Home » Biometrics, Member News, Section News

Biometrics Section News for June 2019

1 June 2019 462 views No Comment

The following JSM 2019 short courses are cosponsored by the Biometrics Section:

Saturday, July 27

Reproducible Computing
Led by Colin Rundel
Success in statistics and data science is dependent on the development of both analytical and computational skills. This workshop will cover the following:

  • Recognizing the problems reproducible research helps address
  • Identifying pain points in getting your analysis to be reproducible
  • The role of documentation, sharing, version control, automation, and organization in making your research more reproducible
  • Introducing tools to solve these problems, specifically R, RStudio, RMarkdown, git, GitHub, and make
  • Strategies for scaling these tools and methods for larger, more complex projects

Workshop attendees will work through several exercises and get first-hand experience with using relevant tool-chains and techniques, including R/RStudio, literate programming with R Markdown, automation with make, and collaboration and version control with git/GitHub.

Statistical and Computational Methods for Microbiome and Metagenomics Data Analysis
Led by Curtis Huttenhower and Hongzhe Lee
High-throughput sequencing technologies enable individualized characterization of the microbiome composition and functions. The human microbiome, defined as a community of microbes in and on the human body, affects human health and risk of disease by dynamically interacting with host diet, genetics, metabolism, and environment. The resulting data can potentially be used for personalized diagnostic assessment, risk stratification, disease prevention, and treatment. Microbiome has become one of the most active areas of research in biomedical sciences. New computational and statistical methods are being developed to understand the function of microbial communities. In this short course, we will give detailed presentations on the statistical and computational methods for measuring various important features of the microbiome based on 16S rRNA and shotgun metagenomic sequencing data and how these features are used as an outcome of an intervention, mediator of a treatment, and covariate to be controlled for when studying disease/exposure associations. The statistics underlying some of the most popular tools in microbiome data analysis will be presented—including bioBakery tools for meta’omic profiling and tools for microbial community profiling (MetaPhlAn, HUMAnN, Data2, DEMIC, etc.)—together with advanced methods for compositional data analysis and kernel-based association analysis.

Sunday, July 28

Regression Modeling Strategies
Led by Frank Harrell
All standard regression models have assumptions that must be verified for the model to have power to test hypotheses and for it to be able to predict accurately. Of the principal assumptions (linearity, additivity, distributional), this course will emphasize methods for assessing and satisfying the first two. Practical but powerful tools are presented for validating model assumptions and presenting model results. This course provides methods for estimating the shape of the relationship between predictors and response using the widely applicable method of augmenting the design matrix using restricted cubic splines. Even when assumptions are satisfied, overfitting can ruin a model’s predictive ability for future observations. Methods for data reduction will be introduced to deal with the common case in which the number of potential predictors is large in comparison with the number of observations. Methods of model validation (bootstrap and cross-validation) will be covered, as will auxiliary topics such as modeling interaction surfaces, variable selection, overly influential observations, collinearity, and shrinkage. A brief introduction to the R rms package for handling these problems also will be discussed. The methods covered will apply to almost any regression model, including ordinary least squares, logistic regression models, ordinal regression, quantile regression, longitudinal data analysis, and survival models.

Functional Data Analysis for Wearables: Methods and Applications
Led by Vadim Zipunnikov and Jeff Goldsmith
Technological advances have made many wearable devices available for use in large epidemiological cohorts, national biobanks, and clinical studies. This opens up a tremendous opportunity for clinical and public health researchers to unveil previously hidden but pivotal physiological and behavioral signatures and relate them to disability and disease. Therefore, understanding, interpretation and analysis of complex multimodal and multilevel data produced by such devices becomes crucial.

The main goal of this workshop is to present an overview of the functional data analysis methods for modeling physical activity data, review their strengths and limitations, and demonstrate their implementation in R packages refund and mgcv. We will also examine several nonfunctional approaches for extracting informative and interpretable features from wearable data. We will discuss applications in epidemiological studies such as the Head Start Program and National Health and Nutrition Examination Survey and a clinical study of congestive heart failure.

Tuesday, July 30

Measuring the Impact of Nonignorable Missing Data
Led by Daniel Heitjan and Hui Xie
The popular but typically unverifiable assumption of ignorability greatly simplifies analyses with incomplete data, both conceptually and computationally. We say missingness is ignorable when the probability that an observation is missing depends only on fully observed information and nonignorable when the probability that an observation is missing depends on the value of the observation, even after conditioning on available design variables and covariates.

For example, in a clinical trial, the data are plausibly nonignorably missing when the subjects who drop out are those for whom the drug is either ineffective or excessively toxic. The possibility that the missing observations in a study are the result of a nonignorable mechanism casts doubt on the validity of conclusions based on the assumption of ignorability. Unfortunately, it is generally impossible to robustly assess the validity of this assumption with just the data at hand.

One way to address this problem is to conduct a local sensitivity analysis. Essentially, recompute estimated parameters of interest under models that slightly violate the assumption of ignorability. If the parameters change only modestly under violation of the assumption, then it is safe to proceed with an ignorable model. If they change drastically, then a simple ignorable analysis is of questionable validity.

To conduct such a sensitivity analysis in a systematic and efficient way, we have developed a measure we call the index of local sensitivity to nonignorability (ISNI), which evaluates the rate of change of parameter estimates in the neighborhood of an ignorable model. Computation of ISNI is straightforward and avoids the need to estimate a nonignorable model or to posit a specific magnitude of nonignorability. We have developed a suite of statistical methods for ISNI analysis, now implemented in an R package named isni.

In this half-day short course, we will describe these methods and train users to apply them to inform evaluations of the reliability of empirical findings when data are incomplete.

An Introduction to the Joint Modeling of Longitudinal and Survival Data, with Applications in R
Led by Dimitris Rizopoulos
In follow-up studies, different types of outcomes are typically collected for each subject. These include longitudinally measured responses (e.g., biomarkers) and the time until an event of interest occurs (e.g., death, dropout). Often, these outcomes are separately analyzed, but on many occasions, it is of scientific interest to study their association. This type of research question has given rise in the class of joint models for longitudinal and time-to-event data. These models constitute an attractive paradigm for the analysis of follow-up data mainly applicable in two settings: when focus is on a survival outcome and we wish to account for the effect of endogenous time-dependent covariates measured with error and when focus is on the longitudinal outcome and we wish to correct for nonrandom dropout.

This full-day course is aimed at applied researchers and graduate students and will provide a comprehensive introduction to this modeling framework. We will explain when these models should be used in practice, which are the key assumptions behind them, and how they can be used to extract relevant information from the data. Emphasis is given on applications. At the end of the course, participants will be able to define appropriate joint models to answer their questions of interest.

This course assumes knowledge of basic statistical concepts such as standard statistical inference using maximum likelihood and regression models. Also, basic knowledge of R would be beneficial but is not required. Participants are required to bring their laptop with the battery fully charged. Before the course, instructions will be sent for installing the required software.

Adaptive Treatment Strategies: An Introduction to Statistical Approaches for Estimation
Led by Erica Moodie
Evidence-based medicine relies on using data to provide recommendations for effective treatment decisions. However, in many settings, response is heterogeneous across patients. Patient response may also vary over time, and physicians are faced with the daunting task of making sequential therapeutic decisions having seen few patients with a given clinical history.

Adaptive treatment strategies (ATS) operationalize the sequential decision-making process in the precision medicine paradigm, offering statisticians principled estimation tools that can be used to incorporate patient’s characteristics into a clinical decision-making framework so as to adapt the type, dosage or timing of treatment according to patients’ evolving needs.

This half-day course will provide an overview of precision medicine from the statistical perspective. We will begin with a discussion of relevant data sources. We will then turn our attention to estimation and consider multiple approaches—and their relative strengths and weaknesses—to estimating tailored treatment rules in a one-stage setting. Next, we will consider the multi-stage setting and inferential challenges in this area. Relevant clinical examples will be discussed, as well available software tools.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

Comments are closed.