Skip to Main Content

Regularized Classification and Survival Analysis for Expression Profiling of Cancer

The objectives of this project are to develop novel statistical methods and computer packages for cancer classification and survival analysis using high-dimensional gene expression data and clinical measurements. The development of the proposed statistical methods that can deal with high-dimensional problems in estimating the relationship between cancer clinical outcomes and genomic data will contribute to better understanding of the genetic basis of cancer, better diagnoses, and better survival prediction, which in turn, can potentially have important impact on public health.

Study Period

January 1, 2008 - December 31, 2011

Acknowledgements

This study has been supported by RO1 CA120988 from NCI, NIH (P.I.: Dr. Jian Huang, Department of Statistics and Actuarial Science, University of Iowa). We would like to thank members of Yale Cancer Center and University of Iowa Holden Comprehensive Cancer Center for insightful discussions.

Integrative Analysis of Genomic Data and Marker Identification

The goal of this study is to develop novel meta analysis and integrative analysis methods that can pool and analyze data from multiple heterogeneous genomic studies and select important markers. This study can provide more efficient usage of existing genomic data, provide more accurate predictions and more insights into the genomic mechanisms of disease occurrence and progression, and in general benefit pharmacogenomic studies.

Presentations

  1. Integrative analysis of cancer genomic data. The 57th Session of the International Statistical Institute. August 18th, 2009.
  2. Identification of cancer-associated gene pathways from analysis of expression data. JSM. August 3rd, 2009.
  3. A Tale of Two Streets: Incorporating grouping structure in high dimensional data mining. Data Mining and Business Intelligence Conference. June 6th, 2009.
  4. Identification of genes associated with multiple cancers via integrative analysis. Division of Biostatistics, Washington University at St. Louis. April 3rd, 2009.
  5. Identifying common transcriptional profiles of neoplastic transformation and progression via integrative microarray analysis. Yale Center for Clinical Investigation. Sep. 29th, 2008.
  6. Regularized meta analysis of cancer microarray. NRC. July 30th, 2008.
  7. Data mining in cancer informatics. Renmin University. June 19th, 2008.
  8. Regularized microarray meta-analysis. ICSA. June 7th, 2008.
  9. Meta analysis of cancer microarray studies. Yale Cancer Center. May 14th, 2008.

Acknowledgements

This study has been supported by:

  • R03LM009828 from NLM/NIH (National Library of Medicine): Efficient microarray meta analysis and cancer biomarker selection. Funding period: 09/01/2009--08/31/2010. PI: Shuangge Ma.
  • CTSA award to Yale Center for Clinical investigation: Identifying common transcriptional profiles of neoplastic tranformation and progression via integrative microarray analysis. Funding period: 1/1/2008--06/30/2010. PI: R. Sherwin. Project PI: Shuangge Ma.

Gene Set (Pathway and Network) Based Genomic Data Analysis

The goal of this study is to develop statistical methods that can make more efficient use of cancer genomic data by properly accounting for the clustering (pathway or network) structure of gene expressions and selecting predictive cancer biomarkers. It can provide more insights into the genomic mechanisms of cancer occurrence and progression.

Oral Presentations

  1. Identification of cancer-associated gene pathways from analysis of expression data. JSM. August 3rd, 2009.
  2. Variable selection in the accelerated failure time model via the bridge method. IMS, China. July 3rd, 2009.
  3. A Tale of Two Streets: Incorporating grouping structure in high dimensional data mining. Data Mining and Business Intelligence Conference. June 6th, 2009.
  4. Two-level gene selection via group bridge penalization. Department of Statistics, Columbia University. Jan. 22nd, 2009.


Posters

  1. Gene network analysis and identification of lymphoma prognosis markers. 2010 Clinical and Translational Research and Education Meeting ACRT/SCTS Joint Annual Meeting.

Acknowledgements

This study has been supported by:

  • R03LM009754 from NLM/NIH (National Library of Medicine): Effective clustering penalized methods for genomic biomarker selection. Funding period: 08/01/2009--07/31/2010. PI: Shuangge Ma.
  • DMS-0904181 from NSF (DMS): Novel methods for pharmacogenomic data analysis using gene clusters. Funding period: 8/15/09--8/14/12. PI: Shuangge Ma. Co-PI: Michael Kosorok, Department of Biostatistics, UNC Chapel Hill.

We would like to thank members of Yale Cancer Center for insightful discussions.

Semiparametric Analysis

The long term goal is to develop novel semiparametric analysis tools for biological, economical, demographical, and medical studies. In the study supported by DMS-0805984 from NSF, the goal is to investigate semiparametric two-part models and their applications in biomedical studies.

Presentations

  1. Interval censored data with a cured subgroup. Academy of Mathematical and System Science, Chinese Academy of Sciences. June 19th, 2009.
  2. Interval censored data with a cured group. Department of Mathematics, Washington University at St. Louis. April 2nd, 2009.

Acknowledgements

This study has been supported by DMS-0805984 from NSF: Novel Semiparametric Two-part Models: New Theories and Applications. Funding period: 07/01/2008--06/30/2011. PI: Shuangge Ma; Co-PI: Dr. Andrew (Xiao-Hua) Zhou, Department of Biostatistics, University of Washington.

Collaborative Research

Genetics of Functional Disability in Schizophrenia and Bipolar Illness

  • Study CSP #572 conducted by VA CT Healthcare System/CERC
  • Role: Senior genetic statistician
  • Key collaborators: Dr. John Concato

Cancer Genomics

  • Key collaborators: Drs. Yawei Zhang, Yong Zhu, Michael Krauthammer

Health Economics

  • Key collaborators: Dr. Jennifer Ruger

Cardiovascular Disease

  • Key collaborators: Dr. Richard Kronmal