New Resource to Help Scientists Better Classify Cancer Subtypes

New Resource to Help Scientists Better Classify Cancer Subtypes

Cancer prognosis has traditionally been informed by classification of tumors by their site of origin, with subtypes based on histologic features, morphologic grade, and American Joint Committee on Cancer/International Union Against Cancer TNM (tumor, nodes, metastasis) stage. Large-scale cancer genome projects, such as The Cancer Genome Atlas (TCGA), have further enhanced cancer classification by defining molecular subtypes for all major cancer types. However, these methods produce results intrinsic to specific TCGA datasets and do not carry over to other samples in clinical trials or research studies. An article published in Cancer Cell describes additional research that aimed to bridge the gap between the discovery of molecular subtypes in an existing cohort and the application of those subtype labels in the clinic, to aid in classification of cancer subtypes.

On the basis of this research, the authors created an online resource of 737 publicly available, containerized predictive models for 26 cancer cohorts to expand the utility of TCGA subtypes using machine-learning approaches. Five different machine-learning approaches were used to produce distinct subtype classifier models incorporating 5 different data types and comprising 26 different cancer cohorts with 106 subtypes. The authors suggest that these tumor subtype classifiers will be useful in both prospective cancer research and in clinical practice.

High level
This new resource supports gene-based feature sets for the creation of compact cancer testing panels and kits to clinically subtype non-TCGA patient tumor samples. The classifier models can be applied to samples from other studies for most research applications, even when only 1 data type is available, after appropriate data transformation to match the range and distribution in the TCGA cohort. Deep-learning techniques raise the possibility of improving biological classification tasks, and the methods used in this research may inform approaches to fine-tuning of large models to improve cancer subtypes classification.

Ground level
External cancer datasets employing various molecular assay platforms or incorporating formalin-fixed, paraffin-embedded samples can be effectively transformed to yield accurate subtype predictions using these innovative classifier models, and enhance classification of cancer subtypes. Clinicians should note that in practice, new and undocumented subtypes may be encountered in non-TCGA datasets, so it may be beneficial to assign an “unknown” label to a new sample in situations where no existing subtype has a strong enough class prediction score.