Triple-negative breast cancer (TNBC) is a subtype with heterogeneous patient outcomes. Approximately 40% of patients experience rapid relapse, while the remaining patients have long-term disease-free survival. To determine if there are molecular differences between primary tumors that predict prognosis, we performed RNA-seq on 47 macrodissected tumors from newly diagnosed patients with TNBC (n = 47; 22 relapse, 25 no relapse; follow-up median, 8 years; range, 2–11 years). We discovered that expression of the MHC class II (MHC II) antigen presentation pathway in tumor tissue was the most significant pathway associated with progression-free survival (HR, 0.36; log-rank P = 0.0098). The association between MHC II pathway expression and good prognosis was confirmed in a public gene expression database of 199 TNBC cases (HR, 0.28; log-rank P = 4.5 × 10–8). Further analysis of immunohistochemistry, laser-capture microdissected tumors, and TNBC cell lines demonstrated that tumor cells, in addition to immune cells, aberrantly express the MHC II pathway. MHC II pathway expression was also associated with B-cell and T-cell infiltration in the tumor. Together, these data support the model that aberrant expression of the MHC II pathway in TNBC tumor cells may trigger an antitumor immune response that reduces the rate of relapse and enhances progression-free survival. Cancer Immunol Res; 4(5); 390–9. ©2016 AACR.
Triple-negative breast cancer (TNBC) describes a clinical subtype of invasive breast cancer tumors that lack expression of the estrogen receptor (ER−), the progesterone receptor (PR−), or overexpression of HER2. They represent a breast cancer entity in which tumors behave aggressively and are not candidates for ER or HER2/Neu-targeted therapy. Most patients are treated with surgery and receive adjuvant or neoadjuvant chemotherapy with or without local radiotherapy. Patient outcome is heterogeneous, with 42% of patients having rapid relapses with a peak at 3 years from diagnosis while relapse rate is low from years 5 to 10 (1). TNBC tumor types vary in their genomic makeup, with the majority categorized as basal-like (BL) subtype. In general, BL and non-BL subtypes share similarly aggressive biology (2).
Over the past 15 years, a major research effort has been directed at using genomic techniques to analyze the biology of breast cancer and to establish genomic signatures to assess prognosis (3). These data have been primarily prognostic gene expression signatures derived from microarray genomic platforms (Affymetrix, Illumina, etc.) with more recent studies using RNA-sequence (seq) technology (4). This has been most notably successful in ER+ breast cancer. Some of these genomic assays have received FDA approval and are used widely to assist therapy decision making in ER+ disease (5). Prognostic gene expression signatures are not as well developed for TNBC and are not used in clinical practice. Several large multigene signatures have performed well in multivariate analysis, which indicates that gene expression differences between tumors are associated with different clinical outcomes (6–9).
The presence of tumor-infiltrating lymphocytes (TIL) can be detected by morphology, immunohistology, and genomic methodologies. The presence of TILs in TNBC has been associated with good prognosis in several studies (9–12). The conclusions from many of these TIL studies are that the patient's immune response has a positive effect on progression-free survival (PFS), therapy response, and overall survival, especially in TNBC (13, 14). It is unclear what differences between TNBC tumors lead to differences in lymphocyte infiltration.
In this study, we used RNA-seq technology to examine gene expression in TNBC, which has multiple advantages over microarray genomic platforms (15, 16). We designed this study to determine which genes had significantly different expression between patients who relapsed compared with those who did not experience relapse during a follow-up period (median, 8 years; range, 2–11 years). Whole-transcriptome analysis of macrodissected tumor tissue revealed that expression of the MHC class II antigen presentation pathway (MHC II) in TNBC tumors was the most significant pathway associated with good clinical outcomes in our dataset. We confirmed the association between MHC II pathway expression and good prognosis in a public gene expression database. We performed further analysis of tumor immunohistochemistry, laser-capture microdissected tumors, and TNBC cell lines to demonstrate that tumor cells, in addition to immune cells, aberrantly express the MHC II pathway. We found that expression of the MHC II pathway is correlated with the presence of a TIL gene expression signature in the same tumors. This study provides a means to assess prognosis in TNBC and may also provide a coherent mechanism for the generation of endogenous antitumor immunity in patients with good clinical outcomes.
Materials and Methods
The Tumor Procurement Shared Facility of the University of Alabama at Birmingham (UAB) Comprehensive Cancer Center has an Institutional Review Board–approved protocol for collection of tumor and normal tissue samples for research purposes using deidentified clinical data and laboratory analysis. TNBC breast cancer tissues (n = 47) were selected for analysis on the basis that the tumors were ER and PR negative, HER2/Neu was not overexpressed, snap-frozen tissue was available, there was adequate patient follow-up (>24 months), and the patient had received no anticancer therapy prior to tissue collection.
The frozen tumor tissue underwent macrodissection by a board-certified pathologist (W.E. Grizzle; see Supplementary Data). This process included taking serial frozen sections, staining them with hematoxylin and eosin, and estimating tumor cell content. Areas of the specimen that contained uninvolved breast and/or leukocytic infiltration were removed to enrich for the malignant cells in the specimen. The deidentified tumor specimens had >50% tumor nuclei and were shipped on dry ice to HudsonAlpha Institute for Biotechnology (Huntsville, AL). More details are provided in Supplementary Methods.
The 47 tumor specimens were weighed and underwent RNA extraction (see Supplementary Data). RNA-seq libraries were constructed (17) and were quantified using the Qubit dsDNA High Sensitivity Assay Kit and the Qubit 2.0 fluorometer (Invitrogen). Three barcoded libraries were pooled in equimolar quantities per sequencing lane on an Illumina HiSeq 2000 sequencing machine. They were sequenced using paired-end 50-bp reads and a 6-bp index read to a depth of at least 50 million read pairs per library. The RNA-seq data are publicly available through GEO Accession GSE58135 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE58135).
RNA-seq data analysis
Gene expression values (fragments per kilobase of transcript per million, FPKM) were calculated using TopHat v 1.4.1 (18), GENCODE version 9 (19), BEDtools (20), and Cufflinks 1.3.0 with -u option (ref. 21; see Supplementary Data). We performed unsupervised clustering on normalized gene read counts to identify subclusters of samples within our dataset using the ConsensusClusterPlus R package (22). TNBC subtype of each sample was determined using TNBCType (23). The SAMseq function was used to perform supervised analysis to identify genes differentially expressed between tumors from patients who did or did not relapse with q values of <5% (24). Kaplan–Meier curves and survival analysis were performed using RNA-seq FPKM values and an R script (25). The Supplementary Data section contains more details about these bioinformatics analyses.
Public microarray data analysis
Kaplan–Meier and survival analyses were performed on public microarray data using the Kaplan–Meier Plotter tool (http://kmplot.com; ref. 26). Patients were censored at the follow-up threshold (8–10 years). Only JetSet best probe sets were used for each gene in the microarray data analysis (27). Analysis was restricted to the 199 patients whose tumors were ER−, PR−, and were classified as basal intrinsic breast cancer subtype (25). Basal TNBC tumors were identified based on the St. Gallen criteria (28) using the procedure described by the developers of the Kaplan–Meier Plotter tool (29).
Tumor versus stroma gene expression
Five archived deidentified TNBC tumor specimens underwent standard immunohistochemical analysis with anti-CD74 (Leika/Novocastra) and anti–HLA-DPB1 (Sigma-Aldrich). An anatomic pathologist estimated the fraction of antibody-positive tumor cells and the localization of the staining (see Supplementary Data). To examine gene expression by epithelial tumor cells versus stroma, we used a public laser-capture microdissection dataset (GEO-GSE5847; ref. 30). The raw dataset (.cel and matrix files) was uploaded to Partek Genomic Suite (PGS) for data background subtraction, quality control, and robust multiarray average normalization. Of the 31 patients in this database, we selected the 14 patients who had invasive TNBC to measure the gene expression in their epithelial tumor cells.
Cell line interferon gamma treatment and RNA-seq
The TNBC cell lines MDA-MB-468 (ATCC HTB-132) and MDA-MB-436 (ATCC HTB-130) were purchased from the ATCC, and RNA-seq was performed 3 months after purchase (21 passages). Due to the short interval between the purchase and the RNA-seq experiment, no additional authentication was performed. MD-MB-468 was cultured in DMEM, 10% FBS, 1 mmol/L sodium pyruvate, and 1× nonessential amino acids. MDA-MB-436 was cultured in DMEM, 10% FBS, 10 μg/mL insulin, and 16 μg/mL glutathione. Cells were treated with 10 U/mL Recombinant Human Interferon-γ (Thermo Fisher product # PHC4031) in duplicate for 24 hours. Cells were lysed with 350 μL RLT (Qiagen) plus 1% BME, RNA was extracted from the lysate using the Norgen Animal Tissue RNA Purification Kit, and RNA-seq libraries were constructed using the KAPA Stranded mRNA-seq Kit. Libraries were sequenced on the illumina HiSeq 2500 with 10 samples per lane. Data were aligned to the UCSC hg19 transcriptome using HISAT (31). HTSEQ was used to calculate gene counts (32). DESEQ2 (33) was used to calculate fold changes between two replicates of IFNγ-treated cells and the two replicates of untreated cells for both cell lines.
Descriptive analysis was provided for patients' characteristics, including Student t test and χ2 statistics. The individual MHC II gene expression values were transformed to best fit a normal distribution using log2 base (34, 35). High or low expression levels of individual genes were assessed around median value, tertiles, or quartiles. PFS was defined as the time from diagnosis to the first documented disease progression or death due to any cause, whichever occurs first. Subjects without relapse were considered censored. The Kaplan–Meier method and log-rank test were used to assess the expression difference. The HR and its 95% confidence interval (CI) from the Cox model (36) with the Efron method were reported. Pearson correlation coefficient was estimated to examine the correlation among individual genes. The association between MHC II gene expression and good prognosis was independently significant in multivariate analysis by Cox regression analysis using the variables of age, race, stage, tumor size, node status, adjunct therapy, and breast cancer subtype. In addition, high levels of MHC II gene expression were correlated with HR for relapse even when controlling for the effects of stage (which includes the variables of tumor size and node status).
Forty-seven women with TNBC were selected for this case series with a median follow-up time of 8 years (range, 2–11 years; Table 1). As expected, the 22 patients who had disease relapse had significantly higher stage (P = 0.0295), tumor size (P = 0.0029), and node involvement (P = 0.0394) compared with patients who did not experience disease relapse. Both groups received similar adjuvant therapy. Anthracycline combinations were used in the majority of patients. Five relapsed patients and two non-relapsed patients received no adjuvant treatment. Similar numbers of patients had conservative surgical management and radiotherapy. The median time to relapse was 18.5 months (range, 8–27 months), and the follow-up of nonrelapsed patients had a median of 96 months (range, 25–137 months). Racial makeup of the two groups was similar, and overall 81% of the tumors were BL using the St. Gallen criteria (28) and similar in both groups. This case series generally represents the diverse presentation and outcomes that are seen in TNBC patients in clinical practice.
Unsupervised consensus clustering analysis
RNA-seq was performed on macrodissected flash-frozen tumor specimens that were surgically resected from the women in this study before they began chemotherapy or radiotherapy. To determine if there were molecular differences between patient's tumors that are associated with clinical outcomes, we first performed unsupervised consensus clustering analysis on whole-transcriptome data. The analysis identified three main clusters (1, 2, and 3) composed of 20, 17, and 10 patient tumors, respectively (Fig. 1A). The cluster analysis did not simply reflect seven previously defined TNBC subtypes (23, 37); each cluster we identified contained multiple TNBC subtypes, and the TNBC subtypes were represented in multiple clusters (Supplementary Fig. S1). Figure 1B and C provide the PFS for the three cluster groups. Cluster 2 had improved PFS for the groups as a whole (P = 0.023) and when subdivided based on lymph node involvement (P = 0.013). Patient tumors in Cluster 2 had higher expression of immunomodulatory genes than tumors in the other clusters (Supplementary Table S1). The rate of relapse in Cluster 2 (3/17; 18%) is significantly lower than in Clusters 1 (12/20; 60%) and Cluster 3 (7/10; 70%); P = 0.0067. This analysis is consistent with findings from previous reports that a subset of tumors has increased expression of many different immune-related pathways, which is associated with better clinical outcomes (14, 37, 38).
Gene expression analysis
We next performed an analysis to determine which genes in the transcriptome had the most significant expression differences between tumors from patients who relapsed compared with tumors from patients who did not relapse. Table 2 provides the list of 24 genes identified with a FDR of 5% (q value < 0.05). A heatmap of the 24 genes (Supplementary Fig. S2) illustrated that each of the 24 genes exhibited higher average expression across the tumors from patients who did not relapse. Eleven of these genes are major components of the MHC II antigen presentation pathway, including CIITA, CD74, HLA-DPA1, HLA-DPB1, HLA-DPB2, HLA-DQA1, HLA-DRB1, HLA-DRB5, HLA-DRB6, CTSH, and NCOA1 (Supplementary Fig. S3; Table 2). The expression of these MHC II pathway genes is highly correlated across patient samples (P < 0.0001), which suggests that they undergo coordinated expression regulation in these tumors (Supplementary Table S2). To our knowledge, this is the first time that the coordinated expression of the MHC II pathway genes has been reported as the most significant independent predictor of good prognosis in TNBC.
The 11 MHC class II genes that we identified as significantly associated with good prognosis include each step in the MHC II antigen presentation process from the master transcriptional transactivator, CIITA, to the antigen-presenting complex components, HLA-DP, DQ, and DR. HLA-DM is another crucial pathway member that allows HLA binding of peptides prior to display on the antigen-presenting cell surface. While the HLA-DMA and HLA-DMB genes did not reach the strict statistical threshold for genome-wide significance that the other 11 genes attained, they were highly correlated with CD74 and other significant genes in the pathway (r = 0.84 and 0.89, respectively; P < 0.001 for both). This suggests that HLA-DM is also coordinately regulated with other MHC II pathway members in TNBC tumors.
To further characterize the relationship between the 13-gene MHC II pathway signature and prognosis, we determined the statistical association between gene expression and PFS. The average expression value of the 13 MHC II genes in our 47 patients was used to generate Kaplan–Meier curves which demonstrated a significant association between MHC II pathway expression and PFS (Fig. 2A; log-rank P = 0.0098; HR, 0.36).
Given the strong correlation among overexpressed MHC II pathway genes, we examined the association between PFS and expression on a single-gene basis. A summary of HR values for 10 of the MHC II differentially expressed genes is provided in Supplementary Table S2. High expression of CIITA or CD74 alone was independently significantly associated with PFS (CIITA log-rank P = 0.0002; HR, 0.167; Fig. 2B; CD74 log-rank P = 0.0164; HR, 0.349; Fig. 2C). When CIITA gene expression is classified as either above or below the median, it is an independent predictor for PFS (P = 0.008) by multivariable Cox regression analysis. When controlling for tumor stage, the HR for high versus low CIITA was 0.147 (CI, 0.048–0.450). Similarly, CD74 was an independent predictor for PFS (P = 0.0322) with an HR of 0.362 (0.143–0.917) after adjusting for tumor stage. When gene expression values are divided into tertiles (high, intermediate, and low values), the samples with the highest CIITA only had two of 16 relapses including one with relapse at >90 months. The lowest-third expression values for CD74 were associated with 12 of 16 relapses, all of which occurred within 25 months (Supplementary Fig. S4A and S4B). Together, these statistical analyses demonstrate that expression of the MHC II pathway genes is strongly associated with PFS.
Confirmation that MHC II expression is associated with a good prognosis
To determine if the association between MHC II expression and prognosis was specific to our patient case series or RNA-seq methods, we sought to confirm this result in another dataset. We examined a large meta-analysis of Affymetrix microarray data that were assembled to encompass gene expression profiles from all available breast cancer studies that had adequate clinical follow-up (25). This database conglomerates the gene expression data and clinical follow-up from samples that were collected for many different studies. We analyzed 199 patients in this meta-analysis dataset with ER−, PR−, and basal-intrinsic subtype tumors and examined the expression levels of our 13 MHC II genes. One of the MHC II genes was not represented in this database (HLA-DPB2). The average expression value of the remaining 12 MHC II gene expression levels had a striking association with PFS with a log-rank of P = 4.5 × 10–8 and an HR of 0.28 (0.17–0.45) as shown in Fig. 3A. Similar to the results in our patient data, individual MHC II gene expressions were significantly correlated with PFS (prognosis) as shown for CD74 in Fig. 3B [log-rank P = 1.9 × 10–6; HR, 0.31 (0.18–0.51)]. Despite the differences in gene expression measurement technology and the multiple institutions and studies included in the public meta-dataset, MHC II expression was confirmed to be strongly associated with TNBC prognosis.
Tumor cell expression of MHC II genes
Classically, MHC II antigen processing and presentation are attributed to dendritic cells, B cells, and macrophages which are found in tumor stroma, lymph nodes, and spleen. To determine whether the prognostic MHC II gene expression signature was derived from tumor cells or surrounding cells in the tumor sample, we performed immunohistochemistry on five randomly selected TNBC specimens. We assessed staining for CD74 and HLA-DPB1 in the malignant epithelium, recording the percentage of tumor cells stained, pattern of staining (cytoplasmic, membranous, or both), and scored the staining intensity as weak, moderate, or strong with respect to background lymphocytes which served as an internal control. We found that all five TNBC tumor specimens had CD74 protein expression in tumor cells. This staining was cytoplasmic, membranous, or both, with 5% to 90% of tumor cells showing immunoreactivity that was weak to moderate in intensity (Fig. 4A). HLA-DPB1 protein expression was noted in two of five TNBC tumors, with weak to moderate cytoplasmic and membranous staining in 20% and 40% of tumor cells, respectively (Fig. 4B). These results are consistent with previous observations that MHC II proteins can be detected in TNBC tumor cells (39–43).
In addition, we examined the expression of MHC II genes in laser-capture microdissected breast tumor tissues using a publicly available Affymetrix microarray dataset (ref. 30; GSE5847). In 14 patients with TNBC, the range of values in the tumor cells for HLA-DPA1 was 6.70 to 11.48 RMA units and for HLA-DRB1 was 11.35 to 13.12 RMA units. T test and paired T analysis results were not statistically significant between stroma and epithelial expression of MHC II genes. These analyses further support the conclusion that TNBC epithelial tumor cells can express MHC II genes.
In professional antigen-presenting cells (APC), interferon gamma (IFNγ) is the signal that induces CIITA to transcriptionally activate the MHC II antigen presentation pathway. We analyzed breast cancer cell line RNA-seq data from a previous study (44) and identified two TNBC cell lines that had little or no expression (FPKM < 0.5) of the prognostic MHC II pathway genes in standard media conditions. We then treated these cell lines with IFNγ and performed RNA-seq. We found that the majority of the 13 prognostic MHC II pathway genes were highly expressed after IFNγ induction in both TNBC cell lines (Fig. 5). This result confirms findings from previous reports (41, 45, 46) that TNBC tumor cells can express components of the MHC class II antigen presentation pathway and demonstrates that induction of this pathway can occur through the same signaling pathway that activates APCs.
Tumor MHC II gene expression correlation with infiltrating lymphocytes
The strong association between tumor cell MHC II pathway expression and PFS suggests that antitumor immunity is involved in conferring the good prognosis. To test if an antitumor immune response was associated with the MHC II–positive tumors, we assessed the correlation of representative MHC II genes with the B- and T-cell gene signatures used by West and colleagues (12) to identify TILs. As can be seen in our dataset (Table 3A) and the public database (Table 3B), there is a substantial correlation of MHC II gene expression with B-cell and T-cell genes in both datasets. In general, the MHC II gene correlations were higher with the T-cell genes than the B-cell genes (CD19 and CD20). This observation is consistent with MHC II antigen presentation inducing activation of T cells.
We demonstrated that coordinated expression of the MHC class II antigen presentation pathway occurs in a subset of TNBC patients' tumors and is associated with TILs and long-term PFS. Previous studies have reported individual components of these results, including the correlation between TILs and good prognosis (10–12, 14), the presence of an immunomodulatory gene expression signature in TNBC (7, 9, 37), and expression of various individual HLA proteins in tumor cells (39, 40, 42, 43, 46). However, this time we have linked these observations through a specific coherent model that suggests a mechanism for discerning why a subset of TNBC patients has long-term PFS. Based upon the data collected in this study, we propose that a subset of TNBC patients have aberrant expression of the MHC II pathway in their tumor cells that results in the presentation of tumor-specific neoantigens to CD4+ T cells, which become activated and induce the recruitment of other TILs. This TIL invasion may reflect the induction of an antitumor immune response that reduces the rate of relapse in patients after treatment of their primary tumor. This model suggests that endogenous antitumor immunity plays an important role in TNBC prognosis.
Although we associated this mode of endogenous antitumor immunity with prognosis in TNBC patients, the principle behind this concept had already been demonstrated in animal models. Elegant studies have demonstrated that ectopic expression of the MHC II pathway in tumor cells from mouse models of breast cancer can induce Th1-mediated antitumor immunity and antitumor memory in the syngeneic host mouse (47–50).
Our results indicate that a biomarker test that measures MHC class II expression could be a powerful way to predict risk of relapse in TNBC patients. Further studies are warranted to overcome the limitations of our approach and develop a clinical-scale assay to specifically measure expression of the MHC II pathway signature genes in clinical specimens. A qPCR assay or a Nanostring assay that is compatible with the fragmented RNA derived from formalin-fixed paraffin embedded clinical specimens could be a promising approach to determine if this discovery has clinical utility as a prognostic biomarker in a larger case series of TNBC patients. We are also excited by the possibility that therapies that induce MHC class II expression in tumor cells may be a particularly valuable strategy for converting MHC II–negative poor prognosis TNBC tumors into MHC-positive tumors that present tumor-specific neoantigens (45, 51, 52) and induce antitumor immunity.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Conception and design: A. Forero, D.J. Buchsbaum, A.F. LoBuglio, K.E. Varley
Development of methodology: A. Forero, W.E. Grizzle, A.F. LoBuglio, K.E. Varley
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): W.E. Grizzle, N.D. Merz, E. Downs-Kelly, C. Vaklavas, K.E. Varley
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): A. Forero, Y. Li, D. Chen, K.L. Updike, E. Downs-Kelly, T.C. Burwell, C. Vaklavas, A.F. LoBuglio, K.E. Varley
Writing, review, and/or revision of the manuscript: A. Forero, Y. Li, D. Chen, W.E. Grizzle, E. Downs-Kelly, T.C. Burwell, C. Vaklavas, D.J. Buchsbaum, R.M. Myers, A.F. LoBuglio, K.E. Varley
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): W.E. Grizzle, T.C. Burwell, C. Vaklavas, R.M. Myers
Study supervision: A. Forero, R.M. Myers, A.F. LoBuglio
Funding for the work has been provided by the UAB SPORE in Breast Cancer (NCI P50 CA089019), the Breast Cancer Research Foundation of Alabama, the Cancer Center Core Support Grant (P30 CA013148), and the Susan G. Komen for the Cure Foundation (KG090969).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Cancer Immunology Research Online (http://cancerimmunolres.aacrjournals.org/).
- Received September 24, 2015.
- Revision received January 8, 2016.
- Accepted January 28, 2016.
- ©2016 American Association for Cancer Research.