Transcripts with ESTs derived exclusively or predominantly from testis, and not from other normal tissues, are likely to be products of genes with testis-restricted expression, and are thus potential cancer/testis (CT) antigen genes. A list of 371 genes with such characteristics was compiled by analyzing publicly available EST databases. RT-PCR analysis of normal and tumor tissues was performed to validate an initial selection of 20 of these genes. Several new CT and CT-like genes were identified. One of these, CT46/HORMAD1, is expressed strongly in testis and weakly in placenta; the highest level of expression in other tissues is <1% of testicular expression. The CT46/HORMAD1 gene was expressed in 31% (34/109) of the carcinomas examined, with 11% (12/109) showing expression levels >10% of the testicular level of expression. CT46/HORMAD1 is a single-copy gene on chromosome 1q21.3, encoding a putative protein of 394 aa. Conserved protein domain analysis identified a HORMA domain involved in chromatin binding. The CT46/HORMAD1 protein was found to be homologous to the prototype HORMA domain-containing protein, Hop1, a yeast meiosis-specific protein, as well as to asy1, a meiotic synaptic mutant protein in Arabidopsis thaliana.
This article was published in Cancer Immunity, a Cancer Research Institute journal that ceased publication in 2013 and is now provided online in association with Cancer Immunology Research.
CT antigens represent potential cancer vaccine targets for a wide range of tumor types (1). The MAGE, BAGE, and GAGE gene families were the first CT antigens to be identified. This was achieved on the basis of the autologous CD8+ T cell responses they elicited in cancer patients (2). Subsequently, it was recognized that CT antigens also elicit antibody responses in cancer patients, and a further series of CT antigen genes was identified by SEREX (serological analysis of recombinant cDNA expression libraries of human tumors) (3, 4). The SEREX-defined CT antigens include the SSX gene family, SCP1, NY-ESO-1, CT7, HOM-TES-85, CAGE, CAGE1, and NY-SAR-35 (5). More recently, CT antigens have also been sought by identifying genes with restricted CT mRNA expression pattern, regardless of their immunogenicity. This strategy has resulted in the identification of LAGE-1, CT9, CT10, and SAGE by representational difference analysis, and several other CT antigen genes, including CT15, CT16, FATE, and TPTE, by EST database mining (1). In the present study, we continued our search for new CT antigens by analyzing EST databases for genes with predominant expression in testis, and then evaluating their expression in tumors by RT-PCR. Of the 20 CT candidate genes analyzed, we identified CT46/HORMAD1 as a novel CT antigen gene that encodes a putative meiosis-related protein.
Selection of CT candidate genes by EST-based database analysis
The Ludwig Institute for Cancer Research (LICR) Transcriptome database (6) was analyzed, and transcripts with somatic tissue:testicular EST ratios of <5% (P = 0.05) were selected, resulting in a list of 371 genes. Twelve of the 371 genes had already been described in the literature as having a CT expression pattern, including seven listed in the recently compiled CT database (SPANXA1/CT11.1, MAGEA2/CT1.2, GAGED2/CT12.1, BORIS/CT27, HAGE/CT13, AF15q14/CT29, and TDRD1/CT41.1; 5). The remaining 359 genes were evaluated with on-line bioinformatics tools to confirm the testis-specificity of the mRNA transcript and to seek evidence of expression in cancer cell lines or tissues. Two hundred-thirty genes were found to have either ESTs present in more than two somatic tissues, no ESTs in any cancer cDNA libraries (except germ cell tumors), or inadequate data in the database. All such genes were eliminated. A sample of 20 genes was then selected from among the remaining 129 genes based on their higher testis:normal EST ratios and the presence of ESTs from more than one type of cancer. The mRNA distribution of these genes in normal tissues was analyzed by RT-PCR (Table 1).
Identification of four CT or CT-like genes by RT-PCR
Ten of the twenty genes selected showed ubiquitous expression in all twelve normal tissues examined, and three showed differential expression, with at least moderate expression in two or more somatic tissues (Table 2). Seven genes remained as potential CT genes, including four true testis-specific genes (BOLL, PRM2, LOC440934, LOC151273) and three genes with limited and/or weak expression in somatic tissues (CPXCR1, C10orf94, and HORMAD1).
The expression of these 7 genes was then evaluated in 29 cell lines consisting of 15 melanomas, 4 small cell lung cancers (NCI-H82, -H128, -H187, -H740), 3 non-small cell lung cancers (SK-LC-5, -14, -17), 3 colon cancers (SW403, HCT15, LS174T), 1 renal cancer (SK-RCC-1), 1 hepatocellular carcinoma (SK-HEP-1), 1 bladder cancer (T24), and 1 sarcoma (SW982). Melanoma expresses known CT antigens more frequently than most other tumor types (5). The other cell lines have previously been typed and have been shown to express one or more known CT genes (data not shown).
The expression profile of the 7 potential CT genes in this selected "CT-rich" cell line panel is summarized in Table 3. Three genes - BOLL, PRM2, and LOC151273 - showed no expression in any of the twenty-nine cell lines, indicating that these genes, although having cancer-derived ESTs in GenBank, are rarely expressed in cancer. The other four genes, CPXCR1, C10orf94, LOC440934, and HORMAD1, showed moderate to strong expression in one or more cell lines, which identified them as new CT or CT-like genes. The entire process of analyzing the expression pattern of the 20 genes by RT-PCR is summarized in Figure 1.
Of these four genes, CPXCR1 and C10orf94 showed moderate to strong mRNA expression in normal brain tissue by RT-PCR. LOC440934 was only expressed in five of seven cell lines derived from lung cancer (including four from small cell lung cancer), and was not found to be expressed in any of the other twenty-two cell lines from other cell lineages. CPXCR1, C10orf94, and LOC440934 are thus likely to be differentiation antigens, exhibiting concurrent strong expression in testis but not in other somatic tissues, rather than true CT genes. This phenomenon has previously been observed, for example in the case of NY-BR-1, which is a breast differentiation antigen that is also expressed in testis (7). The products of CPXCR1 and C10orf94 are not likely to be useful as targets for cancer vaccines, as the concomitant brain expression raises the concern of antineuronal autoimmunity. On the other hand, the LOC440934 gene product might be of value as a vaccine target for lung cancer.
In comparison to these three genes, HORMAD1 was expressed in three melanoma cell lines and two nonmelanoma cell lines, and thus appears to be a new CT gene. This gene was designated CT46, following our proposed CT nomenclature system (5).
Quantitative RT-PCR (qRT-PCR) analysis of CT46/HORMAD1 expression
To confirm the qualitative RT-PCR data on cell lines and to further evaluate the expression of CT46/HORMAD1 in tumor tissues, qRT-PCR was performed. In addition to showing strong expression of CT46/HORMAD1 in testicular tissue, qualitative RT-PCR (Table 2) showed weak expression of CT46/HORMAD1 in brain, breast, colon, spleen, and placenta. This data was confirmed by qRT-PCR. Among 11 nontesticular normal tissues, the highest expression was seen in placental tissue, with 0.76% of the testicular expression, followed by spleen (0.55%), and colon (0.23%). Other normal tissues expressed CT46/HORMAD1 mRNA at <0.1% of testicular expression, including breast (0.046%) and brain (0.044%).
Quantitative RT-PCR analyses of cell lines confirmed the qualitative PCR data. Thus, of the 15 melanoma cell lines tested, the 3 positive lines - SK-MEL-12, -24, and -80 - expressed CT46/HORMAD1 at 2.85%, 6.39%, and 8.33% of the testicular expression level, respectively. All other melanoma cell lines found to be negative by qualitative RT-PCR had CT46/HORMAD1 mRNA levels that were <0.02% the testicular expression level. There is thus 100% concordance between the qualitative and quantitative RT-PCR results. Since these two assays utilized primers derived from different regions of the genes, this data validated the expression data of CT46/HORMAD1 in normal tissue and in cell lines.
The expression of CT46/HORMAD1 in additional tumor cell lines and tumor specimens was then examined by qRT-PCR and is summarized in Figure 2. We observed weak, moderate, and strong CT46/HORMAD1 expression by qualitative RT-PCR to be approximately equivalent to >0.1%, >1%, and >10% of testicular expression as measured by qRT-PCR. Based on these cut-off values, moderate to strong CT46/HORMAD1 expression (>1% testicular level) was seen in 14/30 (47%) non-small cell lung cancer specimens, 4/11 (36%) breast cancer specimens, 7/20 (35%) esophageal cancer specimens, 5/18 (28%) endometrial cancer specimens, 3/15 (20%) bladder cancer specimens, and 1/15 (7%) colon cancer specimens. Similar levels of expression were also seen in 4/12 (25%) small cell lung cancer cell lines and 2/17 (12%) colon cancer cell lines, but not in neuroblastoma cell lines (0/5). In total, 34/109 (31%) tumor specimens showed >1% testicular level of expression, with 12/109 (11%) exhibiting strong (>10%) expression of CT46/HORMAD1.
CT46/HORMAD1 protein is immunogenic in cancer patients
BLAST analysis of the CT46/HORMAD1 sequence against the patent database showed that a partial CT46/HORMAD1 cDNA sequence had previously been identified by Obata et al. (GenBank Accession No. AX053429) by SEREX analysis of breast cancer with autologous patient serum. This indicates that CT46/HORMAD1 is immunogenic and capable of eliciting spontaneous antibody responses in cancer patients.
The CT46/HORMAD1 gene and its products
CT46/HORMAD1 is a single-copy gene, located on chromosome 1q21.3, that spans 22.8 kb and encodes an mRNA of 1880 bp (excluding the polyA tail). An intronless pseudogene was also identified on chromosome 6q12-14.1 (GenBank Accession No. AL132673), with 93% sequence identity to the CT46/HORMAD1 cDNA sequence.
RT-PCR and DNA sequencing of testicular CT46/HORMAD1 cDNA revealed two transcript variants. The predominant, full-length CT46/HORMAD1 transcript consists of 13 exons, whereas the alternative transcript variant lacks exon 4 (64 bp). The major transcript encodes a putative protein of 394 aa, with the translational initiation site located in exon 2. If the same initiation site is used for transcript variant 2, the encoded protein would only be 60 aa in length, due to a frameshift in the open reading frame resulting from the missing 64 bp. Alternatively, this minor, shorter transcript may be translated from a new initiation site in exon 3, with a putative 323 aa protein, of which the carboxyl 313 residues are identical to those of the main product. A search for conserved protein domains identified a HORMA domain comprising the entire length of the full-length 394-aa sequence (KOG4652, HORMA domain; and pfam02301, HORMA domain) (Figure 3). Indeed, while this study was ongoing, the Human Genome Organization (HUGO) named the gene HORMAD1, recognizing it as a HORMA domain-containing protein. HORMA (for Hop1p, Rev7p, and MAD2) domain proteins are involved in modulating chromatin structure and dynamics. Specifically, it has been suggested that the HORMA domain recognizes chromatin states that result from DNA double-strand breaks or nonattachment to the mitotic spindle and acts as an adaptor to recruit other proteins (8). Hop1, the prototype HORMA domain protein, is a yeast meiosis-specific protein, with which CT46/HORMAD1 shares 25.8% homology over its 215 aa sequence. Although it is not certain whether CT46/HORMAD1 is the human Hop1 ortholog, the presence of the HORMA domain, the similarity to Hop1 and asy1 (Arabidopsis thaliana, meiotic asynaptic mutant protein; 27.65% similarity over 260 residues), together with the germ cell-restricted expression of CT46/HORMAD1, all point to CT46/HORMAD1 being a meiosis-related protein.
CT46/HORMAD1 is highly conserved across species
Homology searches using predicted CT46/HORMAD1 protein sequences identified orthologs in other primates (Macaca fascicularis, GenPept Accession No. BAB63133), as well as rodents (Mus musculus, RefSeq Accession No. NP_080765; Rattus norvegicus, RefSeq Accession No. XP_228333). All are hypothetical proteins predicted from cDNA sequences. Each of the cDNAs was derived from testis, indicating conserved testis-specific transcription.
The available monkey cDNA sequence (GenBank Accession No. AB070034) is a partial sequence encoding the carboxyl 298 residues, with 98.3% (293/298) sequence identity to human CT46/HORMAD1. The mouse and rat counterparts are full-length sequences, with predicted proteins of 374 aa and 391 aa, respectively. The mouse protein shows 78% sequence identity to CT46/HORMAD1 (89% similarity allowing conservative amino acid changes), and the rat protein has 72% identity to CT46/HORMAD1, with 83% sequence similarity, including conservative changes.
In addition to identifying these ortholog genes, the protein homology search identified additional meiotic synapsis proteins, including the meiotic synapsis protein from rice [GenPept Accession No. BAD00095, from Oryza sativa (japonica cultivar-group)] and the Asy1 meiotic protein from Chinese kale (GenPept Accession No. AAN37925), further supporting the hypothesis that CT46/HORMAD1 is an evolutionarily conserved meiotic protein.
MGC26710 is a human protein homologous to CT46/HORMAD1
Among human proteins, MGC26710 is most similar to CT46/HORMAD1. The MGC26710 gene is located on chromosome 22q12 and encodes a putative protein of 307 aa (RefSeq Accession No. NM_152510). Its similarity to CT46/HORMAD1 lies in the N-terminal HORMA domain, with 54% sequence identity in approximately the first 240 residues, which has 72% similarity, including conservative changes (Figure 3).
The mRNA expression of MGC26710 in normal tissues was evaluated by qualitative RT-PCR. The results indicated tissue-restricted expression, with strong expression in testis, liver, and brain tissues, weak expression in kidney tissue, and no or minimal expression in 8 other normal tissues. Examination of the cancer cell lines showed moderate to strong expression in three of twenty-one cell lines tested (NCI-H82, SK-LC-14, and T24), which did not coincide with CT46/HORMAD1 expression. MGC26710 is thus a differentially expressed gene, but differs from CT46/HORMAD1 in its normal and tumor-tissue expression profile.
Through analysis of genes with predominant expression in testis, we have identified CT46/HORMAD1 as a novel CT antigen. Twenty-seven ESTs from normal tissues corresponding to CT46/HORMAD1 were found in GenBank, twenty-three from testis and four from brain tissue. By comparison, 9 ESTs derived from tumor tissue were found, including 4 from germ cell tumors, 4 from breast cancer, and 1 from lung cancer. The EST distribution thus suggested that CT46/HORMAD1 is a germ cell-specific gene that can be activated in non-germ cell malignancies, which is characteristic of CT antigen genes. Our experimental data confirm this impression, revealing CT46/HORMAD1 expression in lung, breast, esophageal, endometrial, bladder, and colon cancers. Although qRT-PCR detected amplification products in a few somatic tissues, we could not formally exclude the possibility that this was the result of amplifying contaminating genomic DNA, as the intronless pseudogene is highly homologous, even in the region where the trans-intronic primers and probe were derived. Even if mRNA were expressed in somatic tissues, our data demonstrated that the level of expression in all somatic tissues is <1% that of testicular expression. Similar low-level expression has also been observed for other CT antigens (1), which does not preclude their use as targets for cancer vaccines.
It has been observed that CT antigens can be separated into two groups, based on whether or not they are located on chromosome X. Chromosome X has been shown to contain an unusually high number of testis-specific genes (9, 10), some of which are CT antigen genes. CT antigen genes belonging to this group include MAGE, GAGE, NY-ESO-1, SSX, XAGE, SPANX, and the recently identified CT45 (11). These genes are almost always members of multigene families, with highly similar members derived from recent gene duplication events. In contrast, most CT antigen genes not located on chromosome X are single-copy genes. CT46/HORMAD1 is a new member of the latter group.
Although the function of CT46/HORMAD1 remains to be experimentally validated, the predicted protein contains a HORMA-domain, and is thus likely to be involved in regulating chromatin structure and dynamics. More specifically, CT46/HORMAD1 is highly similar to meiotic proteins, consistent with its tissue-specific expression in germ cells. This likely association with meiosis is of particular interest, as other meiosis-related proteins have also been found to be CT antigens, including Spo11 and SCP-1 (synaptonemal complex protein 1) (12). We have speculated that expression of such meiosis-specific proteins in somatic cells may lead to genome instability and thus contribute to tumor progression (13).
Although this study resulted in the identification of CT46/HORMAD1, EST-based analyses in general are not particularly effective at identifying tissue-specific genes, including CT genes. Most genes that appeared to have a testis-specific or testis-predominant expression pattern based on EST data exhibited broad-spectrum expression in multiple somatic tissues upon RT-PCR analysis with gene-specific primers. One major reason for this is the underrepresentation of certain types of normal tissues in the EST database. At the time of our "Virtual Northern" analysis, 1,989,425 ESTs were included in the cDNA pool from "normal" tissues. This included 224,322 ESTs from brain and 92,259 from testis, whereas fewer than 1% of the ESTs were derived from pancreas (7614 ESTs), ovary (8152), spleen (16,164), or colon (17,509). As a result, there are frequently no ESTs for genes with low abundance transcripts in these tissues in the database. In comparison, recently developed gene-profiling techniques, such as massively parallel signature sequencing (MPSS), appear to promise a more comprehensive coverage of rare transcripts, at least at the present time. In parallel to the current study, we have also taken the massively parallel signature sequencing approach to identify new CT antigen genes, and this has led to the identification of more than a dozen novel CT genes, including CT45, a gene family located on chromosome Xq26 (11).
Materials and methods
Tumor tissue specimens and cell lines
Tumor tissue specimens were obtained from the Departments of Pathology at the Weill Medical College of Cornell University and at Memorial Sloan-Kettering Cancer Center, following protocols approved by their institutional review boards. Cell lines were obtained from the cell-line bank maintained at the New York Branch of the Ludwig Institute for Cancer Research.
EST-based identification of genes with a predominantly CT expression pattern
The LICR Transcriptome database (6) was used to search for genes showing a predominantly CT expression pattern, which we refer to as CT-like genes. This relational database documents clusters of transcript sequences (including ESTs) aligned to the genome, and the fine structure of the genes from which they are derived (6). A set of controlled vocabularies (eVOCs) (14) is used to describe the origin of EST libraries contributing to the database, allowing reliable searches for genes with specific tissue-expression patterns. The version of the Transcriptome database used during this study was based on Build 30 of the NCBI assembly of the human genome.
Three pools of ESTs were derived from the database. Pool A contained ESTs derived from cDNA libraries of normal adult tissues excluding testis, ovary, placenta, pooled normal tissues, and normal tissues of unknown origin. Pool B included ESTs from libraries of any cancer type except testicular cancer. Finally, pool C contained libraries from normal testis. Normalized and subtracted libraries, as well as small libraries (less than 600 ESTs), were excluded, in an attempt to avoid nonrepresentative EST data.
Genes showing an expression level in normal tissues (pool A) below 5% of that observed in normal testis (pool C), but which are also found in cancers (pool B), were retrieved. Fishers exact test was applied to test the significance of the representational difference observed between pools A and C for the putative CT genes, and genes with P < 0.05 were retained. This list contained 371 candidates, including several genes already listed in the CT database (5).
In silico analysis
To select the most promising candidates among the 371 CT genes identified, the expression profiles of each gene in normal and tumor tissues were evaluated using a combination of the SAGE Anatomic Viewer and its Virtual Northern tool (15) and database searches using BLASTN (16). The objective of the analysis was to identify Unigene clusters containing ESTs derived from testis as well as from non-germ cell tumors, but with limited expression in somatic tissues. Once a Unigene cluster was determined to be a likely CT candidate, the intron-exon structure of the corresponding gene was defined using tools at the NCBI Web site. This information was then used to design trans-intronic primers for RT-PCR.
For specific genes of interest, for example, CT46/HORMAD1, various tools on the NCBI Web site were used for protein similarity searches, the identification of conserved domains, and the prediction of possible transcript variants and proteins. Gene identifiers were retrieved from the Ensembl database (17) in order to maintain a consistent naming convention; short names were assigned to each new gene identified in the project, using Human Gene Nomenclature Committee (HGNC)-approved symbols whenever possible.
For RT-PCR analysis of normal tissue expression, a panel of normalized cDNA (MTC panels I and II, BD Biosciences, Palo Alto, CA) derived from 16 normal tissues was used. Tissues included in these panels were brain, colon, heart, kidney, leukocytes, liver, lung, ovary, pancreas, placenta, prostate, skeletal muscle, small intestine, spleen, thymus, and testis.
In order to evaluate gene expression in tumor cell lines, total RNA was prepared by standard guanidinium thiocyanate-CsCl gradient method, and 2 µg was used in a 20 µl reverse transcription reaction. Two microliters of the synthesized cDNA was then used per 25 µl PCR reaction. PCRs were set up using a commercial master mix (Platinum Taq Supermix, Invitrogen, Carlsbad, CA) and 35 cycles of amplification, each consisting of 15 s at 94°C, 1 min at 60°C, and 1 min at 72°C. The PCR products were visualized by 1% agarose gel electrophoresis and ethidium bromide staining.
Quantitative RT-PCR was performed using an ABI PRISM 7000 Sequence Detection System (Applied Biosystems, Foster City, CA). Normal testis total RNA was obtained commercially (Ambion, Austin, TX). Tumor tissue total RNA was prepared using Trizol reagents (Invitrogen). Two micrograms total RNA was used per 20 µl reverse transcription reaction, and 2 µl cDNA was then used for each 25 µl PCR. The reactions were set up in duplicate sets, and the level of expression was determined as the abundance relative to that in the normal testis sample. For this purpose, a standard curve was established for each PCR plate, consisting of testicular cDNA in four-fold serial dilutions. Forty-five two-step cycles of amplification were performed, with each cycle consisting of 15 s at 95°C and 1 min at 60°C. The RNA quality of the cell lines and tissues was evaluated by separate control amplification of GUS and GAPDH transcripts. All specimens included in the final analysis have Ct values differing by less than four cycles, indicating similar cDNA quality and quantity.
This work was supported by funding from the Cancer Research Institute (to Y.-T. Chen, C. V. Jongeneel, and A. O. Gure) through the Cancer Antigen Discovery Collaborative.
- Received April 21, 2005.
- Accepted June 6, 2005.
- Copyright © 2005 by Yao-Tseng Chen