C.
albicans first generation DNA Arrays
Introduction
Candida albicans genome sequence data from GenBank and from the Stanford GenomeTechnology Center (Assembly 3) were used to make gene arrays. Using these data, 3313 putative ORFs were identified, from which 2016 ORFs were selected for array construction. They comprised 364 GenBank entries, 1020 S cerevisiae homologues and 632 hypothetical ORFs. A 3'-fragment of 300-400 bp from each ORF was polymerase chain reaction (PCR)-amplified, yielding 2002 products that were arrayed on nylon membranes (Fig. 1). The array covers about one third of the expected 6000 protein coding genes in C. albicans.
A table is provided with information on all the probes that are present in these arrays. This includes the location of the probes on the arrays, the location of the identified ORFs in the Contigs of Assembly 3 of the C. albicans genome or in the Genbank entries, gene names, functional assignments and links to the C. albicans genomic database CandidaDB, to the Stanford Genome Technology Center database for Assembly 6 of the C. albicans genome and to Genbank.
These arrays were first described in:
Murad AMA, d'Enfert C, Gaillardin C, Tournu H, Tekaia F, Talibi D, Maréchal D, Marchais V, Cottin J, Brown AJ (2001) Transcript profiling in Candida albicans reveals new cellular functions for the transcriptional repressors, CaTup1, CaMig1 and CaNrg1. Mol. Microbiol. 42:981-993 (2001)[Medline].
Sequence data for Candida albicans was obtained from the Stanford Genome Technology Center website. Sequencing of Candida albicans was accomplished with the support of the NIDR and the Burroughs Wellcome Fund.
To download the HDF.xls table, click here
To get more informations on the gene annotation procedure and the nomenclature, click here
To get more informations on the arrays, click here
To download the transcriptdata.xls file of Murad et al., click here
C. albicans gene annotation
A total of 6213 protein sequences predicted from the S. cerevisiae genome sequence were downloaded from MIPS (25 April 1995). C. albicans sequences were obtained from two different sources: 380 non-redundant entries of C. albicans ORFs were retrieved from GenBank (9 July 1999); and Assembly 3 was obtained from the Stanford C. albicans sequencing project (22 April 1999). Assembly 3 contained 1919 contigs of at least 2 kb, covering 12 301 999 bp, or around 80%, of the C. albicans genome, assuming a haploid genome size of 15.5 Mb. The largest contigs (Contig3-3189 to Contig3-3718), representing 6 992 176 bp, were annotated using the following two approaches. Firstly, ORFs were identified using the graphical analysis tool orffinder. Secondly, each contig was searched for segments matching known C. albicans or S. cerevisiae genes. Assembly 3 was compared with C. albicans entries using blastn, and to S. cerevisiae ORFs using blastx using the alternative yeast nuclear code (Ohama et al., 1993). All blast searches were automatically launched and displayed using the scripts blastallgenomes and readblast (Tekaia et al., 2000), which provided a working annotation table listing the ORF name, its length, the percentage identity/similarity/gap and the position of the beginning and end of the match in the contig and in the ORF. paintblast was used to compare orffinder and blast outputs, which facilitated identification of possible frameshifts or introns. Several classes of C. albicans ORFs were annotated: known C. albicans genes, homologues of S. cerevisiae genes and additional ORFs larger than 150 codons. Overlapping ORFs on the same strand were assumed to result from sequencing errors and considered as a single ORF. The longest ORF was retained when two ORFs overlapped on opposite strands. 3'-truncated ORFs that lay at the end of contigs were discarded because 3'-ends of ORFs were to be arrayed. A total of 3313 putative C. albicans ORFs were identified, and 2016 of these were selected for the construction of gene arrays.
In
the accompanying table, spot numbers are linked to an ORF
name of the following type A.x.y where:
A is either the genbank accession number or the Contig3 number in the april 99 release of the C. albicans genome sequence available from the Stanford DNA Sequencing and Genome Technology Center
a corresponds to the 5’ of the ORF in the Genbank or April 99 release DNA sequence
b corresponds to the 3’ of the ORF in the Genbank or April 99 release DNA sequence
Different annotations are shown for each ORF:
Type:
Genbank if the ORF was
extracted from Genbank
ho if the ORF was extracted from a Contig but did not show a match with a S. cerevisiae ORF in our annotation process
sc if the ORF was extracted from a Contig and showed a match with an entire S. cerevisiae ORF in our annotation process
scN if the ORF was extracted from a Contig and showed a match with the COOH of a S. cerevisiae ORF in our annotation process
SChomol: S. cerevisiae homologue identified by comparison of the contig to a S. cerevisiae ORF database using blastx
YPD link: Html link to annotation of corresponding S. cerevisiaegene at Proteome Inc.
Stanford: Corresponding ORF in the Stanford Contig6 release of October 2000 as deduced from a blastn analysis of the GCCA database vs the Contig6 database
Stan link: Html link to annotation of corresponding C. albicans ORF at Stanford
CandidaDB: Accession number of corresponding ORF in the C. albicans genomic database CandidaDB. Annotation in CandidaDB was performed by the European Galar Fungail Consortium using Assembly 6 of the C. albicans genome available from the Stanford DNA Sequencing and Genome Technology Center
Gene: Gene name as available in CandidaDB
Function: Function as available in CandidaDB
C. albicans
array construction
PCR products were arrayed using published procedures (Richmond et al.,1999). PCR primers of 18-22 bases in length were designed using
primer3
software to amplify a 3'-region of 300-400 bp from each ORF. These oligonucleotides
were synthesized with a 5'-tag (5'-oligonucleotide, 5'-CGACGCCCGCTGATA: 3'-oligonucleotide,
5'-GTCCGGGAGCCATC') to facilitate subsequent re-amplification of the PCR products. Using these
oligonucleotides, the ORFs were PCR-amplified from the C. albicans SC5314
genome. The purity and length of all PCR products were checked by agarose gel electrophoresis. A total of 2002 PCR
products, which satisfied our quality controls, were spotted in duplicate onto nylon
membranes using a BioGrid System (BioRobotics) (Fig. 1). Candida albicans genomic DNA, E. coli ORFs and S. cerevisiae ORFs were
included on the membranes as controls.
A schematic representation of the array is shown below.
The sequences of the PCR primers for each ORF are available upon request to Christophe d'Enfert.
Publications
Murad AMA, d'Enfert C, Gaillardin C, Tournu H, Tekaia F, Talibi D, Maréchal D, Marchais V, Cottin J, Brown AJ (2001) Transcript profiling in Candida albicans reveals new cellular functions for the transcriptional repressors, CaTup1, CaMig1 and CaNrg1. Mol. Microbiol. 42:981-993 (2001)[Medline].
Murad, A.M.A., Leng, P., Straffon, M., Wishart, J., Macaskill, S., MacCallum, D. Schnell, N., Talibi, D., Marechal, D., Tekaia, F., d’Enfert, C., Gaillardin, C., Odds, F.C. and Brown, A.J.P. (2001) NRG1 represses yeast-hypha morphogenesis and hypha-specific gene expression in Candida albicans. EMBO. J. 20, 4742-4752. [Medline]
Fradin, C., Kretschmar, M., Nichterlein, T., Gaillardin, C., d'Enfert, C., and Hube, B. (2002) Stage-specific gene expression of
Candida albicans in human blood. Molec. Microbiol. 47:1523-1543. [Medline]