Galar Fungail
CandidaDB is a C. albicansgenomics database which has been
developed by the Galar Fungail European Consortium in collaboration with members
of the Institut Pasteur.
To
learn about CandidaDB, please read below
Any comments concerning CandidaDB are welcome and should be sent to denfert@pasteur.fr.
CandidaDB is a server that contains a database dedicated to the
analysis of the genome of the human fungal pathogen, Candida albicans.
Its purpose is to collate and integrate various aspects of the genomic
information from C. albicans, which is currently responsible for
the vast majority of life-threatening fungal infections in
immuno-compromized individuals. CandidaDB provides an almost complete
dataset of DNA and protein sequences derived from C. albicansstrain SC5314, linked to the relevant annotations and functional assignments. It allows one to easily browse through these data and retrieve information, using various criteria (gene names, location, keywords, etc.).
Nucleotide sequence data for C. albicans were obtained from the
Stanford Genome Technology Center website. Sequencing of C.
albicanswas accomplished with the support of the NIDR and the
Burroughs Wellcome Fund.
CandidaDB is supplemented with information from C. albicansentries present in the EMBL/GenBank/DDBJ databanks, as well as observations either published in international journals or communicated directly to us by individual researchers.
Things to know about CandidaDB:
The following strategy has been followed to develop CandidaDB:
- Identification in the C. albicansgenome sequence of open reading frames which encode proteins of more than 150 amino acids or have a homologue in public databases or have a significant coding probability according to GeneMark prediction.
- Identification of duplicated protein coding sequences which occur in the C. albicansgenome because of heterozygosity. In this case, only one allele is present in CandidaDB but information about the location of the other allele in Assembly 6 of the C. albicansgenome will soon be provided.
- Annotation of all remaining protein coding sequences using the following convention:
- Gene names: The standard convention for naming C. albicansgenes was used. Only genes that have already been characterized or can be postulated to encode a functional homologue of the more closely related Saccharomyces cerevisiae gene are named according to this convention. Genes that did not meet these criteria have been designated IPFxxxx where IPF stands for "Individual Protein File".
Synonyms include the corresponding ORF in the dataset available at the Stanford Genome Technology Center .
- Gene names tags: Several tags have been added to gene names to take into account the occurence of frame-shifts, contig breaks and exon/intron structures that are present in the current version of the C. albicansgenome:
- ".5" and ".3" indicate that the protein coding sequence only corresponds to the 5'- or 3'-end of a gene and that the corresponding 3'- or 5'-end has not been defined.
- ".5f" and ".3f" indicate that the protein coding sequence only corresponds to the 5'- or 3'-end of a gene and that the corresponding 3'- or 5'-end is found contiguously on the genome.
- ".5eoc" and ".3eoc" indicate that the protein coding sequence only corresponds to the 5'- or 3'-end of a gene and is located at the end of a Contig in Assembly 6 of the C. albicansgenome available at the Stanford Genome Technology Center .
- ".exon1" and ".exon2" indicate exon regions of genes that have an exon/intron structure.
- Functions: Functions were assigned on the basis of published data when available or homology to proteins of known function. In this case, the proposed function is followed by a "(by homology)" tag to emphasize that it has not been validated by experimental data and therefore should be used cautiously.
- S. cerevisiaehomologue: Informations are provided on the more closely related gene in S. cerevisiae.These include when available gene name, alternate gene names, function, e-value of blastp of the C. albicansprotein vs the S. cerevisiaeproteome, reciprocity of the blastp result.
- In addition to these informations and to the different search and genome browsing tools, CandidaDB provides links to the sequence of the Assembly6 Contig in which a coding sequence is located and to the corresponding ORF and its annotation on the Stanford Genome Technology Center website.
- WARNING: For the purpose of developing CandidaDB, we have generated a virtual genome sequence by linking Contig DNA sequences of Assembly 6 available at the Stanford Genome Technology Center. Contig sequences are separated by stretches of 500 undefined nucleotides (X). Linkage of the Contig sequences is independent of their actual order on the C. albicansgenome. Therefore, the locations of the coding sequences on this virtual genome do not reflect their actual location on the C. albicansgenome. Location of the coding sequences on the Contig they originate is also provided in CandidaDB.
This server is constructed on top of a UNIX Sybase database and uses the framework of Genolist developed at Institut Pasteur.
You will find more information about Genolist database in:
I. Moszer, P. Glaser and A. Danchin
"SubtiList: a relational database for the Bacillus subtilis genome"
Microbiology (1995) 141:261-268
If you use CandidaDB in a publication, please quote the following:
"Nucleotide sequence data for Candida albicanswere obtained from the
Stanford Genome Technology Center website at
http://www-sequence.stanford.edu/group/candida. Sequencing of C.
albicanswas accomplished with the support of the NIDR and the
Burroughs Wellcome Fund.
Informations about coding sequences and proteins were obtained from CandidaDB available at http://www.pasteur.fr/Galar_Fungail/CandidaDB/ which has been developed by the Galar Fungail European Consortium (QLK2-2000-00795)."
Any comments concerning CandidaDB are welcome and should be sent to denfert@pasteur.fr.
This World-Wide Web server has been set up at the Institut Pasteur (Paris - FRANCE) in the framework of the European project "Novel Approaches for the Control of Fungal Disease - Galar Fungail (Fifth framework programme, EC contract: QLK2-2000-00795) coordinated by Alistair Brown (see The Galar Fungail consortium and the CandidaDB genomic database for C. albicans). It has also benefited from fundings from the French Ministere de la Recherche (Reseau Infections Fongiques, Programme de Recherche Fondamentale en Microbiologie, Maladies Infectieuses et Parasitaires) to Christophe d'Enfert and Claude Gaillardin.
This server has been constructed by Louis Jones from the Service d'Informatique Scientifique and Ivan Moszer from the Unité de Génétique des Génomes Bactériens.
The original SubtiList database was realized by Ivan Moszer with the invaluable help from Claudine Médigue and Alain Viari.
Curation of the data is made by Christophe d'Enfert and Lionel Frangeul.
It has benefited from the expertise of the European Galar Fungail Consortium with the participation of: Alistair Brown and Abigail Mavor (University of Aberdeen, UK), Claude Gaillardin and Djamila Onésime (INRA, Grignon, France), Joachim Ernst and
S. Krishnmurthy (University of Duesseldorf, Germany), Angel Dominguez, Maria-Carmen Lopez and Nuria Martin (University of Salamanca, Spain), Jose Perez Martin (CSIC, Madrid, Spain), Piet de Groot and Frans Klis (University of Amsterdam, the Netherlands), Luis Castillo and Rafael Santandreu (University of Valencia, Spain), Oliver Bader, Chantal Fradin, Donita Kunze and Bernhard Hube (Robert Koch Institute, Berlin, Germany) and Fredj Tekaia, Sylvie Rodriguez and Susana Garcia (Institut Pasteur, Paris, France).
Development of CandidaDB would not have been possible without the availability of the C. albicansgenome sequence provided to the scientific community by the Stanford Genome Technology Center prior to any publication. Sequencing of C.
albicansat the Stanford Genome Technology Center was accomplished with the support of the NIDR and the Burroughs Wellcome Fund.
Galar
Fungail Home Page
CandidaDB