Unit: Biological Software and Databases
Director: Bernard Caudron
The "Biological Software and Databanks" Group is in charge of the installation, maintenance and interfacing of biological software for researchers, as well as the maintenance and access to a choice of international biological databanks. We provide the concept of scientific databases that enable to place laboratory results online. The interfaces used to present the biological software are in full release; their installation on the new cluster is under way. The group takes part in teaching activities: bioinformatics for the continuing education courses, initiation to bioinformatics for the Genome Analysis and the Cell and Molecular Genetics courses and the organisation of the course in Informatics for Biology of the Institut Pasteur.
Software for biology:
The whole software available for biology is divided into 340 packets including about 1500 programs. Installation, release and maintenance of this software represent the major part of the group work. The packet software is being installed on the new cluster. The migration began during the 4th quarter of 2005 and we still have 6 months left to work.
To make the access to this software more convivial, we have constructed 380 Web interfaces that guide the user in the choice of the programs and parameters related that enables him to start by the Web and to retrieve results by email. This service is opened on the campus and from outside, it is comparable to the one that is provided reciprocally by the NCBI, the EBI or the SIB to the researchers worldwide. Software is sometimes spoilt by mistakes, our mission is to detect these mistakes and analyse them in order to send the authors either a report, or a corrective when we have found one.
Our group interacts in a constructive way with the following centres: NCBI (for Blast programs), Washington University (phred, phrap, consed, phylip) and the EMBOSS developers (European Molecular Biology Open Software Suite).
The project Mobyle, which relies on studies made since end 2003, intended to improve the current portal of bioinformatics analysis is at 75 % of its implementation.
Biological databanks: (person in charge: N. Joly)
Since 1999, the biological databases are updated with an automate that has been developed by Nicolas Joly, Marc Baudoin and myself. We have made software that updates automatically biological databanks currently available on our servers. The important data volume remained the most difficult constrain to manage. The space used on disks for the whole formats represents for GenBank 650 Go, for Embl 600 Go and for the other databanks 150 Go, in total 1400 Go are reserved for databanks. Today 36 banks, including Embl, GenBank, RefSeq, Uniprot, Nrprot, Genpept, Pdb, Pir are locally copies of original banks that are searched periodically by the automate on their respective production sites.
Databases: (persons in charge: Louis Jones, C. Jorge)
In 2005, new databases have been added according to the model "GenoList" to present on the web the eukaryotic genome Candida albicans (CanidaDB), the genome Aspergilus Fumigatus (AspergiList) and the genome Mycobacterium Smegmatis (SmegmaList). A database for biodiversity has been developed by
L. Jones for Ph. Glaser. This work resulted in a presentation during the departmental days of June 2005 (The Biodiversity database, L. Jones).
The ARPAS software that manages and interrogates the CRBIP's database is being running for 3 years, its adaptation is under way to integrate a web access.
A database has been developed for the Mouse Molecular Genetics Unit, to analyse, compare and label the X-chromosome regions around the mouse and bovine gene Xist.
Currently C. Jorge works in collaboration with the "Centre of resources in biostatistics, epidemiology and pharmaco-epidemiology" (D. Guillemot, C. Toneatti-Lemare, L. Lafitte) to the development and experimentation of data collection tools via secured databases and via the Internet (eCRF) adapted to the pharmaco-epidemiology research.
Electronic mail help:
Since the year 2005, a new electronic address pi-bioinfo at pasteur.fr enables the researchers at the campus to ask about the software, databanks and databases applied to biology. We also answer to the questions of external users connected by the Internet on our Bioweb server.
Bioinformatics training sessions: (person in charge: C. Maufrais)
We suggest since 1993, with the continuing education department, theoretical and practical training sessions for the autonomous and critical use of biological data analysis software. A training session is held each year, during 3 weeks in November and in December, that is to say fifteen 1/2 days that can be followed by about thirty researchers, technicians or trainees.
Courses: (persons in charge C. Letondal, C. Maufrais, K. Schuerer, E. Deveaud)
The informatics in biology course takes place from January 6th to April 29rd, 2005. This course aims at giving the biologists autonomy in the creation of software tools. After a theoretical part that introduces the informatics bases and implies about twenty lecturers in total (half of them belongs to the Institut Pasteur), 16 trainees have succeeded in applying their knowledge to the completion of a bioinformatics project. The theme of this project is a real problem set either by the student laboratory, or by one of the lecturer.
The year 2005 was the opportunity, for Catherine Letondal in collaboration with Thierry Rose and F. Hantraye, to coordinate a little group of scientific and technical discussions that includes 70 subscribers, a part of them meets about twice a month to present their works and share their experience on bioinformatics.
Keywords: bioinformatics, biological software, databases, databanks