Unit: Biological Software and Databases
Director: Bernard Caudron
The "Biological Software and Databanks" Group is very active in the installation, maintenance and interfacing of biological software for researchers. A great mission is to give access to a choice of international biological databanks that can be searched with this software. We make scientific databases that enable to place laboratory results online. The interfaces used to present the biological software are in full release. The group takes part in teaching activities: bioinformatics for the continuing education courses, initiation to bioinformatics for the Genome Analysis and the Cell and Molecular Genetics courses, the organisation of the course in informatics for biology is assured by three persons of the group, to which as teachers the other members of the group join.
Software for biology:
The whole software available for biology is divided into 340 packets including about 1500 programs. Installation, release and maintenance of this software represent the major part of the group work. The packet software is from different origins and the installation work depends on the professionalism with which they were prepared by the authors.
To make the access to this software more convivial, we have constructed 380 Web interfaces that guide the user in the choice of the programs and parameters related that enables him to start by the Web and to retrieve results by email. This service is opened on the campus and from outside, it is comparable to the one that is provided reciprocally by the NCBI, the EBI or Infobiogen to the researchers worldwide. Software is sometimes spoilt by mistakes, our mission is to detect these mistakes and analyse them in order to send the authors either a report, or a corrective when we have found one.
Our group interacts in a constructive way with the following centres: NCBI (for Blast programs), Washington University (phred, phra, consed, phylip) and the HGMP Centre (for the EMBOSS program)
The project Mobyle, relies on studies made since end 2003, intended to improve the current portal of bioinformatics analysis, entered the second half of the year 2004 in the implementation stage.
Biological databanks: (person in charge: N. Joly)
Since 1999, the biological databases are updated with an automate that has been developed by Nicolas Joly, Marc Baudoin and myself. We have made a software that updates automatically biological databanks currently available on our servers. The important data volume remained the most difficult constrain to manage. The space used on disks for the whole formats represents for GenBank 500 Go, for Embl 400 Go and for the other databanks 100 Go, in total 1000 Go are reserved for databanks. Today 36 banks, including Embl, GenBank, Swissprot, Uniprot, TrEmbl, Genpept, Pdb, Pir are locally copies of original banks that are searched periodically by the automate on their respective production sites.
Databases: (persons in charge: Louis Jones, C. Jorge)
New databases have been added according to the model "GenoList" to present on the web the microbactery genome Legionella pneumophila (LegioList) and an eukaryotic genome Candida albicans (CanidaDB). The model "GenoList" still provides the expected services and numerous genomes were updated in 2004: Mycobacterium tuberculosis (TubercuList), Mycobacterium leprae (Leproma), Mycobacterium bovis (BoviList).
Software ARPAS which manages and queries the CRBIP' databases is functional since 2 years, its development is under way in order to integrate a web access.
A database has been developed for the Mouse Molecular Genetics Unit, to analyse, to compare and label the X-chromosome regions around the mouse and bovine gene Xist.
Currently C. Jorge works in collaboration with the "Centre of resources in biostatistics, epidemiology and pharmaco-epidemiology" (D. Guillemot, C. Toneatti-Lemare, L. Lafitte) to the development and experimentation of data collection tools via secured databases and via the Internet (eCRF) adapted to the pharmaco-epidemiology research.
Electronic mail help:
For the year 2004, the whole electronic mail received at the address email@example.com represents 1100 help demands, to which 1800 answers were made 2/3 from the "Systems and network" group and 1/3 from the "Biological software and databanks" group. We manage also numerous questions coming from Bioweb users connected by the Internet.
Bioinformatics training sessions: (person in charge: C. Maufrais)
We suggest since 1993, with the continuing education department, theoretical and practical training sessions for the autonomous and critical use of biological data analysis software. A training session is held each year, during 3 weeks in November and in December, that is to say fifteen 1/2 days that can be followed by about thirty researchers, technicians or trainees.
This training content was useful for the "2nd Bioinformatics course in Dakar" carried out with the Dakar Institut Pasteur, the AUF, the Dakar University and the Paris Institut Pasteur, from June 14t th to 30 th, 2004. This course enabled the 16 students of the countries of West Africa to improve themselves with the bioinformatics tools.
Courses: (persons in charge C. Letondal, C. Maufrais, K. Schuerer, E. Deveaud)
The informatics in biology course takes place from January 6th to April 23rd, 2004. This course aims at giving the biologists autonomy in the creation of software tools. After a theoretical part that introduces the informatics bases and implies about twenty lecturers in total (half of them belongs to the Institute), 15 trainees have succeeded in applying their knowledge to the completion of a bioinformatics project. The theme of this project is a real problem set either by the student laboratory, or by one of the lecturer.
The year 2004 was the opportunity, for Catherine Letondal in collaboration with Thierry Rose, head of research in the Immunogenetics of the Cell, to coordinate a little group of scientific and technical discussions that includes 70 persons in total, discussion list subscribers, a part of them meets about twice a month for presentations or conference reports.
In 2004, the group members have participated in the following scientific projects:
Participation in the assemblage of the whole Candida glabrata genome (Dujon and al, 2004)
Searching for a signature PPI with the SwissProt bank, creation of the "sig" program (Garcia and al, 2004)
CandidaDB: a database for the Candida albicans genome (d'Enfert and al, 2005)
Publication of a conference article and a book chapter about the "Search works on programming by the biologist user". (C. Letondal and al, 2004)
Keywords: bioinformatics, biological software, databases, databanks