17.4. Parsing

In Bioinformatics, parsing is very important, since it enables to extract informations from data files or to extract results produced by various analysis programs, and to make them available in your programs. For instance, a Blast parser will transform a text output into a list of hits and their alignment, that can be made available as a data structure, such as, for example, Biopython Bio.Blast.Record objects, that you can use in a Python program.

The purpose of this section is not to present everything about parsing, but just to introduce some basic notions.

Parsing means analyzing a text and producing structured data in a form that is useful for programs. It can be a list of strings, a set of classes instances, or just a boolean result: this depends on the needs and the parsing system you are using. An important aspect of parsing is the architecture that is used to process the text that you want to analyze.

So, in all the cases, there is an engine driving the whole process, be it a simple loop or a specialized component. In this chapter, we will just do some "manual" parsing with patterns that are introduced in Section 17.5, as well as some event-driven parsing will be done as a practical work on abstract frameworks (see Exercise 19.4), and during the Web/XML course.