Introduction to Programming using Python

Programming Course for Biologists at the Pasteur Institute

Katja Schuerer

Corinne Maufrais

Catherine Letondal

Eric Deveaud

Marie-Agnes Petit

February, 2 2008

Abstract

The objective of this course is to teach programming concepts to biologists. It is thus aimed at people who are not professional computer scientists, but who need a better control of computers for their own research. This programming course is part of a course in informatics for biology. If you are already a programmer, and if you are just looking for an introduction to Python, you can go to this Python course (in Bioinformatics).

PDF version of this course

This course is still under construction. Comments are welcome.

Handouts for practical sessions (still under construction) will be available on request.

Contact: ieb@pasteur.fr


Table of Contents

1. Introduction
1.1. First session
1.1.1. Let's run the python interpreter
1.1.2. Variables
1.1.3. Strings
1.1.4. Functions
1.1.5. Lists
1.1. A first program
1.2. Why Python
1.3. Programming Languages
2. Variables
2.1. Data, values and types of values
2.2. Variables or naming values
2.3. Variable and keywords, variable syntax
2.4. Namespaces or representing variables
2.5. Reassignment of variables
3. Statements, expressions and functions
3.1. Statements
3.2. Sequences or chaining statements
3.3. Conditionals
3.4. Iterations
3.5. Functions
3.6. Operations
3.7. Composition and Evaluation of Expressions
4. Communication with outside
4.1. Output
4.2. Formatting strings
4.3. Input
4.4. Files
4.4.1. Reading from a file
4.4.2. Writing into a file
4.4.3. Errors with files
5. Program execution
5.1. Executing code from a file
5.2. Interpreter and Compiler
6. Strings
6.1. Values as objects
6.1. Working with strings
6.1.1. Unicode
6.1.2. Iterations on strings
7. Branching and Decisions
7.1. Conditional execution
7.2. Conditions and Boolean expressions
7.3. Logical operators
7.4. Alternative execution
7.5. Chained conditional execution
7.6. Nested conditions
7.7. Solutions
8. Defining Functions
8.1. Defining Functions
8.2. Parameters and Arguments or the difference between a function definition and a function call
8.3. Functions and namespaces
8.4. Boolean functions
9. Collections
9.1. Datatypes for collections
9.2. Methods, Operators and Functions on Lists
9.3. Methods, Operators and Functions on Dictionaries
9.4. What data type for which collection
10. Nested data structures
10.1. Nested data structures
10.2. Identity of objects
10.3. Copying complex data structures
10.4. Modifying nested structures
11. Repetitions
11.1. Repetitions
11.2. The for loop
11.3. The while loop
11.4. Comparison of for and while loops
11.5. Range and Xrange objects
11.6. The map function
11.7. List comprehensions
12. Exceptions
12.1. General Mechanism
12.2. Python built-in exceptions
12.3. Raising exceptions
12.4. Defining exceptions
13. Functions II
13.1. Passing argument by name
13.2. Defining default values
13.3. Variable number of parameters
13.4. Functions and namespaces
13.5. Defining a function as a parameter
14. Modules and packages in Python
14.1. Modules
14.1.1. Using modules
14.1.2. Building modules
14.1.3. Where are the modules?
14.1.4. How does it work?
14.1.5. Running a module from the command line
14.2. Packages
14.2.1. Loading
14.3. Getting information on available modules and packages
14.4. Designing a module
15. Recursive functions
15.1. Recursive functions definition
15.2. Flow of execution of recursive functions
15.3. Recursive data structures
15.4. Solutions
16. Files
16.1. Handle files in programs
16.2. Reading data from files
16.3. Writing in files
16.4. Design problems
16.5. Documentation strings
17. Scripting
17.1. Using the system environment: os and sys modules
17.2. Running Programs
17.3. Parsing command line options with getopt
17.4. Parsing
17.5. Searching for patterns.
17.5.1. Introduction to regular expressions
17.5.2. Regular expressions in Python
17.5.3. Prosite
17.5.4. Searching for patterns and parsing
18. Object-oriented programming
18.1. Introduction
18.2. What are objects and classes? An example
18.2.1. Objects description
18.2.2. Methods
18.2.3. Classes
18.2.4. Creating objects
18.3. Defining classes in Python
18.4. Combining objects
18.5. Classes and objects in Python: technical aspects
18.5.1. Namespaces
18.5.2. Objects lifespan
18.5.3. Objects equality
18.5.4. Classes and types
18.5.5. Getting information on classes and instances
18.6. Solutions
19. Object-oriented design
19.1. Introduction
19.2. Components
19.2.1. Software quality factors
19.2.2. Large scale programming
19.2.3. Modularity
19.2.4. Methodology
19.2.5. Reusability
19.3. Abstract Data Types
19.3.1. Definition
19.3.2. Information hiding
19.3.3. Using special methods within classes
19.4. Inheritance: sharing code among classes
19.4.1. Introduction
19.4.2. Discussion
19.5. Flexibility
19.5.1. Summary of mechanisms for flexibility in Python
19.5.2. Manual overloading
19.6. Object-oriented design patterns
19.7. Solutions
Bibliography

List of Figures

1.1. History of programming languages(Source)
2.1. Namespace
2.2. Reassigning values to variables
3.1. Syntax tree of the expression
4.1. Interpretation of formatting templates
5.1. Comparison of compiled and interpreted code
5.2. Execution of byte compiled code
6.1. String indices
6.2. Unicode
7.1. Flow of execution of a simple condition
7.2. If statement
7.3. Block structure of the if statement
7.4. Flow of execution of an alternative condition
7.5. Multiple alternatives or Chained conditions
7.6. Nested conditions
7.7. Multiple alternatives without elif
8.1. Function definitions
8.2. Blocks and indentation
8.3. Stack diagram of function calls
9.1. Comparison some collection datatypes
10.1. Representation of nested lists
10.2. Accessing elements in nested lists
10.3. Representation of a nested dictionary
10.4. List comparison
10.5. Copying nested structures
10.6. Modifying compound objects
11.1. The for loop
11.2. Flow of execution of a while statement
11.3. Structure of the while statement
11.4. Passing functions as arguments
12.1. Exceptions class hierarchy
14.1. Module namespace
14.2. Loading specific components
15.1. Stack diagram of recursive function calls
15.2. Stack diagram of recursive function calls for function fact()
15.3. A phylogenetic tree topology
15.4. Tree representation using a recursive list structure
16.1. ReBase file format
16.2. Flowchart of the processing of the sequence
17.1. Manual parsing
17.2. Event-based parsing
17.3. Parsing: decorated grammar
17.4. Parsing result as a hierarchical document
17.5. Pattern searching
17.6. Python regular expressions
17.7. Python regular expressions: classes and methods summary
18.1. A DNA object
18.2. Representation showing object's methods as counters
18.3. A Protein object.
18.4. Protein and DNA objects.
18.5. Classes and instances namespaces.
18.6. Class attributes in class dictionary
18.7. Classes methods and bound methods
18.8. Types of classes and objects.
19.1. Components as a language
19.2. A stack
19.3. Dynamic binding (1)
19.4. Dynamic binding (2)
19.5. UML diagram for inheritance
19.6. Multiple Inheritance
19.7. Delegation
19.8. A composite tree

List of Tables

3.1. Order of operator evaluation (highest to lowest)
4.1. String formatting: Conversion characters
4.2. String formatting: Modifiers
4.3. Type conversion functions
6.1. String methods, operators and builtin functions
6.2. Boolean methods and operators on strings
7.1. Boolean operators
9.1. Sequence types: Operators and Functions
9.2. List methods
9.3. Dictionary methods and operations
16.1. File methods
16.2. File modes
19.1. Stack class interface
19.2. Some of the special methods to redefine Python operators

List of Examples

5.1. Executing code from a file
8.1. More complex function definition
8.2. Function to check whether a character is a valid amino acid
10.1. A mixed nested datastructure
11.1. Translate a cds sequence into its corresponding protein sequence
11.2. First example of a while loop
12.1. Filename error
12.2. Give the user a chance to enter a proper filename
12.3. Raising an exception in case of a wrong DNA character
12.4. Raising your own exception in case of a wrong DNA character
12.5. Exceptions defined in Biopython
13.1. Default values of parameters
13.2. Function execution namespaces
13.3. Global statement
13.4. Global statement (2)
14.1. A module
14.2. Using the Bio.Fasta package
14.3. A genealogy module
15.1. Factorial
16.1. Reading from files
16.2. Restriction of a DNA sequence
17.1. Walking subdirectories
17.2. Running a program (1)
17.3. Running a program (2)
17.4. Running a program (3)
17.5. Getopt example
17.6. Searching for the occurrence of PS00079 and PS00080 Prosite patterns in the Human Ferroxidase protein
18.1. DNA, a class for DNA sequences (first version)
18.2. DNA, a class for DNA sequences (complete version)
19.1. A Stack
19.2. Stack class
19.3. Defining operators for the DNA class
19.4. Inheritance example (1): sequences
19.5. Curve class: manual overloading
19.6. An uppercase sequence class
19.7. ImmutableList class
19.8. SequenceDB class
19.9. A composite tree
19.10. A recursive visitor

List of Exercises

3.1. Composition
4.1. Copying a file
5.1. Execute code from a file
6.1. Iterating on strings
7.1. Chained conditions
7.2. Nested condition
10.1. Representing complex structures
11.1. Write the complete codon usage function
11.2. Converting a list from integers to floats
11.3. Generate the list of all possible codons
13.1. Sorting a dictionary by its values
14.1. Locating modules
14.2. Bio.SwissProt package
14.3. Using a class from a module
14.4. Import from Bio.Clustalw
15.1. Recursive definition of reverse
16.1. Multiple sequences for all enzymes
17.1. Basename of the current working directory
17.2. Finding files in directories
18.1. A Point class
18.2. A Point class (continued)
19.1. ADT for the Point class
19.2. Operators for the DNA class
19.3. A ConstantPoint class
19.4. Example of an abstract framework: Enzyme parser
19.5. Point factory
19.6. An analyzed sequence class
19.7. A partially editable sequence

List of Definitions

2.1. Type
2.2. Binding
2.3. Variable
2.4. Namespace
2.5. Reference
3.1. Statement
3.2. Program
3.3. Sequence
3.4. Function
3.5. Function call
3.6. Arguments of functions
3.7. Operations and Operators
3.8. Composition and Expression
4.1. Files
5.1. Interpreter
5.2. Interactive interpreter session
5.3. Compiler
6.1. Object
6.2. Attribute
6.3. Method
6.4. Strings
7.1. Branching or conditional statements
7.2. Conditions or Boolean expressions
7.3. Multiple alternative conditions
8.1. Abstraction
8.2. Basic instruction
8.3. Parameter
8.4. Block
8.5. Stack
8.6. builtin namespace
8.7. Boolean function
9.1. List
9.2. Dictionary
11.1. The for loop
11.2. The while loop
15.1. Recursive function
15.2. Recursive definition
15.3. Nested list
18.1. Object
18.2. Class
18.3. Instance
18.4. Method
19.1. Overloading
19.2. Polymorphism