Scripting

Table of Contents

17.1. Using the system environment: os and sys modules
17.2. Running Programs
17.3. Parsing command line options with getopt
17.4. Parsing
17.5. Searching for patterns.
17.5.1. Introduction to regular expressions
17.5.2. Regular expressions in Python
17.5.3. Prosite
17.5.4. Searching for patterns and parsing

17.1. Using the system environment: os and sys modules

There are modules in the Python library that help you to interact with the system.

The sys module.  The sys module provides an interface with the Python interpreter: you can retrieve the version, the strings displayed as prompt (by default: '>>>' and '...'), etc... You can find the arguments that were provided on the command line:

% python -i prog.py myseq.fasta
>>> import sys
>>> sys.argv
['prog.py', 'myseq.fasta']
	
The file handle for the standard input, output and error are accessible from the sys module:

>>> sys.stdout.write("a string\n")
a string
>>> sys.stdin.read()
a line
another line
'a line\nanother line\n'                                                  (1)
	    
1

You have to enter a Ctl-D here to end the input.

The os module.  This module is very helpful to handle files and directories, processus, and also to get environment variables (see environ dictionary). One of the most useful component is the os.path module, that you use to get informations on files:

>>> import os.path
>>> os.path.exists('myseq.fasta')
1
>>> os.path.isfile('myseq.fasta')
1
>>> os.path.isdir('myseq.fasta')
0
>>> os.path.basename('/local/bin/perl')
'perl'
	  

Exercise 17.1. Basename of the current working directory

Write the statements to display the basename of the current working directory.

The os.path module provides a method: walk that enables to walk in all the directories from a starting directory and to call a given function on each.

Example 17.1. Walking subdirectories

The following code displays for each directory its name and how many files it contains:

>>> def f(arg, dirname, fnames): 
...     print dirname, ": ", len(fnames)

>>> os.path.walk('.', f, None)
	  
The arguments of function f must be: dirname, which is the name of the directory, and fnames which is a list containing the names of the files and subdirectories in dirname. arg is a free parameter, that is passed to walk (here: None).

Exercise 17.2. Finding files in directories

Find a file of a given name and bigger than a given size in a directory and its sub-directories. Only consider files, not directories.