Modules and packages in Python

Table of Contents

14.1. Modules
14.1.1. Using modules
14.1.2. Building modules
14.1.3. Where are the modules?
14.1.4. How does it work?
14.1.5. Running a module from the command line
14.2. Packages
14.2.1. Loading
14.3. Getting information on available modules and packages
14.4. Designing a module

14.1. Modules

A module is a component providing Python definitions of functions, variables or classes... all corresponding to a given specific thema. All these definitions are contained in a single Python file. Thanks to modules, you can reuse ready-to-use definitions in your own programs. Python also encourages you to build your own modules in a rather simple way.

14.1.1. Using modules

In order to use a module, just use the import statement. Let us take an example. Python comes with numerous modules, and a very useful one is the sys module (sys stands for "system"): it provides information on the context of the run and the environment of the Python interpreter. For instance, consider the following code:

#!/local/bin/python
import sys

print "arguments: ", sys.argv
	 
Say that you stored in a modexa.py file, and that you run it like this:
./modexa.py 1 a seq.fasta
	 
This will produces the following output:
arguments:  ['./modexa.py', '1', 'a', 'seq.fasta']
	 
Explanation: By using the argv variable defined in the sys module, you can thus access to the values provided on the command line when launching the program. As shown in this example, the access to this information is made possible by:

  • importing the module through the import statement, which provides access to the module's definitions
  • using the argv variable defined in the module by a qualified name: sys.argv.

You may also select specific components from the module:

	      from sys import argv
	      print "arguments: ", argv
          
In this case, you only import one definition (the argv variable) from the sys module. The other definitions are not loaded.

14.1.2. Building modules

You build your own module by creating a Python file. For instance, if the file ValSeq.py contains the following code (adapted from the Biopython ValSeq module):

Example 14.1. A module

	# file Valseq.py

	valid_sequence_dict = { "P1": "complete protein", \
	 "F1": "protein fragment", "DL": "linear DNA", "DC": "circular DNA", \
	 "RL": "linear RNA", "RC":"circular RNA", "N3": "transfer RNA", \
	 "N1": "other"   }

	def find_valid_key(e):
	   for key,value in valid_sequence_dict.items():
              if value == e:
                  return key
	  

you can use it by loading it:

	import ValSeq
	  
where ValSeq is the module name. You can then access to its definitions, which may be variables, functions, classes, etc...:
	>>> print ValSeq.valid_sequence_dict['RL']
	linear RNA
	>>> ValSeq.find_valid_key("linear RNA")
        RL
	   

14.1.3. Where are the modules?

Modules are mainly stored in files that are searched:

  • in your current working directory,
  • in PYTHONHOME, where Python has been installed,
  • in a path, i.e a colon (':') separated list of file paths, stored in the environment variable PYTHONPATH. You can check this path through the sys.path variable.

Files may be:

  • Python files, suffixed by .py (when loaded for the first time, compiled version of the file is stored in the corresponding .pyc file),
  • defined as C extensions,
  • built-in modules linked to the Python interpreter.

Exercise 14.1. Locating modules

Sometimes, it is not enough to use pydoc or help. Looking at the source code can bring a better understanding, even if you should of course never use undocumented features.

Browse the directory tree PYTHONHOME/site-packages/Bio/.

14.1.4. How does it work?

When importing a module, the interpreter creates a new namespace, in which the Python code of the module's file is run. The interpreter also defines a variable (such as sys, ValSeq, ...) that refers to this new namespace, by which the namespace becomes available to your program (Figure 14.2).

Figure 14.1. Module namespace

open

A module is loaded only once, i.e, a second import statement will neither re-execute the code inside the module (see Python reload statement in the reference guides), nor will it re-create the corresponding namespace.

When selecting specific definitions from a module:

	      >>> from ValSeq import find_valid_key
	      >>> find_valid_key("linear RNA")
	      RL
	    
the other components stay hidden. As illustrated in Figure 14.2, no new namespace is created, the imported definition is just added in the current name space.

Figure 14.2. Loading specific components

This can causes errors if the definition that is imported needs to access to other definitions of the module, e.g:

	      >>> print valid_sequence_dict['RL']
	      NameError: name 'valid_sequence_dict' is not defined
	      >>> print ValSeq.valid_sequence_dict['RL']
	      NameError: name 'ValSeq' is not defined
	    

You can also load "all" the components from a module, which makes them available directly into your code:

	    >>> from ValSeq import *
	    >>> find_valid_key("linear RNA")
	  
You probably did this many times in order to use the string module's definitions, right? The result of:
	    >>> from string import *
	  
is that all the definitions of the module are copied in your current namespace.

Caution

Be aware of potential names collision: for instance, if you current namespace contains a definition of a variable called, say: count, it will be destroyed and overloaded by the string module's definition of the count function.

Caution

You can restrict the components being imported by an import * statement. The __all__ variable, also used for packages (Section 14.2), can explicitly list the components that should be directly accessible (see Exercise 14.4).

14.1.5. Running a module from the command line

When the file of a module is run from the command line (instead for being imported):

	    % python ValSeq.py
	  
the module does not behaves like a module anymore. It is, instead, run within the default __main__ module (i.e not the ValSeq module):
% python -i ValSeq.py
>>> ValSeq.find_valid_key("linear RNA")
NameError: name 'ValSeq' is not defined
>>> find_valid_key("linear RNA")
'DL'
	    
For this reason, the code executed when the module is loaded (e.g: either with import or from the command line) can be made dependent of its current name by testing this name. The current module's name is stored in a special purpose variable __name__:
	    if __name__ == '__main__':
	       # statements that you want to be executed only when the
	       # module is executed from the command line
	       # (not when importing the code by an import statement)
               print find_valid_key("linear RNA")