18.3. Defining classes in Python

So, how do we actually define classes? For instance, let us describe the DNA class. The program listed in Example 18.1 defines a DNA class as defined in Figure 18.2. The definition of the class is composed of two main parts: a header, providing the name of the class, and a body, that is composed of a list of definitions, mainly method definitions, but sometimes also assignments (see Section 18.5.1).

Example 18.1. DNA, a class for DNA sequences (first version)

Les us look at a first version of our DNA, with a limited set of methods. A complete version will be presented later on.
                                                                          (1)
class DNA:
    
    def __init__(self, name=None, seq=None):                              (2)
        self.name = name
        self.seq = lower(seq)                                             (3)
 
    def gc(self):
        count_c = self.seq.count('c')                                     (4)
        count_g = self.seq.count('g')
        return float(count_c + count_g) / len(self.seq)

    def setname(self, name):                                              (5)
        self.name = name
	      
1

This statement declares and creates DNA as a class.

2

The __init__ method is automatically called at instance creation (see below).

3

Initialization of instances attributes (name and seq).

4

This method defines how to compute the GC percent of the sequence.

5

This method enables to change the name of the sequence.

The self parameter represents the object itself. You could of course use any other word like carrot or ego, but this would not help the reading of your code by others... So self is present in the Class and methods definitions each time the reference to the object instance is needed.

Let us first look at one of these methods definitions, the gc method, which computes the GC percent of the sequence:

    
      def gc(self):
        count_c = self.seq.count('c')
        count_g = self.seq.count('g')
        return float(count_c + count_g) / len(self.seq)
	  
Method definitions follow exactly the same pattern as standard function definitions, except that they must have declared a first parameter (here: self) for referring to the instance. Indeed, an instance identificator, actually a reference to the instance, is required in order for the statements within methods to access to the current instance attributes: here, the access to the seq attribute is needed in order perform the count. In fact, Python automatically passes the instance reference as the first argument of a method. Hence, it is associated to the first parameter which is the self parameter. You don't need to specify the argument for the reference to the instance. This is automatically done by Python. In fact, calling:
>>> s2.gc()
0.66
 	
is equivalent to:
>>> DNA.gc(s2)
0.66
 	
The interpretor can find which class s2 belongs to, and handles the passing of the instance reference argument automatically.

How does the method computes its result? For this, it needs to access to the character sequence of the DNA object. This is done by using the seq attribute, that was defined at instantiation (i.e by the __init__ method, see below). Within the method, the attribute is available through the object by the dot operator: self.seq. This shows that the object attributes are always available, at least as long as the object itself still exists. Attributes are thus accessible from all the methods of the class. They are a way to share data among methods. The method also use local variables: count_c and count_g to compute intermediate results. These variables have a local scope, restricted to the method's namespace, exactly like local variables that are defined in functions.

Let us now look at the __init__ method.

    def __init__(self, name=None, seq=None):
        self.name = name
        self.seq = lower(seq)
	
This is a special method, which, when defined, is called at class instantiation, e.g when you run the following statement:
>>> s2 = DNA(name='seq2', seq='acaagatgccattgtcccccggcctcctgctgctgctgctctccggggcca')
the __init__ method defined for the DNA class is in fact called with 3 arguments. As for the other methods, the self argument is automatically provided by Python as a reference to the newly created instance. You don't have to provide an __init__ method, but it is usually the good place to put initialization statements. Initial values for attributes can be passed as arguments and associated to attributes. A good practice is to assign them default values, such as None. You can also notice that the seq attribute is initialized with the lower string function: the other methods will thus not have to check for this in order to perform their computation.

Attributes ca be changed at any time of course, not only in the __init__ method. When called with a string argument standing for a new name, the setname method changes the name attribute of our object:

    
    def setname(self, name):
        self.name = name
	  
The body of this method is quite straightforward, it consists in a single statement which assigns a new value, the one passed as an argument to the method, to the name attribute.

Exercise 18.1. A Point class

Write the definition of a Point class. Objects from this class should have a method show() to display the coordinates of the point, and a method move() to change these coordinates. The following python code provides an example of the expected behaviour of objects belonging to this class:

>>> p1 = Point(2,3)
>>> p2 = Point(3,3)
>>> p1.show()
(2, 3)
>>> p2.show()
(3, 3)
>>> p1.move(10, -10)
>>> p1.show()
(12, -7)
>>> p2.show()
(3, 3)
	  

Solution 18.1

The following definition is the complete definition of the DNA, with two methods added, one for computing the reverse complement, the second one for computing the translated protein sequence.

Example 18.2. DNA, a class for DNA sequences (complete version)

class DNA:
    
    def __init__(self, name=None, seq=None):
        self.name = name
        self.seq = lower(seq)
 
    def gc(self):
        count_c = self.seq.count('c')
        count_g = self.seq.count('g')
        return float(count_c + count_g) / len(self.seq)

    def revcompl(self):
        revseq = ''
        for c in self.seq:
            revseq = c + revseq                                           (1)

        revcompseq = ''
        for base in revseq:
            if base == 'a':
                revcompseq += 't'
            elif base == 't':
                revcompseq += 'a'
            elif base == 'c':
                revcompseq += 'g'
            elif base == 'g':
                revcompseq += 'c'

        return revcompseq

    def translate(self, frame=0):                                         (2)
        """
        frame: 0, 1, 2, -1, -2, -3
        """
        if frame < 0 :
            seq = self.revcompl()
            frame = abs(frame) - 1
        else:
            seq = self.seq

        if frame > 2:
            return ''

        protseq = ''

        for i in range(frame,len(seq) - 2,3):
            codon = seq[i:i+3]
            protseq += Standard_Genetic_Code[codon]                       (3)

        return protseq

    def setname(self, name):
        self.name = name
	      
1

This method defines how to compute the reverse complement of the sequence.

2

This method defines how to translate the DNA sequence into a protein sequence.

3

The Standard_Genetic_Code dictionary is defined elsewhere.

So, let us now look at the translate method.

     def translate(self, frame=0):
        """
        frame: 0, 1, 2, -1, -2, -3
        """
        if frame < 0 :
            seq = self.revcompl()
            frame = abs(frame) - 1
        else:
            seq = self.seq

        if frame > 2:
            return ''

        protseq = ''
        nb_codons = len(seq)/3

        for i in range(frame,len(seq) - 2,3):
            codon = seq[i:i+3]
            protseq += Standard_Genetic_Code[codon]

        return protseq
	  
To call this method, you must provide the frame:
>>> s2.translate(0)
TRCHCPPASCCCCSPGP
	
Indeed, this method has declared a frame parameter. It thus takes two arguments: self and frame. As for the gc method, the first parameter does not have to be specified at calling time, only the remaining ones.

The translate method needs to call another method of the class: revcompl, which computes the reverse complement. Notice that this method is called without any argument. This is not necessary, because it is called on the object itself, referenced by the variable self with the dot operator:

self.revcompl()
	
revcompl will return a character string, that will be used by translate to compute a negative frame.