19.4. Inheritance: sharing code among classes

19.4.1. Introduction

What we have seen so far, object-based programming, consists in designing programs with objects, that are built with classes. In most object-oriented programming languages, you also have a mechanism that enables classes to share code. The idea is very simple: whenever there are some commonalities between classes, why having them repeat the same code, thus leading to maintenance complexity and potential inconsistency? So the principle of this mechanism is to define some classes as being the same as other ones, with some differences.

Example 19.4. Inheritance example (1): sequences

So far, we have design two different classes for different types of sequences: indeed, having all the methods defined only for protein sequences also available for DNA sequences can be considered a problem. At the same time, some functionalities are common to both types of sequences: cleaning the sequence, producing a Fasta format to name a few. So we keep our two classes to deal with DNA and protein sequences, and we add a new class: Sequence, which will be a common class to deal with general sequence functions. In order to specify that a DNA (or a Protein) is a Sequence in Python is:

class DNA(Sequence): 
	    
The DNA class is called a subclass, and the Sequence is called a superclass or a base class. Following the class statement, there are only the definitions specific to the DNA or Protein classes: you do not need to re-specify the other ones. The following code shows how to specify the Sequence superclass.
class Sequence:

    def __init__(self, name=None, seq=None):
        self.name = name
        self.seq = seq
        self.clean()

    def clean(self):
        seq = ""
        for c in self.seq:
           if c in self.alphabet:
              seq += c
        self.seq = seq

    def __str__(self):
        return ">"+self.name+"\n"+self.seq

    def __getitem__(self, i):
        return self.seq[i]

    def getname(self):
        return self.seq

    def getseq(self):
        return self.seq


  
class DNA(Sequence): 

      alphabet = "atcg"

      def gc(self):
          """GC percent"""
          ...

      def revcompl(self):
          """reverse complement"""
          ...

      def translate(self):
          """translation into a protein"""
          ...



class Protein(Sequence):

    weight = {"A":71.08,"C":103.14 ,"D":115.09 ,"E":129.12 ,"F":147.18 ,"G":57.06 ,"H":137.15 ,"I":113.17 ,"K":128.18 ,"L":113.17 ,"M":131.21 ,"N":114.11 ,"P":97.12 ,"Q":128.41 ,"R":156.20 ,"S":87.08 ,"T":101.11,"V":99.14 ,"W":186.21 ,"Y":163.18 ,"X": 110}

    alphabet = weight.keys()

    def mw(self):
        molW = 0
        for aa in self.seq:
            molW += self.weight[aa]
     
        molW += 18.02
        molW = molW / 1000
        
        return molW
	    
Look at how we use these definitions:
>>> s = DNA("s1", "attgccctt")
>>> s.gc()
66.66
>>> print s
>s1
attgccctt
    
The __str__ method that is called when issuing a print statement is not defined for the DNA class. So python looks up one level further, finds a definition at the Sequence class definition level and calls it. Provided with the self reference to the object, exactly the same way as for DNA class methods, it can access to the attributes in order to produce a printable string of the object.

The clean method is a common method, except that it uses a specific alphabet for each type of sequence. This is why each subclass must define an alphabet class attribute. When referring to self.alphabet, python looks up in the object's namespace first, then one level up in the class namespace and finds it.

Now look carefuly at the __init__ method: it is now defined in the superclass. There are indeed common things to do for both DNA and Protein classes: attributes initialization, sequence cleaning, ... So it is enough to define the __init__ method at the superclass level: whenever the __init__ method, or any method is not defined at the subclass level, it is the method at the superclass level which is called (see below Section 19.4.1.2). But, remember, the DNA sequence is supposed to be in lowercase, whereas the protein sequence is in uppercase. How can we do that? The idea is to define an __init__ method at the subclass level that calls the __init__ method of the superclass, and then proceed to actions specific to the DNA or the Protein sequences.

class Sequence:

    def __init__(self, name=None, seq=None):
        self.name = name
        self.seq = seq
        self.clean()

...

class DNA(Sequence): 

      def __init__(self, name=None, seq=None):
        Sequence.__init__(self, name, lower(seq))

...

class Protein(Sequence):

    def __init__(self, name=None, seq=None, dna=None):
        Sequence.__init__(self, name, upper(seq))
        self.dna = dna  # when called by DNA.translate()
...
	    

19.4.1.1. Overloading

Let us decide that a generic mw method can be generalized and put at the Sequence class level. This way, a potential RNA subclas of Sequence will have a pre-defined mw method, common to all the molecular sequences. We do not need any mw method at the Protein level anymore. But, since the computation of the molecular weight differs slightly for DNA sequences, we define a specific mw method in the DNA class.

class Sequence:

    def mw(self):
        molW = 0
        for c in self.seq:
            molW += self.weight[c]
     
        return molW

...

class DNA(Sequence): 

    def mw(self):
        """
        Computes the molecular weight of the 2 strands of a DNA sequence.
        """
        molW = 0
        for c in self.seq:
            molW += self.weight[c]

        for c in self.revcompl():
            molW += self.weight[c]
     
        return molW

...
	    
When a method is redefined (or overriden) in the subclass, which is the case of the mw method, it is said that the method is overloaded. This means that, according to the actual class of a sequence instance, which can be either DNA or Protein, the method actually called can vary.

Overloading

Overloading a method is to redefine at a subclass level a method that exists in upper classes of a class hierarchy.

Another term that is used in object-oriented programming is the term: "polymorphism". "Polymorphism" litteraly means: several forms. In other words, the name mw has several meanings, depending on the context of the call.

Polymorphism

A method is said to be polymorphic if it has several definitions at different levels of a class hierarchy.

The term "polymorphism" is also used about operators. The '+' operator, for instance, is polymorphic in the sense that it refers to different operations when used with strings and integers.

Exercise 19.3. A ConstantPoint class

Define a ConstantPoint subclass of you Point class, such as points created as ConstantPoint cannot be moved.

19.4.1.2. How does it work? Dynamic binding.

As we said above, by declaring DNA as a subclass of Sequence, you do not need to re-specify in DNA all the methods that are already defined in Sequence. The way it works is that Python looks up methods starting from the actual class of the object, for instance DNa and, if not found, by looking up further in Sequence super class for the needed method or attribute. So, when calling:

>>> my_dna.gc()
      
python looks up for the gc method (Figure 19.3). Python finds it in the current class. When looking up for other methods, such as the __str__ method, which is not defined in the DNA class, Python follows the graph of base classes (Figure 19.4). Here, we only have one: Sequence, where __str__ is defined.

Figure 19.3. Dynamic binding (1)

Figure 19.4. Dynamic binding (2)

Forms of inheritance.  There are two forms of inheritance: extension and specialisation. In other words, inheritance can be used to extend a base class adding functionalities, or to specialise a base class. In the case of the DNA class, it is rather an extension. Sometimes, the term subclass is criticized because, in the case of extension, you actually provide more services, rather than a subset. The term subclass fits better to the case of specialization, where the subclass addresses a specific subset of the base class potential objects (for instance, dogs are a subset of mammals).

UML diagrams.  Classes relationships (as well as instances relationships, not represented here) can be represented by so-called UML diagrams, as illustrated in Figure 19.5.

Figure 19.5. UML diagram for inheritance

19.4.1.3. Multiple inheritance

In Python, as in several programming language, you can have a class inherit from several base classes. Normally, this happens when you need to mix very different functionalities. For instance, you want to wrap your class with a class that provides tools for testing (Figure 19.6), or services for persistence. Inheriting from classes that come from the same hierarchy can be tricky, in particular if methods are defined everywhere: you will have to know how the interpretor choose a path in the classes graph. But this case is more like a theoretical limit case, that should not happen in a well designed program.

Figure 19.6. Multiple Inheritance

19.4.2. Discussion

Benefits of inheritance. 

  • Regarding flexibility, the inheritance mechanism provides a convenient way to define variants at the method level: with inheritance indeed, the methods become parameters, since you can redefine any of them. So you get more than just the possibility of changing a parameter value.
  • Regarding reusability, inheritance is very efficient, since the objective is to reuse a code which is already defined. Components designed with the idea of reuse in mind very often have a clean inheritance hierarchy in order to provide a way for programmers to adapt the component to their own need.
  • Inheritance also provides an elegant extensibility mechanism, by definition. It lets you extend a class without changing the code of this class, or the code of the module containing the class.

Combining class or combining objects?  Using inheritance is not mandatory. The main risk of using it too much is to get a complex set of classes having a lot of dependencies and that are difficult to understand. There are generally two possibilities to get things done in object-oriented programming:

  • Inheritance: you combine classes in order to get a "rich" class providing a set of services coming from each combined class.
  • Composition: you combine objects from different classes.
The use of composition instead of inheritance is also illustrated by the design patterns from [Gamma95], that are introduced in Section 19.6.

Problem with inheritance for extension.  When using inheritance to extend a base class, you might want to have a method in the subclass not just overloading the method in the base class, but as a complement. In this case, one usually first calls the base class, and then performs the complementary stuff. In Python, you have to know explicitely the name of the superclass to perform this call (see for instance method __init__). Be aware, that this can become rather tricky sometimes, for you have to design a real protocol describing the sequence of calls that have to be done among classes, when not only one method is involved.

When using inheritance or composition: summary.  The question of choosing between inheritance and composition to combine objects A and B results in deciding whether A is-a B or whether A has-a B. Unfortunately, it is not always possible to decide about this, only on the basis of the nature of A and B. There a few guidelines, though (see [Harrison97], chapter 2).

  • The main advantage of inheritance over composition is that method binding, e.g lookup and actual method call, is done automatically: you do not have to perform the method lookup and call, whereas, when combining objects, you have to know which one has the appropriate method. For instance, a Protein instance may have a dna if created by translate.
  • Use composition when you catch yourself making exceptions to the is-a rule and redefining several inherited methods or willing to protect access to the base class attributes and/or methods. In such a case, the advantage described in the previous item of having automatic attribute and method access becomes a problem.
  • Use composition when the relationships between objects are dynamic. For instance, a different way to design sequence tools, such as clean, __str__, etc... could be to design a SeqTools class:

    class SeqTools:
    
         def __init__(self, seqobject):
             self.seqobject = seqobject
    
         del clean(self):
             seq = ""
             for c in self.seqobject.seq:
                  if c in self.seqobject.alphabet:
                     seq = seq + c
             self.seqobject.seq = seq
    
        def __getitem__(self, i):
            return self.seqobject.seq[i]
    
        def __str__(self):
            return ">"+self.seqobject.name+"\n"+self.seqobject.seq
    
    ...
    	  
    The DNA or Protein instances can then have a reference to a SeqTools instance, created at instantiation:
    class DNA:
    
          def __init__(self, name=None, seq=None):
            self.toolbox = SeqTools(self)
            self.name = name
            self.seq = seq
            self.toolbox.clean()
    
        def __str__(self):
            return self.toolbox.__str__()
    
        def __getitem__(self, i):
            return self.toolbox.__getitem__(i)
    
    	  
    When a program then calls the usual DNA methods, such as:
    >>> s = DNA("s1", "attgccctt")
    >>> print s
    >s1
    attgccctt
    >>> s[0]
    'a'
    	    
    it first invokes the DNA methods, which just behaves as wrappers for the SeqTools methods.

    The main advantage is flexibility: you could more easily redefine which sequence toolbox to use, provided the public interface is the same, than with inheritance. For this purpose, you could provide a toolbox constructor, where the class to be used for the tools could be easily changed at runtime:

    class DNA:
    
          def __init__(self, name=None, seq=None):
            self.toolbox = build_seqtools(self)
            self.name = name
            self.seq = seq
            self.toolbox.clean()
    
    def build_toolbox(seqobject, toolbox=SeqTools):
         return toolbox(seqobject)
    	  
    Indeed, inheritance relationships are fixed at definition time:
    class DNA(Sequence): 
    ...
    	  

  • Use composition when a single object must have more than one part of a given type.
  • Use composition to avoid deep inheritance hierarchies.
  • If you can't decide between inheritance and composition, take composition.
  • Do not use inheritance when you get too many combined method calls between base class and sub-classes, which can happen when using inheritance for extension.
  • Use inheritance when you want to build an abstract framework, which purpose is to be specialized for various uses (see Exercise 19.4). A good example is the parsing framework in Biopython, that lets you create as many new parsers as needed. Inheritance also provides a good mechanism to design several layers of abstraction, that define interfaces that programmers must follow when developping their components. Bioperl modules, although not developped in a true object-oriented language, are a good example of this.

    Exercise 19.4. Example of an abstract framework: Enzyme parser

    Design a parser for the Enzyme database, using Biopython parsing framework. See ??? for more details.