19.2. Components

19.2.1. Software quality factors

The topics introduced in this section address some of the issues of software quality, and how Python can help on this matter.

Before entering into details, let us just summarize some important concepts (you can find a good and more exhaustive description in [Meyer97]). There is no absolute quality in software: depending on the context, scale, scope and goals of the program being developped, you might very well either write on-the-fly short pieces of code to solve a temporary problem, or spend a significant effort to have your application follow an industrial design process. So, rather than only a global so-called standard that should be applied for each program you write, there are a few quality factors to be evaluated according to the actual needs of the project. Among these factors, one usually distinguish between internal and external factors. External quality factors are the ones that corresponds to visible requirements, directly important for the user of the software, such as validity, robustness or efficiency. Internal quality factors are properties of the code itself, such as legibility or modularity. In fact, internal factors often indirectly help to get external quality. Some factors, such as reusability, extensibility and compatibility, that we will study more thoroughly here, belong to external quality factors in the sense that they can be required by the users of a software library or the programmers of a shared source code; they are also important internal quality factors in the sense that they make the internal quality of the source code better. The aim of this chapter is mainly to describe these factors, as well as internal quality factors.

19.2.2. Large scale programming

Theoretically, in order to get a program that performs a given task and solves the problem you have specified, a basic set of instructions such as: branching, repetitions, expressions and data structures can be sufficient. Now, the programs that you produce can become a problem by themselves, for several reasons:

  • They can become very large, resulting in thousand lines of code where it is becoming difficult to make even a slight change (extensibility problem).
  • When an application is developped within a team, it is important for different people to be able to share the code and combine parts developped by different people (compatibility) ; having big and complex source files can become a problem.
  • During your programmer's life, or within a team, you will very often have to re-use the same kind of instructions set: searching for an item in a collection, organizing a hierarchical data structure, converting data formats, ...; moreover, such typical blocks of code have certainly already been done elsewhere (reusability). Generally, source code not well designed for re-use can thus be a problem.

So, depending on the context of the project, there are some issues which are just related to the management of source code by humans, as opposed to the specification of the task to perform. And if you think about it, you probably tend to use variable names that are relevant to your data and problem, aren't you? So, why? This is probably not for the computer, but, of course, rather for the human reader. So, in order to handle source structure and management issues, several conceptual and technical solutions have been designed in modern programming languages. This is the topic of this chapter.

Let us say that we have to develop source code for a big application and that we want this source code to be spread and shared among several team members, to be easy to maintain and evolve (extensible), and to be useful outside of the project for other people (reusable). What are the properties a source code should have for these purpose?

  • it should be divided in small manageable chunks
  • these chunks should be logically divided
  • they should be easy to understand and use
  • they should be as independant as possible: you should not have to use chunk A each time you need to use chunk B
  • they should have a clear interface defining what they can do

The most important concept to obtain these properties is called modularity, or how to build modular software components. The general idea related to software components is that an application program can be built by combining small logical building blocks. In this approach, as shown in figure Figure 19.1, building blocks form a kind of high-level language.

Figure 19.1. Components as a language

19.2.3. Modularity

The simplest form of modularity is actually something that you already know: writing a function to encapsulate a block of statements within a logical unit, with some form of generalization, or abstraction, through the definition of some parameters. But there are more general and elaborated forms of components, namely: modules and packages.

So, what is modularity? As developped in [Meyer97], modularity is again not a general single property, but is rather described by a few principles:

  • A few interfaces: a component must communicate with as few other components as possible. The graph of dependencies between components should be rather loosely coupled.
  • Small interfaces: whenever two components communicate, there should be as few communication as possible between them.
  • Explicit interfaces: interfaces should be explicit. Indirect coupling, in particular through shared variables, should be made explicitly public.
  • Information hiding: information in a component should generally remain private, except for elements explicitly belonging to the interface. This means that it should not be necessary to use non public attributes elements of a component in order to use it. In languages such as Python, as we will see later, it is technically difficult to hide a component's attributes. So, some care must be taken in order to document public and private attributes.
  • Syntactic units: Components must correspond to syntactic units of the language. In Python, this means that components should correspond to known elements such as modules, packages, classes, or functions that you use in Python statements:
    import dna
    from Bio.Seq import Seq
    dna, Bio, Bio.Seq and Seq are syntactic units, not only files, directories or block of statements. In fact, Python really helps in defining components: almost everything that you define in a module is a syntactic unit.

You can view this approach as though not only the user of the application would be taken into account, but also the programmer, as the user of an intermediate level product. This is why there is a need for interfaces design also at the component level.

19.2.4. Methodology

These properties may be easier to obtain by choosing an appropriate design methodogy. A design methodoly should indeed:

  • help in defining components by successive decomposition;
  • help in defining components that are easy to combine;
  • help in designing self-understandable components: a programmer should be able to understand how to use a component by looking only at this component;
  • help in defining extensible components; the more independant components are, the more they are easy to evolve; for instance, components sharing an internal data structure representation are difficult to change, because you have to modify all of them whenever you change the data structure.

19.2.5. Reusability

Programming is by definition a very repetitive task, and programmers have dreamed a lot of being able to pick off-the-shelves general purpose components, relieving them from this burden of programming the same code again and again. However, this objective has, by far, not really been reached so far. There are several both non technical and technical reasons for this. Non-technical reasons encompass organisational and psychological obstacles: although this has probably been reduced by the wide access to the Web, being aware of existing software, taking the time to learn it, and accepting to use something you don't have built yourself are common difficulties in reusing components. On the technical side, there are some conditions for modules to be reusable.

  1. Flexibility: One of the main difficulty for making reusable components lies in the fact that, while having the impression that you are again programming the same stereotyped code, one does not really repeat exactly the same code. There are indeed slight variations that typically concern to following aspects (for instance, in a standard table lookup):
    • types: the exact data type being used may vary: the table might contain integers, strings, ...
    • data structures and algorithms may vary: the table might be implemented with an array, a dictionary, a binary search tree, ... ; the comparison function in the sort procedure may also vary according to the type of the items.
    So, as you can understand from these remarks, the more flexible the component is, the more reusable it is. Flexibility can be partly obtained by modularity, as long as modules are well designed. However, in order to get real flexibility, other techniques are required, such as genericity, polymorphism, or inheritance, that are described in Section 19.5.
  2. Independancy towards the internal representation: by providing a interface that does not imply any specific internal data structure, the module can be used more safely. The client program will be able to use the same interface, even if the internal representation is modified.
  3. Group of related objects: it is easier to use components when all objects that should be used together (the data structures, data and algorithms) are actually grouped in the same component.
  4. Common features: common features or similar templates among different modules should be made shareable, thus making the whole set of modules more consistent.