Chapter 9. Collections

Table of Contents

9.1. Datatypes for collections
9.2. Methods, Operators and Functions on Lists
9.3. Methods, Operators and Functions on Dictionaries
9.4. What data type for which collection

9.1. Datatypes for collections

In a previous chapter we have seen that strings can be interpreted as collections of characters. But they are a very restricted sort of collection, you can only put characters in a string and a string is always ordered. But we need often to handle collections of all sorts of objects and sometimes these collections are not even homogeneous, meaning that they may contain objects of different types. Python provides several predefined data types that can manage such collections. The two most used structures are called Lists and Dictionaries. Both can handle collections of different sorts of objects, but what are their differences?

List

Lists are mutable ordered collections of objects of different sorts. The objects are accessible using their position in the ordered collection.

Dictionary

Dictionaries are mutable unordered collections which may contain objects of different sorts. The objects can be accessed using a key.

Here are some examples comparing a list version of a collection of enzyme's pattern and a dictionary version of the same collection. Lists are created using a comma separated list of all elements enclosed into brackets, whereas dictionaries are enclosed into braces and contain a comma separated list of key-value pairs, each separated by a colon.

>>> patternList = [ 'gaattc', 'ggatcc', 'aagctt' ]
>>> patternList
['gaattc', 'ggatcc', 'aagctt']

>>> patternDict = { 'EcoRI' : 'gaattc', 'BamHI' : 'ggatcc', 'HindIII' : 'aagctt' }
>>> patternDict
{ 'EcoRI' : 'gaattc', 'BamHI' : 'ggatcc', 'HindIII' : 'aagctt' }
      
To access the elements we use the position in the list and the key for the dictionary.

Important

List indices have the same properties as string indices, in particular they start with 0 (remember Figure 6.1).

>>> patternList[0]
'gaattc'
>>> patternDict['EcoRI']
'gaattc'
    
As for strings you can get the number of elements in the collection, as well as the smallest and the greatest element.
>>> len(patternList)
3
>>> len(patternDict)
3
      
Lists can be sliced but dictionaries cannot. Remember that dictionaries are unordered collections, so getting a slice does not make any sense.
>>> digest = patternList[:1]
>>> digest
['gaattc']
      
You can ask whether an element is in the collection. Dictionaries have two possibilities to perform this.
>>> 'gaattc' in patternList
1
>>> patternDict.has_key('HindIII')
1
>>> 'HindIII' in patternDict
1
      
Unlike strings both collections are mutable. This means that you can remove, add or even change their elements.
>>> del patternList[0]
>>> patternList
['ggatcc', 'aagctt']
>>> patternList[0] = 'gaattc'
>>> patternList
['gaattc', 'aagctt']
>>> patternList.append('ggattc')
>>> patternList
['gaattc', 'aagctt', 'ggattc']
>>> del patternList[:2]
>>> patternList
['ggattc']

>>> del patternDict['EcoRI']
>>> patternDict
{'BamHI': 'ggatcc', 'HindIII': 'aagctt'}
>>> patternDict['EcoRI'] = 'gaattc'
>>> patternDict
{'BamHI': 'ggatcc', 'EcoRI': 'gaattc', 'HindIII': 'aagctt'}
>>> patternDict['BamHI'] =''
>>> patternDict
{'BamHI': '', 'EcoRI': 'gaattc', 'HindIII': 'aagctt'}
      
As for strings you cannot access elements that do not exist.
>>> patternList[10]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IndexError: list index out of range
>>> patternDict['ScaI']
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
KeyError: ScaI
      

Figure 9.1 compares different actions on collections for strings, lists and dictionaries.

Figure 9.1. Comparison some collection datatypes