16.3. Writing in files

Let us continue our restriction site example. Because we have got the enzyme pattern from the ReBase database, we can now process our sequence with all these patterns using the findpos function (Example 11.2). There is only one restriction: at the moment the findpos function can only find exact restriction patterns, so we have to exclude all patterns containing ambiguous bases.

INPUT: a dictionary enz_dict containing all restriction site patterns accessible by enzyme name, a sequence seq to search for

OUTPUT: list of start position of every occurrence for each pattern in the dictionary.

The print_matches in the following listing of functions prints the results of the analysis on the screen.

seq = ""

def isexact(pat):
    for c in pat.upper():
        if c not in 'ATGC':
            return 0
    return 1

def findpos(seq, pat):
    matches = []
    current_match = seq.find(pat)
    while current_match != -1:
        matches.append(current_match)
        current_match =seq.find(pat, current_match+1)
    return matches


def print_matches(enz, matches):
    if matches:
        print "Enzyme %s matches at:" % enz,
        for m in matches:
            print m,
        print
    else:
        print "No match found for enzyme %s." % enz
        
for enzname in enz_dict.keys():
    pat = enz_dict[enzname]
    if isexact(pat):
        print_matches(enzname, findpos(seq, pat))


In order to store the results permanently, we will see now how we can write the information in a file. As for reading, we have to open a file although now in a writing mode, write our results in the file and then close it.

def print_matches(enz, matches):
    ofh = open("rebase.res", "w")
    if matches:
        print >>ofh, "Enzyme %s matches at:" % enz,
        for m in matches:
            print >>ofh, m,
        print >>ofh
    else:
        print >>ofh, "No match found for enzyme %s." % enz
    ofh.close()

The problem with this print_matches function is that it prints only the result of last enzyme. Because if we close the file after writing the information, the next time we will open the file for writing the next result, we will overwrite the old result. We have two possibilities to solve this. First, we can open the file to append at the end of the file. Or second, we can open the file for writing in the main stream of the program and then pass the file object as argument to print_matches, and close the file only when all results have been written. We prefer the second solution.

seq = ""

def isexact(pat):
    for c in pat.upper():
        if c not in 'ATGC':
            return 0
    return 1

def findpos(seq, pat):
    matches = []
    current_match = seq.find(pat)
    while current_match != -1:
        matches.append(current_match)
        current_match =seq.find(pat, current_match+1)
    return matches
				
def print_matches(ofh, enz, matches):
    if matches:
        print >>ofh, "Enzyme %s matches at:" % enz,
        for m in matches:
            print >>ofh, m,
        print >>ofh
    else:
        print >>ofh, "No match found for enzyme %s." % enz

ofh = open("rebase.res", "w")       
for enzname in enz_dict.keys():
    pat = enz_dict[enzname]
    if isexact(pat):
        print_matches(ofh, enzname, findpos(seq, pat))
ofh.close()

Although it is possible to use the write or writelines methods of the file object, we have shown in the above example how to pass a file object to the print statement. Which syntax you will use in your own code, is a question of taste. But the code could be difficult to read if you mix them.