In this session we will cover the use of the dictionary data type, functions and we will move from the use of iPython Notebooks to scripts.
For this course we will use "spyder" to create and run Python scripts. Open "spyder-app" from the Anaconda launcher.
Spyder is a so-called IDE (integrated development environment) that aids in the creation of Python scripts. It is not required to use an IDE (notepad or any other text editor will do) but, in general IDEs use knowledge of the programming language to make a programmer's life a little easier.
Python scripts are text files that contain sequences of Python statements. A script can be executed by the Python interpreter; python reads and interprets the script line-by-line, from top to bottom and if there are no errors, the script will be executed.
The general structure of a script is as follows:
#!/usr/bin/env python
import ...
def function1(...):
statements
def function2(...):
statements
def main(...):
statements
if __name__ == "__main__":
main(...)
To execute a script, run the following in a terminal (or DOS-box):
python myscript.py
Functions in Python (and in general in programming) are very similar to mathemacal functions. A function may accept arguments and in Python, functions always return a value. The keyword def is used to define functions in Python and within functions, the return keyword is used to return values. Create a script in spyder that contains the following code and observe what happens if you run it using the green "play button":
#!/usr/bin/env python
def area(length, width):
return length * width
def mean(numbers): # this defines a function `mean()`
mysum = sum(numbers)
return float(mysum) / len(numbers)
def hello():
print("Hello World")
def main(): # this defines a function `main()`
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(mean(x))
print(type(mean(x)))
myvariable = hello()
print(type(myvariable))
print(area(10, 15))
if __name__ == "__main__":
main()
Dictionaries are a data type that holds not than one item. However, instead of indexing the collection by a number, dictionaries can be indexed other data types, such as strings. The indices used for accessing items in a dictionary are called keys. The following example illustrates the dictionaty (dict, see documentation). Run the examples below, in a script in spyder. Try to stick to the general structure of scripts, by putting the code into a main() function.
# an empty dictionary
a = {}
print(a)
# a dictionary, indexed by strings
a = {"one": 1, "two": 2, "three": 3}
print(a["two"])
# does a key exist? (the `in` operator)
print("one" in a)
print("four" in a)
# or like this
if "one" in a:
print("a contains 'one'")
The following short example shows how a dictionary could be used to count the occurences of letters in a string. The letter do not have to be known in advance for this to work.
# counting letters
mystring = "Dictionaries are a data type that holds not than one item"
mydict = {}
for letter in mystring:
if letter in mydict:
mydict[letter] = mydict[letter] + 1 # updates an existing entry
else:
mydict[letter] = 1 # creates a new entry
print(mydict)
# obtaining just the keys could be useful too
mykeys = mydict.keys()
print(mykeys)
# also, we can iterate over the keys of the dictionary
print("All keys")
for key in mydict:
print(key)
# example
for key in mydict:
print("the letter '" + key + "' occurred " + str(mydict[key]) + " times.")
For this session we will be doing four bioinformatics problems (1, 2, 3 and 5) that are listed on the Rosalind website. Try to work through the problem, by writing a script for each of them.
For the last problem, store GC-content for all sequences in a dictionary and print, rather than returning the one with the highest GC-content. We have posted two example scripts online (see "course extra materials, at our website).