##Review exercises (10 minutes)
a = 1; b = 2; c = 2.0; d = 'abcdefg'; e = '1'
1 / 2
a / b
a / c
float(a) / b
a / 'b'
a + b
d + e
d + 'e'
if 'd' in d:
print 'found d'
if d in 'd':
print 'found abcdefg'
else:
print 'the variable d was not in the string "d"'
##Exercises 4: slices (20 minutes).
x[0]
x[2:4]
x[4]
x[:]
x[:-3]
x[-2:-1]
x[-5:4]
x[-5:-9]
x[1:]
Try some values of your own to see if you can reconstruct how negative indices and left off start and end arguments work.
'gene_1_UAUCCUA_0.3'
as a variable and write a slice notation to retrieve the third character (the 'two-eth' character). (remember that the first item is item number 0). This should give back 'n'
.
##Exercises 5: lists and splitting things (20-30 minutes)
x
and then try this command: x2 = x.split('_')
.
x2
? Note the [
and ]
symbols and the commas, as well as the extra quote markings. These are features denoting that x2
is a list (that's what the brackets on the end denote) composed of several strings (commas delimit the individual elements of the list). The split
command that created this list from a string will be covered in part 5.
x
and x2
, try these commands to compare the slice notation properties of a list with the slice notation properties of a string.
type(x)
type(x2)
x[0]
x2[0]
x[0:2]
x2[0:2]
x[1:4][1:3]
x2[1:4][1:3]
type(x2[1:3])
type(x2[1:3][1])
x2[1:3][1][3]
x[1:3][1][3]
list
variable we created (x2
).
split
method can be generalized as follows: some_list = 'some_string'.split(some_delimiter)
. There are two inputs to this split statement (the string that needs splitting and the delimiter used to split) and one output (the output list). Try the following commands to familiarize yourself with how split goes about splitting up a string into a list.
x = 'abcdefgh'.split('c')
print(x)
print('abcdefgh'.split('r'))
a = 'hi, my name is Bob, this is Una'
print(a.split(','))
z = a.split(' ')
print(z)
print(z[1])
l = a.split() # this is a very useful trait of split, look up what happens when you split with empty parentheses
print(l)
caveman = a.split(' is ') # notice that multiple characters can be used as the delimiter
print(caveman)
'ACCGCGU,LLMNAQR,2.4'
'gene1 gene2 gene3'
'gene1, gene2, gene3'
##15 minute break
##Exercises 6: more properties of lists and 'methods' like split (20-30 minutes)
x = [1, 2, 3]
y = 'ABCDE'
x[1] = 'ab'
print(x)
x[1:3] = 'R'
print(x)
y[1] = 'L'
print(y)
y = y[0:1] + 'hello' + y [2:]
print(y)
'ATGCACTATTGCGTTAACTAGATGGGGCATTTTTAAATGGGACCCTGA'
'ATG'
) as your delimiter. Next, we need to fix each individual ORF in the list (as each ORF is now missing its start codon). To fix each ORF, replace each element in the list with the start codon plus the element (hint: use the string concatenation operator '+
').
.something
is a method of whatever came before the dot. Methods are a major feature of the Python programming language. We’ve been using the split
method of strings which operates on a string and returns a list. Usually, these methods use whatever came before the dot as the input item to operate on, whatever is in parentheses as parameters, and return some value which the user can store as a variable. Importantly, these three components can be input in very creative ways so long as they evaluate to have values that the python method knows what to do with (in this case the method requires an input string and uses a second string as a delimiter). Try these bizarre looking exercises to test this.
x = ['abcdefgh', 'cd']
y = x[0].split(x[1])
y = x[0].split(x[0][2])
y = x[0].split(x[0][4])[1]
y = x[0].split(x[0][4])[x[1].index('c')]
y = x[0].split(x[0][4])[1].split('g')
In general, you can make your code compact by putting the content of one operation into the input fields of the next, or readable by storing each step as a variable. Here is an alternate version of the final statement:
first_string = x[0]
delimiter = x[0][4]
first_list = first_string.split(delimiter)
new_string = first_list[1]
final_list = new_string.split('g')
interesting_genes
portion of the string (with split
), split this portion of the string by 'gene5'
to only look at the part that comes after 'gene5'
(with a second split), and return the value associated with gene5 (with a third split)
'boring_genes; gene1:2.6, gene2:3.8, interesting_genes; gene4:1.9, gene5:8.2, gene6:9.1'
if
statement so that it would find the gene5 expression level regardless of whether gene5 was in the interesting_genes
or boring_genes
, and so that it would report back which set of genes gene5 was in.
print
, and len
to explore the properties of your new nested list.
##Exercises 7: loops and nested lists (20-30 minutes).
for hamster_plan in 'ALMJKLKJ':
print hamster_plan
for horse_vitamin in ['frosted', 'berry', 'cereal']:
print horse_vitamin
'ALMJKLKJ'
? In other words, what is each hamster_plan
? What is being iterated through in the list ['frosted', 'berry', 'cereal']
? What is each horse_vitamin
? Do you notice the difference between the type of data retrieved by the hamster_plan
loop (which is going through a string) and the horse_vitamin
loop (which is going through a list)? What will bean_juice
be if you nest the loops as below:
for horse_vitamin in ['frosted', 'berry', 'cereal']:
for bean_juice in horse_vitamin:
print bean_juice
'ALSQRWQT'
and prints each character.
'found Q'
every time the character is 'Q'
x
in the line preceding your loop from above (the part 4 loop) at some initial value of your choosing, and putting x = x + 1
within the loop. What happens to x
as you go through the loop? Use this nifty property to print out the letter number where 'Q'
was found whenever 'Q'
is found.
x
(below) to figure out how many lists are in x
. Use a loop to print each of the lists that make up x
.
x = [[['gene1', 'heart'], ['gene2', 'brain']], [['gene4', 'appendix'], ['gene5', 'stomach'], ['gene6', 'esophagus']]]
x
, see how many items the component list has (with the len
function), and make a loop that prints out what those items are.
x[1]
or x[1][1]
). If you have time, try looping through all elements of the slices instead of the full list. (replacing x
in the outermost loop with x[1]
, x[1][0:2]
, etc.)
dna = 'AATTACCGCATTCCACGGGACCTACGAATTATAGTACCTAAA'
i = 0
while i < 10:
....
print(dna[i])