##Review exercises
'this is a string'
'this is a modified string'
'my'
.'m'
in the alphabet.For the next several exercises, please download the following files and save them in the same folder as this notebook.
http://bioinfo.umassmed.edu/bootstrappers/bootstrappers-courses/python1/Python_I/yeast/Saccharomyces_cerevisiae.R64-1-1.78_transcripts.bed
http://bioinfo.umassmed.edu/bootstrappers/bootstrappers-courses/python1/Python_I/yeast/Saccharomyces_cerevisiae.R64-1-1.78_sample.gtf
http://bioinfo.umassmed.edu/bootstrappers/bootstrappers-courses/python1/Python_I/yeast/sacCer3.genome
http://bioinfo.umassmed.edu/bootstrappers/bootstrappers-courses/python1/Python_I/yeast/README.txt
x = open('sacCer3.genome') # x is now a 'file object' variable
print(x)
y=x.readline()
print(y)
print(x.readline())
print(x.readline())
print(x.readlines()) # note this is readlines and not readline
print(x.readline())
x.seek(0)
print(x.readline())
x.seek(8)
print(x.readline())
print(x.readline())
file
to figure out what you can do with files)
x = open('sacCer3.genome')
for line in x:
print('the current line of the file is: ' + line)
Notice that when you put a file handle as the object being iterated through, Python essentially executes the .readline()
method of file objects again and again, and stores the result into the loop’s user-defined variable until the file runs out of lines. Alternatively, try this implementation to avoid making a file object variable at all:
for line in open('sacCer3.genome'):
print('the current line is: ' + line)
line
to print only the fifth, sixth, and seventh characters of each line from the text file.
'\t'
) and only print the second column (column[1]
)
output_file=open('test_output.txt', 'w')
for letter in 'ACCGT':
output_file.write(letter)
output_file.close()
x = open('Saccharomyces_cerevisiae.R64-1-1.78_sample.gtf')
y = open('test_output2.txt', 'w')
for line in x:
y.write(line)
y.close()
Afterward, you’ll want to open 'test_output2.txt' to see if it looks the same as 'Saccharomyces_cerevisiae.R64-1-1.78_sample.gtf'
[[chromosome, start, end, ..., ..., ...], [chromosome, start, end, ..., ..., ...], ....]
The file 'Saccharomyces_cerevisiae.R64-1-1.78_sample.gtf' contains an excerpt from the gene annotations for yeast, as downloaded from Ensembl. The format of the file is explained in 'README.txt'. Try to create a file containing genomic location (chromosome, start, end), transcript name and strand, separated by tabs (five columns), based on 'Saccharomyces_cerevisiae.R64-1-1.78_sample.gtf'. Be sure to only output transcripts.
chrVI 194812 196314 YFR021W + chrXVI 298571 299503 YPL134C - ... ... etc.
import
and genomic data (30 minutes)For this exercise we will be using the "ucscgenome" module. This module most likely needs to be installed before it can be used. Open a terminal (or a "DOS-box", or cmd.exe) and install "ucscgenome" using pip, using the following command:
pip install ucscgenome
import ucscgenome
genome = ucscgenome.Genome("sacCer3")
sequence = genome["chrIV"]
print(sequence[100:110])
import ucscgenome
genome = ucscgenome.Genome("sacCer3")
sequence = genome["chrIV"]
print(sequence[100:110])
The following code makes it possible to translate one set of characters into another set of character in a string. How could you apply this to obtain a reverse complement sequence from a forward sequence?
import string
t = string.maketrans("aei", "qwe") # this create a 'translate string' `t`
print('this needs to be translated'.translate(t))
# example
import string
t = string.maketrans("aei", "qwe") # this create a 'translate string' `t`
print('this needs to be translated'.translate(t))