Bootstrappers:
Python 2, Session 4

February 12, 2015

Chris MacKay
and Mike Purcaro & the GSBS Bootstrappers

Outline

  • Rosalind Problems:
    • more on classes and methods
    • more on using python from the terminal
    • briefly go over module and package basics Link
    • using basic command line arguments Link

Rosalind Problems:

- DNA Counting DNA Nucleotides
- RNA Transcribing DNA into RNA
- REVC Complementing a Strand of DNA
- GC Computing GC Content
- HAMM Counting Point Mutations
- PROT Translating RNA into Protein
- SPLC RNA Splicing
- SUBS Finding a Motif in DNA
- PRTM Calculating Protein Mass
- REVP Locating Restriction Sites

Problem 1: DNA

  • Given: A DNA string s of length at most 1000 nt.

  • Return: Four integers (separated by spaces) counting the respective number of times that the symbols 'A', 'C', 'G', and 'T' occur in s.

Try this:

  • solve the problem with a DNASequence class, with a method that returns a count of all bases
  • what if your DNA sequence has "N"s?

Problem 2: RNA

  • Given: A DNA string t having length at most 1000 nt.

  • Return: The transcribed RNA string of t.

Try this:

  • how about adding a new method to your DNASequence class that returns an RNA version of the DNA sequence?

Problem 3: REVC

  • Given: A DNA string $s$ of length at most 1000 bp.

  • Return: The reverse complement $s^{c}$ of $s$.

Try this:

  • how about adding a new method to your DNASequence class?

Sorting a list (pg 1)...(sorted(list))

sorted(list) creates a new list and leaves the old list intact

my_list = [67, 81, 24, 100]
test = sorted(my_list)
print test  # [24, 67, 81, 100]
test = sorted(my_list, reverse = True)
print test # [100, 81, 67, 24] 

def getKey(item): # to sort the list by last digit only item = str(item) return int(item[-1]) test = sorted(my_list, key = getKey) print test # [100, 81, 24, 67]

test = sorted(my_list, key = lambda x: int(str(x)[-1])) print test # [100, 81, 24, 67]

Sorting a list (pg 2)...(list.sort())

list.sort() sorts the list in place (NO NEW LIST MADE)

my_list = [67, 81, 24, 100]
my_list.sort()
print my_list # [24, 67, 81, 100]
my_list.sort(reverse = True)
print my_list # [100, 81, 67, 24]

def getKey(item): item = str(item) return int(item[-1]) my_list.sort(key = getKey) print my_list # [100, 81, 24, 67]

my_list.sort(key = lambda x: int(str(x)[-1])) print my_list # [100, 81, 24, 67]

Problem 4: GC

  • Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

  • Return: The ID of the string having the highest GC-content, followed by the GC-content of that string on the next line.

Try this:

  • try creating a FASTAFile class with a method that returns a DNASequence object for each sequence in the file
  • how about adding a new method to your DNASequence class that calculates the GC content?

Problem 5: HAMM

  • Given: Two DNA strings $s$ and $t$ of equal length (not exceeding 1 kbp).

  • Return: The Hamming distance $d_{H}(s,t)$.

Try this:

  • add another method to your DNASequence class

Problem 6: PROT

  • Given: An RNA string $s$ corresponding to a strand of mRNA (of length at most 10 kbp).

  • Return: The protein string encoded by $s$. of that string on the next line.

  • HELP: codon table

Problem 7: SPLC

  • Given: A DNA string $s$ (of length at most 1 kbp) and a collection of substrings of $s$ acting as introns. All strings are given in FASTA format.

  • Return: A protein string resulting from transcribing and translating the exons of $s$. (Note: Only one solution will exist for the dataset provided.)

Try this:

  • use your FASTAFile class
  • how about adding a new method to your DNASequence class that calculates the GC content?

Adding default attributes to class and keyword arguments


class DNASequence(object):
    def __init__(self, sequence, id, introns = None):
        self.seq = sequence
        self.id = id
        self.introns = introns

    def spliced(self):
        ...
        return spliced_sequence

new_sequence = DNASequence('ATCGCTAGAGCT', 'seq_12345')

next_sequence = DNASequence(id = 'seq_35452', 
                            sequence = 'TGCTAGCTGAATCA', 
                            introns = [seq_obj1, seq_ob2, seq_ob3])

Problem 8: SUBS

  • Given: Two DNA strings $s$ and $t$ (each of length at most 1 kbp).

  • Return: All locations of $t$ as a substring of $s$.

NOTE:

  • python uses 0-based counting, but is that what Rosalind is looking for?

Problem 9: PRTM

  • Given: A protein string $P$ of length at most 1000 aa.

  • Return: The total weight of $P$. Consult the monoisotopic mass table.

Try This:

  • try creating a protein class, and a calculate mass method.

tangent on Modules...

say you have a function called reverseComplement in a myCode.py file which is in the same directory as this script:

import myCode
new_rev_comp = myCode.reverseComplement(sequence)
import myCode as my
new_rev_comp = my.reverseComplement(sequence)
from myCode import reverseComplement
new_rev_comp = reverseComplement(sequence)
from myCode import reverseComplement as rc
new_rev_comp = rc(sequence)

to read more on modules and packages go here

Problem 10: REVP

  • Given: Given: A DNA string of length at most 1 kbp in FASTA format.

  • Return: The position and length of every reverse palindrome in the string having length between 4 and 12. You may return these pairs in any order.

4 6
5 4
6 6 ...

Try this:

  • try to call in some of your previously written classes and functions from another .py file...

Intro slide

Here is a list that should build:

  • I like formulas, like this one $e=mc^2$
  • It's rendered using MathJax. You can change the settings by editing base.html if you like
  • pressing 'f' toggle fullscreen
  • pressing 'w' toggles widescreen
  • 'o' toggles overview mode

Thanks everyone!

website http://bioinfo.umassmed.edu/bootstrappers/
github crmackay