# Bootstrappers: Python 2, Session 4

## February 12, 2015

Chris MacKay
and Mike Purcaro & the GSBS Bootstrappers

## Outline

• Rosalind Problems:
• more on classes and methods
• more on using python from the terminal
• briefly go over module and package basics Link
• using basic command line arguments Link

## Rosalind Problems:

- DNA Counting DNA Nucleotides
- RNA Transcribing DNA into RNA
- REVC Complementing a Strand of DNA
- GC Computing GC Content
- HAMM Counting Point Mutations
- PROT Translating RNA into Protein
- SPLC RNA Splicing
- SUBS Finding a Motif in DNA
- PRTM Calculating Protein Mass
- REVP Locating Restriction Sites


## Problem 1: DNA

• Given: A DNA string s of length at most 1000 nt.

• Return: Four integers (separated by spaces) counting the respective number of times that the symbols 'A', 'C', 'G', and 'T' occur in s.

Try this:

• solve the problem with a DNASequence class, with a method that returns a count of all bases
• what if your DNA sequence has "N"s?

## Problem 2: RNA

• Given: A DNA string t having length at most 1000 nt.

• Return: The transcribed RNA string of t.

Try this:

• how about adding a new method to your DNASequence class that returns an RNA version of the DNA sequence?

## Problem 3: REVC

• Given: A DNA string $s$ of length at most 1000 bp.

• Return: The reverse complement $s^{c}$ of $s$.

Try this:

• how about adding a new method to your DNASequence class?

## Sorting a list (pg 1)...(sorted(list))

sorted(list) creates a new list and leaves the old list intact

my_list = [67, 81, 24, 100]
test = sorted(my_list)
print test  # [24, 67, 81, 100]
test = sorted(my_list, reverse = True)
print test # [100, 81, 67, 24]
def getKey(item):    # to sort the list by last digit only
item = str(item)
return int(item[-1])
test = sorted(my_list, key = getKey)
print test # [100, 81, 24, 67]
test = sorted(my_list, key = lambda x: int(str(x)[-1]))
print test # [100, 81, 24, 67]


## Sorting a list (pg 2)...(list.sort())

list.sort() sorts the list in place (NO NEW LIST MADE)

my_list = [67, 81, 24, 100]
my_list.sort()
print my_list # [24, 67, 81, 100]
my_list.sort(reverse = True)
print my_list # [100, 81, 67, 24]
def getKey(item):
item = str(item)
return int(item[-1])
my_list.sort(key = getKey)
print my_list # [100, 81, 24, 67]
my_list.sort(key = lambda x: int(str(x)[-1]))
print my_list # [100, 81, 24, 67]


## Problem 4: GC

• Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

• Return: The ID of the string having the highest GC-content, followed by the GC-content of that string on the next line.

Try this:

• try creating a FASTAFile class with a method that returns a DNASequence object for each sequence in the file
• how about adding a new method to your DNASequence class that calculates the GC content?

## Problem 5: HAMM

• Given: Two DNA strings $s$ and $t$ of equal length (not exceeding 1 kbp).

• Return: The Hamming distance $d_{H}(s,t)$.

Try this:

• add another method to your DNASequence class

## Problem 6: PROT

• Given: An RNA string $s$ corresponding to a strand of mRNA (of length at most 10 kbp).

• Return: The protein string encoded by $s$. of that string on the next line.

• HELP: codon table

## Problem 7: SPLC

• Given: A DNA string $s$ (of length at most 1 kbp) and a collection of substrings of $s$ acting as introns. All strings are given in FASTA format.

• Return: A protein string resulting from transcribing and translating the exons of $s$. (Note: Only one solution will exist for the dataset provided.)

Try this:

• use your FASTAFile class
• how about adding a new method to your DNASequence class that calculates the GC content?

## Adding default attributes to class and keyword arguments


class DNASequence(object):
def __init__(self, sequence, id, introns = None):
self.seq = sequence
self.id = id
self.introns = introns

def spliced(self):
...
return spliced_sequence

new_sequence = DNASequence('ATCGCTAGAGCT', 'seq_12345')

next_sequence = DNASequence(id = 'seq_35452',
sequence = 'TGCTAGCTGAATCA',
introns = [seq_obj1, seq_ob2, seq_ob3])


## Problem 8: SUBS

• Given: Two DNA strings $s$ and $t$ (each of length at most 1 kbp).

• Return: All locations of $t$ as a substring of $s$.

NOTE:

• python uses 0-based counting, but is that what Rosalind is looking for?

## Problem 9: PRTM

• Given: A protein string $P$ of length at most 1000 aa.

• Return: The total weight of $P$. Consult the monoisotopic mass table.

Try This:

• try creating a protein class, and a calculate mass method.

## tangent on Modules...

say you have a function called reverseComplement in a myCode.py file which is in the same directory as this script:

import myCode
new_rev_comp = myCode.reverseComplement(sequence)

import myCode as my
new_rev_comp = my.reverseComplement(sequence)

from myCode import reverseComplement
new_rev_comp = reverseComplement(sequence)

from myCode import reverseComplement as rc
new_rev_comp = rc(sequence)


to read more on modules and packages go here

## Problem 10: REVP

• Given: Given: A DNA string of length at most 1 kbp in FASTA format.

• Return: The position and length of every reverse palindrome in the string having length between 4 and 12. You may return these pairs in any order.

4 6
5 4
6 6 ...


Try this:

• try to call in some of your previously written classes and functions from another .py file...

## Intro slide

Here is a list that should build:

• I like formulas, like this one $e=mc^2$
• It's rendered using MathJax. You can change the settings by editing base.html if you like
• pressing 'f' toggle fullscreen
• pressing 'w' toggles widescreen
• 'o' toggles overview mode

## Thanks everyone!

website http://bioinfo.umassmed.edu/bootstrappers/
github crmackay