Downloading a list of files simple

The following will take a file where each line is a different url of a file to download, here’s a link to download script here downloadFiles.py

downloadFiles.py

#!/usr/bin/env python

import argparse, urllib

def parse_downloadFiles_args():
    parser = argparse.ArgumentParser(description="Take in a file where the first column holds the url of a file to be downloaded, will overwrite current files if they exist")
    parser.add_argument('-f', '--file', type=str, required = True)
    return parser.parse_args()

def download_files(urlsFile):
    with open(urlsFile, "r") as f:
        for line in f:
            lineSplit = line.split()
            print ("Downloading {url} to {file}".format(url = lineSplit[0], file = os.path.basename(lineSplit[0])))
            urllib.urlretrieve(lineSplit[0], os.path.basename(lineSplit[0]))
    
if __name__ == "__main__":
    args = parse_downloadFiles_args()
    download_files(args.file)
cat files/download_sra_urls.txt
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053682/SRR053682.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053683/SRR053683.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053684/SRR053684.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053685/SRR053685.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053686/SRR053686.sra
./downloadFiles.py -file download_sra_urls.txt
Downloading ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053682/SRR053682.sra to SRR053682.sra
Downloading ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053683/SRR053683.sra to SRR053683.sra
Downloading ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053684/SRR053684.sra to SRR053684.sra
Downloading ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053685/SRR053685.sra to SRR053685.sra
Downloading ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053686/SRR053686.sra to SRR053686.sra

urlgrabber

urlgrabber is a python packages that offers a lot more than the default python utilties for downloading files including resumiing stopped downloads or checking if the file is already downloaded

To install

pip install pycurl urlgrabber
from urlgrabber.grabber import URLGrabber
g = URLGrabber()
data = g.urlread(url)

This is nice because you can specify options when you create the grabber. For example, lets turn on simple reget mode so that if we have part of a file, we only need to fetch the rest.

from urlgrabber.grabber import URLGrabber
g = URLGrabber(reget='simple')
local_filename = g.urlgrab(url)

The available options are listed in the module documentation, and can usually be specified as a default at the grabber-level or as options to the method.

from urlgrabber.grabber import URLGrabber
g = URLGrabber(reget='simple')
local_filename = g.urlgrab(url, filename=None, reget=None)

requests

Requests is also a great library for dealing with other http operations as well

pip install requests

See their website which has a great documentation.
http://docs.python-requests.org/en/master/user/quickstart