The following will take a file where each line is a different url of a file to download, here’s a link to download script here downloadFiles.py
downloadFiles.py
#!/usr/bin/env python
import argparse, urllib
def parse_downloadFiles_args():
parser = argparse.ArgumentParser(description="Take in a file where the first column holds the url of a file to be downloaded, will overwrite current files if they exist")
parser.add_argument('-f', '--file', type=str, required = True)
return parser.parse_args()
def download_files(urlsFile):
with open(urlsFile, "r") as f:
for line in f:
lineSplit = line.split()
print ("Downloading {url} to {file}".format(url = lineSplit[0], file = os.path.basename(lineSplit[0])))
urllib.urlretrieve(lineSplit[0], os.path.basename(lineSplit[0]))
if __name__ == "__main__":
args = parse_downloadFiles_args()
download_files(args.file)
cat files/download_sra_urls.txt
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053682/SRR053682.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053683/SRR053683.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053684/SRR053684.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053685/SRR053685.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053686/SRR053686.sra
./downloadFiles.py -file download_sra_urls.txt
Downloading ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053682/SRR053682.sra to SRR053682.sra
Downloading ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053683/SRR053683.sra to SRR053683.sra
Downloading ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053684/SRR053684.sra to SRR053684.sra
Downloading ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053685/SRR053685.sra to SRR053685.sra
Downloading ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR053/SRR053686/SRR053686.sra to SRR053686.sra
urlgrabber is a python packages that offers a lot more than the default python utilties for downloading files including resumiing stopped downloads or checking if the file is already downloaded
To install
pip install pycurl urlgrabber
from urlgrabber.grabber import URLGrabber
g = URLGrabber()
data = g.urlread(url)
This is nice because you can specify options when you create the grabber. For example, lets turn on simple reget mode so that if we have part of a file, we only need to fetch the rest.
from urlgrabber.grabber import URLGrabber
g = URLGrabber(reget='simple')
local_filename = g.urlgrab(url)
The available options are listed in the module documentation, and can usually be specified as a default at the grabber-level or as options to the method.
from urlgrabber.grabber import URLGrabber
g = URLGrabber(reget='simple')
local_filename = g.urlgrab(url, filename=None, reget=None)
Requests is also a great library for dealing with other http operations as well
pip install requests
See their website which has a great documentation.
http://docs.python-requests.org/en/master/user/quickstart