Wget , tircks and tips

Wget and Curl make such a wonderful pair in Linux ,i would like to share few glimpse on this .

Download a single file/page:

wget http://required_site/file

Download the entire site, using the -r option:

wget -r http://required_site/

Download certain file types, using the -A option

Say,to download only pdf and mp3 use:

wget -r -A pdf,mp3 http://required_site/

To follow external links, using the -H option:

wget -r -H -A pdf,mp3 http://required_site/

To limit the sites to follow, using the -D option:

wget -r -H -A pdf,mp3 -D files.site.com http:/required_site/

Number of levels to go , when using -r option can be indicated using the -l option:

wget -r -l 2 http://required_site/

Download all images from the site:

wget -erobots=off -r -l1 --no-parent -A .gif,.jpg http://required_site/


Still more....{tricky}

Using wget to download content protected by referer and cookies
#1. get base url and save its cookies in file
#2. get protected content using stored cookies

wget --cookies=on --keep-session-cookies --save-cookies=cookie.txt http://first_page

wget --referer=http://first_page --cookies=on --load-cookies=cookie.txt --keep-session-cookies --save-cookies=cookie.txt http://second_page

Mirror website to a static copy for local browsing:


wget --mirror -w 2 -p --html-extension --convert-links -P http://required_site

Wget to work in the background:


wget -t 45 -o log http://required_site &


Wget for FTP
{ login and password ! Wget says ill take care}:

wget ftp://reqiured_site

Read the list of URLs from a file
:


wget -i file










Share this