Jarrod Trainque

8Jan

Spidering an entire site with DeepVacuum

I recently discovered a cool little program for OSX called DeepVacuum that allows you to download the contents of an entire website.

Major features:

  • Downloads entire sites or pages with all required content.
  • Can download selected media (pictures, music, clips) from pages.
  • Can downloads links from a selected text/html file.
  • Document architecture allows simultaneous downloads from many sites.
  • Wget installer is included.
  • Allows powerful fine tuning of download settings.

You essentially point this application at a given URL, wait a few minutes, and end up with a complete archive of a website.

Also cool is the ability to override the Robots.txt directive which (usually) is used to tell webspiders to not spider a file.

I’m using it to download and save certain sites for offline referencing, but some possible other uses include saving archive version of personal sites, and turning database-driven (dynamic) sites into static HTML sites.

You can get the application here. There’s a $12 registration fee to support further devlopment, but it appears that there’s no limit in functionality if you do not register.

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a comment, or trackback from your own site.

0 Comments

No comments yet.

RSS feed for comments on this post.

Leave a comment