Spidering an entire site with DeepVacuum
I recently discovered a cool little program for OSX called DeepVacuum that allows you to download the contents of an entire website.
Major features:
- Downloads entire sites or pages with all required content.
- Can download selected media (pictures, music, clips) from pages.
- Can downloads links from a selected text/html file.
- Document architecture allows simultaneous downloads from many sites.
- Wget installer is included.
- Allows powerful fine tuning of download settings.
You essentially point this application at a given URL, wait a few minutes, and end up with a complete archive of a website.
Also cool is the ability to override the Robots.txt directive which (usually) is used to tell webspiders to not spider a file.
I’m using it to download and save certain sites for offline referencing, but some possible other uses include saving archive version of personal sites, and turning database-driven (dynamic) sites into static HTML sites.
You can get the application here. There’s a $12 registration fee to support further devlopment, but it appears that there’s no limit in functionality if you do not register.
0 Comments
No comments yet.
RSS feed for comments on this post.