Pages

Monday, April 18, 2011

How to download pages of a site from Google Cache

You sometimes get worried when you lose an important file on your computer. There are cases when people lose their whole website due to server errors and regret of not having backed it up. Google, Yahoo, Bing or any other major search engine caches webpages they have already indexed. Generally Google indexes more pages than other search engines for a normal blog. Here’s a trick with which you can recover your website from Google cache. The process has limitations and could be very difficult for large websites but its nonetheless very handy.

With this trick, you can get a text version of the pages on your website that have been already indexed by Google. It will be better if you use Mozilla Firefox because this process will require some Firefox plugins to work.

On Google search box type the following and run the search.
site:yourdomain.com

This will return a list of all pages from your website that have been indexed on Google. Below
each result, you can see a link called “Cached”. Click on any link titled “Cached” and you’ll get a Cache of the page from Google Cache. You can save the cache file and repeat the process for all such links. But the process will get tedious and will take lots of time. We can make this task a lot quicker.

Download Them All is a useful Firefox Download Manager. Install this plugin to your Firefox. Now go to the search results page for site:yourdomain.com and right click anywhere on the page. Click on Download Them All from the context menu. This will open a dialogue box with options for downloading content from the page.
Download Them All PluginYou can see a list of all links on the page with the link title as well. The links to cache page are titled cache. It can take a long time for your to select them manually. To automatically select the links to cache pages enter “cached” in the “Fast Filtering” box. Now if you scroll through the links the links, you’ll find the cache pages highlighted automatically. Click on the “start” button and your download will start. Download them All will prompt you for renaming the file with the name “search.htm”. Accept it and the files will automatically be renamed.

While doing this, you should be careful not to do it quicker. Head on to the second page only after the cache pages from the first search result page have been downloaded. Or Google can temporarily block your IP address from accessing the cache pages. Or if you want to download caches from two pages at a time, use Auto Page plugin for Firefox.Scroll upto the second page and repeat the above process. But don’t try more than two pages at a time.

You won’t get images in the cache page but it will have all text contained in the page. So with this trick, you can recover the pages from your website from Google cache easily.

1 comment: