How I think (the blog)

Internal 301 to homepage treated as 404 by Google

Back in May 2013, during a Webmaster central hangout with John Mueller, John confirmed that Google treats internal 301’s to the homepage as 404’s. That should mean that if you 301 internal pages to the root, they won’t pass PageRank. However, this can still be interpreted incorrectly as there are many questions that remained unanswered.

How to extract title & meta data using Gdocs, Xpath and ImportXml

I’m pretty sure everyone knows I have an unhealthy obsession with Google docs, and the wonderful things it can achieve. Paul from asked how to get common meta data from a webpage quickly using ImportXML, so here it is: 1) Get the meta description //meta[@name=’description’]/@content 2) Get the title //title 3) Get the keywords //meta[@name=’keywords’]/@content" Okay, let’s go ahead

How to download all attachments from a gmail thread

You may have several attachments within a Gmail email thread, but it’s too time consuming to download one attachment at a time. There’s a Forward All option in the top nav menu that allows you to forward the thread back to yourself, thus allowing you to conveniently download all attachments in a gmail thread.  See

How to detect languages of webpages in bulk using Google docs

There’s a neat little function in Google docs spreadsheets that detects language. It’s  =DetectLanguage(), yep, that simple. Except that it’s only detecting the language of the text in a cell, so if we use it alongside =ImportXml and we extract the <title> of a webpage, we can be more accurate. Your formula would then become =DetectLanguage(importxml(“,”//title”)). Heads

Bulk ImportXml tool & source (Google docs spreadsheets)

There’s been a few of you requesting a way to bypass the 50 importxml limit in Google docs so I’ve decided to release something publicly. Click here to view the spreadsheet Just make sure to sign in, then make a copy, then press the run button once to authorize the script. If the script doesn’t

Will the canonical tag remove a page from the index?

If I use rel=canonical on a page that points to this page, can I find the canonicalized page in the index? Hypothesis: Setting a canonical target from page A to page B will remove page A from the index. Result: Nope, it’s not visible in the index – at least I can’t find it with

Scroll to Top