This one might sound a little advanced for some but really it’s as easy as cutting up an avocado.
It’s setting up your robot.txt file
No, we’re not going to start texting robots. The robots.txt file is one of those things that runs in your site’s background. If it’s there and setup right it’s a great step towards being found in Google. If it’s not your site is running under the radar and you probably don’t even know it.
Like the name suggests it’s a text file. And as you already guessed it’s for the robots. Those are the ‘Bots sent by search engines to your site. This file simply tells them which pages and posts are ok to spider and which ones not to.
It’s nothing secret – all public sites should have them.
Here’s the robots.txt files for Google, Facebook, and even the White House!
If you have a wordpress site I bet you already have one too. It’s probably not as fancy as Google’s or Facebook’s but maybe you don’t need it to be, eh?
Here’s the point to this post. Don’t blow this off as something too small or inconsequential. It’s very important stuff.
Let’s go check your robots.txt file and make sure it’s doing what it should be.
Just go to the root of your site and at the end of the .com or .net (or .org) just add a /robots.txt and then enter.
It’s probably just 3 lines. Am I right?
We can decipher that. It’s telling the spiders to ignore the two folders “wp-admin” and “wp-includes” in your server directory. That’s ok because those two files contain stuff that makes your site work but there’s nothing in there that should be indexed by Google.
There’s a third folder that’s a part of your wordpress install called, “wp-content”. As you might guess we don’t want to tell the spiders to ignore the ‘content’ – that would be bad. If we want to be found in Google, that’s the folder we DO want them to go to.
Now, IF your robots.txt file looks like that box above AND you are happy with your entire site being crawled by spiders you have nothing to do. Go be happy!
However, IF you want to be a little bit pickier on who can spider and what – you can either learn to adjust this file yourself (all you need is FTP access and notepad) or you can hire someone to do it (like me!).
If you want to exclude your site from being cached by the Internet WayBack Machine (I’m not sure why you would) here’s your text for that.
# Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /
If there’s a site you don’t want – all you need to know is the name of their Bot.
You were going to ask me how to find that out – right?
I’m a step ahead of you! Here’s a whole database of Bots! You’re welcome!
