Holy Guacamole! This just the 8th in the series of…
20 Secret Blogging Tips in 20 Days!
Check out the bottom of this post to see the rest.
This one might sound a little advanced for some but really it’s as easy as cutting up an avocado.
It’s setting up your robot.txt file
No, we’re not going to start texting robots. The robots.txt file is one of those things that runs in your site’s background. If it’s there and setup right it’s a great step towards being found in Google. If it’s not your site is running under the radar and you probably don’t even know it.
Like the name suggests it’s a text file. And as you already guessed it’s for the robots. Those are the ‘Bots sent by search engines to your site. This file simply tells them which pages and posts are ok to spider and which ones not to.
It’s nothing secret – all public sites should have them.
Here’s the robots.txt files for Google, Facebook, and even the White House!
If you have a wordpress site I bet you already have one too. It’s probably not as fancy as Google’s or Facebook’s but maybe you don’t need it to be, eh?
Here’s the point to this post. Don’t blow this off as something too small or inconsequential. It’s very important stuff.
Let’s go check your robots.txt file and make sure it’s doing what it should be.
Are you ready? Ok, here we go!
Just go to the root of your site and at the end of the .com or .net (or .org) just add a /robots.txt and then enter.
It’s probably just 3 lines. Am I right?
We can decipher that. It’s telling the spiders to ignore the two folders “wp-admin” and “wp-includes” in your server directory. That’s ok because those two files contain stuff that makes your site work but there’s nothing in there that should be indexed by Google.
There’s a third folder that’s a part of your wordpress install called, “wp-content”. As you might guess we don’t want to tell the spiders to ignore the ‘content’ – that would be bad. If we want to be found in Google, that’s the folder we DO want them to go to.
Now, IF your robots.txt file looks like that box above AND you are happy with your entire site being crawled by spiders you have nothing to do. Go be happy!
However, IF you want to be a little bit pickier on who can spider and what – you can either learn to adjust this file yourself (all you need is FTP access and notepad) or you can hire someone to do it (like me!).
BONUS:
If you want to exclude your site from being cached by the Internet WayBack Machine (I’m not sure why you would) here’s your text for that.
# Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /
If there’s a site you don’t want – all you need to know is the name of their Bot.
You were going to ask me how to find that out – right?
I’m a step ahead of you! Here’s a whole database of Bots! You’re welcome!
Oh and in case you were wondering…
“The Skynet Funding Bill is passed. The system goes on-line August 4th, 1997. Human decisions are removed from strategic defense. Skynet begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th. In a panic, they try to pull the plug. Skynet has become self aware.”
Ready for a killer wordpress website? Click HERE. Read the rest of the series…
[listly id=”5A5″ layout=”full”]
- The Ultimate Guide to Writing the Perfect Blog Post - March 14, 2023
- 8 Questions Your Web Developer Should Have Asked - April 27, 2021
- Slack, Chat or Discord? - April 6, 2021