Improving SEO with a valid XML Sitemap

Google Webmaster Tools offers a Python script to help you build an XML Sitemap quickly. You don’t need to know XML or Python code to get your website indexed correctly. A valid XML sitemap also makes your website more search engine friendly.

Here I’ll show you how to modify, setup and run these files to create your XML Sitemap. I will be using Powweb as the host in this example. We need two files, config.xml and the python script sitemap_gen.py which creates our actual sitemap.xml Please download the latest from here: http://code.google.com/p/sitemap-generators/ Unzip and r-click on example_config.xml and open with Frontpage, Dreamweaver, or any XML editor.

Let’s only edit what we need to get it running for now. Locate the site nodes, that is the first instance of this code:

<site base_url=http://www.example.com/
store_into=”/var/www/docroot/sitemap.xml”
verbose=”1″
sitemap_type=”web”
>

Change the base_url to suit your website. The store_into=”/path/file.xml” can end in .xml or .xml.gz Your host can tell you your DocumentRoot path. For Powweb hosted people you will find your DocumentRoot path in OPS> Services> Overview. It should look something like this:

<site
base_url=http://www.streetsie.com/
store_into=”/home/users/web/b1234/pow.user/htdocs/sitemap.xml”
verbose=”1″
sitemap_type=”web”
>

Okay, scroll down and locate the input nodes:

<!–
<url href=”http://www.example.com/stats?q=name” />
<url
href=http://www.example.com/stats?q=age
lastmod=”2004-11-14T01:00:00-07:00″
changefreq=”yearly”
priority=”0.3″
/>
–>

Remove the

<!– and the –>

to enable the code. Change the domain to yours leaving /stats?q=name and /stats?q=age at the end. I update this site about once a week and gave it a priority of 0.5 as follows:

<url href=”http://www.streetsie.com/stats?q=name” />
<url
href=http://www.streetsie.com/stats?q=age
lastmod=”2009-10-16T01:00:00-07:00″
changefreq=”weekly”
priority=”0.5″
/>

Now scroll down a little further and locate the directory nodes:

<!–
<directory path=”/var/www/icons” url=”http://www.example.com/images/” />
<directory
path=”/var/www/docroot”
url=http://www.example.com/
default_file=”index.html”
remove_empty_directories=”true”
/>
–>

Here again we remove the

<!– and the –>

to enable the code. Insert your DocumentRoot path and domain name URL. Use whatever your default homepage is eg index.php or home.html and I change remove_empty_directories to false like this:

<directory path=”/home/users/web/b1234/pow.user/htdocs/” url=”http://www.streetsie.com/” />
<directory
path=”/home/users/web/b1234/pow.user/htdocs/”
url=http://www.streetsie.com/
default_file=”index.html”
remove_empty_directories=”false”
/>

That’s all the editing you need do for now. You can play with the other nodes and options later. Save your file as config.xml and upload it along with?the sitemap_gen.py python script into your host?root folder. For Powweb users that means inside your htdocs folder. At Powweb now goto OPS> Services> Site Tools> Scheduled Jobs. Here we run the script and can set a cronjob to update the Sitemap. In “Add New Job” click the radio button for “Command to run” and enter this command line using your details:

python /home/users/web/b1234/pow.user/htdocs/sitemap_gen.py –config=/home/users/web/b1234/pow.user/htdocs/config.xml

This should all be on one line with a space only between “python /home” and “sitemap_gen.py –config=” Set the “Run this job” option to weekly and select on: minute 1 of: 11pm on: Sundays. Click “Schedule this Job” Give it a minute to walk through the directory listing and you should start to see the output results in the Command output window. Check the output for errors. A sitemap.xml file should have now been created and reside in your htdocs folder.

The Python script automatically informs Google of your new sitemap. The cronjob will fire the Google Sitemap Python script every week updating your sitemap.xml automatically. Wasn’t so hard now was it. See how clever you are?

Written by
Croc Hunter
streetsie.com

8 thoughts on “Improving SEO with a valid XML Sitemap

  1. I have a carpool site and i was having such a hard time updating the search engines about newly posted rides. this script helps a lot. thanks a ton for such a detailed explanation
    divvymyride

  2. Most search engines don’t recognise the ? if you force them to parse links with a ? it may harm your page ranking so I advise against it. Just because they don’t appear in your sitemap does not mean they won’t appear in search results. Google builds the XML file to best suit the way they list your pages. Can’t get a much more Google friendly XML sitemap than one created by Google. Bing has taken a slice where Yahoo, Alta Vista etc couldn’t but I still find a good Google listing filters down to a good listing in all major search engines.

  3. Graham,
    Thanks for the detailed reply. I will set up the cron job with the python script. By the way I am sure you must be aware that Google has come up with this new version of site-map generator that requires root permission, setting up apace etc but unfortunately for people with shared hosting, that does not help.
    keep up the good work :)

  4. Thank you Very much — helps — its very hard when you start out to building these – i had used a tool but Google denied my sitemap – so going to retry

  5. Wow, even in 2012 this script is still useful, thanks. I’ll keep it in hand in case to double check against the auto-generating sitemap tools we use.

Leave a Reply