Augment the sitemap
A sitemap is a structured list of pages that comprise a website, which is used by site crawlers such as Google to index a site's content. When you create a site with Content, as you create and modify your site pages, a set of sitemap files for your site are automatically created, published, and updated without any additional effort on your part. However, if you want to augment these programmatically generated sitemap files, for example, by adding additional metadata or a modified URL path, you can do it through the acoustic content developer tools or Content APIs. This tutorial shows you how to augment your sitemap files by using both methods.
Before you begin
To get started, you need the Content developer tool known as wchtools. The wchtools provides a way to interact with your site from the command line. You can download and install the wchtools from GitHub
Understand the programmatically generated sitemap files
The automatically generated sitemap consists of a small set of files that live in the root path of your site.
- /robots.txt -- This file serves as an entry point. Web crawlers read this file to determine the rules for crawling your site. The robots.txt file of a website hosted in Content will contain pointers to one or more sitemap index and/or sitemap files.
- /sitemap_index.xml -- is a pointer to one or more sitemap files.
- sitemap.xml -- is a sitemap file. This is the file that actually lists pages on your site and their URLs.
Note:
The files shown above are generalized. Because multiple websites could be hosted on the same server, your files will have unique prefixes to associate them with your particular site, such as \mycustomhost_robots.txt or default_sitemap.xml. For simplicity, we will use the generic names throughout this tutorial.
Create new sitemap files
You might have a need to update your sitemap files for your sites. In Content, the sitemap and sitemap_index files are regenerated whenever you create a page on your site. Therefore, when you update the sitemap files that were created for your site in Content, any changes you make to these files can be lost at any time.
Saving your augmentations in a new, separate sitemap file, and adding this new file to the list in the robots.txt file will prevent such rewrites. Additions to the robots.txt files are always reserved.
- You can use the existing sitemap as a template to create a new file.
- After you create a new file, you must add it as a separate entry in robots.txt file. For example:
#My robots.txt
User-agent: *
Disallow:
sitemap: https://content-XX-N.content-cms.com/4030f492-e057-4e05-8876-ad2077c9fa78/default_sitemap_index.xml
sitemap: https://content-XX-N.content-cms.com/4030f492-e057-4e05-8876-ad2077c9fa78/alternate_sitemap.xml
You can make updates to the sitemap files or add the new files by using the Acoustic developer tool.
Use the developer tools to update or add a new sitemap file
Using wchtools to update your sitemap file is simple.
- Use the wchtools init command to establish a connection to your site's API URL. For more information, see the Getting Started section in the wchtools readme.
- Run
wchtools pull -A --dir <path-to-working-directory>
to extract all of the site's artifacts to your local system.
When the pull command completes, you should see an assets subdirectory in your local working directory with the robots.txt, sitemap.xml, and sitemap_index.xml files.
Note:
These file names will probably have a unique prefix for your website. You can edit these files to suit your needs by either updating the existing sitemap.xml files or adding additional files.
- After you modified the files to your satisfaction, run
wchtools push -A --dir <path-to-working-directory>
to push the modified files or the newly added files back to your server. Only files that have changed since you downloaded them will be pushed back. You can refer to wchtools-cli.log to verify the success of the push.
Expected outcome
Once these steps are complete and all updated assets have published, you will be done. Your new sitemap file will continue to be served until you remove it, and you can update it whenever you want with the authoring APIs.
Updated almost 2 years ago