There are several situations in which it may be useful for a single codebase to be serving different robots.txt
files for different domains or environments.
There are two methods of serving an alternate robots.txt
file from Drupal: you could use the RobotsTxt module or you could use .htaccess
and Apache mod_rewrite rules.
The easiest approach would be to use the RobotsTxt module. The modules generates the robots.txt
file and allows you to edit it for each site through the web user interface. You have to then delete or rename the robots.txt
file in the root of the site for this module to display its own robots.txt
file.
An alternative method would be to use a separate, static file with different robots.txt
rules. Then use mod_rewrite
to serve different files for each domain. For example, the production site would serve the default robots.txt
file, but the staging site might serve a more restrictive file. Ideally, you should restrict access to your staging and/or development servers by securely password protecting those environments.
In the .htaccess
file in the root directory, you would add the following:
# alternative robots.txt file on staging
RewriteCond %{HTTP_HOST} ^staging\.example\.com$ [NC]
RewriteRule ^robots\.txt$ robots_noindex.txt [L]
In the preceding example, the robots_noindex.txt
file contained a basic, restrictive set of rules. The following is an example of it:
User-agent: *
Disallow: /
Conflicts Between the RobotsTxt Module and Fast 404
The RobotsTxt module requires you to remove the robots.txt
file from the Drupal docroot so that the request can be generated and served by Drupal instead of a static file served by your web server. Problems can occur if you are using Drupal's core Fast 404 functionality or the contrib Fast 404 module. When the static robots.txt
file is missing, Fast 404 will deliver a 404 (File not found) response instead of passing the request to Drupal. You must allowlist this file to allow Fast 404 and RobotsTxt to work together.
For information on using Fast 404, see Using Fast 404 in Drupal.
Drupal core Fast 404
If you use the core Fast 404 functionality, you can exclude robots.txt
by adding it to the 404_fast_paths_exclude
variable in your settings.php
file.
$conf['404_fast_paths_exclude'] = '/\/(?:styles)\/|robots\.txt/';
Fast 404 module
If you use the Fast 404 module, you can exclude robots.txt
by adding it to the fast_404_string_whitelisting
array in your settings.php
file:
# Array of allowlisted URL fragment strings that conflict with fast_404.
$conf['fast_404_string_whitelisting'] = array('cdn/farfuture', '/advagg_', 'robots.txt');