Sometimes you will need to understand what traffic or features on your site result in Solr queries. One example is when your Acquia Search usage is too high, similar to situations described in https://docs.acquia.com/acquia-search/activate/view-usage/staying-under-limits/
For this, you may need to log and analyze all the Solr queries coming from your Drupal site in order to determine their frequency, originating IPs, User Agents, etc.
Enabling Solr request logging in Search API Solr (D7 only)
If you are using the Search API Solr module, you can enable logging of the Drupal-to-Solr requests into the Drupal watchdog log by following these steps:
- Go to the Search API Admin page. Example URL: https://example.com/admin/config/search/search_api/
- On the "Server" entry you want to troubleshoot, click Edit
- Expand the "Advanced" fieldset and look for the Log Search Requests option, enable it and save.
Once you do this, you will see your search requests come up in your watchdog log.
NOTE: We do not recommend leaving this enabled permanently in a production site, as it may cause performance or storage issues depending of the rate of logging.
If you additionally have the syslog Drupal module enabled, you can then analyze the log entries either through your browser in real time, or SSH into your webserver for longer-term or real-time analysis.
Viewing via Logstream
You can read about live-streaming logfiles on your browser by reading: https://docs.acquia.com/acquia-cloud/monitor/logstream/
Once you're streaming logs, you can see only the Solr requests coming from your application by choosing to stream the drupal-watchdog.log file, and add a filter to match the string "search_api_solr".
Viewing via an SSH connection
If you SSH to your webserver, this command will get you to the folder that keeps your `drupal-watchdog.log` file:
# cd to the logfiles folder
cd /var/log/sites/${AH_SITE_NAME}/logs/`hostname -s`
For example, you can live-tail only the Solr queries with the following command:
#
# cd to the logfiles folder and then live-tail the drupal-watchdog.log looking for a certain pattern.
# Run this from the webserver's shell.
#
cd /var/log/sites/${AH_SITE_NAME}/logs/`hostname -s`; tail -f drupal-watchdog.log |grep "search_api_solr.*Query:"
Understanding the Search API Solr logs
Here's some example output that you may see after enabling this extra logging on the Search API Solr module:
{ ... snip ... }
Feb 6 14:21:55 srv-12345 example[14610]: https://example.com|1580998915|search_api_solr|111.111.111.111|https://example.com/search?query=cake||0||Query: fl=item_id%2Cscore&fq=index_id%3A%22recipes_index%22&fq=hash%3Aabcdefg&start=0&rows=10&q.op=AND&q.alt=%2A%3A%2A&spellcheck=true&spellcheck.onlyMorePopular=true&spellcheck.count=1&hl=false&hl.fl=spell&hl.simple.pre=%5BHIGHLIGHT%5D&hl.simple.post=%5B%2FHIGHLIGHT%5D&hl.snippets=3&hl.fragsize=70&hl.mergeContiguous=true&wt=json&json.nl=map request_id="v-06b3fade-48ec-11ea-aeb0-22000a098256"
{ ... snip ... }
The above contains a lot of information including:
- The originating IP address: "111.111.111.111"
- Module that logged this information: "search_api_solr")
- The Drupal site's URL at the moment this log line was written: "https://example.com/search?query=cake"
- The query sent from Drupal to Solr during generation of the above page: "Query: fl=item_id%2Cscore&fq=index_id%3A%22recipes_index ... { ... snip ... }"
In this example, we can determine that there was an HTTP request to the Drupal site at https://example.com/search where there was a keyword search for "cake".
This generated a request to Solr which includes many arguments, like fl=... (fields), fq=... (filter query), spellcheck=on (enable spellchecking), etc. The actual meaning of all these arguments is beyond the scope of this article, but you can consult Solr Documentation to find out more. Needless to say, this request has all the parameters and arguments that tell Solr what data to match, what data to return back, and any other actions to take like use the spellchecker, return facets, provide snippets, etc. The Solr query is generated by the Search API Solr module depending on things like View arguments and settings, any modules that alter/hook into the generation of this request, etc.
Finding out more about the Web request
Note that each line in drupal-requests.log includes a section with a request_id=XXXX value. This is a unique value that identifies a single Web request across all logs in Acquia Cloud. You can read more about it at https://docs.acquia.com/acquia-cloud/develop/drupal/requestid/
You can use the request_id values to figure out which User Agent caused any of the above queries. For the example log line above, we can look for the request_id string v-06b3fade-48ec-11ea-aeb0-22000a098256 in access.log, which includes the User Agent:
#
# Get all relevant log entries for the single incoming web request identified by `request_id`
# Run this from the logs folder.
#
$ grep 'v-06b3fade-48ec-11ea-aeb0-22000a098256' *.log
{ ... snip ... }
111.111.111.111 - - [06/Feb/2020:14:21:55 +0000] "GET /search?query=cake HTTP/1.1" 200 19892 "-" "Mozilla/5.0 (compatible; adscanner/)/1.0 (Mozilla/5.0 (compatible; seoscanners.net/1.0; +spider@seoscanners.net); http://seoscanners.net; spider@seoscanners.net)" vhost=example.devcloud.acquia-sites.com host=example.com hosting_site=example pid=14931 request_time=1009467 forwarded_for="111.111.111.111, 10.11.12.13" request_id="v-06b3fade-48ec-11ea-aeb0-22000a098256" location="-"
{ ... snip ... you may also see more entries for the same incoming request }
For this particular request, it turns out the User Agent is:
Mozilla/5.0 (compatible; adscanner/)/1.0 (Mozilla/5.0 (compatible; seoscanners.net/1.0; +spider@seoscanners.net); http://seoscanners.net; spider@seoscanners.net)
which looks like a robot or spider.
Because customers only have a limited allotment of Solr queries per month, you may want to block robots from consuming your Solr search entitlements by blocking them. You can read this Knowledgebase article on blocking certain known bad robots to find out how.
You can learn more about logging in Acquia Cloud here: https://docs.acquia.com/acquia-cloud/monitor/logs/