Sometimes Solr search seems like an unfathomable black box; you plug in a few words and "magic" happens. Luckily, there are options to help you decipher what is going on behind the scenes, and give you control of your search experience.
Pre-requisite modules
To begin debugging, you need to follow some steps first.
- Drupal 7: Install and enable both Devel and the the Solr Devel module . You may also need to manually enable and place the "Solr Query Debugger" block in a region visible alongside your search results.
- Drupal 8:
- Install Devel module (and ideally, the included Kint module as well)
- Configure the Variables Dumper section at /admin/config/development/devel to use "Symfony var-dumper" or "Kint".
- The next step depends on your Search API Solr module version:
- Search API Solr module 8.x-1.x: (Acquia Search with Solr 3/4)
- Enabling Devel automatically adds a "Solr" subtab under the "Devel" tab on your entities.
- To enable the Query Debugger block, you will need:
- Patch Search API Solr 8.x-1.x with the patch from Drupal.org issue #2924132 comment #9
- Clear Drupal caches and manually place the block named "Search API Solr: Query debugger" at /admin/structure/blocks
- Go to /admin/config/search/search-api and edit the appropriate Search API Server entries by enabling the "Debug Solr queries" checkbox.
- Search API Solr module 8.x-4.x: (Acquia Search with Solr 7)
- You must enable the search_api_solr_devel module (included with Search API Solr module) to both see the "Devel > Solr" subtab on nodes as well as to dump the Solr Query Debugger information on-screen.
After this, you can debug in various ways. Read on for 2 example scenarios.
Scenario 1: Why do search results have weird ordering or showing irrelevant items?
Example:
- You have a site for the local government of the City of Awesomeville.
- The mayor is complaining that searching for "Parking lots in the city" is showing non-sensical results
- Also: the actual page titled Public Parking Lots should be first on the list, but it's not!
How you can troubleshoot this:
See fields included in keyword matching along with keyword processing/stemming
- Run your search as an admin and look for the Solr Query debugger block
- Click through to open up the debugger: debug > parsedquery_toString and analyze it:
Example output for the search words Parking lots in the city :
+(( (content:park^40.0 | taxonomy_names:park^2.0 | label:park^5.0 | tos_name:park^3.0)~0.01 (content:lot^40.0 | taxonomy_names:lot^2.0 | label:lot^5.0 | tos_name:lot^3.0)~0.01 (content:citi^40.0 | taxonomy_names:citi^2.0 | label:citi^5.0 | tos_name:citi^3.0)~0.01 )~1)
The above tells us:
- Our keywords are being searched for within the Solr fields content, taxonomy_names, label and tos_name.
- The ^XX.X notation means boosting (telling Solr the relative importance of a match on this field)
- The pipe ( | ) operator means "or", thus we are looking for word matches in any of the fields.
- The words are stemmed and processed according to existing rules in the Solr index's schema.xml and other files:
- Parking was converted to the lowercase, stemmed word "park"
- city was converted to the stemmed word "citi"
Note that you control which fields to search across either in your View settings (Search API) or in your Bias settings for Apache Solr module. Additionally, modules can implement some hooks to add fields to search within at query time.
In this scenario, we might reach the conclusion that Parking is actually being indexed and searched for as park.
- That makes your content about "Recreational parks" show up when searching for parking.
- It turns out this also happens when searching for the name Parker! (Oh no!)
- So! To fix this problem, we probably need to protect the term parking by adding it to the protwords.txt file in Solr.
Note stemming and other processing is usually an incredibly helpful Solr feature which increases recall; it just turns out that the Solr defaults wasn't the best option for this site and thus need tweaking. Some other things that happen behind the scenes include:
- CamelCase or mixed alphanumeric words are usually split up into separate terms. For example:
- HelloWorld would be split into the 2 words "hello" and "world", as well as indexed as the single "helloworld" term.
- ABC123XYZ would be split into 3 words: ABC, 123 and XYZ as well as the whole term.
- Common stopwords are dropped.
- 1 or 2-character terms are dropped.
Understanding how relevance ranking/ordering is calculated
The Solr Query Debugger block also shows us useful explain information from Solr. We can use it to understand how a score is assigned to each item shown in the search results.
This section includes:
- One element for each of the entities matched and shown on the current search results page.
- The calculations done by Solr for each matching term/word, for each field, each generating a number.
- The total number (score) assigned to this entity by Solr.
The explain contents include quite a lot of Solr technical terms and concepts which are beyond the scope of this document. However, you can learn more by looking at the Further reading section at the end of this document.
Some things you can do that affect the above scoring calculations is discussed on our article Obtaining relevant search results.
Scenario 2: An item isn't showing anywhere in search results. Why?
For example:
- Your site sells starship parts, and you have a page that sells a "2nd generation Warp Core NX-001-A"
- Your customers are searching for "NX-001-A" but can't find this item at all!
We can do various things:
Checking if an item is indexed in the first place
- You can use the Devel > Solr Index tab that is shown on entities.
- This will show a table telling you if the item is Present in the index and if it is queued to be indexed (sometime in the future when the indexing job(s) process it)
This is how it looks for Drupal 7 Solr Devel:
You will see a similar table for Drupal 8 Search API Solr 8.x-1.2 or higher:
If the item is not in the index, try re-saving it without changes. This should now make it show as "Queued for Indexing" and you can now index it normally. If re-saving doesn't get it queued, it's possible that module settings exclude this item from being indexed: you should check your index settings and confirm if some entity types/bundles are excluded, or if you have a module that avoids indexing certain items.
See what data is present in Solr for an entity
Once we know an item does exist in the index, we can see what its data is like in the Solr index.
- use the Devel > Solr Index tab shown on entities
- On the shown table above to see all the Solr fields for this entity,
- Drupal 7: click on the Analyze Document option.
- Drupal 8: click on Solr and Local indexing data.
- Here is a screenshot of that for our example's product page:
You could, for instance, find that the word(s) you are searching for (which are in the entity) actually never made it to the Solr index, and therefore this item will never match. In most cases, you can control which fields make it to the index by either:
- defining a "Search Index" View mode for a bundle and adding your fields there
- specifying which fields to index within the Index settings (Search API module only)
- having modules implement certain hooks to add content/fields to the Solr index
Check if a certain query will match a specific item
Note: This isn't available for Drupal 8 Search API Solr yet.
- Use the Devel > Solr Index tab shown on entities
- Then use the Analyze Query option. This shows a query form where you can enter search keywords, and the form will respond telling you if that query will match the current entity or not.
- For example: to find out if a search for "NX-001-A" will match node/748, the below example shows a match:
Further reading
- Acquia Documentation: Obtaining relevant search results
- Lucene Tutorial: Scoring , Solr Scoring Tutorial and TFIDF.com Explain concepts like TF, IDF, queryNorm, boosts, and more.