Thoughts about technology, business and all that's life.

This blog has moved to http://shal.in.

Tuesday, April 07, 2009

Tagging and Excluding Filters

Multi-select faceting is a new feature in the, soon to be released, Solr 1.4. It introduces support for tagging and excluding filters which enables us to request facets on a super-set of results from Solr.

The Problem

Out-of-the-box support for faceted search is a very compelling enhancement that Solr provides on top of Lucene. I highly recommend reading through the excellent article by Yonik on faceted search at Lucid Imagination's website, if you are not familiar with it.

Faceting on a field provides a list of (term,document-count) pairs for a given field. However, the returned facet results are always calculated on the current resultset. Therefore, whatever the current results are, the facets are always in sync with the results. This is both an advantage as well as a disadvantage.

Let us take the search UI for finding used vehicles on the Vast.com website. There are facets on the seller's location and the vehicle's model. Let us assume that the Solr query to show that page looks like the following:
q=chevrolet&facet=true&facet.field=location&facet.field=model&facet.mincount=1


What happens when you select a model by clicking on, say "Impala"? The facet for vehicle model disappears. Why? The reason is that now only "Impala" is being shown and there are no other models present in the current result set. The Solr query looks like the following now:
q=chevrolet&facet=true&facet.field=location&facet.field=model&facet.mincount=1&fq=model:Impala

So what is wrong with this? Nothing really. Except that for ease of navigation, you may still want to show all other models and document-counts which were being shown in the super-set of the current results (the previous page). But, as we noted a while back, the facets are shown for the current result set, in which all the models are Impala. If we attempt to facet on models field with the filter query applied, we will get a list of all models. But, except for "Impala", all other models will have a zero document count.

Solution #1 - Make another Solr query

Make another call to Solr without the filter query to get the other values. Our example query would look like:
q=chevrolet&facet=true&facet.field==model&facet.mincount=1&rows=0
The rows=0 is specified because we don't really want the actual results, just the facets for the model field. This is a solution that can be used with any version of Solr. However, it is one additional HTTP request. Even though it is a bit inconvenient, this is usually fast enough. However, an additional call is expensive if you are using Solr's Distributed Search which will send one or more queries to each shard.

Solution #2 - Tag and exclude filters

This is where multi-select faceting support comes in handy. With Solr 1.4, it is possible to tag the filter queries with a name. Then we can exclude one or more tagged queries when requesting for facets. All of this happens through additional metadata that is added to request parameters through a syntax called Local Params.

Let us go step-by-step and change the query in the above example and see how the request to Solr will look like.

1. The original request in the above example without tagging:
q=chevrolet&facet=true&facet.field=location&facet.field=model&facet.mincount=1&fq=model:Impala
2. The filter query tagged with 'impala':
q=chevrolet&facet=true&facet.field=location&facet.field=model&facet.mincount=1&fq={!tag=impala}model:Impala
3. The facet field with the 'impala' filter query excluded:
q=chevrolet&facet=true&facet.field=location&facet.field={!ex=impala}model&facet.mincount=1&fq={!tag=impala}model:Impala
Now, with this one query, you can get the facets for current results as well as for the super-set without the need to make another call to Solr. If you want Solr to return this particular facet field under an alternate name, you can add a 'key=alternative-name' local param. For example, the following Solr query will return the 'models' facet under the name of 'allModels':
q=chevrolet&facet=true&facet.field=location&facet.field={!ex=impala key=allModels}model&facet.mincount=1&fq={!tag=impala}model:Impala
Tagging, filtering and renaming is not just limited to facet fields. It can be used with facet queries, facet prefixes and date faceting too.

This is another cool contribution by Yonik (also see my previous post). I'm really looking forward to the Solr 1.4 release. It is bringing a bunch of very useful features including the super-easy-to-setup Java based replication. But more on that in a later post.

2 comments:

shankar ramse said...

Can this tagging be used to support somthing like the following.

Suppose I have free text field say
textfield.I index the field.If I search for textfield:glass.I have to get facet counts for the most common words found in a textfield.
ie.

example:search for textfield:glass
should return facet counts for common words found textfield. semiconductor(10),iron(20), silicon (25) material (8) thin(25) and so on.
Can this be done using tagging.

Anonymous said...

Can you please provide example configuration for field query?

I tried, but i am getting the expected result.

The query parameters are given below.
fq={!tag=impala}sal_amt:[100%20TO%20500]&facet.query={!ex=impala}sal_amt.

here sal_amt is field query.

About Me

My photo
Committer on Apache Solr. Principal Software Engineer at AOL.

Twitter Updates

    follow me on Twitter

    Recently shared stories

    Recent questions on Apache Solr

    Recent development in Apache Solr