How Do You Index Content Sources?
It is quite easy to integrate third-party sources, such as Atlassian Confluence, ServiceNow or OpenText Content Server, in order to make Apache Solr a fully featured enterprise search. As Apache Solr does not come with built-in connectors, it offers all the APIs needed to index content sources, of course.
Thus, you can either write your own connectors and crawlers – an open source alternative is the framework Apache ManifoldCF for indexing many content sources – or you can use our Raytion Enterprise Search Connectors for indexing.
In order to integrate new content sources into Apache Solr, you need to set up and configure the following components:
- Within Apache Solr you need to create a collection
- You need to have a security token store up and running, e.g., the Raytion Custom Security Manager or similar.
- You need to deploy and configure the connector, which indexes the content source.
Search Experience and User Journeys
Having the data of the content source indexed is the first step towards a great enterprise search. But this does not yet display the data to your users. Therefore, you need to have a search experience or a search interface at hand.
In the context of Apache Solr or Elasticsearch, the search interface has multiple responsibilities. In enterprise search scenarios it needs to make sure that only authenticated users can search. So it needs to support authentication providers, such as Azure AD, Google Cloud Identity, Okta, Active Directory or similar.
The search interface has to offer a query pipeline. Within this query pipeline the token store will be queried with the user ID of the searching user. Based on the response, the original search query is transformed, so that it adds an ACL filter (access control list filter), which is needed for secure search (i.e., security trimming). Furthermore, you can integrate synonym expansions, natural language understanding, ranking hints and more within the query pipeline.
After communicating with Apache Solr, the search interface needs to render the search results, display filters and more. A common aim is that users can quickly distinguish between the search results and understand, which ones help them most in solving their task at hand.
We have customers, who either implement the search interface on their own or with our help. There are open-source frameworks available, which can be used as a starting point for your search interface implementation. Alternatively, our commercial framework Raytion Search & Retrieval Interface is a compatible turnkey solution for Apache Solr.
The Outcome
Apache Solr is a solid and open-source foundation for a great enterprise search experience. It offers full flexibility when it comes to customizing the query processing, content processing and search experience. But you need to build or purchase some components for this great search engine. This includes the search interface, connectors and a security token store. Our customers use the search engine in their enterprise search scenarios, for e-commerce or site searches together with their Adobe Experience Manager or Sitecore.
If you are interested in more information on how to build an enterprise search based on Apache Solr, then please reach out to our experts.