/
What search API to use?

What search API to use?

This article addresses programmers who implement components to be used for searching. Four services can be used for different requirements. The following matrix should guide to select the best fitting service.

Search Feature Matrix

Topic

Search Service

Structure Service

core-service query

core-service raw esql

Topic

Search Service

Structure Service

core-service query

core-service raw esql

Backend storage used

Elasticsearch

Elasticsearch

Relational DB 

Relational DB

Short description

For search result lists and aggregations.

For search and building nested virtual trees using aggregations.

Simple AND conditions on documents and folders. Returns full dms objects as json/xml.

Provides simple(raw) database values by using a feature rich SQL like query. 

In production usage

Client main search and client dashlets.

Client folder view to build the virtual child objects tree.

API only usage.

API only usage.

Service name

search-service

structure-service

rest-ws - ResultService - query

rest-ws - ResultService - raw

API Documentation link

Search API

Structure Service API

Live Swagger-UI for rest-ws / ResultService / query

Live Swagger-UI for rest-ws / ResultService /raw

See also Using eSQL

Supports SQL-like query?

No

Yes, but a very limited set.

No

Yes, full eSQL functionality.

Supports fulltext search conditions?

Yes

Yes

No - only wildcard search values on single fields.

No, but the sql LIKE can be used.

Supports search for document text content?

Yes, by fulltext search,

Yes, by fulltext search.

No

No

Supports aggregations? (COUNT,SUM)

Yes, partially

Yes, full, with nested aggregations and getting arbitrary aggregations in one call.

No

Yes, limited, by using SUM(), COUNT() as part of the e-sql functionality.

Supports access to inactive versions?

No, inactive versions are not permanently stored inside Elasticsearch. Inactive versions can not be accessed with the API.

No

Yes, by using the "(all versions)" eSQL qualifier.

Streaming capable? (1)

No

No

Yes

Yes

Data consistency

The Elasticsearch data is updated lazily. Any change of indexdata or content text is processed asynchronously using a message queue. The immediate update can be forced by using the synchronous storage mechanism. But even in this case, it is possible that the data in the Elasticsearch is not up to date. On heavy system load, or if a reindexing is running, it may even take several minutes before the data is updated. Even if the indexdata of an object is up to date, it may take some more time until the content text is extracted by the rendition service and updated in the Elasticsearch index.

The data is always up to date. The changes are committed and confirmed on the relational database before any message about this change is send.

Access control

The Elasticsearch index holds an access control list (ACL) for each entry. This list is checked against the user access list on each query. The performance cost at query time is constant, regardless of the complexity of the rights system. The disadvantage is: Changes in the security system, like new or removed roles or changes to visibility clauses, must be propagated to the Elasticsearch index. It may take a long time before the ACL is updated in the index. This depends on the amount of data and complexity of the visibility clauses.

The user access is checked during query time. The SQL does not only contain the conditions given by the user. It also applies visibility clauses for the folder and document objects. This is a query performance cost, that is dependent on the complexity of the request and the rights system. The advantage is: Changes of the right system are instantly active.

Basic performance  considerations

Elasticsearch is built to give a very good search performance for fulltext searches. Indexdata conditions and aggregations are also performing well. It is not well suited to retrieve large amounts of data. 

Relational databases are good at joining tables and using where conditions on columns that are indexed. It is very important to use indexed columns in the where conditions for a good query performance. If LIKE is used with wildcard in front, an index can not be used.

(1) What is "Streaming capable?"

The service is streaming capable, if the result from the backend storage is delivered to the calling client without storing the complete result in memory. If the core-service is used, the client's request gets transformed to a SQL statement used to query the database. The data the database returns is transformed row by row and sent back to the client. The memory consumption is minimal in this case. If the receiving client is also capable of stream handling the result, even large datasets can be processed.
The search and structure service do not support stream handling. The complete data set of the backend storage (Elasticsearch) is kept and transformed in memory. Therefore it is mandatory to set an upper limit for the result size. If no size is given, the service enforces a default size.

 

Related content

Search Service API
Search Service API
More like this
Structure Service API - Folder Structure
Structure Service API - Folder Structure
Read with this
2020-04-03
More like this
Guidelines
Guidelines
Read with this
Search via Web-API Gateway
Search via Web-API Gateway
More like this
Search via Web-API Gateway
Search via Web-API Gateway
More like this