Search

Any content you put into Cloud CMS is automatically indexed for full-text and structured search. This lets your editorial teams instantly search for content and find the things they're looking for.

Under the hood, Cloud CMS uses Elastic Search and makes available to your editorial users and developers the full syntax of the Elastic Search Query DSL. This allows you to execute simple searches as well as more complex queries that take into account term and phrase matching, nested operations, logical constructs, fuzziness, proximity, wildcards and regular expression matching.

Cloud CMS automatically maintains Elastic Search indexes for you on a per-branch basis. Thus, you can create as many branches as you would like and each branch will have a uniquely maintained and runtime-ready index to power your search API calls.

Cloud CMS additionally offers the Find API. The Find API lets you execute concurrent MongoDB and Elastic Search queries and compose them into a single, intersecting record set.

If you're looking for a reference on how to write search queries, we recommend visiting our page on Query Strings.

Automatic Search Indexing

Cloud CMS automatically indexes all of your content. This includes both its JSON structure and any binary attachments that belong to the node.

For example, a node might have 3 JSON metadata fields and 2 binary payloads (let's say, a PDF document and a Word document). Cloud CMS will index the JSON fields first and then also index the PDF document and the Word document. To do so, Cloud CMS performs text extraction on each binary file and loads the extracted tokens onto special fields within Elastic Search for discovery. Thus, all of your content is instantly available for full-text search.

Cloud CMS supports a wide variety of desktop MIME type files, including Microsoft Office formats, PDF, text formats and most common Audio and Video formats. Depending on the MIME type, different elements are automatically extracted. For some formats, such as Audio and Video formats, header information is extracted whereas other formats (such as Powerpoint or PDF) will have its textual elements extracted. Cloud CMS essentially tries to extract as much as it can.

Per-branch Search Indexes

Search indexes are maintained at a branch level. If you're working in the master branch, it will have it's own index which represents the tip view of content within that branch. If you fork another branch, it will have it's own index. These indexes are automatically maintained for you as you use Cloud CMS.

Searching within Projects

Within the Cloud CMS user interface, searching is available within a search box for every project.

From within a project, you can search for all documents contained within that project. Cloud CMS provides a search screen that gives you the ability to write out the text of your search as well as set up common filters (such as property filters, date/time and more).

By default, search results include scores and a few interesting properties. You may further wish to customize the results list to show custom properties. This can be done by writing custom UI Templates.

Federated Search across Projects

Cloud CMS also lets you perform a unified search across multiple projects. From within the Cloud CMS UI, you simply navigate to your platform and type into the search box. This performs a single search across ALL of your projects.

Search results come back with full node properties loaded and some metadata about the search (including it's score within the relevant search index).

Permissions

As with everything Cloud CMS, the search API respects the underlying permissions and authorities that have been granted to the objects that are considered result candidates. Authorities are checked before content is retrieved which means that two people could execute the same search and get different results.

An example - suppose that there are 10 content items with the term "Pink Floyd" in them. An administrator (who has super authorities and can do just about anything) might run this query and get back 10 results. However, user A might only have CONSUMER authorities against 4 of those content items. When person A performs the search, they would only get back a result set of size 4.

Permissions are baked into Cloud CMS all the way down to the core. If you need to get back the full set of content objects for purposes of synchronization or anything else, make sure that you have sufficient authorities to do so.

How to Describe Searches

To run a search, you simple pass Cloud CMS a JSON object or some text that you wish to search for.
If you pass some text, then the text is expected to be an Elastic Search Query String. If you pass a JSON object, then the object is expected to conform to the Elastic Search Query DSL.

Using a Query String

A text search involves passing a string that might simply be a keyword, such as:

"joe smith"

This will run a search across all of your content and find any content where the phrase joe smith exists. This is a case insensitive search and so it will find content that includes content like Joe Smith and JOE SMITH.

You can also pass text that contains a Query String. A Query String uses Elastic Search's Query DSL and must conform to the Elastic Search SDL for Query Strings:

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

Using this DSL, you can express a far more complicated query as a bit of text. For example, you might want to find all content of type my:article created in 2018 that contains the text Cloud CMS is awesome using a proximity of 5:

__type:"my:article" AND _system.created_on.year:2018 AND "Cloud CMS is awesome"~5

For a full reference on text searches, please see our reference on Query Strings.

Using Query JSON

If you send a JSON object, then the JSON object should be expressed using the Elastic Search Query DSL. This DSL lets you take full control of the search mechanics and perform 100% of the search functionality that Elastic Search offers.

In contrast, a Query String (text) is more limited in its expression. A query string is effectively parsed into its JSON form before being executed. For example, a query string search for "joe smith" ends up looking like the following in JSON form:

"query_string" : {
    "query" : "joe smith"
}

The JSON DSL is very powerful and gives you full access to everything Elastic Search can do. If you wish to take full advantage of Elastic Search and find that you cannot achieve what you want to achieve using a textual query string, you will eventually want to write your queries as JSON.

Search API

There are several methods available on the Cloud CMS REST API to perform searches. All of the REST methods assume a repository and branch that identifies the index to be searched.

You can perform simple text-based searches using little more than a GET method like this:

GET /repositories/{repositoryId}/branches/{branchId}/nodes/search?text={text}

Or you can perform more elaborate searches using the <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html

POST /repositories/{repositoryId}/branches/{branchId}/nodes/search

Set your request content type to application/json and pass a payload consisting of the Elastic Search DSL configuration block. For example, you might do this:

POST /repositories/{repositoryId}/branches/{branchId}/nodes/search
{
    "query_string" : {
        "default_field" : "content",
        "query" : "this AND that OR thus"
    }
}

Here is an example where we search for all content nodes with the text "eddie van halen" in them:

// assume we have a branch
var branch = ...;

// search!
branch.searchNodes("eddie van halen").each(function() {
    console.log("Found a node with title: " + this.title);
});

Here is an example where we look for nodes...

// assume we have a branch
var branch = ...;

// search!
branch.searchNodes("eddie van halen").each(function() {
    console.log("Found a node with title: " + this.title);
});

All of the content that you put into Cloud CMS is indexed within Elastic Search. This means that all of your JSON properties are available for search purposes using the Elastic Search DSL.

Suppose we have content objects that look like this one:

{
    "_type": "my:product",
    "title": "My Product",
    "product": "shirt",
    "price": 10.99,
    "tags": ["shirt", "blue", "popular"],    
    "audience": ["children", "toddlers"],
    "size": "small"
}

That's just an example. Let's imagine we have hundreds or thousands of those content objects in Cloud CMS. Those objects will be available for both query (backed by MongoDB) and search (using Elastic Search). Cloud CMS also provides facilities for performing hybrid query/search combinations across both.

Let's stick to search for the moment. Suppose we want to find all of the my:product content instances that describe size small shirts. The Elastic Search DSL is pretty powerful and there are probably many ways to write this. But here is one:

{
    "filtered": {    
        "filter": {
            "term": {
                "__type":  "my:product"
            }
        },
        "query": {
            "bool": {
                "should": [
                    { "match": { "product": "shirt" }},
                    { "match": { "size":  "small" }}
                ]
            }         
        }
    }
}

Elastic Search will run the query and find all matches where product is shirt and size is small. It will then filter to keep only those instances whose content type is my:product.

Note that in Elastic Search, the special __type field is used instead of _type. This is due to a limitation in Elastic Search 6.x that we expect to be resolved in a future version of Elastic Search. For now, please use __type. At a future point, we expect to support _type and will retain support for __type for backward compatibility.

So how do we use this?

From an API perspective, we can post the following:

POST /repositories/{repositoryId}/branches/{branchId}/nodes/search
{
    "filtered": {    
        "filter": {
            "term": {
                "__type":  "my:product"
            }
        },
        "query": {
            "bool": {
                "should": [
                    { "match": { "product":  "shirt" }},
                    { "match": { "size":  "small" }}
                ]
            }         
        }
    }
}

Here is an example of how this is done in code:

// assume we have a branch
var branch = ...;

// search!
branch.searchNodes({
    "filtered": {
        "filter": {
            "term": {
                "__type": "my:product"
            }
        },
        "query": {
            "bool": {
                "should": [
                    {"match": {"product": "shirt"}},
                    {"match": {"size": "small"}}
                ]
            }
        }
    }
}).each(function() {
    console.log("Found a node with title: " + this.title);
});

Note that the results come back with full pagination information, allowing you to limit the number of objects that come back and the starting location within the database. Smaller sets perform faster so be mindful of this with your calls.

As with all Cloud CMS queries, you control pagination via request parameters. These include skip and limit. Pagination also allows you specify sort information to control the result set order.

For more information on pagination, please see our documentation on Pagination.

For details on how to get the most out of the Elastic Search DSL, check out the Elastic Search DSL Query page.