Search

Cloud CMS provides full-text and structured search for all of your content. The platform uses Elastic Search under the hood to automatically create and manage search indexes for your content so that you can find anything at any time.

The platform also offers "composite" search operations which let you layer searches on top of structured queries and traversals around node objects.

Cloud CMS automatically indexes all of your content, including both it's JSON structure and any binary attachments you append to the node. For example, a node might have 3 JSON metadata fields and 2 binary payloads (let's say, a PDF document and a Word document). Cloud CMS will index the JSON fields first and then also index the PDF and Word. To do so, Cloud CMS performs text extraction on each mimetype and loads the extracted tokens onto special fields within Elastic Search for discovery. Thus, all of your content is instantly available for full-text search.

Within the Cloud CMS user interface, searching is available within a search box for every project. From within a project, you can search for all documents contained within that project. Furthermore, Cloud CMS provides a platform view with a search box that lets you perform a single search across ALL of your projects (we call this a federated search).

Search results come back with full node properties loaded and some metadata about the search (including it's score within the relevant search index).

Per-branch Search Indexes

Search indexes are maintained at a branch level. If you're working in the master branch, it will have it's own index which represents the tip view of content within that branch. If you fork another branch, it will have it's own index. These indexes are automatically maintained for you as you use Cloud CMS.

Permissions

As with everything Cloud CMS, the search API respects the underlying permissions and authorities that have been granted to the objects that are considered result candidates. Authorities are checked before content is retrieved which means that two people could execute the same search and get different results.

An example - suppose that there are 10 content items with the term "Pink Floyd" in them. An administrator (who has super authorities and can do just about anything) might run this query and get back 10 results. However, user A might only have CONSUMER authorities against 4 of those content items. When person A performs the search, they would only get back a result set of size 4.

Permissions are baked into Cloud CMS all the way down to the core. If you need to get back the full set of content objects for purposes of synchronization or anything else, make sure that you have sufficient authorities to do so.

Search API

There are several methods available on the Cloud CMS REST API to perform searches. All of the REST methods assume a repository and branch that identifies the index to be searched.

You can perform simple text-based searches using little more than a GET method like this:

GET /repositories/{repositoryId}/branches/{branchId}/nodes/search?text={text}

Or you can perform more elaborate searches using the <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html

POST /repositories/{repositoryId}/branches/{branchId}/nodes/search

Set your request content type to application/json and pass a payload consisting of the Elastic Search DSL configuration block. For example, you might do this:

POST /repositories/{repositoryId}/branches/{branchId}/nodes/search
{
    "query_string" : {
        "default_field" : "content",
        "query" : "this AND that OR thus"
    }
}

Here is an example where we search for all content nodes with the text "eddie van halen" in them:

// assume we have a branch
var branch = ...;

// search!
branch.searchNodes("eddie van halen").each(function() {
    console.log("Found a node with title: " + this.title);
});

Here is an example where we look for nodes...

// assume we have a branch
var branch = ...;

// search!
branch.searchNodes("eddie van halen").each(function() {
    console.log("Found a node with title: " + this.title);
});

All of the content that you put into Cloud CMS is indexed within Elastic Search. This means that all of your JSON properties are available for search purposes using the Elastic Search DSL.

Suppose we have content objects that look like this one:

{
    "_type": "my:product",
    "title": "My Product",
    "product": "shirt",
    "price": 10.99,
    "tags": ["shirt", "blue", "popular"],    
    "audience": ["children", "toddlers"],
    "size": "small"
}

That's just an example. Let's imagine we have hundreds or thousands of those content objects in Cloud CMS. Those objects will be available for both query (backed by MongoDB) and search (using Elastic Search). Cloud CMS also provides facilities for performing hybrid query/search combinations across both.

Let's stick to search for the moment. Suppose we want to find all of the my:product content instances that describe size small shirts. The Elastic Search DSL is pretty powerful and there are probably many ways to write this. But here is one:

{
    "filtered": {    
        "filter": {
            "term": {
                "_type":  "my_product"
            }
        },
        "query": {
            "bool": {
                "should": [
                    { "match": { "product":  "shirt" }},
                    { "match": { "size":  "small" }}
                ]
            }         
        }
    }
}

Elastic Search will run the query and find all matches where product is shirt and size is small. It will then filter to keep only those instances whose content type is my:product.

Note that in Elastic Search, the special _type field does not contain a colon. Cloud CMS changes the colon to an underscore and will also set everything to lowercase. Be sure to keep this in mind before you make your query.

So how do we use this?

From an API perspective, we can post the following:

POST /repositories/{repositoryId}/branches/{branchId}/nodes/search
{
    "filtered": {    
        "filter": {
            "term": {
                "_type":  "my_product"
            }
        },
        "query": {
            "bool": {
                "should": [
                    { "match": { "product":  "shirt" }},
                    { "match": { "size":  "small" }}
                ]
            }         
        }
    }
}

Here is an example of how this is done in code:

// assume we have a branch
var branch = ...;

// search!
branch.searchNodes({
    "filtered": {
        "filter": {
            "term": {
                "_type": "my_product"
            }
        },
        "query": {
            "bool": {
                "should": [
                    {"match": {"product": "shirt"}},
                    {"match": {"size": "small"}}
                ]
            }
        }
    }
}).each(function() {
    console.log("Found a node with title: " + this.title);
});

Note that the results come back with full pagination information, allowing you to limit the number of objects that come back and the starting location within the database. Smaller sets perform faster so be mindful of this with your calls.

As with all Cloud CMS queries, you control pagination via request parameters. These include skip and limit. Pagination also allows you specify sort information to control the result set order.

For more information on pagination, please see our documentation on Pagination.

For details on how to get the most out of the Elastic Search DSL, check out the Elastic Search DSL Query page.