API Server

The Cloud CMS API Server is a Java application that launches inside of a Java Servlet Container. The Java application surfaces a REST API as well as backend services and DAOs to support connectivity to Mongo DB, Elastic Search and a slew of Amazon services including S3, SNS, SQS, Route 53, Cloud Front and more.

Properties File

Cloud CMS is primarily configured via a properties file that is auto-detected and loaded when the underlying Spring Framework starts up. This properties file is typically named docker.properties.

For the most part, this properties file consists of key=value pairs that are static in nature. This key has that value and so on. Values may be strings, numbers and boolean (true/false). You do not need to wrap strings in quotes. Cloud CMS will handle the conversion for you.

In addition, you may wish to pull in environment variables from your Docker container OS. This is useful if you're launching Docker in AWS (or similar) and wish to store sensitive values (such as passwords, access keys, etc) outside of the docker.properties config file. Not only that, but it potentially allows your single Docker configuration to more easily reused across environments (by just changing environment variables).

To pull in environment variables, use the ${VARIABLE} value. For example,

cluster.aws.access-key=${AWS_ACCESS_KEY}
cluster.aws.secret-key=${AWS_SECRET_KEY}

For information on AWS, keep on reading. This is just an example.

Admin User Password

One of the first things you'll usually want to configure is the admin user's password. The admin user is created when you first start up Cloud CMS. You can set the password like this:

gitana.admin.password=admin

Concurrent Request Rate Limits

You can limit concurrent requests on a per-tenant and per-user basis within Cloud CMS. This keeps track of how many HTTP requests are "in-flight" at any given moment. When the number of "in-flight" or concurrent requests exceeds the specified amount, an HTTP 429 status code is returned.

From a web architecture viewpoint, a 429 is a valid status code response and client code that calls into Cloud CMS is expected to handle this gracefully.

To specify rate limits per-tenant, use the following:

gitana.ratelimiting.tenant.enabled=false
gitana.ratelimiting.tenant.defaultMaxConnections=-1

To specify rate limits per-user, use the following:

gitana.ratelimiting.user.enabled=false
gitana.ratelimiting.user.defaultMaxConnections=-1

Auditing

The Cloud CMS auditing service tracks every operation against auditable objects within the system and writes those operations to a special Audit collection. This Audit collection provides a reliable capture of what users did within your system and records every method invocation, including:

  • the method invoked
  • arguments passed to the method
  • the value returned from the method
  • the invoker of the method
  • when the method was invoked
  • exceptions that were raised

To enable auditing:

org.gitana.repo.audit.AuditService.enabled=false

Logging

The API uses Log4j2 as its logging engine and provides a configuration-based way for you to customize the logging of various sub-services within the product. Each logger within the product logs at a given log level. You can increase and/or reduce the amount logging for each logger by adjusting its respective log level.

Custom Log Levels

To customize the log levels, add a log4j2-docker.xml file to the classpath. This file should sit at the root of the classpath. If you're unfamiliar with how to mount this into Docker, take a look at the quickstart example provided in the Cloud CMS Docker distribution. It provides a sample log4j2-docker.xml file with blocks of configuration that you can easily comment out to get started quickly with debugging.

The log4j2-docker.xml file provides a way for you to specify the Log4j2 log level for individual service beans or entire packages of services at once.

By default, the log levels within Cloud CMS are pretty conservative and are optimized for production usage. However, if you're running Cloud CMS in development or would like to get more log information to diagnose a problem, you can adjust the log4j2-docker.xml configuration to provide the level of granularity that you seek. Often, this involves adjust log levels from INFO to DEBUG.

Note that DEBUG should only be used while debugging or in development. We do not recommend running Cloud CMS with DEBUG logging in place on production. It will produce far too many logs and will also run more slowly.

Let's say you wanted to enable more logging for the rules engine within the product. You could do that with a log4j2-docker.xml file like this:

<?xml version="1.0" encoding="UTF-8"?>
<Configuration>
    <Loggers>

        <!-- Set all classes in the org.gitana.platform.services.rule package to log with DEBUG -->
        <Logger name="org.gitana.platform.services.rule" level="DEBUG"/>

    </Loggers>
</Configuration>

Log Outputs (stdout / stderr)

Several log files are maintained by the API server Java application. These can be accessed by exposing the log folder as a "volume" on the API server's docker container. Update the docker-compose.yml config file so that the folder: /opt/tomcat/logs is available to the docker host system.

This example mounts the logs folder to a host folder named "api-logs":

  api:
    build: ./api
    networks:
      - cloudcms
    depends_on:
      - mongodb
      - elasticsearch
    env_file:
      - ./api/api.env
    ports:
      - "8080:8080"
    volumes:
      - ./api-logs:/opt/tomcat/logs

After running a new build (docker-compose build --force-rm) the folder will be created and the log files will be visible to the host.

The default server logging goes to cloudcms.log.

If you require logging of each API call to the API server you can monitor the Tomcat access logs: localhost_access_log.*.txt

Access Logs

Every request and response that the API processes is logged into its own Log4j2 appender. You can customize this appender to have those entries write to a different location, format differently, rollover to disk or even roll over to S3 periodically.

By default, the access logs appender (named RequestFile) is defined like this:

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="info" packages="org.gitana.platform.services.log">
    <Properties>
        <Property name="baseDir">logs</Property>
    </Properties>
    <Appenders>
        <RollingFile name="RequestFile" fileName="${baseDir}/cloudcms-requests.log" filePattern="${baseDir}/cloudcms-requests/cloudcms-requests-%d{yyyy-MM-dd-HH-mm}-%i.log" append="false">
            <PatternLayout>
                <pattern>%msg%n</pattern>
            </PatternLayout>
            <Policies>
                <OnStartupTriggeringPolicy/>
                <TimeBasedTriggeringPolicy interval="30"/>
                <SizeBasedTriggeringPolicy size="256 MB"/>
            </Policies>
            <DefaultRolloverStrategy max="20"/>
        </RollingFile>
    </Appenders>
</Configuration>

You can override these settings via the same process as described above in which you add your own log4j2-docker.xml file. This file will be picked up by Cloud CMS and its settings will be merged into an overall Log4j2 configuration set. Specifically, you can redeclare the RequestFile RollingFile implementation and Cloud CMS will let your implementation override the one provided out-of-the-box.

Upon making changes, make sure you to rebuild the API docker container (docker-compose build --force-rm) and restart.

You should now see JSON objects describing API calls written to the logs folder as cloudcms-requests.log.

S3 Rollover

Cloud CMS includes an S3RolloverStrategy implementation that you can use to have your access logs rollover to S3 (in addition to rolling over on disk). This provides a convenient way to get your access logs off the server. And if you're running in a cluster, this provides a way to get your logs all collected into a single place.

To use this strategy, you simply need to add it to your RollingFile appender. This strategy takes a few arguments:

  • accessKey: the AWS account access key
  • secretKey: the AWS account secret key
  • region: the S3 region
  • bucket: the S3 bucket name
  • prefix: a prefix to append ahead of any created keys (such as cloudcms/production)

You can use a Properties block to accomplish this elegantly as shown in the code below:

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="info" packages="org.gitana.platform.services.log">
    <Properties>
        <Property name="REQUESTS_S3_ACCESS_KEY"></Property>
        <Property name="REQUESTS_S3_SECRET_KEY"></Property>
        <Property name="REQUESTS_S3_REGION">us-east-1</Property>
        <Property name="REQUESTS_S3_BUCKET"></Property>
        <Property name="REQUESTS_S3_PREFIX">cloudcms/production</Property>
    </Properties>
    <Appenders>
        <RollingFile name="RequestFile" fileName="${baseDir}/cloudcms-requests.log" filePattern="${baseDir}/cloudcms-requests/cloudcms-requests-%d{yyyy-MM-dd-HH-mm}-%i.log">
            <PatternLayout>
                <pattern>%msg%n</pattern>
            </PatternLayout>
            <Policies>
                <OnStartupTriggeringPolicy/>
                <TimeBasedTriggeringPolicy interval="30"/>
                <SizeBasedTriggeringPolicy size="10M"/>
            </Policies>
            <S3RolloverStrategy accessKey="${REQUESTS_S3_ACCESS_KEY}" secretKey="${REQUESTS_S3_SECRET_KEY}" region="${REQUESTS_S3_REGION}" bucket="${REQUESTS_S3_BUCKET}" prefix="${REQUESTS_S3_PREFIX}" />
        </RollingFile>
    </Appenders>
</Configuration>

API Pagination

By default, any API calls that support pagination will be given a paginated limit of 25. In other words, if your API call doesn't specify how many records it wants back, it will get back 25 records at most. Cloud CMS does this to help protect against code that errantly forgets to include pagination in cases where you have very large record sets. Very large record sets implies lots of time to execute, lots of memory consumed and so on.

gitana.api.pagination.limit.default=25

Be on the safe side and specify a limit in your paginated calls.

In addition, Cloud CMS will enforce a maximum pagination limit of 1000. If you try to retrieve more than 1000 records in a result set, your result will be capped. We cap this at 1000 by default though you're free to change this in your own Docker installations to suit your needs:

gitana.api.pagination.limit.max=1000

If you need to retrieve more than 1000 results, we recommend making multiple calls to paginate through the total set using the limit and skip options. See our documentation on Pagination.

Clustering

Each API server that spins up supports clustering. When an API server comes online, it searches for other API servers that might be out there and part of the same cluster. If it finds any, it connects to them and redistributes any cache state to balance out the cluster.

Similarly, if a Cloud CMS API server goes offline, the other servers in the cluster become aware and re-balance as needed. In this way, a Cloud CMS API cluster is an ephemeral thing - servers may join and leave the cluster as demand increases or falls away.

Every API server, regardless of whether you intend it to participate in a multi-server cluster or not, requires that you provide a cluster.group.name and a cluster.group.password:

cluster.group.name=
cluster.group.password=

You can set these to anything you like. However, if you do intend to have other API servers join the cluster, they will also need to specify the same cluster.group.name and cluster.group.password.

When Cloud CMS starts up, it will bind a port that it will use to communicate with other cluster members (should they come along). As mentioned, even if your cluster is just size 1, you will need to bind this port.

cluster.port=5800

The default port is 5800 if you don't otherwise specify it. This means that any other API servers can communicate with this API server instance on port 5800.

Cloud CMS will assume the IP address of the first network interface it spots. In most cases, this will resolve to localhost or 127.0.0.1 for development or simple server configurations. For more complex configurations where you may have multiple network interfaces, you can specify which interface(s) should be considered via the following configuration:

cluster.interfaces.enabled=true
cluster.interfaces.interface=

The cluster.interfaces.interface property can either be the IP address of the interface or a comma-delimited value of multiple interfaces to consider in order.

Note that the use of interfaces applies most specifically when using multicast or tcpip discovery providers. For other provider types, such as aws or zookeeper, you will likely not need to bother with this.

In cloud deployment scenarios, where containers are distributed across multiple hosts (and across availability zones), you will need to provide a publicly accessible IP address per server instance. This public-address is an IP address which other Cloud CMS API Servers can use to connect to -this- API server.

cluster.public-address=

The public address identifies the public IP address of the box as seen by the outside world. If you have two EC2 instances, with IP addresses 12.34.56.788 and 12.34.56.789, each box will have a slightly different configuration with cluster.public-address=12.34.56.788 for the first box and cluster.public-address=12.34.56.789 for the second.

Suppose box #1 comes online first. It will bind to 12.34.56.788:5800. Now suppose that box #2 comes online. It will bind to 12.34.56.789:5800. If configured properly, it will then call out to try to find other boxes that are members of the same cluster. It will find the first box and connect to it on 12.34.56.788.

Thus, the public address is important as it establishes an IP address that the outside world can use to connect to your API server. This IP address must be publicly accessible.

If you're configuring this and get stuck, use Telnet to make sure that your cluster port can be connected to. It must be available from one API server to the other.

Note that the public-address property is typically required for aws, zookeeper and other dynamic and cloud-friendly deployment scenarios.

By default, Cloud CMS will perform the handshake described above. If it can't find something on 5800, it'll make another attempt at 5801. Then 5802 and so on. It will try three times and if nothing works, it'll consider things to have failed. This feature is provided to make it easier to have things work in cases where there are minor port conflicts or multiple API containers running on a single host.

cluster.port-auto-increment=true

Note that port auto incrementing is enabled by default.

Cloud CMS's clustering and discovery of other API servers runs automatically at startup. It then remains active throughout the lifetime of the server. As other servers come and go, your API server will log messages to indicate that it has discovered a new member or that a member left.

By default, Cloud CMS will wait 10 seconds upon startup to "allow things to settle". For low latency or typical environments, this is more than sufficient. However, for slow network scenarios, you may wish to increase this.

cluster.initial.wait.seconds=10

Cluster Discovery

When Cloud CMS servers are brought online, they discover one another and then get on about business. The exact strategy used to discover one another is configurable. The following discovery services are available:

  • Multicast
  • TCP/IP
  • Amazon Web Services
  • ZooKeeper

By default, the Cloud CMS API server is configured to use Multicast. This means that will work out-of-the-box for scenarios where containers are launched against the host's network (such as when Docker is launched with -Dnet=host`).

However, for cases where containers are launched on their own network or launched as part of a Docker Machine running on a cloud provider (such as EC2), you will need to use a different discovery mechanism.

Multicast

Cloud CMS supports multicast for scenarios where containers are deployed to the same network as the host. In this case, everything (host and containers) are bound under the same network interface and multicast can be used as a communication mechanism between the Cloud CMS API servers.

Multicast is typically very good for development servers or even test servers that run on top of a simplified or common network configuration. When Docker is launched with the -Dnet=host option, the network used by the containers is the same as that of the host and multicast applies.

cluster.multicast.enabled=true
cluster.multicast.group=224.2.2.3
cluster.multicast.port=54327
cluster.multicast.loopbackmode.enabled=true

In anything that is production grade (such as cloud deployments), you will typically have your containers running is isolated network environments and so multicast will simply not work.

TCP/IP

Cloud CMS provides a way for you to name all of the servers explicitly that are participating in your cluster. The members field in the tcpip config lets you explicitly spell out the <ip>:<port> network reachable addresses for the API server.

In this way, if you wish to have no dynamic or automatic discovery at all, you can use tcpip discovery to spell out all participants from the beginning.

cluster.tcpip.enabled=true
cluster.tcpip.members=
cluster.tcpip.timeout.seconds=-1

In general, you should think of tcpip discovery mode as a fallback if nothing else works. Running in tcpip mode trades off any dynamic detection of servers except for those that start up on the given list of IP addresses. This can be good for some scenarios but for most "cloudy" deployments, you will want to use a different discovery services such as aws or zookeeper.

Amazon Web Services

If you're running your containers on Amazon AWS (EC2), then you will likely want to take advantage of the aws discovery service. To use this service, just provide the following:

cluster.aws.enabled=true
cluster.aws.access-key=
cluster.aws.secret-key=
cluster.aws.region=
cluster.aws.timeout.seconds=-1
cluster.aws.tag.key=
cluster.aws.tag.value=
cluster.aws.iamrole=
cluster.aws.securitygroup=

When your API server starts up, it connects to Amazon's API to find other EC2 instances that came online and were running API servers for the same cluster group and password. It then auto-configures for the public IP addresses and ports of those API servers.

This is a very nice and efficient mechanism for discovery. It gives you the advantages of elastic instances (you can add and remove instances on the go) and avoids the need to detect and wire in IP addresses ahead of time.

The required properties are:

cluster.aws.enabled=true
cluster.aws.region=

You must also specify either an access/secret key pair (cluster.aws.access-key and cluster.aws.secret-key) or an IAM role (cluster.aws.iamrole).

All other properties are optional:

  • cluster.aws.timeout.seconds specifies the maximum amount of time to wait for a member of the cluster to discover another member of the cluster.
Filtering

By default, the EC2 discovery process looks across all of the EC2 instances in your region. It connects to each one and tries to authenticate using the cluster group name and password. If the EC2 instance doesn't respond or the cluster parameters do not match, the EC2 instance is filtered out. The remaining set after filtering comprises the cluster members.

For efficiency purposes, you can improve this lookup process by filtering. You can filter either on tags or security groups:

  • cluster.aws.tag.key and cluster.aws.tag.value filters to only consider EC2 instances with a matching tag key/value pair.
  • cluster.aws.securitygroup filters to only include EC2 instances in a given security group.
Public Address

There are several ways to launch Cloud CMS within AWS. These may include: the AWS Docker Engine for Docker Machine, Amazon's Elastic Container Service, Amazon Elastic Beanstalk or perhaps another way. Depending on how you launch, you'll end up with Docker containers running in a network environment over which you will have some degree of control.

The ideal scenario is one where the networking environment is running in host mode so that the container's networking environment is shared with the host. In this configuration, the default network interface is usually the public interface. The clustering mechanism usually picks up the right "public" IP address in this scenario.

However, in some environments (such as when using multiple container Elastic Beanstalk Dockerrun.aws.json files to launch), you'll simply be given a network environment. These may be bridged environments with multiple interfaces. In these scenarios, it isn't always possible for the clustering mechanism to pick out the right public IP address.

Furthermore, AWS maintains the notion of "public" IP addresses vs. "private" IP addresses for your EC2 instances. In most cases, what you're looking to use is the "private" IP address and the clustering mechanism out to be able to pick it out for you.

However, at times when it cannot, you may need to set the cluster.public-address property to nudge things along. This property tells your container how to identify itself to other members in the cluster. Suppose you have Container A and Container B. Container B starts up and calls out to Container A to see if it is a member of the cluster. Container A must reply with a "public address" that matches what Container B expects. For simple networking configurations, this is trivial and automatic. However, if you have multiple interfaces, it is possible for Container B to pick the wrong interface and thus the wrong IP address in response.

In these scenarios, you can force the public address to the private IP of your EC2 instance like this:

cluster.public-address=ec2:private-ip

If you want to force to the public IP, you can do so like this:


cluster.public-address=ec2:public-ip

ZooKeeper

If you're running your containers in the cloud and are either not running on AWS or elect not to use AWS EC2 Discovery Services for any reason, then you may choose to use ZooKeeper as an alternative.

Apache ZooKeeper provides a directory service whereby Cloud CMS API Servers register themselves as they come online. When additional services come online, they discover previous servers and so on.

The following properties must be set:

cluster.zookeeper.enabled=true
cluster.zookeeper.url=
cluster.zookeeper.path=

In effect, ZooKeeper provides the same mechanism as AWS Discovery Services but will work for any cloud provider. The only caveat is that you must run ZooKeeper yourself as part of your environment so that any API servers running can connect and utilize it as a service.

Binary Storage

Cloud CMS lets you configure one or more back-end Binary Storage providers that the system will use to persist and retrieve binary files for a given datastore. Binary Storage providers are sensitive to datastore scoped configurations allowing tenants to customize storage on a per-datastore, per-project and per-platform basis.

At its core, Binary Storage providers are configured at the Spring bean level. You can define as many Binary Storage providers as you wish. Each Binary Storage provider instance is a singleton that plays the role of manufacturing Binary Storage instances when requested by upstream services. The provider framework takes on responsibility for caching storage instances where appropriate.

Binary Storage providers are used to binary files and attachments where needed in the product. Attachments to content nodes, for example, are stored via a Binary Storage provider. As are images attached to principals, archives, projects and more.

By default, Cloud CMS has several providers wired for you out-of-the-box so as make it simple to set up global binary persistence to a single location. This is the most common use case and it also doesn't preclude further configuration and customization later.

The following Binary Storage provider types are available out-of-the-box:

  • mongodb
  • file
  • AWS_S3
  • caching
  • fallback
  • s3gridfs

To configure the global Binary Storage provider, set the following property to one of the values above:

org.gitana.platform.services.binary.storage.provider={providerType}

If not otherwise specified, the default is to use a provider type of mongodb (which is for GridFS).

Mongo DB (Grid FS) provider

This provider is configured by default and will be used if you don't override the global setting.

GridFS is a "file system" implementation that is provided by Mongo DB whereby binary files are written into Mongo DB. The advantage here is that your files will reside in Mongo DB (keeping everything in one place) and will enjoy all of the replica set and shard architecture advantages that Mongo DB affords. Furthermore, GridFS is low latency and works nicely in a clustered or distributed setting where all servers in the cluster can fall back on it as a single source of the truth.

One downside with GridFS is that, since everything goes into MongoDB, your MongoDB data partitions and volumes need to grow to accommodate the total storage size of the binary files. If you start putting really big binary files into Cloud CMS, your MongoDB storage requirements will increase likewise. This can have cost implications and may introduce some challenges from a DevOps perspective in terms of managing EBS volumes and the like.

GridFS is a good solution if your total binary file size is predictable and manageable.

To enable the mongodb GridFS as the global Binary Storage provider, set the following:

org.gitana.platform.services.binary.storage.provider=mongodb

File System provider

For non-clustered environments where you only have a single Cloud CMS API server, the file system Binary Storage provider is available and will let you store all of your binary files on local disk. This is ideal for development boxes.

This provider is not distributed or cluster-aware. It cannot be used in clustered API deployments in any capacity since each server in the cluster is essentially managing its own local store.

To enable the file system as the global Binary Storage provider, set the following:

org.gitana.platform.services.binary.storage.provider=file

You must also specify the storagePath where files are to be written:

org.gitana.platform.services.binary.storage.provider.file.storagePath=/data/cms/binaries

This directory must exist and the API process must have sufficient read/write privileges to the directory.

Amazon S3 provider

Cloud CMS can be configured to write and read binary files from Amazon S3 directly. This enables your API cluster to use Amazon's scalable S3 storage without concern for growth in local disk volumes. Since each server in the cluster communicates to S3 and uses S3 as a common resource, this Binary Storage provider implementation is cluster-safe and ready to go.

To enable the AWS_S3 backend as the global Binary Storage provider, set the following:

org.gitana.platform.services.binary.storage.provider=AWS_S3

You must then specify the Amazon API keys to use and any additional properties for the S3 connection pool:

org.gitana.platform.services.binary.storage.provider.AWS_S3.accessKey=
org.gitana.platform.services.binary.storage.provider.AWS_S3.secretKey=
org.gitana.platform.services.binary.storage.provider.AWS_S3.bucketName=
org.gitana.platform.services.binary.storage.provider.AWS_S3.maxConnections=500

One downside to using S3 directly is latency. Since every binary operation requires a network connection back to S3, latency will eventually prove to be an issue. Fortunately, Cloud CMS provides per-server local caching for optimized performance. Read on to learn more about this.

Caching provider

A caching provider lets you add cluster-aware caching to any other provider. It wraps an existing provider with caching so that binary files are written to local disk and served back from local disk whenever possible. As assets are updated, local disk cache is maintained and purged as needed across the cluster.

org.gitana.platform.services.binary.storage.provider=caching

You must set up caching like this:

org.gitana.platform.services.binary.storage.provider.caching.primaryProviderType=AWS_S3
org.gitana.platform.services.binary.storage.provider.caching.cachePath=/data/cms/binaries

In this example, the caching provider is set up to wrap around the AWS_S3 provider to offer cluster-aware, disk-based caching. Binary files are written to disk at /data/cms/binaries.

If cachePath is not provided, a temp directory path will be used (and cached disk state wil not survive server restarts).

Fallback provider

The fallback Binary Storage provider provides a safe way to layer a "master" provider on top of another (or several others) with the objective of gradually migrating binary dependencies from the other providers to the master. The fallback Binary Storage provider takes a list of providers and binds them together into this configuration.

The first provider in the list is the "master" provider. The master provider is the preferred provider.

  • When binary files are read, the master provider is consulted first. If the file isn't found there, the other providers are consulted in turn. If none of the providers have the file, a 404 is returned. If any of the providers have the file, it is streamed back.
  • When binary files are created or updated, the master provider receives the file.
  • When binary files are deleted, they are deleted from ALL providers.

This is ideal for situations where you may have data already existing in one provider (GridFS) and want to transition to using another provider (S3). In this case, you'd set S3 as the primary provider and GridFS as the other provider in the list. Binary data that cannot be found in S3 will fallback to being served from GridFS. However, any new binary data going forward will be written solely to S3.

To enable the fallback provider as the global Binary Storage provider, set the following:

org.gitana.platform.services.binary.storage.provider=fallback

And then configure the provider like this:

org.gitana.platform.services.binary.storage.provider.fallback.providerTypes=AWS_S3,mongodb

S3 with Grid FS fallback and file caching

A specific provider is offered out-of-the-box to support S3 with file caching turned on. It further supports GridFS as a fallback in case binary content pre-existed therein.

org.gitana.platform.services.binary.storage.provider=s3gridfs

You must then configure the provider like this:

org.gitana.platform.services.binary.storage.provider.s3gridfs.cache=true
org.gitana.platform.services.binary.storage.provider.s3gridfs.cachePath=/data/cms/binaries

If cachePath is not provided, a temp directory path will be used. Note that a temp directory means that the local disk cache (per server) will not survive server restarts. First requests for binary assets to newly started servers will pull down from S3 and begin rebuilding the cache anew.

We recommend using s3gridfs in production and as a simplified means of configuring S3 in production clusters.

Mongo DB

Mongo DB provides the primary data store for Cloud CMS. Cloud CMS creates a connection pool that it uses to communicate with Mongo DB while Cloud CMS is in service. You can configure Cloud CMS to connect to Mongo DB running either as a Docker container or as a standalone service.

In addition, you can configure Cloud CMS to connect to Mongo DB running as a standalone services, in a replica set or in a sharded configuration.

Hosts

Use the mongodb.hosts setting to specify a comma-delimited set of <server>:<host> entries. Each entry should be the network-accessible address of either a mongod (in the case of a standalone or replica set configuration) or mongos process (in the case of a sharded configuration).

To connect to a single mongod or mongos process, you might use:

mongodb.hosts=mongodb.mycompany.com:27017

Or connect to a multiple servers in a replica set:

mongodb.hosts=repl1.mycompany.com:27017,repl2.mycompany.com:27017,repl3.mycompany.com:27017

Cloud CMS will initialize the MongoDB driver connection and manage that connection to best take advantage of what was supplied. If you supplied a list of replicas, for example, Cloud CMS will automatically migrate between members of the replica set on failure or when you switch primary.

Authentication

By default, Cloud CMS assumes that connectivity to Mongo DB is unauthenticated. In other words, it assumes that Mongo DB has been configured in such a way as to not require authentication.

This is generally acceptable for development. But for production environments, you will want to make sure that Mongo DB is configured for authentication, that a user exists in Mongo DB with sufficient access privileges and that you supply the username and password of that user like this:

mongodb.default.authentication.required=true
mongodb.default.authentication.username=
mongodb.default.authentication.password=

SSL

By default, Cloud CMS assumes that connectivity to Mongo DB does not use SSL. If you wish to enable SSL, then set the following properties:

mongodb.default.ssl.enabled=true

Grid FS

By default, Cloud CMS will store binary files into Mongo DB's Grid FS storage system. This is specified via the following configuration option:

org.gitana.platform.services.binary.storage.provider=mongodb

You can use setting to change to a different storage provider or activate a custom implementation that you've built.

Count

Mongo DB has an interesting "feature" (which some might argue is a bug and others religiously would defend that it isn't) in that count()` operations against collections take O(n) time. This means that the amount of time needed to count rows in a collection will increase as the collection size increases.

This opens up a downside in that a request could come along and perform a count() and therefore take an unpredictable amount of time. To defend against this, Cloud CMS lets you specify the maximum amount of time you wish a count()` operation to execute before it is forced to fail. A forced fail means that the operation fails, the database cursor is released, memory is cleaned up and the API server nicely releases any resources it might be holding on to.

mongodb.defaultMaxCountTimeMs=2000

In addition, you may opt to limit the maximum number of items to count. This is an alternative to defaultMaxCountTimeMs in that you can lock down the maximum count size, letting you be sure that things can't get out of hand. This makes your request calls more predictable but also means that your total record set size may be inaccurate for large counts.

mongodb.defaultMaxCount=100000

An example - suppose defaultMaxCountTimeMs were set to 10 seconds and there were 10 million items in a collection. If you ran a count() and it took 11 seconds to execute, an exception would be raised and the operation would fail. You could raise defaultMaxCountTimeMs to 11 seconds and then things would work (but the operation would still take 11 seconds to complete).

But what if you really didn't care about whether the results had 10 million items or 1000 items. Your end users aren't going to paginate through 10 million entries are they? (or maybe they are... hmmm... it is your call)

Still, suppose we knew they'd never do that. Perhaps our UI front end doesn't even provide pagination or we have some other kind of UI control which is far more intuitive. Anyway, in that case, we could set defaultMaxCount to 1000.

Now when the count() operation occurs, it will be extremely fast. And the reason is because it takes a lot less time to return the first 1000 results than the first 10 million.

Find

Another potentially expensive operation in Mongo DB is a find(). An end user could run a query that runs for a very long time. While the query is running, the Mongo DB database connection is consumed, a thread is hung in the API server and the end user is waiting.

In general, when building scalable and fail-fast applications, we'd rather nip this in the bud. To do so, we can limit the maximum amount of time a find() operation can run like this:

mongodb.defaultMaxFindTimeMs=60000

Now, if the find() operation runs for more than the prescribed amount of time, an exception is raised, the operation fails and the resources released. Nice and clean.

Slow Queries

While developing your front-end applications, you'll occasionally experience slow queries that build up as you put in more and more content. These queries are usually slow because they haven't been indexed. Cloud CMS lets you add custom indexing to branches and so you'll want to do that.

To help you along, Cloud CMS offers the ability to log slow queries to its log file. This is corollary to the tenant log file that you already find in Cloud CMS, however it will be available to your developers and administration team.

mongodb.query.explain=false

This will produce a lot of explanations and so we only recommend this on development or non-production environments.

Write Concern

By default, the Write Concern for MongoDB is set to ACKNOWLEDGED. This means that any writes to the MongoDB database will wait for acknowledgement from the DB before proceeding. This is a relatively safe way to run as it ensures that MongoDB is aware of any any data that was committed and puts MongoDB in a position where has the opportunity to control the situation from a disaster recovery perspective.

That said, you may wish to change the Write Concern to make it more robust. The JOURNALED setting, for example, tells Cloud CMS to wait for MongoDB to acknowledge that the data was written to its journal before continuing. This takes a little longer but ensures that MongoDB can fully recover from its own journal via a repair or on next startup.

If you are running a replica set, then the W1, W2 and W3 tell Cloud CMS to wait until the data was successfully written to 1, 2 or 3 members of the replica set respectively. You may also choose to set this to MAJORITY to tell Cloud CMS to wait until the majority of replica set members commit the data before proceeding.

With replica sets, it is important to understand that the primary (W1) must be committed and the other members are eventually consistent by default. You may choose to set the non-primaries as slaveOk (done within MongoDB) to support reads from non-primary members. However, if you do this, you will need to make sure that the WriteConcern configured here enforces consistency across the non-primary members on commit (in other words, the data should be written across all members or none on each commit).

To adjust the write concern, use the following setting:

mongodb.default.writeconcern=ACKNOWLEDGED

Elastic Search provides a secondary index of searchable content for Cloud CMS. It provides full-text search and structured query against it from within Cloud CMS using the Elastic Search DSL.

When content is written into Cloud CMS, it is primarily written into Mongo DB. It then secondarily written into Elastic Search. As content is created, updated and deleted, Elastic Search has its indexes kept perfectly in sync so that text-based search is available against every branch in a repository.

Elastic Search runs as a separate service from the API server. Each API server, upon starting up, creates a connection pool to your Elastic Search endpoint that it uses to communicate, execute queries and maintain its search indexes in real time.

Cloud CMS supports Elastic Search clustering. It only needs to know the IP address of one member of the cluster and the cluster name to connect.

The configuration properties are specified like this:

elasticsearch.remote.cluster.name=elasticsearch
elasticsearch.remote.hosts=
elasticsearch.remote.defaultPort=9300

Where hosts is a comma-delimited set of <host>:<port> or simply <host> entries.

Email Provider

Cloud CMS allows you to configure an email provider that will be used to send email to people that you invite to participate in your project. The default email provider is used to dispatch those invitations and is also used to send emails during a coordinated registration process.

To set your email provider, use the following properties:

oneteam.emailprovider.host=
oneteam.emailprovider.port=-1
oneteam.emailprovider.username=
oneteam.emailprovider.password=
oneteam.emailprovider.smtp.enabled=true
oneteam.emailprovider.smtp.secure=true
oneteam.emailprovider.smtp.requiresauth=true
oneteam.emailprovider.smtp.starttls=true
oneteam.emailprovider.from=

Job Dispatcher

Each API server that spins up supports 2 primary functions. The first is to handle incoming web requests and turn them around quickly. The second is to process background jobs that are queued up in the distributed job queue. Background jobs typically include content indexing, exporting and complex mimetype conversions or data extractions.

These kinds of jobs "take a while" and so they're frequently moved off the request and placed in the asynchronous job queue. Every server in the cluster (or in an API worker cluster) that is configured to process jobs will work together to coordinate the distribution of job work by all cluster members.

By default, every API server performs both functions (web request handler and job worker). However, you can control this behavior via the following flag:

gitana.jobdispatcher.enabled=true

For more advanced and scalable deployments of Cloud CMS, you will want to run two tiers of API servers -- one tier for handling web requests and the other for working on background jobs. That way, any long-running intensive work won't steal CPU cycles from your web request handling. You can use the flag above to achieve this by enabling the job dispatcher only for the API servers in the job worker tier.

Any servers running the job dispatcher can configure the maximum number of jobs using this setting:

gitana.jobqueue.server.maxConcurrentJobs=25

You can also configure the maximum number of jobs that the job dispatcher will dispatcher per tenant platform:

gitana.jobqueue.server.maxConcurrentJobsPerPlatform=5

If you're running a multi-tenant offering, you can use this to ensure that no single tenant may draw too much job handling. If you're running single tenant, you may disable this by setting:

gitana.jobqueue.server.maxConcurrentJobsPerPlatform=-1

In addition, individual job workers can be turned on and off. This allows you to segment your API workers so that you can allocate job workers to certain servers (so that heavier tasks can be allocated to higher powered servers and so on).

gitana.workers.webcapture.enabled=true
gitana.workers.transfer.import.enabled=true
gitana.workers.transfer.export.enabled=true
gitana.workers.transfer.copy.enabled=true
gitana.workers.bulkTransactionCommit.enabled=true
gitana.workers.bulkTransactionCleanup.enabled=true
gitana.workers.export.enabled=true
gitana.workers.create-project.enabled=true
gitana.workers.binaryStorageMigration.enabled=true
gitana.workers.indexDatastore.enabled=true
gitana.workers.indexPlatform.enabled=true
gitana.workers.oneteamStartProjectCopy.enabled=true
gitana.workers.replication.export.enabled=true
gitana.workers.replication.import.enabled=true
gitana.workers.publication.export.enabled=true
gitana.workers.publication.import.enabled=true
gitana.workers.pdfPreview.enabled=true
gitana.workers.rendition.enabled=true
gitana.workers.generateThumbnails.enabled=true
gitana.workers.nodeListReduction.enabled=true
gitana.workers.index.enabled=true
gitana.workers.filefolder-reindex.enabled=true
gitana.workers.search-reindex.enabled=true
gitana.workers.finalize-release.enabled=true
gitana.workers.create-release.enabled=true
gitana.workers.indexBranch.enabled=true
gitana.workers.interactionPageInsight.enabled=true

Web Shot

Cloud CMS features the ability to take snapshots of web pages and other HTTP resources. Most of this integrated into the product automatically. The Cloud CMS analytics engine, for example, captures snapshots automatically for pages whose interaction is being reported upon.

The Cloud CMS Web Shot server provides the ability to capture these snapshots. The Cloud CMS API calls out to this server when needed.

This setting is optional. If provided, Cloud CMS will capture snapshots for your assets.

To configure the location of the Cloud CMS Web Shot server, use this property:

org.gitana.platform.services.webdriver.webshot.endpoint=

Backdoor Authentication

At times, you will want to configure a secret 'backdoor' password that allows you to log in or impersonate any user in your platform. You can do so by enabling backdoor authentication and specifying a password like this:

org.gitana.platform.services.authentication.backdoor.enabled=false
org.gitana.platform.services.authentication.backdoor.password=

Field Encryption

Cloud CMS generates a salt value that is used to encrypt fields such as password fields or credential keys and secrets. This salt value takes a number of inputs -- one of which is the a secret key that you can provide like this:

org.gitana.platform.services.encryption.secret=<anything you like>

Ticket Encryption

Cloud CMS writes back a GITANA_TICKET cookie with every API request. The ticket provides a cookie-based way to store the access token that is required for every request to the Cloud CMS API. In this way, the Cloud CMS API can be used to serve back assets directly to a browser where requests do not originate from a driver but instead of native HTML elements like IMG tags (with src attributes).

This GITANA_TICKET can be encrypted so that the access token is never out in the open. To encrypt the ticket, you simply need to provide an encryption secret for the ticket generator - liek this:

org.gitana.platform.services.ticket.secret=

Deletions

Cloud CMS retains deleted nodes in per-branch collections (known as "deletion" collections). The product provides services so that editorial teams can quickly restore deletions back to the originating branch in the event that something was deleted accidentally. The "deletions" collection (per-branch) provides a fast and efficient way to discover recent deletions and restore without a deep interrogation of the master node list.

Note that each branch's "deletions" collection is essentially a copy of the node made available to support query and recovery quickly. The actual master copy of the deleted node is always contained in the master node record. Cloud CMS is a copy-on-write system and so a master record of all nodes is always retained. In the end, your data is always recoverable.

That said, the facility of the deletions collection can be customized according to your editorial needs. Given that it tracks all deletions, the collection can grow to be quite large and so you may want to set automatic collection capping so that the total number of available deletions is limited in size over time.

To enable auto-capping, set the following properties, kind of like this:

org.gitana.platform.services.deletion.DeletionService.autocap.enabled=true
org.gitana.platform.services.deletion.DeletionService.autocap.maxSize=10000
org.gitana.platform.services.deletion.DeletionService.autocap.resetSize=7500

The maxSize setting describes the maximum number of deleted records allowed before the collection is capped back to the resetSize. In this case, if the collection has 10000 deletion records in it and you delete just one more node, the deletion records will drop their oldest entries and the collection will resize to 7500.

Third Party / OS Libraries

The Cloud CMS API runs in an OS environment that has been configured and optimized to support its runtime needs. These include OS updates and third party installations of common libraries.

ImageMagick

https://www.imagemagick.org/script/index.php

Provides Cloud CMS with services for mimetype transformation, extraction and more for image formats.

org.gitana.platform.services.transform.imagemagick.basepath=/usr/bin

FFMpeg

https://ffmpeg.org

Provides Cloud CMS with video services including mimetype conversion, extraction and image manipulation of frames.

org.gitana.platform.services.transform.ffmpeg.basepath=/usr/bin

LibreOffice

https://www.libreoffice.org

Provides Cloud CMS with support for OpenDoc and Microsoft Office Formats.

org.gitana.platform.services.transform.openoffice.enabled=true
org.gitana.platform.services.transform.openoffice.path=/usr/lib64/libreoffice
org.gitana.platform.services.transform.openoffice.portNumbers=9100
org.gitana.platform.services.transform.openoffice.maxTasksPerProcess=200
org.gitana.platform.services.transform.openoffice.taskExecutionTimeout=120000
org.gitana.platform.services.transform.openoffice.taskQueueTimeout=30000

Clam AV Antivirus

https://www.clamav.net

Provides Cloud CMS with support for virus scanning and file quarantine on upload and storage.

org.gitana.platform.services.antivirus.clamav.enabled=true
org.gitana.platform.services.antivirus.clamav.executable.path=/usr/bin/clamscan

GeoLite2 Database

https://dev.maxmind.com/geoip/geoip2/geolite2

Provides Cloud CMS with the ability to interpret latitude/longitude information. The database for GeoLite2 is included with the Cloud CMS API.

org.gitana.platform.services.geolocation.databaseFilePath=/opt/geoip2/GeoLite2-City.mmdb

Zip/Unzip

org.gitana.platform.services.zip.executable.path=zip
org.gitana.platform.services.zip.timeout=1800000
org.gitana.platform.services.unzip.executable.path=unzip
org.gitana.platform.services.unzip.timeout=1800000

Web / HTTP

The API runs inside of a servlet container. You can adjust the server-side request-handling characteristics through the following properties.

Multipart File Handling

Set the maximum size (in bytes) of any multipart requests. This is the maximum size of the entire multipart request (as a summation of the sizes of all parts). This is set to 1GB by default.

org.gitana.platform.services.webapp.multipart.maxUploadSize=1073741824

Set the maximum size (in bytes) of any individual parts in the multipart request. This is set to 512MB by default.

org.gitana.platform.services.webapp.multipart.maxUploadSizePerFile=536870912

Set the maximum size (in bytes) for a part that is allowed to reside in a memory buffer during multipart processing. This is set to 128KB by default.

org.gitana.platform.services.webapp.multipart.maxInMemorySize=131072

Set the default encoding for any parts. This is set to the servlet spec ISO-8859-1 by default.

org.gitana.platform.services.webapp.multipart.defaultEncoding=ISO-8859-1