Cloud Connected

Thoughts and Ideas from the Gitana Development Team

Gitana 4 Roadmap – Job queue performance and management

With the early arrival of Gitana 4.0, we’ve improved the product to deliver a number of important improvements to our customers -- the user interface has been enhanced to provide a better editorial experience, the publishing engine now uses deltas and differencing for faster releases with smaller payloads and we’ve baked both generational and discerning AI services into the foundation of our product, just to name a few.

In this article, I’d like to provide some insight into our future direction. Specifically, I’d like to highlight our active investments into our Distributed Job Engine.

Distributed Job Engine

The Distributed Job Engine is a cluster-wide service that spawns and coordinates workers to execute long-running tasks or jobs. These are sometimes thought of as background tasks. The Job Engine is used to coordinate publishing and deployment operations, perform real-time replication, integrate with AI services, index content for full-text search, transform MIME type payloads from one format to another, extract text and metadata from files, classify and tag documents into taxonomies and more.

While the job engine is efficient and horizontally scalable in 4.0, we have identified avenues for improvement that are truly exciting. These include scheduling improvements, the introduction of fast lane assignment, dynamic reallocation, predictive routing and enhancements to reporting, events and notifications.

Scheduling

Our support for priority queues will improve to allow for a configurable, rules-based assignment resource requirements and limits for individual jobs. This will allow the scheduler to not only allocate jobs based on priority but also on required service levels and resource needs. This will empower the scheduler to allocate workers for higher priority jobs onto pods that guarantee a required service level and affinity (i.e. adequate CPU and memory to tackle the task at hand).

When all is said and done, developers will be able to launch jobs that schedule with higher or lower priority and execute within a much tighter deviation for quality of service.

Fast Lanes / Multiple Queues

Our Scheduling improvements will also include a simpler model – i.e. the notion of “fast lanes”. In effect, these are separate queues whose parameters are specified in the queue configuration itself. This frees developers from having to assign those parameter at the time that a job is submitted.

Customers will be able to separate out “fast lane” queues that automatically allocate to pods with more memory and more available resource. Some queues can be configured to take priority over others. This makes it easy for customers to monitor the quality of service of their executing jobs and make adjustments at the queue level to accommodate variations in demand.

Dynamic Reallocation

Workers that execute in the cluster can transition jobs into different states. They can even pause jobs, interrupt them or reschedule them. However, when priority work arrives, long-running and lower priority jobs sometimes need to be not only paused, but reallocated onto different pods running in the cluster.

Dynamic reallocation provides the ability for jobs to be paused, have their state passivated and then have that job remounted onto a new pod running elsewhere in the cluster. Either immediately or at a later point in time.

While this ability exists for some job types in Gitana 4.0, we will be extending it to all job types. This will support some of our improvements to priority scheduling by allowing the scheduler to query, interpret and potentially reallocate jobs that are already in-flight.

Predictive Routing

With additional metrics being gathered for jobs and executing workers, we will see increased usage of predictive artificial intelligence models to make determinations about optimal scheduling.

These models use historical information about the past performance of jobs to make future decisions on how best to allocate jobs onto worker pods. These decisions incorporate predictions about a job that take into account factors like potential execution time, memory and CPU consumption.

For jobs that execute on content in branches, these predictive services will also aid scheduling decisions that are predicated on branch locks, the number of content items being operated upon and more. These factors will play an important role in increasing parallelism and improved throughput for operations that would otherwise block based on branch-level locking.

Reporting, Events and Notifications

The additional metrics collected will be available via the API and from within the user interface. Customers will be able to inspect individual jobs (as they can now). But they’ll also be able to inspect queues to understand and validate the intended quality of service for any individual queue.

Each queue will allow for custom limits and event handlers to be configured. When an individual queue’s quality of service tests those limits, an event is raised that will trigger an event handler.

Customers can use this feature to send notifications (such as an email, Slack notification or SMS message). Or they can configure automated actions or even server-side scripted code that runs and handles the event as they best see fit.

Summary

We’re really excited at Gitana about these features. These improvements to scheduling will result in increased throughput and even better performance for our customers. We’re also excited to give our customers more control and visibility into their job executions.

Chat GPT and Cloud CMS

Lately, the the new development taking the tech world and media by storm is Chat GPT - an incredible new chatbot from Open AI which is capable of producing clear and well worded text of all kinds, from instructions to build a treehouse to poems written from the perspective of a pirate. While not perfect, the potential of this technology is immense, and got us to thinking, how might an AI of this calibur be applied to the future of content management?

As it turns out, the training process for Open AI's models comprehensively scraped sites across the internet, and this included our own well indexed documentation for both Cloud CMS and our forms engine, Alpaca JS. This meant that Chat GPT had an understanding of and was able to provide explanations and answer questions about all sorts of Cloud CMS concepts, although sometimes with some small inaccuracies. It can even generate code samples, take this example custom field extension that it was able to generate:

customField.png

What got us particulary excited was the ability that the bot had to take English descriptions of content models, and convert it into usable JSON. Take this example we generated for a content model of a car:

generatedModel.png

The possiblities of this seem incredible! The idea that one could write plain English descriptions of a content model, and receive a usable encoded JSON schema once would not even be considered a possibility, but now seems at our fingertips. So, we decided to mock up an extension to our UI which would allow for content models to be generated by Open AI's text completion API, which could then be sent right into the Cloud CMS UI for immediate editing and use.

askGpt.png

describeModel.png

loadingModel.png

contentModelJson.png

contentModelUi.png

All of this from just a description! There is still a long way for this to go, as depending on how the input text is formatted and written, the output can vary and be inaccurate and thus unusable. But this technology is constantly improving by leaps and bounds, and a new GPT-4 powering the next generation of these chat bots seems to be on the near horizon which promises even more vast advancements over the currently used GPT-3.

Regardless, here at Cloud CMS we are very excited to see where this will go and what wild and crazy things we and our customers will be able to build with these AI tools. We will be looking soon to implement text completion as a new External Service Integration as a way to get AI insights based on fields in your content, so stay tuned!

Requirements for CMS Publishing

What does "publish" mean?

In a CMS, publishing means moving content from a draft state to a published version. It will update the content on the published target(s) with the changes made since the last publish date. This sounds simple enough but the requirements for publishing are usually far more complex and interesting.

What Publishing must do

  1. Publish must work
    Seems like this is stating the obvious but it is critical. Must be able to confidently publish desired content to the right place and at the right time.

  2. Collaboration
    Publishing is usually more than changing the status on content from “draft” to “live” status. Changes to content often need to occur together and therefore get published together. Alternatively some changes are independent of other changes and should be published separately. Supposing you have an update scheduled to occur every hour in a day, e.g., product updates on black Friday. The ability to separate each scheduled release to review, maybe to cancel or update, are all perfectly reasonable requirements. Cloud CMS uses Editorial Tasks (a mini branch) to collaborate on changes, when finished the Editorial Task is published now or in the future. This is a perfect mechanism to collaborate on content and then publish as required.

  3. Related content / Structured Content
    A simple example: edit and publish a news article. The news article also has related content such as images. When publishing the news article any related content should also be published otherwise the article would appear broken on the target. It may be difficult for the editor to know all the related content and could be very difficult as a manual task to determine what needs to be published. In Cloud CMS the relationships are defined in the Content Model. Therefore, when content is published in Cloud CMS, the related content is also published. Note: the publish will only publish content that has been updated and the related content that has not already been published

  4. Scheduling
    Content needs to be published either immediately or at scheduled in the future. This may be a single change, a small group of changes, or a major campaign or launch. Often content need to be published at precise times, prepared in advance, reviewed. With Cloud CMS there are no limitations. You can schedule for any date/time in the future. You can even add your changes in to an existing scheduled release. You can review your scheduled changes and even go back to past scheduled releases to review what was in that release.

  5. Accurate and Timely reporting and status
    When content is published (or scheduled to be published) the editorial team needs to know the status: whether scheduled, or the content is successfully published on the target, or whether there are any errors.

  6. Merge Conflicts
    A Conflict occurs when any two users make changes to the same piece of content such that their changes are incompatible. In Cloud CMS when a user tries to publish the conflicting change, Cloud CMS identifies the issue and asks the editorial team to determine which of the two sets of changes to keep. We also have an option to allow for a merge conflict to always accept the latest change.

  7. Multiple Publishing Targets
    What if you need to publish:

  • different content to different targets?
  • the same content to multiple targets?
  • Large and frequent publishing to various targets.

Cloud CMs publishing let you define deployment targets and then within the project define the publishing strategies. Therefore, the editor can automatically publish to the desired targets without the need to call devops.

  1. Workflow Approval process
    Usually, some level of review and approval is required in a publishing process.
  • Preview changes
  • Track changes
  • Comment of a set of changes
  • Finally, approve or reject (or cancel) With the Cloud CMS publishing Workflow Comments can be added and workflow events defined. Changes can be reviewed at any time and a preview option is available for a branch (in this case either the editorial task or a schedule release are branches so they can be previewed)
  1. Priority of content publishes
    Some content publishes are more critical than others. Eg., news alert that must go out now, a message for the CEO, an external site update is more important than an internal site update. With Cloud CMS we address this issue is a number of ways:
  • Making the publish as efficient as possible
  • Job Bandwidth: reserving bandwidth for certain projects, jobs, users
  • Job Queues: additional queues for certain projects, jobs, users

More information

With Cloud CMS we try provide solutions for now and the future. We do not want you just meeting 60% or less of your requirements or to be limited in your use of the CMS. We want you to meet all your requirements now and in the future. Our Publishing Solution has grown with our experience and our customers feedback. If you want to find out more, demo, request a free trial please contact us at info@cloudcms.com