Cloud Connected

Thoughts and Ideas from the Gitana Development Team

Introduction to Changeset Versioning

Cloud CMS provides you with content repositories that are powered by a “changeset” versioning model.  This a powerful versioning model that you won’t find in most conventional CMS products.  It’s one of the reasons why Cloud CMS is such a great platform for collaboration!

Document-level Versioning

A lot of legacy CMS products feature document-level versioning.  With document-level versioning, when you make a change to a document, the system simply increments a version counter.  You end up with multiple versions of your document.

It might look something like the following:

We all have or had an awesome grandparent who knew how to cook something good. For a recipe stored in a Microsoft Word file, the document-versioning model works pretty well!

Problems with Document-level Versioning

That said, there are some major drawbacks.

  1. Desktop Documents Only.  Document-level versioning is really only good for desktop documents (like Microsoft Office files) where everything (all of your nested images, fonts, etc) are contained within a single file.

    That’s why Dropbox uses file-level versioning.  It makes sense for people who work almost exclusively with desktop documents.
     
  2. No way to handle Sets of Changes.  If you’re working on mobile applications, web sites, or just about any non back-office projects, your content will be spread over multiple files.

    Think about a web site.  A web site might have hundreds or thousands of files - things like HTML, CSS, JS, image files and much more.  When you publish a web site, you really want to version the full set of files all at once so that you can push, pull and roll back updates to your web site.
     
  3. Bottlenecks.  If you’ve ever worked with Microsoft Sharepoint or any document-versioning CMS, then you’re aware of the bottlenecks that get introduced when two people want to work on something at the same time.  Either they both make changes (and you have to manually merge them together) or one person locks the file and the other person is sits on their hands.

    Most products that feature document-level versioning do so simply because it’s easy to implement.  However, it leaves your business users with the extremely limited tools for collaboration.  This makes collaboration frustrating as it cuts off people’s initiative, creativity and productivity.
     
  4. No ability to scale.  Okay, so let’s suppose now that you want to scale your content ingestion and production capabilities out to the broader world.  You might want to pull in content from Twitter, Facebook or Quora in real-time.  And let a broad community collaborate together…

    Nah, forget it.  With document-level versioning, that’d be like give everyone a phone and telling them to call each other.

    And then only giving them one phone line.

Changeset Versioning

Fortunately, this problem has been solved.  The solution comes out of the source control world and it is known as distributed “changeset versioning”.

If you’ve ever used Git, Mercurial or any modern source control software, then you’re already familiar with the concept.  It’s been around for awhile and has become extremely popular since it enables folks to work unimpeded, fully distributed and without any of the headaches of file locking and so forth.

It should be noted.  Cloud CMS is the only Content Management System to offer changeset versioning.  We’re it.  Why?  I suppose because it is hard to implement.  

And maybe because everyone else is busy chasing the desktop document problem.  However, if you’ve ever try to build a web or mobile app or tried consuming social content from Twitter, Facebook, LinkedIn, etc… well, then you know it’s all about JSON, XML, object relationships, lots of composite documents, highly concurrent writes and reads and so on!

Only your sales person will believe that a document-versioning system could be used for that purpose!

Changeset Versioning: The Basics

This article by no means intends to provide a Masters thesis on how changeset versioning works.  However, lets delve into the basics!

Let’s start with writing, editing and deleting content.  

When you write content into the Cloud CMS repository, your content gets stored on a “changeset”.  A changeset is a lot like a transparency (from the old transparency projector days).  This is a see-through sheet of plastic that you write on with one of those Sharpie pens.  The projector projects whatever you write up onto the screen.

The cool thing about transparencies is that you can layer them, one on top of the other.  What ends up getting projected is the composite of everything layered together.

So when you write content, the repository basically gets a new transparency and puts your content onto it.

If you make a change, it gets out another transparency, writes your change and layers it on top.

It also does this if you delete something.  It gets out a new transparency, masks (or covers up) your content so that it appears deleted.  

However, your content isn’t really deleted.  It is safe and tucked away somewhere in the stack of transparencies.  It’s just been hidden by the top-most transparency!

You can write as many things onto a changeset (transparency) as you want.  Cloud CMS manages the changesets for you, keeps them in a nice stack and lets you roll back changes if you make a mistake anywhere along the way.

Changeset Versioning: Branches and Merges

As noted, Cloud CMS manages your changesets for you.  The “stack” of changesets is known as a Branch.  As you add more changesets to the branch, the length of the branch gets longer (just like the stack of transparencies gets thicker).

A read operation simple pulls information out of the repository.  A write or a delete adds a new changeset.  Consider the branch shown below.  The reading operation just peeks at the branch looking down from the top.  The writing operation adds a new changeset.

With just a single branch, you can still get into the situation where two people want to change the same file at the same time.  Cloud CMS lets you lock the object and all that kind of thing if you want.  Or, you can create new branches so that everyone can work together at the same time and on the same things.

It kind of looks like this:

Here we have two workspaces.  Each workspace has its own branch which was stemmed off of the Master Branch at changeset V5.  The first user works on Branch A and the second user works on Branch B.  Both Branch A and Branch B have a common ancestor (changeset V5 in the Master Branch).

This allows both users to do whatever they want without stepping on each other’s toes. They can update documents, delete things and create new content.  At any time, they can push and pull changes between their workspace and any other workspace.  This gives them a way to preview what other people are working on and merge their work into their own branches.  They can also merge back to the Master Branch.

Cloud CMS provides an elegant merge algorithm that walks the changeset history tree from the common ancestor on up.  It uses a JSON differencing algorithm to allow for JSON property-level conflicts (as opposed to document level conflicts).  And it provides content model and scriptable policy validation for the merged result.

The result is a highly collaborative experience that encourages your users to experiment and take a shot at contributing without the worry of blocking others or screwing up the master content.

In a future blog, we’ll cover the details of how branching and merging works.  Our approach is one that did not seek to reinvent the wheel but rather ride on top of the wonderful innovation that has already occurred over the last decade within source control tools like Mercurial, Git and Bazaar.