Cloud Connected

Thoughts and Ideas from the Gitana Development Team

Automatic Extraction from Attachments

When you upload files into Cloud CMS all kinds of interesting things happen.

For example, Cloud CMS automatically indexes your files for full-text search. All of the textual content is discovered and made available so your editorial teams can instantly search and find things after uploading.

Did you know that Cloud CMS can also automatically extract metadata about uploaded files and make that information available? This is done via the Cloud CMS f:extractable feature.

When you upload files, Cloud CMS is able to automatically look at the kind of file it is. Based on the kind of file, Cloud CMS readies a small army of content extractors that will interrogate the file for header and tag information. In some cases, such as with XML and JSON, the document payloads are parsed and iterated over as well to pull out interesting information.

This information is then extracted and populated onto your JSON so that your applications have access to the information at runtime.

To use this, all you need to do is add the f:extractable feature to your content instance (or content types depending on how you prefer to do things).

Let's take a look at an example.

Extracting from an Image

Suppose I have an image file called ultima5.jpg (the cover art from one of the truly great games of years past). The file is 269 pixels in width and 371 pixels in height.

ultima5.jpg

After I upload this image to Cloud CMS, I click into the Features section and add the f:extractable feature to it.

add-feature.png

When I click save, the feature will reload with the following extracted properties:

{
    "extracted": {
        "default": {
            "imageHeight": "371",
            "imageWidth": "269"
        }
    }
}

As shown here:

extracted-properties.png

All images support a few basic image-specific properties such as the ones shown above.

Cloud CMS knows how to extract from a wide variety of file types, including popular Office document formats, PDFs, text files, audio files, video files, image files and much, much more!

Different kinds of properties will be extracted depending on:

  • the type of file that is uploaded
  • the kinds of extractors that apply to that file type
  • the availability of properties within the file

By default, extraction is turned off for nodes. That is to say, the base n:node type does not have the f:extractable feature on it. This means that your content won't automatically extract as shown above - you'll have to add the f:extractable feature to your content instances directly or add them to your content types as you see fit.

For more information on extraction, please read up on the f:extractable feature here: https://www.cloudcms.com/documentation/features/extractable.html

We hope you enjoy playing with extraction! We're always looking for feedback and ways to improve. If you discover any file types that you feel Cloud CMS really should extract from (but doesn't currently), let us know!