Integrating Google Cloud Vision

From PegaWiki
This is the approved revision of this page, as well as being the most recent.
Jump to navigation Jump to search

Integrating Google Cloud Vision

Description Integrating Google Cloud Vision into Pega using native Data Pages, Connect-REST, and declarative properties
Version as of 8.1
Application Pega Platform
Capability/Industry Area Data Integration



Google Cloud Vision[edit]

What if Pega was able to interpret images and take appropriate action? Google Cloud Vision (GCV) is a powerful API that integrates perfectly with our platform and dramatically enhances its capabilities with computer vision. If you're new to the party, computer vision (CV) can be defined as "an interdisciplinary scientific field that deals with how computers can be made to gain high-level understanding from digital images or videos" (Wikipedia). But instead of training your own model, GCV comes with pre-trained models that offer a wide variety of features.

Use case examples[edit]

Here are some examples where GCV can help:

  • Face detection: locate faces along with likely expressions such as joy or anger
  • Landmarks: detect names and geo coordinates for certain landmarks
  • Logos: locate individual logos
  • Labels: provide a generalized description of what is shown in the image
  • Text detection and document text detect: detect text via optical character recognition (OCR) in a hierarchy (pages, blocks, paragraphs, words) and language
  • Objects: detect individual objects in an image

Before you begin[edit]

You will need an account for Google Cloud Services. While the first 1,000 calls per month are free, you are required to setup at least one active payment method. Prices are in the range of $1.50 per feature per 1,000 requests. One example taken from Google's Pricing page - 5,000 images with a single feature would cost you about $6 per month. After that you need to create a new project and enable the Google Cloud Vision API. Finally, note that all requests need to be authenticated, so you will either need an API key or create an OAuth token. This guide uses the former.

Here are some more useful links and tools to help you get started:

  • Postman is a cross-platform app allowing you to test REST APIs. You can use this to create and verify requests and the associated JSON payloads.
  • Make a Vision API request shows you how request and response JSONs will look like as well as the appropriate endpoints, and how to enable different features.
  • Try it! is useful if you just want to test what the API can do for your image. You can use this in addition to your demo to show features that you didn't cover, plus the way polygons are shown around detected objects or faces are quite useful as well.
GCV tryit.png

Process/Steps to achieve objective[edit]

Using the Wizard[edit]

Pega works declaratively. This means no manual calls of the web service in an activity, no Connect-REST - just a data page that is referenced in a case. As soon as this page is needed, Pega will automatically issue the request. Since App Studios wizard can't work with a JSON payload yet (as of 8.3), you would need to use Dev Studio. As you provide the following URL, Pega automatically exposes the only query string parameter (your API key - do not share this with anyone). Also, make sure to create the Content-Type header which we will later on set to application/json: https://vision.googleapis.com/v1/images:annotate?key=YOUR_KEY_GOES_HERE GCV rest 1.png

The next step is straight-forward. GCV supports POST only, so check this method:

GCV rest 2.png

In the data model step, select Add a REST response and provide the following:

  1. Your API key
  2. Content-Type should be application/json
  3. Paste the sample request below (JSON)
  4. Click Run.

GCV rest 3.png

The result should look like the following screenshot. Select Submit and Next.

GCV rest 4.png

On the final screen, just accept defaults and select Create. Pega will then automatically create the data type, data page, and the appropriate request and response data transforms for you (and plenty of other records).

Recommended modifications[edit]

While Pega preparing almost everything for us, there is one thing you should change. By default, GCV takes a base64 encoded image, essentially transforming a binary file into text. Said text is then embedded in the request JSON.

GCV data transform 1.png

Since you're likely to send a different image to GCV each time, this should become a parameter. In Dev Studio, head over to your Data types and open the data page just created by the wizard (you may need to select your data type to show up first):

GCV showing data pages.png

Then, add a parameter named imageBase64. While you're at it, make both parameters required; if you want, you can also hard-code your API key here which is handy for demos.

GCV data transform 2.png

On the definitions tab, open the request data transform. Note that parameters are passed to this transform automatically (use current parameter page is checked). In addition, the wizard only picked up the first feature of the array - if you used above example this should be WEB_DETECTION. Change two things:

  1. Use Param.imageBase64 as source for content
  2. Duplicate the features node to enable as many features in parallel as required.

GCV data transform before.png

And here is the data transform after the changes. Feel free to experiment with the settings - for example, you could expose maxResults or the requested feature as additional parameters.

GCV data transform after.png

Note that you might need to do the same with the response data transform. For example, fullTextAnnotation isn't present if you used the sample request provided above).

After making above changes, make sure to test-drive your data page; you can use this page for converting an image to base64. If everything goes well, the results should look similar to the following figure:

GCV data transform test.png

Creating a demo case type[edit]

Next, you want to use your data page in a sample case type. The lifecycle consists of two simple steps - uploading an image followed by presenting the results in a view:

GCV demo case type stages.png

The recommended data model is explained below:

GCV demo case type data model.png

  1. Image - this field will contain the image uploaded by the user.
  2. Image Base64 - this field will hold a textual representation of the uploaded image (i.e. a base64 encoded string). Images are a lot of data, so make sure to select an appropriate max length (I usually go for 2147483647 which is the maximum positive value for 32-bit signed integers).
  3. Image Base64 Inline - this field is purely optional and used to render the image when presenting results. Again, this uses a max length of 2147483647.
  4. Image MimeType - again, purely optional and only used for rendering an image.
  5. Response - a data reference to your data type and data page. Note that our parameter imageBase64 is linked to the Image Base64 field of our case.

In addition, a data transform is required between the two assignments. This data transform sets all Image- related fields to the appropriate values. GCV demo case type data transform.png

  • .ImageBase64 is set to pyAttachmentPage.pyAttachStream, which already contains the base64 encoded data of the image uploaded by the user. Note that this isn't ideal because users may upload additional images or remove attachments, but it should suffice for demos.
  • The optional field .ImageMimeType is set to pyAttachmentPage.pyAttachMimeType.
  • The second optional field .ImageBase64Inline is set to "data:"+pyAttachmentPage.pyAttachMimeType+";base64,"+pyAttachmentPage.pyAttachStream. This so-called data URL will later on be used to display the image in an icon control.

Rendering the image in a section[edit]

While the attachment data type is ideal for uploading an image and showing a thumbnail, having a full-sized image might be preferable. This is where ImageBase64Inline comes into play: just add an image/icon control to your section, and configure it as follows.

GCV image control.png

Displaying table data in a section[edit]

Since most of features are returned as arrays (0..n faces, labels, web entities), tables are a good way to display results. Keep in mind that GCV also supports batch processing of images at the same time -- in this case you'd be seeing multiple responses. For our demo we're working with a single image at one time, hence hard-coding the first element in the list: .Response.response(1).

GCV table control.png

Results[edit]

Here are some examples that make use of GCV and a section (view) in Pega:

Face detection[edit]

Below figure shows how face detection for a crowd of people works. In addition to a confidence, GCV provides information about possible emotions such as joy or surprise, features such as headwear, or properties related to the photograph itself such as underexposure or blurriness. You can use this in your demos to detect if a headshot provided by the customer has low quality (underexposed, blurred), or is against certain regulations (e.g. not smiling, not wearing any headwear).

Faces and expressions detected by Google Cloud Vision

Logo detection[edit]

GCV also provides a solid logo detection. Think of a roadside assistance case - the customer takes a picture of the car, and GCV automatically picks up make (and even model in some cases).

Logos from a billboard detected by Google Cloud Vision

Text Annotations, Labels and Web Entities[edit]

Below figure shows several possible use cases. Note how GCV correctly classifies the image as a driver license from California? You can use that in your demos, customer on-boarding for example. While this isn't due diligence with regard to ID verification, it can at least sort out images that are unlikely an acceptable

form of identification. Another aspect is shown below - GCV can read text via optical character recognition (OCR). This doesn't just give you individual words, but also metadata such as hierarchical information - pages, sections, paragraphs, and the most likely language.

Text extracted from a California Driver's License

Web Entities[edit]

Web Entities are also a kind-of label, but they compare your image to similar images on the web. This comes in different flavors - below figure shows fitting entities for the photograph (in this case Cafe, Coffee, Restaurant, Bakery). Web Entities can also detect visually similar images and where they were found - you could use this to detect license or IP violations.

Web entities extracted from an image

There is more![edit]

As laid out earlier, take a look at all available features. GCV can do much more - you can detect if a photo is likely to contain adult content, shows violence, and more.