Extracting data from documents and attachments into a case
|Description||Learn more about how to handle documents and attachments, and how to extract their data into your cases|
|Version as of||8.5|
|Capability/Industry Area||Data Integration|
Documents and document handling are essential to any digital application, as they are one of the important sources of required information for work flows. They also serve as a bridge between a Pega application and the personas that cannot directly interact with the application.
For example, a customer in a bank loan dispute has to present their income information to their financial institution, but does not have direct access to the institution’s banking application. The customer can thus present the information in an Excel spreadsheet that can then be used to populate the case with the relevant information, with the document attached to the case as proof.
When designing a feature around documents, some of the questions that you might ask yourself are:
- How will the document be uploaded to the application?
- Will any data need to be extracted to the case?
- What are the permissible document formats?
- Should the document be attached to the case?
- What actions are allowed on the attachments - can they be updated or deleted by everyone?
Ingestion, Extraction and Population of attachment data to a Case
The most common way of ingesting a document to a Pega application is by using out-of-the-box flow actions such as pxFileUpload, which uploads documents to the file system on the server, from where they can be used for any other processing, such as sourcing the extraction data or attaching the document to the case.
Documents can also be ingested and linked to a case directly by using the Attach Content smart shape, or the pyAttachContent flow action in a flow.
Based on the document type, a corresponding extraction mechanism can then be configured. For example, if the source file for extracting the information is an Excel spreadsheet, you can use the pxParseExcel file. This mechanism requires the file to be stored on the server file system, hence the use of pxFileUpload rather than case attachments would be a wise design choice
These extraction mechanisms typically populate properties on a given clipboard page or a page list, which can then be directly used to populate a case using standard rules, such as data transforms or activities.
Attachments are instances of the Data-WorkAttach-File class that are stored in a database and are linked to a work object. If you choose to use the Attach Content shape or the pyAttachContent flow action, the document will be directly attached to the case and no additional configuration will be required.
As in the previous example, if a mechanism such as pxFileUpload is used, then the document will be stored on the server file system and will have to be converted into a Data-WorkAttach-File instance and linked to the case. You can do this by creating your own activity, or by using the standard Pega API, for example, the AttachFile activity.
Your design can dictate certain restrictions to the user while accessing the attachment, for example, by making their access to it read only, or access only when the user has a certain privilege, etc. These restrictions are governed by the category rules and the configurations (like privileges, access when and create/edit/delete permissions) are housed within those rules.
Pega Platform ships with the following five category rules out of the box:
- Scanned Document
Based on your design you can use one of these default categories, extend them, or create more if the restrictions are to be made stricter or more lenient. For more information, see the Pega Community article on Creating and configuring categories for case types.