Using background processes to achieve consistent outcomes at scale
|Description||Guidelines for successful batch processing|
|Version as of||8.5|
|Capability/Industry Area||System Administration|
Batch processing is a routine that typically runs in the background to process multiple items over a series of business operations, to achieve a common or consistent outcome. In large-scale batch processing the number of items in a batch can be of the magnitude of thousands, and in some instances millions.
The primary merit of batch processing is that it needs very minimal or no human intervention during the process and can be designed to scale horizontally to run in parallel, across multiple nodes and threads, to achieve the desired processing rate. Some real-world examples of batch processes are:
- An insurance claim batch file, containing multiple claims
- A request to add a new healthcare service (benefit) to a list of existing health plans.
- Generation of monthly statements for credit cards
Phases of batch processing
The entire batch process can be primarily divided into the following phases:
In this phase the batch is submitted for the system to process. Some of the ways in which a batch can be submitted to a system are:
- A case submitted by the user
- A file dropped into a listener (File listener)
- The Invocation of a service (REST, HTTP etc.)
- A scheduled routine that runs in intervals (Job scheduler)
Based on the channel of intake, the received data must be persisted in the system, preferably in an as-is state, for future reference and traceability. Preferably, you would create a case for each batch process and attach the received file or content to the case.
Scrubbing can be used as an optional sub-step in the intake phase, to perform any high-level data validations or minimal checks. Some examples are:
- Verifying that the batch file has the content in the prescribed format
- Verifying that the batch metadata is present and valid, such as the submitter’s name on a healthcare claim file, etc.
Batch processing can be designed using one of several approaches:
- Fire-and-forget: Each item in the batch is launched onto a process (a work flow or a series of business rules), from where it never returns to the trigger point. The process itself should be designed to take care of all intended and unintended behaviors, like an exception. This approach provides the utmost scalability, as each item can be run in an independent thread.
- Sequential: Each item in the batch is processed one after another, in a single thread. This paradigm can be used when:
- the items in the batch are inter-related; the outcome of one item’s process influences the subsequent item.
- the items in the batch use and/or update a common artifact, which cannot be shared or used across multiple parallel threads.
Whenever an item in a batch cannot reach the desired destination, the user should be provided with necessary actions to review and correct the situation. This could be an anticipated business scenario, like a benefit that could not be found and is needed to adjudicate a healthcare claim, or an unanticipated situation like an external service that did not respond. After taking the corrective actions, the user should be able to either:
- Resume the process from where the problem occurred
- Restart the process from the beginning
- Resolve the item, if the problem cannot or need not be rectified
The Pega Platform rules that are commonly used and recommended for batch processing are data flows and queue processors.
Large-scale batch processes can, in some cases, run for hours. So, a comprehensive reporting mechanism must be built to notify the user about:
- Completion of the process: If the batch is run in a sequential manner, the user can be notified about the completion of the process. This can be done in multiple ways, like sending an email upon completion, routing an assignment to the user, or being able to review the completion results, etc.
- Cumulative status of the processes: A report on statistics related to the batch process will help the user in understanding the trends of the outcome of the batch process. This report could include the number of items per batch, and group the items by status (completed, errored out, etc.).
- Robust error handling: Because a batch process runs in the background, any technical errors will make it extremely difficult to trace the root cause. So, a comprehensive error-trapping mechanism should be built, to report the reason for failure back to the user, who can then assist the debug process.
- Shared resource contention: If the individual items in a batch use common shared artifacts, like a healthcare plan's claim adjudication etc., necessary care should be exercised to not lock shared records, which can lead to the failure of other records.
- Rollback mechanism: All database update operations should be designed to come under a single transaction, so that a failure in the process can roll back all database writes. However, if the user can resume the process, then the process should be committed despite the error.
- Traceability of the execution path: To trace the path an item took during the process, add audit messages at the critical points in the process.
- Multi-level processing: If a case submission is the trigger for the background process, the actual background process should not be started immediately on the user’s session. The primary case should be routed to a work queue. Then, a secondary routine (queue processor or job scheduler) should trigger the processing of individual items from the primary case.