Modifying the intake process for external data
|Description||Best practices for modifying the external system data intake process by editing the file listener and queuing|
|Version as of||8.1|
|Capability/Industry Area||Data integration|
You can use several methods to modify the external system data intake process in Pega Platform. This document covers the file listener and queuing methods. To understand which method to choose based on your use case, review the following information.
Use a file listener to import and process batch data from files on a specified network location or web file storage. File listener can be configured to process files periodically, or to look for a specific name pattern. For more information, see Configuring a file service and file listener to process data in files.
Use queues to ingest and process data delivered through events, messages or message streams. You can implement the queuing method by using Pega-provided queue integrations such as JMS, or messaging integrations such as Kafka. These integrations allow your application to continuously receive and process events. For more information about JMS, see Messaging Service Overview.
For more information about Kafka, see Creating a Kafka configuration instance.
Sample scenarios for choosing a file listener or queuing
In a claims processing application, providers send batches of claims to the payers that are separated by providers or by date in EDI file format. In this case, the intake data is already in a file format. To process each file, an EDI specification is defined and processing logic is followed to separate the file into individual claims.
Because the provider application already generates the data in a flat file in the EDI format with a batch of claims, you can use a file listener to consume the file, separate the claims and create a case instance from each. The file listener can be configured to process multiple files concurrently and to recover from any failure (such as a node restart) to continue processing from the last successfully processed batch.
In a banking application, to detect fraudulent transactions, transaction events are pushed to a Pega application. In this case, the intake data is not delivered as a batch in a file and instead is sent as small messages with containing properties of the transaction.
For this scenario, we can use any queuing method that establishes an integration mechanism to push the data from the banking application to the Pega application in a data format that both systems support, such as JSON. Since each message can be ingested and processed separately and multiple messages can be processed in parallel, the system can support a high throughput of messages.
File listener vs queuing
|Pega (OOTB)||File listener||Queuing|
|Configuration||Pega-provided file listener rule with many configuration options.||Pega-provided integration rules for JMS and MQ.
For Kafka, you can use a dataset with real-time processing.
|Message size limit||Supports large file sizes, but could be constrained by the available memory.||Yes, based on the type of queuing technology used and its constraints.|
|Infrastructure to process the message in chunks and records||Yes, built-in configuration to identify how to process the message in chunks and records.||No, messages are expected to be small and represent a single or small number of records.|
|Concurrent processing of multiple files and messages||Yes, multiple files can be processed concurrently, but each file will only be processed by a single thread at a time.
The file listener can be configured to scale horizontally using node types, and vertically using multiple threads on each node.
|Yes, in each Integration rule.
For Kafka, use the real-time processing mechanism to configure threads and node types.
|Risk of message processed before it is fully loaded||Yes. If the file is large, there's a risk the file listener will detect it and begin processing it before the file has been completely transferred/created.
When possible use a temporary file name until the file is ready to be processed, and then rename the file to the name or pattern expected by the file listener.
For some storage systems, the file is not detectedable/available until it is complete.
Choosing between a file listener and queuing
Choose the file listener:
1. If you have an existing system that produces the files.
2. If the order between processing rows of data in the file needs to be maintained.
3. If there is no requirement for particularly high-throughput or low latency.
1. If you have an existing queue where the messages are published.
2. If performance is a concern. With queues, you can scale within the container by using threads and multiple nodes in a cluster.
Data import from the UI
Pega Platform provides a data migration wizard to import CSV files. For more information, see Importing external data.