Ingesting data from Kafka topics into the Decision Data Store
Ingesting data from Kafka topics into the Decision Data Store
Introduction[edit]
This article explains how to achieve the following goals when data for different entities, such as customers, purchases, and devices, comes from Kafka topics:
- Build the customer data model (xCAR) in the Decision Data Store (DDS) and produce the Operational Data Model (ODM).
- Handle daily batch data and real-time published to Kafka topics.
Use Cases[edit]
In general, client systems produce two types of data sets:
- Daily data sets
The client system provides daily data sets for different entities at a scheduled interval. - Real-time data sets
The client system provides data changes in real time, for example, real-time transactional updates, mobile and web transactions, and tags and deletions of customer records. This data is updated in real time, and the client system provides these changes throughout the day.
Naming conventions[edit]
The following terms are used in this article:
- Entity
Customer or xCAR tables that Pega Customer Decision Hub™ uses to provide next best actions. The client system produces data for entities, such as customers, purchases, and devices. - Extended Customer Analytical Record (xCAR)
A table which Pega Customer Decision Hub uses to provide next best actions. - Operational Data Model (ODM)
A wide table with customer data that is transformed in a way that allows Pega Customer Decision Hub to provide faster decisions. - Decision Data Store (DDS)
A Cassandra database.
Recommendations for client system setup[edit]
General recommendations[edit]
- Two Kafka topics per entity: one topic for daily data push and another topic for real-time changes.
- All relevant data for an entity is published to a topic.
- Topics and data in the topics are provided by the client systems.
Data job schedules[edit]
- Daily data
Daily ETL jobs publish customer data (xCAR) to daily Kafka topics. - Real-time data
Real-time data changes are published to real-time topics as the data changes in the client systems throughout the day.
High-level ingestion architecture[edit]
Real-time process[edit]
- Reads data from Kafka topics.
- Creates the Customer, xCAR, Realtime, and Daily tables in the Decision Data Store and inserts data into the corresponding tables.
- Posts customer IDs in the Pega Platform™ internal queue.
ODM process[edit]
- Reads a queue item, for example, a customer ID.
- Reads data from the Customer and xCAR tables in the DDS.
- Runs the process to produce the Operational Data Model.
Ingestion of daily data sets[edit]
In the daily data ingestion use case, client systems publish the customer, purchase, and device data to Kafka topics at scheduled intervals.
The following diagram shows the topics and tables that are used in the ingestion of daily data sets:
Entity streams (client system)
- Kafka topics for customer, purchase, and device data are set up on a client system.
- The client system publishes data to the Kafka topics.
Customer Decision Hub
- The Realtime Process consists of one real-time data flow per Kafka topic to capture the data posted to the topic.
- This article describes three real-time data flows that capture data from the Customer, Purchase, and Device topics.
- Every real-time data flow saves data to the corresponding table in the Decision Data Store, and then saves the data into the daily table based on the day.
- Finally, the data flows post the customer IDs to an internal Pega Platform queue to produce the Operational Data Model.
Ingestion process for daily data published to the Customer topic[edit]
The following figure shows how the system ingests daily data from the Customer topic into the DDS:
Data sets
- Customer Stream data set
A Kafka data set. Reads data from the Customer topic. - Customer DSS data set
A DDS data set. Saves customer data to a DDS table. - ODM Queue data set
The customer ID is posted to an internal stream data set to produce the Operational Data Model. - Daily data sets
DDS data sets, based on days of the week. The system saves customer data to the corresponding day table in the DDS.
Tables in the DDS
- An out-of-the-box Pega Platform capability creates tables in the DDS.
- Daily tables populated by the Customer Stream Realtime changes data flow have a similar structure.
The following figure is an example of a Customer table in the DDS:
Ingestion process for daily data published to the Purchase topic[edit]
The following figure shows how how the system ingests daily data from the Purchase topic into the DDS:
Data sets
- Purchase Stream data set
A Kafka data set. Reads data from the Purchase topic. - Purchase DSS data set
A DDS data set. Saves purchase data to a DDS table. - ODM Queue data set
By using the convert shape, customer IDs are posted to an internal stream data set to produce the Operational Data Model. - Daily data sets
DDS data sets, based on days of the week. Purchase data is saved to the corresponding day table in the DDS.
Tables in the DDS
- An out-of-the-box Pega Platform capability creates tables in the DDS.
- Daily tables populated by the Purchase Stream Realtime changes data flow have a similar structure.
The following figure is an example of a Purchase table in the DDS:
Ingestion process for daily data published to the Device topic[edit]
The following figure shows how the system ingests daily data from the Device topic into the DDS:
Data sets
- Device Stream data set
A Kafka data set. Reads data from the Device topic. - Device DSS data set
A DDS data set. Saves device data to a DDS table. - ODM Queue data set
By using the convert shape, customer IDs are posted to an internal stream data set to produce the Operational Data Model. - Daily data sets
DDS data sets, based on days of the week. Device data is saved to the corresponding day table in the DDS.
Tables in the DDS
- An out-of-the-box Pega Platform capability creates tables in the DDS.
- Daily tables populated by the Device Stream Realtime changes data flow have a similar structure.
The following figure is an example of a Device table in the DDS:
ODM process for daily data[edit]
The ODMProcess data flow consolidates the data from the Customer, xCAR, and Realtime tables in the DDS, and then produces the Operational Data Model in the DDS as a wide table.
Components
- ODM Queue data set
A stream data set. Reads customer IDs when they are posted and executes the logic in this data flow to produce the Operational Data Model. - Merge Customer Data shape
Reads customer data from the DDS and sets the customer context. - DDS data sets
- Daily data sets
Hold the daily data that the system receives at scheduled intervals. The system reads the corresponding entity data from the DDS. - Real-time data sets
Hold the real-time data changes. The system reads the corresponding entity data from the DDS.
- Daily data sets
- Merge Data based on time stamp shape
A data transform that merges the data based on the timestamp.- This step is only required if the daily data pipe produces data that is older than the real-time data available in the system. The assumption is that the real-time tables always have the latest information.
- Whether this step is required or not, depends on how the source system produces data.
- ODM data set
A wide table that holds the operational data that Pega Customer Decision Hub uses to provide next best actions.
In this example, the ODM data set saves full records to the DDS:
Tables in the DDS
ODM process for daily data with precalculations[edit]
If required, you can also inject precalculations during the ingestion process.
Data precalculations
You can use a strategy to do precalculations based on the data. The system saves this data as part of the Operational Data Model.
Customer ODM
In this example, the ODM table also holds the precalculated eligibilities along with customer records which can be used while presenting next best actions.
Ingestion of real-time data sets[edit]
In the real-time data ingestion use case, the client systems publish real-time customer, purchase, and device data to Kafka topics as soon as the data changes in the source systems.
The following diagram shows the topics and tables that are used in the ingestion of real-time data sets:
Entity streams
- The client system publishes data changes to real-time topics.
- Important: The client system publishes the full record of the changes that occurred in the customer, purchase, and device entities.
Customer Decision Hub
- The Realtime Process consists of one real-time data flow per Kafka topic to capture the data posted to the topic.
- This article describes three real-time data flows that capture data from the Customer_RT, Purchase_RT, and Device_RT topics.
- Every real-time data flow saves data to the corresponding table in the Decision Data Store.
- Finally, the data flows post the customer IDs to an internal Pega Platform queue to produce the Operational Data Model.
Ingestion process for real-time data published to the Customer topic[edit]
The following figure shows how the system ingests real-time data from the Customer_RT topic into the DDS:
Data sets
- Customer Stream Realtime changes data set
A Kafka data set. Reads the data changes published in real time to the Customer_RT topic. - Customer DSS RT data set
A DDS data set. Saves the real-time changes to the customer data to a DDS table. - ODM Queue RT data set
The system posts the customer IDs to the internal stream data set which produces the Operational Data Model.
Tables in the DDS
An out-of-the-box Pega Platform capability creates tables in the DDS. The table structure is similar to that of the daily table, but it only contains the customer data that changes in real time.
Ingestion process for real-time data published to the Purchase topic[edit]
The following figure shows how the system ingests real-time data from the Purchase_RT topic into the DDS:
Data sets
- Purchase Stream Realtime changes data set
A Kafka data set. Reads the purchase data changes published in real time to the Purchase_RT topic. - Purchase DSS RT data set
A DDS data set. Saves the real-time changes to the purchase data to a DDS table. - ODM Queue RT data set
Using a convert shape, the system posts the customer IDs to the internal stream data set which produces the Operational Data Model.
Tables in the DDS
An out-of-the-box Pega Platform capability creates tables in the DDS. The table structure is similar to that of the daily table, but it only contains the purchase data that changes in real time.
Ingestion process for real-time data published to the Device topic[edit]
The following figure shows how the system ingests real-time data from the Device_RT topic into the DDS:
Data sets
- Device Stream Realtime changes data set
A Kafka data set. Reads the device data changes published in real time to the Device_RT topic. - Device DSS RT data set
A DDS data set. Saves the real-time changes to the device data to a DDS table. - ODM Queue RT data set
Using a convert shape, the system posts the customer IDs the internal stream data set which produces the Operational Data Model.
Tables in the DDS
An out-of-the-box Pega Platform capability creates tables in the DDS. The table structure is similar to that of the daily table, but it only contains the device data that changes in real time.
ODM process for real-time data[edit]
The ODMProcess_RT real-time data flow consolidates the data from the Customer and xCAR tables in the DDS, and then produces or updates the Operational Data Model in the DDS.
Components
- ODM Queue RT data set
A stream data set. Reads the customer IDs when the are posted, and then executes the logic in this data flow to produce the Operational Data Model. - Merge shape
Reads the real-time customer data changes from the DDS and sets the customer context. - DDS data sets
Data sets that read the corresponding real-time entity data from the DDS. - Customer ODM data set
An ODM table in the DDS that the system uses to present next best actions. By using the customer IDs, the system loads the corresponding ODM records. - Merge ODM Data data transform
Merges the data into the ODM. If the event is related to purchases, the merge process adds/updates the records in the Purchases page list. - Data Pre Calculations decision strategy
If applicable, you can run a strategy to precalculate the data with the incoming real-time data changes. - ODM data set
In this example, the ODM table is updated based on the event type. For example, if the event is related to devices, the process only updates the Devices column in the ODM table as shown in the following figure:
Tables in the DDS
Summary[edit]
The system is configured with two sets of data flow rules to support the ingestion of daily data and real-time data.