Ingesting data from Kafka topics into the Decision Data Store

From PegaWiki
This is the approved revision of this page, as well as being the most recent.
Jump to navigation Jump to search

Ingesting data from Kafka topics into the Decision Data Store

Description Learn how to configure you Pega Customer Decision Hub application to ingest customer data that a client system posts to Kafka topics.
Version as of 8.5
Application Pega Customer Decision Hub
Capability/Industry Area Data Management



Introduction[edit]

This article explains how to achieve the following goals when data for different entities, such as customers, purchases, and devices, comes from Kafka topics:

  • Build the customer data model (xCAR) in the Decision Data Store (DDS) and produce the Operational Data Model (ODM).
  • Handle daily batch data and real-time published to Kafka topics.

Use Cases[edit]

In general, client systems produce two types of data sets:

  • Daily data sets
    The client system provides daily data sets for different entities at a scheduled interval.
  • Real-time data sets
    The client system provides data changes in real time, for example, real-time transactional updates, mobile and web transactions, and tags and deletions of customer records. This data is updated in real time, and the client system provides these changes throughout the day.

Naming conventions[edit]

The following terms are used in this article:

  • Entity
    Customer or xCAR tables that Pega Customer Decision Hub™ uses to provide next best actions. The client system produces data for entities, such as customers, purchases, and devices.
  • Extended Customer Analytical Record (xCAR)
    A table which Pega Customer Decision Hub uses to provide next best actions.
  • Operational Data Model (ODM)
    A wide table with customer data that is transformed in a way that allows Pega Customer Decision Hub to provide faster decisions.
  • Decision Data Store (DDS)
    A Cassandra database.

Recommendations for client system setup[edit]

General recommendations[edit]

  • Two Kafka topics per entity: one topic for daily data push and another topic for real-time changes.
  • All relevant data for an entity is published to a topic.
  • Topics and data in the topics are provided by the client systems.

Data job schedules[edit]

  • Daily data
    Daily ETL jobs publish customer data (xCAR) to daily Kafka topics.
  • Real-time data
    Real-time data changes are published to real-time topics as the data changes in the client systems throughout the day.

High-level ingestion architecture[edit]

Entity streams are connected to Customer Decision Hub which uses a Cassandra database.

Real-time process[edit]

  1. Reads data from Kafka topics.
  2. Creates the Customer, xCAR, Realtime, and Daily tables in the Decision Data Store and inserts data into the corresponding tables.
  3. Posts customer IDs in the Pega Platform™ internal queue.

ODM process[edit]

  1. Reads a queue item, for example, a customer ID.
  2. Reads data from the Customer and xCAR tables in the DDS.
  3. Runs the process to produce the Operational Data Model.

Ingestion of daily data sets[edit]

In the daily data ingestion use case, client systems publish the customer, purchase, and device data to Kafka topics at scheduled intervals.

The following diagram shows the topics and tables that are used in the ingestion of daily data sets:

Customer, Purchase, and Device entity streams are connected to Customer Decision Hub, which uses a Cassandra data base.

Entity streams (client system)

  • Kafka topics for customer, purchase, and device data are set up on a client system.
  • The client system publishes data to the Kafka topics.

Customer Decision Hub

  • The Realtime Process consists of one real-time data flow per Kafka topic to capture the data posted to the topic.
  • This article describes three real-time data flows that capture data from the Customer, Purchase, and Device topics.
  • Every real-time data flow saves data to the corresponding table in the Decision Data Store, and then saves the data into the daily table based on the day.
  • Finally, the data flows post the customer IDs to an internal Pega Platform queue to produce the Operational Data Model.

Ingestion process for daily data published to the Customer topic[edit]

The following figure shows how the system ingests daily data from the Customer topic into the DDS:

A Customer Stream data flow in Dev Studio contains several components.

Data sets

  • Customer Stream data set
    A Kafka data set. Reads data from the Customer topic.
  • Customer DSS data set
    A DDS data set. Saves customer data to a DDS table.
  • ODM Queue data set
    The customer ID is posted to an internal stream data set to produce the Operational Data Model.
  • Daily data sets
    DDS data sets, based on days of the week. The system saves customer data to the corresponding day table in the DDS.

Tables in the DDS

  • An out-of-the-box Pega Platform capability creates tables in the DDS.
  • Daily tables populated by the Customer Stream Realtime changes data flow have a similar structure.

The following figure is an example of a Customer table in the DDS:

The table contains customer records that include customer IDs, credit status, payment history, and so on.

Ingestion process for daily data published to the Purchase topic[edit]

The following figure shows how how the system ingests daily data from the Purchase topic into the DDS:

A Purchase Stream data flow in Dev Studio contains several components.

Data sets

  • Purchase Stream data set
    A Kafka data set. Reads data from the Purchase topic.
  • Purchase DSS data set
    A DDS data set. Saves purchase data to a DDS table.
  • ODM Queue data set
    By using the convert shape, customer IDs are posted to an internal stream data set to produce the Operational Data Model.
  • Daily data sets
    DDS data sets, based on days of the week. Purchase data is saved to the corresponding day table in the DDS.

Tables in the DDS

  • An out-of-the-box Pega Platform capability creates tables in the DDS.
  • Daily tables populated by the Purchase Stream Realtime changes data flow have a similar structure.

The following figure is an example of a Purchase table in the DDS:

The table contains purchase records that include purchase type, credit status, amount, and date.

Ingestion process for daily data published to the Device topic[edit]

The following figure shows how the system ingests daily data from the Device topic into the DDS:

A Device Stream data flow in Dev Studio contains several components.

Data sets

  • Device Stream data set
    A Kafka data set. Reads data from the Device topic.
  • Device DSS data set
    A DDS data set. Saves device data to a DDS table.
  • ODM Queue data set
    By using the convert shape, customer IDs are posted to an internal stream data set to produce the Operational Data Model.
  • Daily data sets
    DDS data sets, based on days of the week. Device data is saved to the corresponding day table in the DDS.

Tables in the DDS

  • An out-of-the-box Pega Platform capability creates tables in the DDS.
  • Daily tables populated by the Device Stream Realtime changes data flow have a similar structure.

The following figure is an example of a Device table in the DDS:

The table contains device records that include the device model and type.

ODM process for daily data[edit]

The ODMProcess data flow consolidates the data from the Customer, xCAR, and Realtime tables in the DDS, and then produces the Operational Data Model in the DDS as a wide table.

The ODMProcess data flow for consolidating data for Customer, xCAR, and Realtime tables from DDS and producing the Operational Data Model in DDS.

Components

  • ODM Queue data set
    A stream data set. Reads customer IDs when they are posted and executes the logic in this data flow to produce the Operational Data Model.
  • Merge Customer Data shape
    Reads customer data from the DDS and sets the customer context.
  • DDS data sets
    • Daily data sets
      Hold the daily data that the system receives at scheduled intervals. The system reads the corresponding entity data from the DDS.
    • Real-time data sets
      Hold the real-time data changes. The system reads the corresponding entity data from the DDS.
  • Merge Data based on time stamp shape
    A data transform that merges the data based on the timestamp.
    • This step is only required if the daily data pipe produces data that is older than the real-time data available in the system. The assumption is that the real-time tables always have the latest information.
    • Whether this step is required or not, depends on how the source system produces data.
  • ODM data set
    A wide table that holds the operational data that Pega Customer Decision Hub uses to provide next best actions.

In this example, the ODM data set saves full records to the DDS:

ODM data set destination configuration with the Save full record option selected.

Tables in the DDS

A DDS table with purchase data.

ODM process for daily data with precalculations[edit]

If required, you can also inject precalculations during the ingestion process.

The ODMProcess data flow with data precalculations.

Data precalculations
You can use a strategy to do precalculations based on the data. The system saves this data as part of the Operational Data Model.

Customer ODM
In this example, the ODM table also holds the precalculated eligibilities along with customer records which can be used while presenting next best actions.

Ingestion of real-time data sets[edit]

In the real-time data ingestion use case, the client systems publish real-time customer, purchase, and device data to Kafka topics as soon as the data changes in the source systems.

The following diagram shows the topics and tables that are used in the ingestion of real-time data sets:

A workflow starts with real-time data streams that are connected to Customer Decision Hub, which uses a Cassandra database.

Entity streams

  • The client system publishes data changes to real-time topics.
  • Important: The client system publishes the full record of the changes that occurred in the customer, purchase, and device entities.

Customer Decision Hub

  • The Realtime Process consists of one real-time data flow per Kafka topic to capture the data posted to the topic.
  • This article describes three real-time data flows that capture data from the Customer_RT, Purchase_RT, and Device_RT topics.
  • Every real-time data flow saves data to the corresponding table in the Decision Data Store.
  • Finally, the data flows post the customer IDs to an internal Pega Platform queue to produce the Operational Data Model.

Ingestion process for real-time data published to the Customer topic[edit]

The following figure shows how the system ingests real-time data from the Customer_RT topic into the DDS:

A data flow to ingest customer stream real-time changes

Data sets

  • Customer Stream Realtime changes data set
    A Kafka data set. Reads the data changes published in real time to the Customer_RT topic.
  • Customer DSS RT data set
    A DDS data set. Saves the real-time changes to the customer data to a DDS table.
  • ODM Queue RT data set
    The system posts the customer IDs to the internal stream data set which produces the Operational Data Model.

Tables in the DDS

An out-of-the-box Pega Platform capability creates tables in the DDS. The table structure is similar to that of the daily table, but it only contains the customer data that changes in real time.

Ingestion process for real-time data published to the Purchase topic[edit]

The following figure shows how the system ingests real-time data from the Purchase_RT topic into the DDS:

A data flow to ingest purchase stream real-time changes.

Data sets

  • Purchase Stream Realtime changes data set
    A Kafka data set. Reads the purchase data changes published in real time to the Purchase_RT topic.
  • Purchase DSS RT data set
    A DDS data set. Saves the real-time changes to the purchase data to a DDS table.
  • ODM Queue RT data set
    Using a convert shape, the system posts the customer IDs to the internal stream data set which produces the Operational Data Model.

Tables in the DDS

An out-of-the-box Pega Platform capability creates tables in the DDS. The table structure is similar to that of the daily table, but it only contains the purchase data that changes in real time.

Ingestion process for real-time data published to the Device topic[edit]

The following figure shows how the system ingests real-time data from the Device_RT topic into the DDS:

A data flow to ingest device stream real-time changes.

Data sets

  • Device Stream Realtime changes data set
    A Kafka data set. Reads the device data changes published in real time to the Device_RT topic.
  • Device DSS RT data set
    A DDS data set. Saves the real-time changes to the device data to a DDS table.
  • ODM Queue RT data set
    Using a convert shape, the system posts the customer IDs the internal stream data set which produces the Operational Data Model.

Tables in the DDS

An out-of-the-box Pega Platform capability creates tables in the DDS. The table structure is similar to that of the daily table, but it only contains the device data that changes in real time.

ODM process for real-time data[edit]

The ODMProcess_RT real-time data flow consolidates the data from the Customer and xCAR tables in the DDS, and then produces or updates the Operational Data Model in the DDS.

The ODMProcess real-time data flow.

Components

  • ODM Queue RT data set
    A stream data set. Reads the customer IDs when the are posted, and then executes the logic in this data flow to produce the Operational Data Model.
  • Merge shape
    Reads the real-time customer data changes from the DDS and sets the customer context.
  • DDS data sets
    Data sets that read the corresponding real-time entity data from the DDS.
  • Customer ODM data set
    An ODM table in the DDS that the system uses to present next best actions. By using the customer IDs, the system loads the corresponding ODM records.
  • Merge ODM Data data transform
    Merges the data into the ODM. If the event is related to purchases, the merge process adds/updates the records in the Purchases page list.
  • Data Pre Calculations decision strategy
    If applicable, you can run a strategy to precalculate the data with the incoming real-time data changes.
  • ODM data set
    In this example, the ODM table is updated based on the event type. For example, if the event is related to devices, the process only updates the Devices column in the ODM table as shown in the following figure:

Customer ODM data set destination configuration with the option to save to the Devices field selected.

Tables in the DDS

A DDS table with purchase data.

Summary[edit]

The system is configured with two sets of data flow rules to support the ingestion of daily data and real-time data.