Ethical Bias check algorithms

From PegaWiki
Ethical bias check algorithms / This is the approved revision of this page, as well as being the most recent.
Jump to navigation Jump to search

Ethical Bias check algorithms

Description Ethical Bias check algorithms
Version as of 8.4
Application Pega Customer Decision Hub
Capability/Industry Area Testing and Simulation



Ethical Bias check algorithms[edit]

This document describes the algorithms and metrics used in the ethical bias check. This document is aimed at the audience that wants to know exactly how the ethical bias check works.

Use case examples[edit]

The ethical bias check is a simulation type that can be used to test if unwanted bias is introduced in the Next-Best-Action strategy. This document describes the use of the rate ratio and the Gini coefficient in detecting bias.

Before you begin[edit]

Bias fields[edit]

The metric used to detect bias is Rate ratio for categorical or binary fields and is the Gini coefficient for numeric (ordinal) fields.

The following Pega data types are allowed as bias types, these fields can be added as bias fields in the Ethical Bias Policy. Please note that only users with pzBiasPolicyConfiguration privilege added to their access group have permission to configure the bias policy.

Pega field type Bias treated as Bias measure Examples Remarks
Text / TrueFalse / Integer Categorical Rate ratio Gender, Ethnicity Bias will be checked for each categorical value. If there are many categorical values, only the first 20 most frequent values are checked for bias and the remaining values are treated as a rest category. Missing values are ignored. Note: Integer is allowed as a Pega field type but it should only be used if the information contained in the field is not ordinal, otherwise select Numeric treatment.
Decimal / Double / Integer Numeric Gini coefficient Age Missing values are ignored.
Date / Datetime / Identifier / Password / TextEncypted / TimeOfDay Not allowed as bias fields

Rate ratio[edit]

This measure of discrimination is used for binary or categorical fields and is based on a contingency table. It's also called risk ratio but we prefer rate ratio as in our case being selected for an action is not a risk.

The rate ratio is calculated from a 2x2 contingency table.

An example of a contingency table is given below:

female male
selected for action 500 1,000
not selected for action 20,000 18,000

Rate ratio for female = [500  / (500+20,000) ] / [1,000 /(1,000+18,000] = 0.024 / 0.052 = 0.46

Rate ratio for male    = [1000 / (1000+18,000) ] / [500 /(20,000+500] = 0.66 / 0.47 = 2.16

The rate ratio ranges between 0 and infinity.

  • A value of 1 indicates there is no bias at all
  • Values smaller or greater than 1 indicate there is bias

Missing value treatment: cases containing missing values are ignored

References:

https://en.wikipedia.org/wiki/Odds_ratio

https://en.wikipedia.org/wiki/Contingency_table

https://en.wikipedia.org/wiki/Relative_risk

Setting a bias threshold for the rate ratio[edit]

The bias thresholds are used to set an allowed range for the rate ratio. If the rate ratio is outside of this allowed range with a confidence interval of 95%, then this will be signaled as significant bias. For each bias threshold setting, no difference is made between a positive shift (towards > 1) and a negative shift (towards < 1):

Bias threshold Allowed rate ratio range
no bias allowed any bias with a 95% confidence for the rate ratio to be greater than 1 or less than 1 will be detected
very light 0.90 - 1.11
light 0.80 - 1.25
heavy 0.66 - 1.50
very heavy 0.50 - 2.00
all bias allowed no bias detection
Illustration bias detection
How bias is detected with respect to the set threshold and confidence interval




GINI measure[edit]

Illustration of how the Gini coefficient is used
Illustration of how the Gini coefficient is used to measure the difference between two distributions

The GINI measure is used to calculate the bias in an ordinal field, for example age. It expresses how much age can be used to discriminate between the group that receives action X and those that don't receive action X.

Missing value treatment: cases containing missing values are ignored

Range: the GINI coefficient ranges between 0 and 1 (or 0 and 100%)

Note that the GINI coefficient is directly related to the Area Under the ROC Curve (AUC) in the way that AUC = (GINI+1) / 2.

References:

https://en.wikipedia.org/wiki/Gini_coefficient


Setting a bias threshold for the Gini coefficient[edit]

The bias thresholds are used to set an allowed range for the Gini coefficient. If the measured Gini coefficient is outside of this allowed range with a confidence interval of 95%, then this will be signaled as significant bias.

Bias threshold Allowed Gini coefficient range
no bias allowed any bias with a 95% confidence for the Gini coefficient to be greater than 0 will be detected
very light < 0.10
light < 0.20
heavy < 0.50
very heavy < 0.70
all bias allowed no bias detection

Methods for determining the confidence intervals[edit]

For the rate ratio, the confidence interval is calculated using the following approximation on the error on log(RR):

(for the 95% confidence interval z= 1.96).

The confidence interval for the Gini coefficient is calculated using Delong's method. This function is also used by well-known packages such as pROC. The error on the Gini coefficient is 2x the error on the AUC.

Dynamic System Setting for the confidence level used to determine the interval width[edit]

In Pega Platform, the confidence level for detecting bias above threshold can be set through a Dynamic System Setting (DSS). The default value is 0.9999 (or 99.99%). Note that lowering this level increases the probability of false alerts on bias detected.

Pega-DecisionEngine decision/simulation/ethicalbias/confidenceinterval 0.9999

Ethical bias report[edit]

For each category: 4 contingency numbers, 2 rates, 1 rate ratio

For each numeric field: 1 Gini coefficient

The numbers that can be part of the bias report are illustrated by this example:

Bias exceeds threshold Issue Group Action Field Category Rate ratio (Rate bootstrapped) Rate ratio 95% confidence interval Rate ratio allowed range Receiving action, #category Receiving action, #rest Receiving action, category rate Not receiving action, #category Not receiving action, #rest Not receiving action, category rate #category total Gini coefficient (Gini coefficient bootstrapped) Gini 95% confidence interval Gini coefficient threshold Receiving action, average value Not receiving action, average value
Yes Sales Phones iPhoneX Gender Female 0.38 0.37 0.37-0.39 0.80-1.25 1179 6662 15% 9592 15128 38% 10,771 (33%)
Yes Male 2.60 2.55 2.53-2.57 0.80-1.25 6662 1179 38% 15128 9592 15% 21,790 (66%)
(Missing) 0
No Ethnicity Ethnicity-White
No Ethnicity-Asian
No Ethnicity-Other
(Missing)
Yes Age 45% 44.5 44.4-44.6 40% 38.1 39.2
(Missing) 1234
Yes Risk Loan Personal Loan Gender Female 0.80-1.25
Yes Male 0.80-1.25
(Missing)
No Ethnicity Ethnicity-White
No Ethnicity-Asian
No Ethnicity-Other
Yes Age