Synthetic Banking Dataset

A realistic, full-stack retail banking dataset covering 221,000 customers, 151 million transactions, and 5 product lines across a 24-month window (Jan 2024 – Dec 2025).

221K
Customers
151.1M
Transactions
€71.7B
Total Volume
293.7K
Accounts
1.71M
Network Edges
24
Months
8
Tables
161.4M
Total Rows

Customer Segments

Three distinct segments with fundamentally different transaction patterns, volumes, and product penetration.

Private

200,000
Customers (90.5%)
120.9M
Transactions
605
Avg txn / customer
€190
Avg transaction
€23.0B
Total volume

SME

20,000
Customers (9.0%)
22.2M
Transactions
1,111
Avg txn / customer
€818
Avg transaction
€18.2B
Total volume

Corporate

1,000
Customers (0.5%)
7.4M
Transactions
7,362
Avg txn / customer
€3,916
Avg transaction
€28.8B
Total volume

Transaction Overview

Transaction Count by Type

Card purchase
90.6M
Business payment
21.5M
Utility bill
11.0M
Internal transfer
5.8M
Subscription
4.6M
Business receipt
3.9M
Payroll run
3.3M
Salary
3.0M
Household transfer
2.3M
Card refund
1.3M

Volume by Type (EUR)

Business payment
€38.5B
Payroll run
€7.8B
Salary
€7.1B
Business receipt
€6.9B
Card purchase
€5.5B
Household transfer
€1.5B
Tax withholding
€0.9B
Support income
€0.9B
Internal transfer
€0.6B
Bonus
€0.6B

Monthly Transaction Volume (Jan 2024 – Dec 2025)

Jan 2024 — 4.3M txn / €2.3BJun 2024Dec 2024 — 6.8M / €3.7BJun 2025Dec 2025 — 8.0M / €4.3B

Temporal Patterns

Transactions exhibit realistic time-of-day and day-of-week patterns that mirror actual banking behaviour.

Hourly Distribution (hover for counts)

0246810121416182022

Day of Week

Monday
21.8M
Tuesday
21.9M
Wednesday
21.7M
Thursday
21.3M
Friday
24.2M
Saturday
22.7M
Sunday
17.5M

Demographics & Geography

Age Distribution (Private customers)

18 – 25
23,762 (11.9%)
26 – 35
29,772 (14.9%)
36 – 45
29,666 (14.8%)
46 – 55
29,680 (14.8%)
56 – 65
29,937 (15.0%)
65+
56,779 (28.4%)

Regions

Tallinn (TLN)
92,526 (41.9%)
Rural (RUR)
46,737 (21.1%)
Tartu (TRT)
31,024 (14.0%)
Narva (NRV)
21,922 (9.9%)
Pärnu (PRN)
15,756 (7.1%)
Other (OTH)
13,035 (5.9%)

Language Preference

Estonian (ET)
135,071 (61.1%)
Russian (RU)
52,602 (23.8%)
English (EN)
33,327 (15.1%)

Household Structure (Private)

Single
111,001 (55.5%)
Primary (head)
49,906 (25.0%)
Partner
49,906 (25.0%)
Dependent
10,187 (5.1%)

Channels & Currency

Transaction Channels

Card (POS)
91.9M (60.8%)
Online banking
42.1M (27.9%)
Mobile app
11.0M (7.3%)
Branch
5.6M (3.7%)
ATM
472K (0.3%)

Currency (by EUR volume)

EUR
€69.0B (96.3%)
USD
€2.1B (2.9%)
GBP
€331M (0.5%)
SEK
€171M
NOK
€152M

Money Flows

Credits vs Debits

€16.7B
Credits (money in)
14.4M transactions
/
€55.0B
Debits (money out)
136.7M transactions

Amount Distribution (EUR)

PercentileAmount
1st€1.00
5th€3.40
25th€13.57
Median€33.89
75th€272.64
90th€1,200.61
95th€2,573.62
99th€6,645.19
Mean€474.44

Average Transaction Amount by Type (EUR)

Bonus
€2,368
Salary
€2,343
Tax withholding
€1,987
Business payment
€1,791
Social insurance
€1,070
Support income
€726
Loan servicing
€710
Household transfer
€683
Internal transfer
€99
Subscription
€65
Card purchase
€61
ATM withdrawal
€50
Utility bill
€34

Merchant Geography

Card Purchases by Merchant Country

Estonia
70.1M (72.3%)
Lithuania
5.8M
Finland
5.5M
Latvia
4.4M
Sweden
3.5M
Poland
3.5M
Germany
2.3M
Netherlands
1.9M

Product Portfolio

Five product lines with monthly snapshots tracking lifecycle state, balances, and contributions.

21.8K
Leasing clients
45.0K
Life insurance
74.0K
Property insurance
123.6K
Pension holders
5.3M
Balance snapshots

Product Penetration (% of all customers)

Pension
123,585 (55.9%)
Property insurance
73,956 (33.5%)
Life insurance
45,042 (20.4%)
Leasing / mortgage
21,781 (9.9%)

Monthly Snapshot Rows

Margin assets
4,591,506
Pension
2,966,040
Property insurance
1,071,805
Margin liabilities
714,413
Life insurance
606,089
Leasing
141,967

Data Dictionary

Column-level reference for all 8 tables. Values in tags are the complete set of allowed values for that field.

customers — 221,000 rows

ColumnTypeDescription
client_idstringUnique customer identifier, e.g. CL0000001. Join key to all other tables via client_id_fk.
segmentstringPrivate SME Corporate
type_cdstringH natural person   C legal entity (company)
gender_cdstringM male   F female   U unknown/undisclosed   X data anomaly
lang_prefstringET Estonian   EN English   RU Russian
birth_or_reg_dtdateDate of birth (Private) or company registration date (SME/Corporate).
bank_reg_dtdateDate the customer registered with the bank. Transactions only exist on or after this date.
postal_codestring5-digit postal code.
region_cdstringTLN Tallinn   TRT Tartu   NRV Narva   PRN Pärnu   RUR Rural   OTH Other
household_idstringHousehold group. Prefix S = single, HH = multi-person household.
household_rolestringprimary head   partner spouse   dependent child   single lives alone
group_head_flgstring0 or 1. Whether this company heads a corporate group.
corp_group_idstringCorporate group identifier (SME/Corporate only).

transactions — 151.1M rows

ColumnTypeDescription
txn_idstringUnique transaction ID, e.g. TX00000000001.
client_id_fkstringAccount owner. Joins to customers.client_id.
acct_ibanstringIBAN of the account being debited or credited.
counterparty_ibanstringIBAN of the other party (employer, merchant, recipient).
book_datedateBooking date.
book_timestringTime of day, HH:MM:SS.
amount_origfloatAmount in the original currency.
trx_ccystringEUR USD SEK NOK GBP
amount_eurfloatAmount converted to EUR.
channelstringcard POS terminal   online internet banking   mobile app   branch physical   atm
dc_indicatorstringC credit (money in)   D debit (money out)
mccstringMerchant Category Code (ISO 18245). Present on card purchases; null for transfers/salary.
merchant_countrystringISO 2-letter country code of the merchant.
txn_typestringTransaction classification (see reference below).
purpose_codestringSEPA purpose code (see reference below).
period_yy_mmstringYear-month, e.g. 2401 = January 2024.
day_in_monthintDay of month (1–31).
weekday_namestringMon Tue Wed Thu Fri Sat Sun

Transaction Type Reference

txn_typeDirectionDescription
salaryCMonthly salary credit from employer
bonusCYear-end (December) bonus
payroll_runDEmployer-side debit paired with salary/bonus
tax_withholdingDIncome tax from employer to tax authority
social_insuranceDSocial insurance contribution from employer
support_incomeCGovernment benefits, pension, or unemployment
card_purchaseDPoint-of-sale or online card payment
card_refundCRefund of a prior card purchase
subscriptionDRecurring subscription (streaming, SaaS, etc.)
internal_transferC/DPerson-to-person or B2B transfer within the bank
business_paymentDOutgoing business payment (suppliers, invoices)
business_receiptCIncoming payment from another bank client
household_transferC/DRent share between household members
allowance_transferC/DAllowance from household head to dependent
utility_billDRecurring utility (energy, gas, water, telecom, municipal)
savings_sweepCAutomated savings deposit
loan_servicingDMonthly leasing or mortgage repayment
cash_withdrawalDATM cash withdrawal

SEPA Purpose Codes

CodeMeaning
SALASalary payment
BONUBonus payment
BENEGovernment benefit / social transfer
SSBESocial security / social insurance
TAXSTax payment
GDDSPurchase of goods or services
SUBSSubscription
RENTRent payment
CHARCharity or allowance
ELECElectricity / utility bill
SAVGSavings transfer
LOARLoan repayment
SUPPSupplier payment
CASHCash withdrawal
GIFTGift / personal transfer
FEESFee payment
OTHROther

leasing_agreements — 141,967 rows

ColumnTypeDescription
client_id_fkstringLeaseholder.
snapshot_datedateMonth-end snapshot date.
lease_statusstringoffer proposed   enactment signed, disbursing   valid active   temporary terminated paused   terminated ended   completely terminated fully settled   archived closed
orig_principalfloatOriginal loan principal (EUR).
outstanding_principalfloatRemaining principal (EUR).
carrying_amountfloatBook value on the bank's balance sheet (EUR).
lease_ccystringEUR USD SEK
year / monthstringPartition keys.

life_insurance — 606,089 rows

ColumnTypeDescription
client_id_fkstringPolicyholder.
snapshot_datedateMonth-end snapshot.
policy_statusstringALG initial   REGTUD registered   MAKSEOOT premium due   JOUS in-force   KATK suspended   LOPET terminated by holder   LOPPKAES expired   LOPPSURM death claim   ANULL annulled
sum_insuredfloatTotal insured amount (EUR).
next_premiumfloatNext premium amount (EUR).
policy_ccystringEUR USD
year / monthstringPartition keys.

property_insurance — 1,071,805 rows

ColumnTypeDescription
client_id_fkstringPolicyholder.
snapshot_datedateMonth-end snapshot.
agreement_statusstring-1 cancelled   0 inactive   1 active   2 claim in progress
cover_type_cdstringE endowment (savings-linked)   P pure protection
premium_eurfloatMonthly premium (EUR).
surrender_value_eurfloatCash surrender value (EUR).
year / monthstringPartition keys.

pension — 2,966,040 rows

ColumnTypeDescription
client_id_fkstringAccount holder.
contrib_datedateContribution month-end.
fund_codestringPEN-A conservative   PEN-B balanced   PEN-C growth   PEN-D aggressive   PEN-E index
unit_qtyfloatCumulative fund units held.
contrib_statusstringactive regular contributions   stopped suspended
year / monthstringPartition keys.

margin_assets — 4,591,506 rows

ColumnTypeDescription
client_id_fkstringAccount holder.
month_enddateLast day of the month.
acct_ccystringEUR USD SEK NOK
end_balance_eurfloatEnd-of-month balance (EUR).
avg_balance_eurfloatAverage daily balance for the month (EUR).
avg_interest_base_eurfloatInterest-bearing balance after activity penalties (EUR).
year / monthstringPartition keys.

margin_liabilities — 714,413 rows

ColumnTypeDescription
client_id_fkstringAccount holder.
month_enddateLast day of the month.
contract_ccystringEUR USD SEK NOK
end_balance_eurfloatOutstanding liability at month-end (EUR, positive number).
avg_balance_eurfloatAverage liability balance (EUR).
avg_interest_base_eurfloatInterest-bearing liability base (EUR).
year / monthstringPartition keys.

Data Quality

Intentional anomalies (~0.5%): A small fraction of rows contain injected data quality issues that mimic real-world problems — gender code flips, temporal inconsistencies, FX amount mismatches, card cloning bursts (rapid foreign purchases from unusual countries), and money-mule transfer chains. These are designed for data quality monitoring and anomaly detection exercises.

Download the Dataset

8 Parquet files + Data Dictionary • Private segment (200K customers, 121M transactions) • 2.4 GB compressed

Download synthetic_banking.tar.gz