Synthetic Banking Dataset
A realistic, full-stack retail banking dataset covering 221,000 customers, 151 million transactions, and 5 product lines across a 24-month window (Jan 2024 – Dec 2025).
Customer Segments
Three distinct segments with fundamentally different transaction patterns, volumes, and product penetration.
Private
200,000
Customers (90.5%)
120.9M
Transactions
605
Avg txn / customer
€190
Avg transaction
€23.0B
Total volume
SME
20,000
Customers (9.0%)
22.2M
Transactions
1,111
Avg txn / customer
€818
Avg transaction
€18.2B
Total volume
Corporate
1,000
Customers (0.5%)
7.4M
Transactions
7,362
Avg txn / customer
€3,916
Avg transaction
€28.8B
Total volume
Transaction Overview
Transaction Count by Type
Monthly Transaction Volume (Jan 2024 – Dec 2025)
Jan 2024 — 4.3M txn / €2.3BJun 2024Dec 2024 — 6.8M / €3.7BJun 2025Dec 2025 — 8.0M / €4.3B
Temporal Patterns
Transactions exhibit realistic time-of-day and day-of-week patterns that mirror actual banking behaviour.
Hourly Distribution (hover for counts)
0246810121416182022
Demographics & Geography
Age Distribution (Private customers)
Regions
Tallinn (TLN)92,526 (41.9%)
Rural (RUR)46,737 (21.1%)
Tartu (TRT)31,024 (14.0%)
Language Preference
Estonian (ET)135,071 (61.1%)
Russian (RU)52,602 (23.8%)
English (EN)33,327 (15.1%)
Household Structure (Private)
Primary (head)49,906 (25.0%)
Channels & Currency
Transaction Channels
Online banking42.1M (27.9%)
Money Flows
Credits vs Debits
€16.7B
Credits (money in)
14.4M transactions
/
€55.0B
Debits (money out)
136.7M transactions
Amount Distribution (EUR)
| Percentile | Amount |
| 1st | €1.00 |
| 5th | €3.40 |
| 25th | €13.57 |
| Median | €33.89 |
| 75th | €272.64 |
| 90th | €1,200.61 |
| 95th | €2,573.62 |
| 99th | €6,645.19 |
| Mean | €474.44 |
Average Transaction Amount by Type (EUR)
Merchant Geography
Card Purchases by Merchant Country
Product Portfolio
Five product lines with monthly snapshots tracking lifecycle state, balances, and contributions.
Product Penetration (% of all customers)
Property insurance73,956 (33.5%)
Life insurance45,042 (20.4%)
Leasing / mortgage21,781 (9.9%)
Monthly Snapshot Rows
Property insurance1,071,805
Margin liabilities714,413
Data Dictionary
Column-level reference for all 8 tables. Values in tags are the complete set of allowed values for that field.
customers — 221,000 rows
| Column | Type | Description |
| client_id | string | Unique customer identifier, e.g. CL0000001. Join key to all other tables via client_id_fk. |
| segment | string | Private SME Corporate |
| type_cd | string | H natural person C legal entity (company) |
| gender_cd | string | M male F female U unknown/undisclosed X data anomaly |
| lang_pref | string | ET Estonian EN English RU Russian |
| birth_or_reg_dt | date | Date of birth (Private) or company registration date (SME/Corporate). |
| bank_reg_dt | date | Date the customer registered with the bank. Transactions only exist on or after this date. |
| postal_code | string | 5-digit postal code. |
| region_cd | string | TLN Tallinn TRT Tartu NRV Narva PRN Pärnu RUR Rural OTH Other |
| household_id | string | Household group. Prefix S = single, HH = multi-person household. |
| household_role | string | primary head partner spouse dependent child single lives alone |
| group_head_flg | string | 0 or 1. Whether this company heads a corporate group. |
| corp_group_id | string | Corporate group identifier (SME/Corporate only). |
transactions — 151.1M rows
| Column | Type | Description |
| txn_id | string | Unique transaction ID, e.g. TX00000000001. |
| client_id_fk | string | Account owner. Joins to customers.client_id. |
| acct_iban | string | IBAN of the account being debited or credited. |
| counterparty_iban | string | IBAN of the other party (employer, merchant, recipient). |
| book_date | date | Booking date. |
| book_time | string | Time of day, HH:MM:SS. |
| amount_orig | float | Amount in the original currency. |
| trx_ccy | string | EUR USD SEK NOK GBP |
| amount_eur | float | Amount converted to EUR. |
| channel | string | card POS terminal online internet banking mobile app branch physical atm |
| dc_indicator | string | C credit (money in) D debit (money out) |
| mcc | string | Merchant Category Code (ISO 18245). Present on card purchases; null for transfers/salary. |
| merchant_country | string | ISO 2-letter country code of the merchant. |
| txn_type | string | Transaction classification (see reference below). |
| purpose_code | string | SEPA purpose code (see reference below). |
| period_yy_mm | string | Year-month, e.g. 2401 = January 2024. |
| day_in_month | int | Day of month (1–31). |
| weekday_name | string | Mon Tue Wed Thu Fri Sat Sun |
Transaction Type Reference
| txn_type | Direction | Description |
| salary | C | Monthly salary credit from employer |
| bonus | C | Year-end (December) bonus |
| payroll_run | D | Employer-side debit paired with salary/bonus |
| tax_withholding | D | Income tax from employer to tax authority |
| social_insurance | D | Social insurance contribution from employer |
| support_income | C | Government benefits, pension, or unemployment |
| card_purchase | D | Point-of-sale or online card payment |
| card_refund | C | Refund of a prior card purchase |
| subscription | D | Recurring subscription (streaming, SaaS, etc.) |
| internal_transfer | C/D | Person-to-person or B2B transfer within the bank |
| business_payment | D | Outgoing business payment (suppliers, invoices) |
| business_receipt | C | Incoming payment from another bank client |
| household_transfer | C/D | Rent share between household members |
| allowance_transfer | C/D | Allowance from household head to dependent |
| utility_bill | D | Recurring utility (energy, gas, water, telecom, municipal) |
| savings_sweep | C | Automated savings deposit |
| loan_servicing | D | Monthly leasing or mortgage repayment |
| cash_withdrawal | D | ATM cash withdrawal |
SEPA Purpose Codes
| Code | Meaning |
| SALA | Salary payment |
| BONU | Bonus payment |
| BENE | Government benefit / social transfer |
| SSBE | Social security / social insurance |
| TAXS | Tax payment |
| GDDS | Purchase of goods or services |
| SUBS | Subscription |
| RENT | Rent payment |
| CHAR | Charity or allowance |
| ELEC | Electricity / utility bill |
| SAVG | Savings transfer |
| LOAR | Loan repayment |
| SUPP | Supplier payment |
| CASH | Cash withdrawal |
| GIFT | Gift / personal transfer |
| FEES | Fee payment |
| OTHR | Other |
leasing_agreements — 141,967 rows
| Column | Type | Description |
| client_id_fk | string | Leaseholder. |
| snapshot_date | date | Month-end snapshot date. |
| lease_status | string | offer proposed enactment signed, disbursing valid active temporary terminated paused terminated ended completely terminated fully settled archived closed |
| orig_principal | float | Original loan principal (EUR). |
| outstanding_principal | float | Remaining principal (EUR). |
| carrying_amount | float | Book value on the bank's balance sheet (EUR). |
| lease_ccy | string | EUR USD SEK |
| year / month | string | Partition keys. |
life_insurance — 606,089 rows
| Column | Type | Description |
| client_id_fk | string | Policyholder. |
| snapshot_date | date | Month-end snapshot. |
| policy_status | string | ALG initial REGTUD registered MAKSEOOT premium due JOUS in-force KATK suspended LOPET terminated by holder LOPPKAES expired LOPPSURM death claim ANULL annulled |
| sum_insured | float | Total insured amount (EUR). |
| next_premium | float | Next premium amount (EUR). |
| policy_ccy | string | EUR USD |
| year / month | string | Partition keys. |
property_insurance — 1,071,805 rows
| Column | Type | Description |
| client_id_fk | string | Policyholder. |
| snapshot_date | date | Month-end snapshot. |
| agreement_status | string | -1 cancelled 0 inactive 1 active 2 claim in progress |
| cover_type_cd | string | E endowment (savings-linked) P pure protection |
| premium_eur | float | Monthly premium (EUR). |
| surrender_value_eur | float | Cash surrender value (EUR). |
| year / month | string | Partition keys. |
pension — 2,966,040 rows
| Column | Type | Description |
| client_id_fk | string | Account holder. |
| contrib_date | date | Contribution month-end. |
| fund_code | string | PEN-A conservative PEN-B balanced PEN-C growth PEN-D aggressive PEN-E index |
| unit_qty | float | Cumulative fund units held. |
| contrib_status | string | active regular contributions stopped suspended |
| year / month | string | Partition keys. |
margin_assets — 4,591,506 rows
| Column | Type | Description |
| client_id_fk | string | Account holder. |
| month_end | date | Last day of the month. |
| acct_ccy | string | EUR USD SEK NOK |
| end_balance_eur | float | End-of-month balance (EUR). |
| avg_balance_eur | float | Average daily balance for the month (EUR). |
| avg_interest_base_eur | float | Interest-bearing balance after activity penalties (EUR). |
| year / month | string | Partition keys. |
margin_liabilities — 714,413 rows
| Column | Type | Description |
| client_id_fk | string | Account holder. |
| month_end | date | Last day of the month. |
| contract_ccy | string | EUR USD SEK NOK |
| end_balance_eur | float | Outstanding liability at month-end (EUR, positive number). |
| avg_balance_eur | float | Average liability balance (EUR). |
| avg_interest_base_eur | float | Interest-bearing liability base (EUR). |
| year / month | string | Partition keys. |
Data Quality
Intentional anomalies (~0.5%): A small fraction of rows contain injected data quality issues that mimic real-world problems — gender code flips, temporal inconsistencies, FX amount mismatches, card cloning bursts (rapid foreign purchases from unusual countries), and money-mule transfer chains. These are designed for data quality monitoring and anomaly detection exercises.