What is Private AI?
Definition of private AI, on-premise architecture, BezChmury 11B as the 2026 sweet spot.
A pivot from Villa Mamma SEO to AI compliance for Polish businesses. Six months of training on BezChmury 11B v3, two RTX 5090 GPUs, 41,016 training pairs - and a decision to never send a single invoice to the United States.
BezChmury did not start as an idea for an AI startup. It started with questions that began landing at the table where we usually talked about how many chairs fit into a wedding hall.
For several years I ran SEO for Villa Mamma - a wedding venue in Nadarzyn near Warsaw. It was a good business: local, deeply Polish, with very specific clients. Within the family and client circle of Villa Mamma there were a handful of accountants. Towards the end of 2024 they began asking questions that came up more and more often in the same form: "Listen, do you use this ChatGPT thing? Would you give it our full books?" Or: "Can I paste a JPK_V7M into it? My client's data? My client's data?".
The first ChatGPT-4 tests with Polish invoices delivered three results. First - hallucinations. The model was confident that KSeF had been live since 2024, when in reality mandatory KSeF was rescheduled in stages (large taxpayers from 1 February 2026, smaller entities from 1 April 2026). Second - no knowledge of FA(3), the latest logical structure for the e-invoice that takes effect in 2026. Third, and worst of all: every question that contained a client's NIP was a transfer of personal data to OpenAI servers in the United States.
Schrems II in 2020 invalidated Privacy Shield. The Data Privacy Framework from 2023 partially replaces it, but not all major providers are confirmed as certified - as of 1 May 2026 the status of OpenAI and Anthropic in the DPF still requires per-provider verification. Which means that in the worst-case scenario a Polish accountant was sending client data they had no legal right to expose to anyone into an American data centre.
In Q4 2024 and Q1 2025 I started looking for Polish language models. I looked at PLLuM (project formally concluded 31 December 2024 - model still available, but the roadmap closed), at Trurl (good, but not for our use cases), and finally at the BezChmury 11B family from SpeakLeash. BezChmury 11B looked like something completely different from the rest: open-weights, Polish tokeniser, Polish team, Polish training data. The decision was not even difficult - the question was not "should we build on Bielik" but "how fast can we start".
The first version of BezChmury was supposed to run on Llama 70B. After two months of prototyping we changed the path entirely.
In April 2026 we made the pivot to BezChmury 11B v3 from SpeakLeash. Concrete facts from the official model cards (accessed 1 May 2026):
BezChmury 11B v3 quantised to Q4_K_M is a file of around 6.5 GB. It fits comfortably on a MacBook M2 with 16 GB of memory. It fits on a PC with an RTX 4060 / 4070. Whatever Llama 70B promised in theory, BezChmury 11B delivers in practice - and it does it in Polish.
Deliberately I will not put loss curves, eval curves or benchmark percentages here. The reason is simple - for the BezChmury 11B v3 family I have not found a publicly confirmed set of standard benchmarks that would let me honestly compare my fine-tuned version against the baseline. Rather than make up numbers, I describe the methodology. Full reports are available on request under NDA.
BezChmury 11B is base-trained on general Polish. We add very narrow domain knowledge on top: KSeF, ZUS, VAT, JPK, GDPR, the Polish codes. If we hit the model with aggressive SFT without a warm-up, we risk catastrophic forgetting - the model "forgets" basic Polish to make room for the new knowledge.
Stage 1 is a gentle LoRA pass with parameters: LR 1.0-1.2e-5,
LoRA r=32, short run. The task: teach the model the "feel" of the BezChmury
domain without disturbing the base weights.
Full SFT on 41,016 training pairs. The dataset was built manually and semi-automatically from three sources: official KSeF/ZUS/VAT documentation, our own corpus of accountant questions (Villa Mamma + network), and synthetic augmentations generated by larger models with verification in Polish.
Stage 2 parameters: LR 3.0-3.5e-6, 1 epoch, long sequences
(chunking + multi-chunk + hierarchy + map-reduce + explicit uncertainty).
The canary A/B experiment: two LRs running simultaneously on two GPUs,
7B Mini and 11B v3 side by side - twice as fast as sequential.
A short pass with LR 1.0e-6, replaying general Polish knowledge (90% PL / 10% EN).
The point: when SFT turns out to be too aggressive, this stage restores some of the base
knowledge without breaking the freshly learned skills. Insurance against catastrophic forgetting.
A 2,000-pair pack split into 5 blocks: (1) IKE/IKZE anti-stale - fixes persistent hallucinations on contribution limits; (2) fresh facts top fails - the most common errors from the eval probe; (3) no-EN - hard-blocks English "leak" in Polish responses; (4) long-context PL grounding - full Polish contexts; (5) retention - a safety net for earlier stages.
Parameters: LR 6e-7 (11B) / LR 8e-7 (Mini), 1 epoch.
The reason this stage had to appear at all - earlier Stage 3 DPO attempts caused a regression
(Mini EN-leak: 12 to 21, WORSE). DPO teaches preferences; it does not nail down numbers.
Microfacts require SFT.
Direct Preference Optimization, parameters LR 5e-7, beta 0.1.
For v3 this is an optional and light stage - SpeakLeash has already done DPO-P (114k pairs)
and GRPO (143k pairs) on their side, so we do not duplicate their work. We only add a domain
"polish" of around 1.5-3k BezChmury pairs.
We train on two RTX 5090s. The chip's Tj Max is 90 degrees Celsius, soft throttle
starts around 87-88 degrees Celsius - those are the real numbers, not the 83 degrees of older GPUs.
Asymmetric power: 500W on GPU 0 and 550W on GPU 1
(GPU 0 has historically had worse thermals, peaking at 92 degrees Celsius).
The cheapest thermal upgrade for the whole setup was not an extra card - it was Vornado #1 as an intake fan. Wedged under the desk as a forced cool-air feed it gave us -13 degrees Celsius while raising the power budget by +25%. Plan for summer 2026: a second Vornado plus air conditioning at a Cool 16 degree Celsius setpoint for a target of 575W/575W.
Every change in the PC environment is made live - without rebooting the machine.
The practical reason: 8 x 32 GB DDR5 is a memory training-friendly setup, but pulling the
DIMMs after a reboot can take hours. systemctl reload > restart, every day of the week.
The whole product is built around five principles, which are hard engineering decisions and not just marketing slogans.
AI Act 2024/1689 - Annex III high-risk obligations apply from 2 August 2026. BezChmury in its current form is not classified as high-risk (it is an information assistant, it does not make administrative or judicial decisions on behalf of a person), but we design to the highest Annex III standards anyway - simply with lower operational friction.
Schrems II + DPF 2023 - as I wrote above: the status of OpenAI and Anthropic in the Data Privacy Framework requires per-provider verification. A Polish legal entity processing client data does not have the luxury of waiting - either the data stays in Poland or we fall into a regulatory gap.
Cloud Act 2018 + FISA 702 - a fundamental asymmetry: any United States provider can be compelled by US law to surrender a European customer's data without that customer's knowledge, regardless of European DPF guarantees. This is not a hypothetical risk; it is a structural one.
Polish DPA trend 2024-2026. Decision DKN.5131.3.2025 underlines the obligation of risk analysis before deploying AI. 13 August 2024 - a fine of PLN 1,499,000 against a medical entity. We do not need a crystal ball to forecast that 2026-2027 will see the first Polish DPA decisions concerning accounting firms which routinely send client data to public chatbots. Better to be on the right side of that history.
The roadmap is deliberately conservative. I list only what I can already build today - no "nice to have" without a code commitment behind it.
Four packages go public: KSeF Lite (PLN 1,490), KSeF Private (PLN 4,990), Accounting Private (PLN 9,990), GDPR Pro Bundle (PLN 14,900). Annual Update Packs PLN 990-5,990. Enterprise on-prem from PLN 49,900 (custom scoping under NDA). The Beta programme - 100 seats: 30 days of free KSeF Private trial, 1:1 onboarding with me, influence on Q3 roadmap priorities.
The Bielik-PL-Minitron-7B-v3.0-Instruct variant (-33.4% vs 11B) as the base for the mobile edition. Target: older laptops with 8-12 GB of RAM. Use cases: quick lookup outside the office (a tax inspection in the field, a phone call from the lift, a quick KSeF code check on a phone tethered to the laptop).
The plan assumes an upgrade to BezChmury 11B v3.1 once that release is published
by SpeakLeash (date unknown, we follow bielik-papers).
Annual update pack for current Pro customers: a new SSoT (VAT changes, new error codes),
new weights if the baseline shifts. Roadmap explicitly "NOT committed" -
the decision will be research-driven, not by date.
ISO 27001 - path planned, certification in progress in 2027. Enterprise on-prem multi-tenant - deployments for larger entities (chains of law firms, large accounting firms with 10+ staff). EN locale - English language support in the UI for Polish firms with international clients (now live, you are reading it). All these are directions, not promises.
If you run an accounting firm, a law firm with a focus on GDPR/healthcare, or a mid-sized medical company - I want to talk to you.
The Beta programme is not a "waiting list". It is 100 seats for people who want a real say in the product before it goes public. What you get:
The application takes five minutes.
Fill in the demo form - in the "source"
field I will automatically tag blog-launch, so I know
you came from this post.
Finally, Polish AI compliance that doesn't send your invoices to the United States. Six months of training, eleven billion parameters, two graphics cards, one Vornado fan under the desk, local model. Let's start together.
ŹRÓDŁA
Wszystkie cytaty dosłowne w artykule pochodzą z powyższych oficjalnych źródeł.
Inline odniesienia oznaczone [N] linkują do tej listy.
A short KSeF Private demo (15 min). We will show local execution, control questions, source base and how BezChmury reduces the risk of hallucinations.