GDPR and AI in 2026
Schrems II, AI Act, DPIA template. For compliance officers and DPOs.
A practical guide to local AI for professions bound by professional secrecy. GDPR (RODO is the Polish term), Schrems II, the EU-US Data Privacy Framework, on-premise architecture, BezChmury 11B v3, and the real trade-offs of running a model on your own laptop.
Private AI (also called on-premise AI, local AI, or AI without the cloud) is a language model that runs locally on the firm's or user's hardware, without sending questions and documents to external servers. All these terms describe the same architecture: the model, the tokeniser, and the knowledge base sit inside the customer's device.
The contrast is simple. Cloud AI - ChatGPT (OpenAI), Claude (Anthropic), Microsoft Copilot or Google Gemini - runs on the provider's servers. Every question, together with its context (a copied excerpt of a contract, an invoice, a client's data), travels as an HTTPS request to a data centre, most often located in the United States. The answer comes back the same way. In an on-prem model the entire dialogue takes place on your laptop or on the firm's server.
A simple example: Anna, an accountant. Anna asks: "What does KSeF (Poland's National e-Invoice System) error code 440 mean and how do I fix it?" In cloud AI, this question - together with the context (invoice number, client tax ID, fragment of the invoice XML if she pasted it) - is sent to the United States, processed by the model, and returned as an answer. In an on-prem model the same question never leaves the laptop - all 16,000 context tokens plus the model's answer remain in the RAM of her computer.
This is not just a technical debate. It is a decision about where, physically, your client's data sits at the moment you ask the model for help.
The adoption trend is real. According to the Wolters Kluwer report "AI in accounting and HR", based on a survey of 581 specialists in October-November 2025, more than 80% of respondents already use AI tools in their daily work. The survey covers accounting departments, HR, accounting firms, and tax advisory practices in Poland. The question is no longer "should we use AI" but "which AI does not send my client's data outside Poland".
In the legal sector the proportions are even higher. The Future Ready Lawyer 2026 report by Wolters Kluwer, covering 810 lawyers from the United States, China, and nine European countries (including Poland), reports that 92% of lawyers use at least one AI tool, and 62% save 6-20% of their weekly working time thanks to AI. Country-level data for Poland alone is not broken out in the public summary, but the inclusion of Polish respondents in the sample is confirmed by the official communication. The Polish market has answered with products: LEX Expert AI (Wolters Kluwer), Libra by Wolters Kluwer, Beck-Noxtua (C.H. Beck/Legalis). Each of these tools, however, runs in cloud mode - which leads us straight to the next section.
One terminological nuance is worth flagging. "Private AI" in our usage is not just "an application on my own server". It is a defensive chain of locality: no telemetry by default, no call-home by default, no vendor support-channel access to production data without a separate agreement. A cloud provider may offer "EU data centres" yet still retain administrative access to logs, mandatory push updates, and management servers located outside the EU. On-prem in the compliance sense is an architecture in which vendor access to customer data is technically blocked by default and any support access requires a separately governed path.
The SaaS architecture works in many use cases. In professions bound by professional secrecy - accountants, tax advisors, attorneys-at-law (radca prawny), advocates (adwokat), doctors - it introduces, however, four concrete risks that are difficult to address through a contract alone.
Regulation (EU) 2016/679 (GDPR; Polish term: RODO) requires the data controller, in Article 32, to apply "appropriate technical and organisational measures", including in transfers to third countries. UODO (the Polish Data Protection Authority) has consistently shown in its 2024-2026 communications that what the regulator asks for is not buzzwords but a documented risk analysis. In decision DKN.5131.3.2025 the authority required the controller to demonstrate whether it had carried out the risk analysis necessary to assess whether the incident had resulted in a breach of the rights or freedoms of natural persons (orzeczenia.uodo.gov.pl).
The Schrems II judgment in case C-311/18 was handed down on 16 July 2020. The Court of Justice of the European Union upheld Standard Contractual Clauses (SCCs) in principle, but invalidated Privacy Shield and confirmed the "essentially equivalent" protection test for transfers outside the European Economic Area. The practical conclusion: a contract is not enough if the law of the third country allows public authorities excessive access to the data (curia.europa.eu).
"The protection afforded by that mechanism must, in practice, be actionable."
The EU-US Data Privacy Framework entered into force at the level of EU implementing decision 2023/1795 on 10 July 2023. The European Commission found that the United States ensures an adequate level of protection - but only for organisations formally certified under the DPF (eur-lex.europa.eu).
The public DPF list confirms the presence of Google LLC (/participant/5780) and Microsoft Corporation (/participant/6474). However - based on research as of 1 May 2026 - we were not able to confirm official DPF entries for OpenAI or Anthropic. This does not prove the absence of any transfer basis whatsoever, but it means that you should not automatically attribute "DPF-certified" status to them without a fresh, separate check.
The CLOUD Act (2018) and FISA section 702 set out mechanisms by which US national security authorities may obtain data from entities subject to US jurisdiction. From a GDPR perspective the key point is this: the DPF and SCCs regulate the transfer, but they do not switch off US law applicable to the American provider. This is the main argument for on-premise for customers who want to minimise transfer risk and the risk of access by third-country authorities as much as possible.
Two smaller but real risks remain. First, the audit trail in cloud AI is limited. The customer typically does not have full control over query logs, nor over which model versions were used to produce which answer. For an UODO auditor, an inspection by KIRP (the Polish Bar Council of Legal Advisors) or NRA (the Polish Bar Association), or an internal compliance procedure, that means no deterministic audit trail. Second - vendor lock-in. Pricing changes unilaterally, the terms of service change unilaterally, and the model you worked with yesterday may behave differently today after a quiet update on the provider's side.
UODO has signalled these risks in its 2024-2026 decisions. The most prominent publicly confirmed case: a fine of nearly PLN 1.5 million against a medical company on 13 August 2024, after a hacker attack in which "unauthorised persons gained access to the data of patients and employees of the company" (uodo.gov.pl). This is not a case against AI - it is a case against insufficient technical and organisational measures, which apply to any pipeline processing sensitive data.
The operational conclusion. The "EU data centre but US-headquartered provider" model is operationally safer than vanilla SaaS, but it does not deliver as hard a position as a true on-prem deployment based on the customer's own infrastructure. This conclusion follows from the logic of Schrems II and the US legal regime - it is not a literal quotation from the regulation but a compliance interpretation that you will encounter in the communications of UODO, CNIL, and the EDPB.
Local AI is not a magic box. It is three layers of software installed on the customer's hardware: a language model, a tokeniser, and a retrieval layer over the knowledge base (RAG), wrapped in a desktop application.
The local model can be run in a few different ways, depending on hardware and scale. The public
repository BezChmury 11B v3 GGUF lists the quantisations
Q4_K_M, Q5_K_M, Q6_K and Q8_0
(huggingface.co).
A full BezChmury-class application has four layers, installed as a single installer (DMG on macOS,
EXE on Windows). First the model, BezChmury 11B (local inference, Q4_K_M quantisation
for a laptop). Then the APT4 tokeniser, optimised for Polish. The third element
is the RAG layer - a local SSoT (Source of Truth) knowledge base; in the case of
KSeF Private it contains 630 verified facts. The fourth element is the Electron UI,
a chat window that behaves like ChatGPT but communicates only with the local backend on
127.0.0.1.
Hard "tokens/s" benchmarks for BezChmury 11B on specific configurations of M2 Pro / M3 Pro / RTX 4060 / RTX 5090 have not yet been published in a single official document. The realistic range on consumer hardware with the 11B Q4 model is 30-60 tokens/s during generation - enough for Q&A scenarios where one answer is 200-400 tokens.
A practical heuristic: llama.cpp and Ollama win on simplicity for local desktops and laptops; vLLM wins only when you move into GPU server territory, FP8, and higher throughput. This is not absolute truth - it is a decision shortcut for someone choosing a stack for an accounting firm for the first time. A Linux office server with five concurrent users will, in 90% of cases, handle a single GPU of the RTX 4090 or RTX 5080 class with Q5_K_M.
There is one more layer that descriptions of "local AI" often miss: RAG as a separate component. RAG (Retrieval-Augmented Generation) is the mechanism in which, before generating an answer, the model searches a local knowledge base (for example BezChmury's 630 SSoT facts, the Ministry of Finance documentation, or the firm's internal policies) and receives concrete passages from those documents as the context for its answer. The practical effect: the model does not "guess" answers from its parameters but cites concrete, verified excerpts. This is the key to a deterministic source citation - without RAG every AI answer is, in fact, a hallucination (more or less accurate).
Local AI has four real trade-offs that there is no point hiding. You buy privacy and control at the cost of certain limitations.
BezChmury 11B has 11 billion parameters. It is not GPT-4 or Claude Opus, whose estimates place them at 1-2 trillion parameters. A smaller model means lower scores on general benchmarks like MMLU. On the other hand, in specialist tasks (KSeF Q&A, FA(3) validation, KSeF error code 440 diagnosis), the difference between 11B and 1T disappears, because what decides the outcome is the quality of the knowledge base (RAG), not the raw model size. This is a conscious trade-off: you give up a slice of SOTA in exchange for privacy and Polish-language quality.
Installing local AI in 2024 required Python, CUDA, and command-line skills. In 2026 it is much simpler - Ollama and LM Studio ship with one-click installers. Applications like BezChmury go even further: a single DMG (Mac) or EXE (Windows) file, one click, done. In practice an accounting firm still needs IT support for the first deployment (firewall, permissions, licence distribution), but this is an hour of work rather than a week.
Cloud AI updates itself - overnight, without your knowledge. That is convenient, but it also means that the model you worked with yesterday may behave differently today. The local model is the opposite. You update it manually, when SpeakLeash releases BezChmury 11B v3.1 or v3.2. In the BezChmury model, updates are bundled into an annual Update Pack - a one-off purchase of the application plus an optional yearly package of updates to the SSoT knowledge base and the model itself.
Every LLM has a training data cutoff. After that date the model does not know about events or legal changes "from memory". The solution is not continuous retraining - it is too expensive. Instead, we use RAG: the local SSoT fact base is updated without the need to retrain the model itself. When KSeF changed on 1 February 2026 (FA(3) becoming the mandatory schema), all that was required was to add a few dozen facts to the SSoT - BezChmury 11B did not have to be reworked.
"Do wszystkich faktur ustrukturyzowanych wystawianych od 1 lutego 2026 r.
stosuje się strukturę logiczną FA(3)."
Working translation: "All structured invoices issued from 1 February 2026 use the FA(3)
logical schema."
The Polish LLM ecosystem has several projects. For an on-prem deployment in an accounting firm or a law firm, BezChmury 11B v3 strikes the best balance between model size, Polish-language quality, and licensing.
"Bielik-PL-11B-v3.0-Instruct is a generative text model featuring 11 billion parameters."
"...after replacing its tokenizer to the APT4 tokenizer optimized specifically for the Polish language."
"...reduce the model's parameter count by 33.4% (from 11.04B to 7.35B)."
We deliberately do not publish concrete benchmark numbers in this article - for example MT-Bench PL, MMLU PL, or Open LLM Leaderboard PL. The reason is simple: as of 1 May 2026 there is no single official document that consolidates comparable scores for BezChmury 11B vs PLLuM vs Trurl 2 across all these benchmarks. Industry mentions suggest that BezChmury 11B v3 sits in the top tier of Polish models (My Company Polska), but before publishing a hard comparison table you would need to run your own tests with a documented methodology.
For the same reason we do not use the name "Krakowiak" as a model - research as of 1 May 2026 was unable to confirm such a project from a public, credible model card or repository. Until an official source is found, we treat it as an unverified name.
Nor do we publish a specific "+30% efficiency for Polish" figure for the APT4 tokeniser. The model cards mention that the tokeniser is optimised for Polish, but the numerical justification of that delta requires a manual reading of the full PDF (huggingface.co/papers/2601.11579), which we have not yet completed in our research.
Anna runs an accounting firm serving 50 clients. She receives, from one of them, an invoice rejected by KSeF (Poland's National e-Invoice System) - error code 440 "Duplicate invoice". The client phones to ask why the invoice did not go through and how to fix it.
Time to a correct answer: 2-3 minutes. Time for the alternative process (looking through the Ministry of Finance brochure, asking a colleague, calling the help line): 30-45 minutes. These figures are illustrative - a more rigorous market-wide benchmark would require a measured pricing of manual KSeF processing, for which the industry description of "an accounting firm with 50 NIPs" mentions 3-4 hours per day without automation (drukarkaksef.pl).
The key in this scenario is not just the speed. The key is the source citation. Anna can hand the client not only the fix but also the basis for it: "Ministry of Finance manual, Part II, the chapter on duplicates". The client knows where the answer comes from. If, a year later, KIRP or UODO ask why the firm advised this particular fix, Anna has the citation in her archive. A chatbot without an audit trail cannot give you that.
The second critical element of the scene: none of the client's data left Anna's device. The invoice number, the client's tax ID, the XML fragment - all of it stayed in the RAM of her laptop, was processed by the local model, and was written to the local audit log. Had Anna used ChatGPT, the same dialogue would have generated an HTTPS request to OpenAI's servers in the United States, with session metadata, an account token, and the full query context. From the perspective of GDPR Article 32 that is a material step - or rather its absence.
On numbers: we do not publish "an average saving of 40 hours per month" as a hard product promise. The figures floating around in industry mentions (3-4 hours per day on manual KSeF logistics for a firm with 50 NIPs, falling to 30-45 minutes with the right tool) come from a single vendor's marketing copy, not from a representative market study. In our hard communication BezChmury stays cautious - we show the mechanism, not magic percentages.
A full description of this scenario is available in the Anna case study.
FAQ
speakleash/Bielik-PL-11B-v3.0-Instruct). Creators: SpeakLeash and ACK Cyfronet AGH. BezChmury as a product is a desktop application (DMG/EXE) built on top of BezChmury 11B - the model engine is free, the application is sold as a one-off. See BezChmury pricing.speakleash/Bielik-PL-11B-v3.0-Instruct. It includes weights, configuration, and the tokeniser. Training data is partly open (SpeakLeash GitHub). The BezChmury application itself is closed-source (a code audit is available for Enterprise customers). More: GDPR and AI on-premise.For more legal context see the article GDPR and AI on-premise - full compliance guide.
If you run an accounting firm, a law firm, or a compliance department, the question is no longer "should we use AI" but "which AI does not send my client's data outside Poland". BezChmury is our answer. The Polish BezChmury 11B v3 model, a local SSoT knowledge base, deterministic citations, full offline operation, a one-off purchase. Your client's data stays where it should - on your hardware.
LISTA BETA · ZNIŻKA 30% PRZED LAUNCH
Dołącz do listy beta – ekskluzywny krąg wczesnych testerów BezChmury. Co 2 tygodnie wysyłam dziennik dewelopera: co buduję, co łamie, co decyduję.
SŁOWNIK POJĘĆ
ŹRÓDŁA
Wszystkie cytaty dosłowne w artykule pochodzą z powyższych oficjalnych źródeł.
Inline odniesienia oznaczone [N] linkują do tej listy.
A short KSeF Private demo (15 min). We will show local execution, control questions, source base and how BezChmury reduces the risk of hallucinations.