EDUCATION · AI FUNDAMENTALS

What is Private AI?
Why accountants and lawyers
need on-premise AI

Q: How much does private AI cost for a small firm?

BezChmury KSeF Lite from PLN 1,490 as a one-off purchase (1 seat, 12 months of updates). Higher tiers: KSeF Private PLN 4,990, Accounting Private PLN 9,990, Pro Bundle PLN 14,900, Enterprise from PLN 49,900. No subscription. Full pricing .

Q: Is BezChmury 11B free?

Yes - the model is Apache-2.0 open-source, available on Hugging Face ( speakleash/Bielik-PL-11B-v3.0-Instruct ). Creators: SpeakLeash and ACK Cyfronet AGH. BezChmury as a product is a desktop application (DMG/EXE) built on top of BezChmury 11B - the model engine is free, the application is sold as a one-off. See BezChmury pricing .

Opublikowano: 01.05.2026 Czas czytania: 14 min Autor: Dominik Witanowski

A practical guide to local AI for professions bound by professional secrecy. GDPR (RODO is the Polish term), Schrems II, the EU-US Data Privacy Framework, on-premise architecture, BezChmury 11B v3, and the real trade-offs of running a model on your own laptop.

Author: Dominik Witanowski Published: 1 May 2026 ~14 min read

SECTION 1

What is Private AI?

Private AI (also called on-premise AI, local AI, or AI without the cloud) is a language model that runs locally on the firm's or user's hardware, without sending questions and documents to external servers. All these terms describe the same architecture: the model, the tokeniser, and the knowledge base sit inside the customer's device.

The contrast is simple. Cloud AI - ChatGPT (OpenAI), Claude (Anthropic), Microsoft Copilot or Google Gemini - runs on the provider's servers. Every question, together with its context (a copied excerpt of a contract, an invoice, a client's data), travels as an HTTPS request to a data centre, most often located in the United States. The answer comes back the same way. In an on-prem model the entire dialogue takes place on your laptop or on the firm's server.

A simple example: Anna, an accountant. Anna asks: "What does KSeF (Poland's National e-Invoice System) error code 440 mean and how do I fix it?" In cloud AI, this question - together with the context (invoice number, client tax ID, fragment of the invoice XML if she pasted it) - is sent to the United States, processed by the model, and returned as an answer. In an on-prem model the same question never leaves the laptop - all 16,000 context tokens plus the model's answer remain in the RAM of her computer.

This is not just a technical debate. It is a decision about where, physically, your client's data sits at the moment you ask the model for help.

The adoption trend is real. According to the Wolters Kluwer report "AI in accounting and HR", based on a survey of 581 specialists in October-November 2025, more than 80% of respondents already use AI tools in their daily work. The survey covers accounting departments, HR, accounting firms, and tax advisory practices in Poland. The question is no longer "should we use AI" but "which AI does not send my client's data outside Poland".

In the legal sector the proportions are even higher. The Future Ready Lawyer 2026 report by Wolters Kluwer, covering 810 lawyers from the United States, China, and nine European countries (including Poland), reports that 92% of lawyers use at least one AI tool, and 62% save 6-20% of their weekly working time thanks to AI. Country-level data for Poland alone is not broken out in the public summary, but the inclusion of Polish respondents in the sample is confirmed by the official communication. The Polish market has answered with products: LEX Expert AI (Wolters Kluwer), Libra by Wolters Kluwer, Beck-Noxtua (C.H. Beck/Legalis). Each of these tools, however, runs in cloud mode - which leads us straight to the next section.

One terminological nuance is worth flagging. "Private AI" in our usage is not just "an application on my own server". It is a defensive chain of locality: no telemetry by default, no call-home by default, no vendor support-channel access to production data without a separate agreement. A cloud provider may offer "EU data centres" yet still retain administrative access to logs, mandatory push updates, and management servers located outside the EU. On-prem in the compliance sense is an architecture in which vendor access to customer data is technically blocked by default and any support access requires a separately governed path.

SECTION 2

Why cloud AI doesn't fit accountants and lawyers

The SaaS architecture works in many use cases. In professions bound by professional secrecy - accountants, tax advisors, attorneys-at-law (radca prawny), advocates (adwokat), doctors - it introduces, however, four concrete risks that are difficult to address through a contract alone.

GDPR Article 32 and the duty of "appropriate measures"

Regulation (EU) 2016/679 (GDPR; Polish term: RODO) requires the data controller, in Article 32, to apply "appropriate technical and organisational measures", including in transfers to third countries. UODO (the Polish Data Protection Authority) has consistently shown in its 2024-2026 communications that what the regulator asks for is not buzzwords but a documented risk analysis. In decision DKN.5131.3.2025 the authority required the controller to demonstrate whether it had carried out the risk analysis necessary to assess whether the incident had resulted in a breach of the rights or freedoms of natural persons (orzeczenia.uodo.gov.pl).

Schrems II: SCCs alone are not enough

The Schrems II judgment in case C-311/18 was handed down on 16 July 2020. The Court of Justice of the European Union upheld Standard Contractual Clauses (SCCs) in principle, but invalidated Privacy Shield and confirmed the "essentially equivalent" protection test for transfers outside the European Economic Area. The practical conclusion: a contract is not enough if the law of the third country allows public authorities excessive access to the data (curia.europa.eu).

"The protection afforded by that mechanism must, in practice, be actionable."

Source: case C-311/18, paragraph 184, CURIA.

DPF 2023: who is NOT on the list

The EU-US Data Privacy Framework entered into force at the level of EU implementing decision 2023/1795 on 10 July 2023. The European Commission found that the United States ensures an adequate level of protection - but only for organisations formally certified under the DPF (eur-lex.europa.eu).

The public DPF list confirms the presence of Google LLC (/participant/5780) and Microsoft Corporation (/participant/6474). However - based on research as of 1 May 2026 - we were not able to confirm official DPF entries for OpenAI or Anthropic. This does not prove the absence of any transfer basis whatsoever, but it means that you should not automatically attribute "DPF-certified" status to them without a fresh, separate check.

CLOUD Act and FISA 702: US law still applies

The CLOUD Act (2018) and FISA section 702 set out mechanisms by which US national security authorities may obtain data from entities subject to US jurisdiction. From a GDPR perspective the key point is this: the DPF and SCCs regulate the transfer, but they do not switch off US law applicable to the American provider. This is the main argument for on-premise for customers who want to minimise transfer risk and the risk of access by third-country authorities as much as possible.

Audit trail and vendor lock-in

Two smaller but real risks remain. First, the audit trail in cloud AI is limited. The customer typically does not have full control over query logs, nor over which model versions were used to produce which answer. For an UODO auditor, an inspection by KIRP (the Polish Bar Council of Legal Advisors) or NRA (the Polish Bar Association), or an internal compliance procedure, that means no deterministic audit trail. Second - vendor lock-in. Pricing changes unilaterally, the terms of service change unilaterally, and the model you worked with yesterday may behave differently today after a quiet update on the provider's side.

UODO has signalled these risks in its 2024-2026 decisions. The most prominent publicly confirmed case: a fine of nearly PLN 1.5 million against a medical company on 13 August 2024, after a hacker attack in which "unauthorised persons gained access to the data of patients and employees of the company" (uodo.gov.pl). This is not a case against AI - it is a case against insufficient technical and organisational measures, which apply to any pipeline processing sensitive data.

The operational conclusion. The "EU data centre but US-headquartered provider" model is operationally safer than vanilla SaaS, but it does not deliver as hard a position as a true on-prem deployment based on the customer's own infrastructure. This conclusion follows from the logic of Schrems II and the US legal regime - it is not a literal quotation from the regulation but a compliance interpretation that you will encounter in the communications of UODO, CNIL, and the EDPB.

SECTION 3

On-premise architecture - how it works

Local AI is not a magic box. It is three layers of software installed on the customer's hardware: a language model, a tokeniser, and a retrieval layer over the knowledge base (RAG), wrapped in a desktop application.

The diagram simplifies the real pipeline. Every arrow stays inside the customer's device.

Three deployment patterns

The local model can be run in a few different ways, depending on hardware and scale. The public repository BezChmury 11B v3 GGUF lists the quantisations Q4_K_M, Q5_K_M, Q6_K and Q8_0 (huggingface.co).

GGUF (llama.cpp / Ollama / LM Studio): the simplest desktop path. The BezChmury 11B Q4_K_M file is roughly 6 GB and runs comfortably on laptops with 16-32 GB RAM.
MLX (Apple Silicon): M1 / M2 / M3 variants with unified memory. Public model cards for Q4 MLX show a file of about 5.9 GB and a peak memory footprint of about 6.4 GB; the Q8 MLX variant is roughly an 11 GB file with about 11.9 GB peak memory on Apple Silicon (LibraxisAI/Bielik-PL-11B-v3.0-Instruct-mlx-q4).
FP8 + vLLM: the server path for multi-user deployments. The official FP8-Dynamic card requires vLLM >= 0.5.0 or SGLang and a GPU with compute capability above 8.9 (Ada Lovelace / Hopper architectures) (huggingface.co).

Stack components

A full BezChmury-class application has four layers, installed as a single installer (DMG on macOS, EXE on Windows). First the model, BezChmury 11B (local inference, Q4_K_M quantisation for a laptop). Then the APT4 tokeniser, optimised for Polish. The third element is the RAG layer - a local SSoT (Source of Truth) knowledge base; in the case of KSeF Private it contains 630 verified facts. The fourth element is the Electron UI, a chat window that behaves like ChatGPT but communicates only with the local backend on 127.0.0.1.

Hardware requirements

Accountant's laptop: a MacBook Pro M2 / M3 with 16+ GB unified memory or a PC with an RTX 3060 / 4060 (12 GB VRAM). Enough for one user with the 11B Q4 model.
Office desktop workstation: 32 GB RAM plus a 16-24 GB GPU (RTX 4080 / 5080). Margin for larger contexts and parallel processes.
Multi-user firm server: 64+ GB RAM, a data-centre-class GPU (A100, H100, RTX 5090), and vLLM as the serving layer. Recommended for five or more concurrent users.

Hard "tokens/s" benchmarks for BezChmury 11B on specific configurations of M2 Pro / M3 Pro / RTX 4060 / RTX 5090 have not yet been published in a single official document. The realistic range on consumer hardware with the 11B Q4 model is 30-60 tokens/s during generation - enough for Q&A scenarios where one answer is 200-400 tokens.

A practical heuristic: llama.cpp and Ollama win on simplicity for local desktops and laptops; vLLM wins only when you move into GPU server territory, FP8, and higher throughput. This is not absolute truth - it is a decision shortcut for someone choosing a stack for an accounting firm for the first time. A Linux office server with five concurrent users will, in 90% of cases, handle a single GPU of the RTX 4090 or RTX 5080 class with Q5_K_M.

There is one more layer that descriptions of "local AI" often miss: RAG as a separate component. RAG (Retrieval-Augmented Generation) is the mechanism in which, before generating an answer, the model searches a local knowledge base (for example BezChmury's 630 SSoT facts, the Ministry of Finance documentation, or the firm's internal policies) and receives concrete passages from those documents as the context for its answer. The practical effect: the model does not "guess" answers from its parameters but cites concrete, verified excerpts. This is the key to a deterministic source citation - without RAG every AI answer is, in fact, a hallucination (more or less accurate).

SECTION 4

Trade-offs of the local model

Local AI has four real trade-offs that there is no point hiding. You buy privacy and control at the cost of certain limitations.

Model size vs SOTA

BezChmury 11B has 11 billion parameters. It is not GPT-4 or Claude Opus, whose estimates place them at 1-2 trillion parameters. A smaller model means lower scores on general benchmarks like MMLU. On the other hand, in specialist tasks (KSeF Q&A, FA(3) validation, KSeF error code 440 diagnosis), the difference between 11B and 1T disappears, because what decides the outcome is the quality of the knowledge base (RAG), not the raw model size. This is a conscious trade-off: you give up a slice of SOTA in exchange for privacy and Polish-language quality.

Learning curve

Installing local AI in 2024 required Python, CUDA, and command-line skills. In 2026 it is much simpler - Ollama and LM Studio ship with one-click installers. Applications like BezChmury go even further: a single DMG (Mac) or EXE (Windows) file, one click, done. In practice an accounting firm still needs IT support for the first deployment (firewall, permissions, licence distribution), but this is an hour of work rather than a week.

Update cycle

Cloud AI updates itself - overnight, without your knowledge. That is convenient, but it also means that the model you worked with yesterday may behave differently today. The local model is the opposite. You update it manually, when SpeakLeash releases BezChmury 11B v3.1 or v3.2. In the BezChmury model, updates are bundled into an annual Update Pack - a one-off purchase of the application plus an optional yearly package of updates to the SSoT knowledge base and the model itself.

Training data cutoff

Every LLM has a training data cutoff. After that date the model does not know about events or legal changes "from memory". The solution is not continuous retraining - it is too expensive. Instead, we use RAG: the local SSoT fact base is updated without the need to retrain the model itself. When KSeF changed on 1 February 2026 (FA(3) becoming the mandatory schema), all that was required was to add a few dozen facts to the SSoT - BezChmury 11B did not have to be reworked.

"Do wszystkich faktur ustrukturyzowanych wystawianych od 1 lutego 2026 r. stosuje się strukturę logiczną FA(3)."

Working translation: "All structured invoices issued from 1 February 2026 use the FA(3) logical schema."

Source: Polish Ministry of Finance brochure on FA(3), ksef.podatki.gov.pl.

SECTION 5

BezChmury 11B v3 as the sweet spot

The Polish LLM ecosystem has several projects. For an on-prem deployment in an accounting firm or a law firm, BezChmury 11B v3 strikes the best balance between model size, Polish-language quality, and licensing.

What we can confirm publicly

Apache-2.0 licence - full openness, commercial use permitted (huggingface.co).
11 billion parameters - the model card simply says "11B", not "11.2B".
Base model: Mistral-7B-v0.2, scaled to 11 billion parameters - the variant Bielik-11B-v3-Base-20250730.
Creators: SpeakLeash in cooperation with ACK Cyfronet AGH, on the PLGrid infrastructure (the Athena and Helios supercomputers).
Tokeniser: APT4, optimised for Polish. The model cards explicitly mention the replacement of the previous tokeniser with one optimised specifically for the Polish language.
Documentation repository: bielik-papers on GitHub contains materials for the v1, v2, v3, v3_minitron, and v3_small versions (github.com/speakleash/bielik-papers).

"Bielik-PL-11B-v3.0-Instruct is a generative text model featuring 11 billion parameters."

"...after replacing its tokenizer to the APT4 tokenizer optimized specifically for the Polish language."

Source: Hugging Face model card, speakleash/Bielik-PL-11B-v3.0-Instruct.

The v3 family as of 1 May 2026

BezChmury 11B v3.0 Instruct - full model, instruct-tuned, Q4_K_M file roughly 6 GB.
Bielik-PL-Minitron-7B-v3.0-Instruct - compressed via Minitron (an NVIDIA technique), 7.35 billion parameters (a reduction from 11.04B to 7.35B, i.e. 33.4%) (huggingface.co).

"...reduce the model's parameter count by 33.4% (from 11.04B to 7.35B)."

Source: BezChmury 11B Minitron model card, Hugging Face.

What we do not promise

We deliberately do not publish concrete benchmark numbers in this article - for example MT-Bench PL, MMLU PL, or Open LLM Leaderboard PL. The reason is simple: as of 1 May 2026 there is no single official document that consolidates comparable scores for BezChmury 11B vs PLLuM vs Trurl 2 across all these benchmarks. Industry mentions suggest that BezChmury 11B v3 sits in the top tier of Polish models (My Company Polska), but before publishing a hard comparison table you would need to run your own tests with a documented methodology.

For the same reason we do not use the name "Krakowiak" as a model - research as of 1 May 2026 was unable to confirm such a project from a public, credible model card or repository. Until an official source is found, we treat it as an unverified name.

Nor do we publish a specific "+30% efficiency for Polish" figure for the APT4 tokeniser. The model cards mention that the tokeniser is optimised for Polish, but the numerical justification of that delta requires a manual reading of the full PDF (huggingface.co/papers/2601.11579), which we have not yet completed in our research.

SECTION 6 · CASE

Practical example - Anna analyses an invoice

Illustrative persona. A scenario typical for accounting firms in Poland in 2026, NOT a real measurement of a single client. Times and steps are based on the industry description of "an accounting firm with 50 tax IDs (NIPs)" and the FA(3) rules from the Ministry of Finance brochure.

Anna runs an accounting firm serving 50 clients. She receives, from one of them, an invoice rejected by KSeF (Poland's National e-Invoice System) - error code 440 "Duplicate invoice". The client phones to ask why the invoice did not go through and how to fix it.

Workflow with local AI (BezChmury)

Anna opens BezChmury - the desktop application that runs without an internet connection. The whole chat session stays on her laptop.
She asks: "What does KSeF code 440 mean and how do I fix it?" The query classifier recognises this as an error code lookup and routes it to the relevant SSoT collection.
The local engine (BezChmury 11B + RAG) generates an answer with a deterministic citation: "code 440 is a duplicate invoice detected by KSeF based on the seller's tax ID, the invoice number (P_2), and the invoice type; uniqueness is checked 10 years back". Source citation: KSeF 2.0 Manual, Part II.
Anna receives a concrete fix: "apply an idempotency key on tax ID + P_2 + XML hash and check whether the client's archive contains an invoice with the same number from the previous 10 years".
The audit log is saved locally: question, SSoT source, timestamp, query hash - ready for later replay and inspection.

Time to a correct answer: 2-3 minutes. Time for the alternative process (looking through the Ministry of Finance brochure, asking a colleague, calling the help line): 30-45 minutes. These figures are illustrative - a more rigorous market-wide benchmark would require a measured pricing of manual KSeF processing, for which the industry description of "an accounting firm with 50 NIPs" mentions 3-4 hours per day without automation (drukarkaksef.pl).

The key in this scenario is not just the speed. The key is the source citation. Anna can hand the client not only the fix but also the basis for it: "Ministry of Finance manual, Part II, the chapter on duplicates". The client knows where the answer comes from. If, a year later, KIRP or UODO ask why the firm advised this particular fix, Anna has the citation in her archive. A chatbot without an audit trail cannot give you that.

The second critical element of the scene: none of the client's data left Anna's device. The invoice number, the client's tax ID, the XML fragment - all of it stayed in the RAM of her laptop, was processed by the local model, and was written to the local audit log. Had Anna used ChatGPT, the same dialogue would have generated an HTTPS request to OpenAI's servers in the United States, with session metadata, an account token, and the full query context. From the perspective of GDPR Article 32 that is a material step - or rather its absence.

On numbers: we do not publish "an average saving of 40 hours per month" as a hard product promise. The figures floating around in industry mentions (3-4 hours per day on manual KSeF logistics for a firm with 50 NIPs, falling to 30-45 minutes with the right tool) come from a single vendor's marketing copy, not from a representative market study. In our hard communication BezChmury stays cautious - we show the mechanism, not magic percentages.

A full description of this scenario is available in the Anna case study.

FAQ

Frequently asked questions

What is Private AI?

An AI model running locally on the firm's hardware, without sending data to external servers. Antonym: cloud AI (ChatGPT, Claude). Recommended for accountants and lawyers because of GDPR (RODO is the Polish term) and Schrems II. Sweet spot: BezChmury 11B v3 Apache-2.0. More: On-premise architecture.

How does on-premise AI differ from cloud AI?

On-premise = local model, data stays on the user's computer. Cloud = every prompt is sent to the provider's servers (typically in the United States). Consequence: cloud AI requires a DPIA plus a transfer impact assessment for the US (Schrems II). On-premise eliminates around 80% of these risks. Full context: GDPR and AI in 2026.

Is private AI legal for an accounting firm?

Yes. On-premise private AI does not transfer data to external entities, so no transfer assessment under GDPR is required. A DPIA under Article 35 GDPR is still needed (but shorter than for cloud). Trigger event: a UODO (Polish DPA) fine of PLN 1,499,000 against a medical company on 13 August 2024. See 7-step DPIA template.

How much does private AI cost for a small firm?

BezChmury KSeF Lite from PLN 1,490 as a one-off purchase (1 seat, 12 months of updates). Higher tiers: KSeF Private PLN 4,990, Accounting Private PLN 9,990, Pro Bundle PLN 14,900, Enterprise from PLN 49,900. No subscription. Full pricing.

What hardware does BezChmury 11B require?

Q4_K_M quantisation: roughly a 6.5 GB file, a minimum of 16 GB RAM (MacBook M2/M3 with 16 GB or a PC with an RTX 3060+ and 12 GB VRAM). For multi-user setups: a firm server with 64+ GB RAM and an RTX 5090 or A100. Latency: 30-60 tokens/s on consumer GPUs. More in BezChmury architecture.

Is BezChmury 11B free?

Yes - the model is Apache-2.0 open-source, available on Hugging Face (speakleash/Bielik-PL-11B-v3.0-Instruct). Creators: SpeakLeash and ACK Cyfronet AGH. BezChmury as a product is a desktop application (DMG/EXE) built on top of BezChmury 11B - the model engine is free, the application is sold as a one-off. See BezChmury pricing.

BezChmury 11B vs ChatGPT - which is better?

Different categories. ChatGPT (GPT-4) is a state-of-the-art general-purpose model, cloud-only, with a US transfer. BezChmury 11B is Polish-language-optimised (APT4 tokeniser), open-source under Apache-2.0, and runs locally. For Polish compliance (KSeF, ZUS, GDPR/RODO) BezChmury 11B wins on control and absence of transfer risk. For broad general problem-solving ChatGPT has a larger model. See our cloud vs on-prem analysis.

What is the Apache-2.0 licence?

An open-source licence permitting commercial use, modification, and distribution without restrictions of the kind imposed by Llama (Meta CCA requires a separate licence for monthly active users above 700M). BezChmury 11B on Apache-2.0 means you can fine-tune, deploy, and sell products without the creators' consent. Full text: apache.org. More in the BezChmury 11B sweet spot section.

Does on-premise AI require an internet connection?

After installation - no. The BezChmury application plus BezChmury 11B runs offline. Internet is required only for: (1) updating the SSoT knowledge base (once a quarter), (2) updating BezChmury 11B (when v3.1+ is released). Audit logs and answers stay local. Trust signal: "works offline after installation". See BezChmury KSeF Private architecture.

How long does BezChmury installation take?

Phase 1 (DMG/EXE download): about 5 minutes. Phase 2 (installation plus first model run): 10-15 minutes on a MacBook M2 16 GB. Phase 3 (integration with the accounting system): 1-3 working days with BezChmury support (Pro+ tier). Phase 4 (team training): 2-3 hours. Full plan in Anna's case study.

Can I view Bielik's source code?

Yes. BezChmury 11B v3 is open-source under Apache-2.0 on Hugging Face: speakleash/Bielik-PL-11B-v3.0-Instruct. It includes weights, configuration, and the tokeniser. Training data is partly open (SpeakLeash GitHub). The BezChmury application itself is closed-source (a code audit is available for Enterprise customers). More: GDPR and AI on-premise.

Will there be a BezChmury Mobile?

Q3 2026 plan: BezChmury Mobile based on Bielik-PL-Minitron-7B-v3.0 (7.35B parameters, a 33.4% reduction vs the 11B version). Target: older laptops with 8-12 GB RAM, quick lookup outside the office. Stage: research, no committed release date. Today: BezChmury desktop runs on a 16 GB RAM minimum. See current packages.

For more legal context see the article GDPR and AI on-premise - full compliance guide.

SUMMARY

Local AI is not hype. It is a compliance architecture.

If you run an accounting firm, a law firm, or a compliance department, the question is no longer "should we use AI" but "which AI does not send my client's data outside Poland". BezChmury is our answer. The Polish BezChmury 11B v3 model, a local SSoT knowledge base, deterministic citations, full offline operation, a one-off purchase. Your client's data stays where it should - on your hardware.

Book a demo (15 min) See KSeF Private → Check pricing →

Dominik Witanowski

Building BezChmury since 2024. 10 years in IT, ex-SEO Villa Mamma, author of the KSeF Private pipeline with 147/150 PASS on an internal probe of 150 test cases (BezChmury vendor metric on internal probe of 150 test cases, NOT an official MF benchmark).

SŁOWNIK POJĘĆ

Słownik pojęć użytych w tym artykule

On-premise AI: Artificial intelligence running locally on the firm's or user's own hardware, without sending data to external servers. Antonym: cloud AI.
BezChmury 11B v3: Polish open-source large language model, 11 billion parameters, base Mistral-7B-v0.2 scaled, created by SpeakLeash and ACK Cyfronet AGH. Apache-2.0 licence.
Apache-2.0: Open-source licence permitting commercial use, modification, and distribution. Without the restrictions of Llama (Meta CCA).
APT4 tokeniser: Bielik's tokeniser, optimised specifically for the Polish language. Better support for Polish diacritics and inflection.
GGUF: Model file format for llama.cpp / Ollama / LM Studio. 4-8 GB file for BezChmury 11B Q4_K_M, 16-32 GB RAM required.
MLX: Apple Silicon native framework for LLMs. Uses unified memory of M1/M2/M3, faster inference than GGUF on MacBooks.
FP8: 8-bit floating-point quantisation for high-end serving (vLLM, GPU > 8.9 compute capability). Smaller than FP16, larger than Int4.
Q4_K_M: 4-bit quantisation scheme with mixed precision per layer. Most popular option for consumer hardware (smaller files, acceptable quality loss).
RAG: Retrieval-Augmented Generation - AI model with access to a local knowledge base (e.g. BezChmury SSoT with 630 records). Each answer = generated + retrieved.
GDPR (RODO): General Data Protection Regulation (Regulation 2016/679). RODO is the Polish term. EU-wide personal data protection rules.

ŹRÓDŁA

Oficjalne źródła i odniesienia

[1]
Bielik-PL-11B-v3.0-Instruct (model card) - Hugging Face https://huggingface.co/speakleash/Bielik-PL-11B-v3.0-Instruct · dostęp: 2026-05-01
[2]
Regulation (EU) 2024/1689 (AI Act) - EUR-Lex https://eur-lex.europa.eu/eli/reg/2024/1689/oj · dostęp: 2026-05-01
[3]
Regulation (EU) 2016/679 (GDPR / RODO) - EUR-Lex https://eur-lex.europa.eu/eli/reg/2016/679/oj · dostęp: 2026-05-01
[4]
bielik-papers (GitHub repo) - SpeakLeash https://github.com/speakleash/bielik-papers · dostęp: 2026-05-01
[5]
Apache License 2.0 - Apache Software Foundation https://www.apache.org/licenses/LICENSE-2.0 · dostęp: 2026-05-01
[6]
AI usage in accounting/HR (Wolters Kluwer survey, 581 respondents, October-November 2025) - Wolters Kluwer Polska https://www.wolterskluwer.com/pl-pl · dostęp: 2026-05-01

Wszystkie cytaty dosłowne w artykule pochodzą z powyższych oficjalnych źródeł. Inline odniesienia oznaczone [N] linkują do tej listy.

Want to see private AI
for your business?

A short KSeF Private demo (15 min). We will show local execution, control questions, source base and how BezChmury reduces the risk of hallucinations.

Book a demo (15 min, free) See KSeF Private →

What is Private AI? Why accountants and lawyersneed on-premise AI

What is Private AI?

Why cloud AI doesn't fit accountants and lawyers

GDPR Article 32 and the duty of "appropriate measures"

Schrems II: SCCs alone are not enough

DPF 2023: who is NOT on the list

CLOUD Act and FISA 702: US law still applies

Audit trail and vendor lock-in

On-premise architecture - how it works

Three deployment patterns

Stack components

Hardware requirements

Trade-offs of the local model

Model size vs SOTA

Learning curve

Update cycle

Training data cutoff

BezChmury 11B v3 as the sweet spot

What we can confirm publicly

The v3 family as of 1 May 2026

What we do not promise

Practical example - Anna analyses an invoice

Workflow with local AI (BezChmury)

Frequently asked questions

Local AI is not hype. It is a compliance architecture.

Related articles

GDPR and AI in 2026

KSeF Private - architecture

Anna's accounting firm

Bądź pierwszy gdy ruszamy w Q3 2026

Słownik pojęć użytych w tym artykule

Oficjalne źródła i odniesienia

Want to see private AI for your business?

What is Private AI?
Why accountants and lawyers
need on-premise AI

Want to see private AI
for your business?