Public-information briefing · April 2026
Ubicloud — Holistic View
Company briefing + AI / inference deep-dive
Part I
Company
Founders, funding, architecture, products, regulatory, DD verdict
TL;DR
- YC W24 open-source IaaS/PaaS cloud, founded 2023, HQ San Francisco
- Three-founder team from Citus Data / Microsoft Azure / Heroku Postgres
- $16M seed (Mar 2024, YC + 500 Emerging Europe); no subsequent round disclosed
- Software abstraction layer running on leased bare-metal (Hetzner, Leaseweb, OVH, AWS Bare Metal) — not a hardware lessor
- Core services: VMs, block storage, networking, managed Postgres, managed Kubernetes (beta), GitHub Actions runners, AI inference
- Positioned as 3x–10x cheaper than AWS; Postgres claimed 9x price/performance vs RDS/Aurora
- AGPL-3.0 — core code is self-hostable; managed service runs the same code
- ClickHouse partnership (Jan 2026) — ClickHouse's native managed Postgres now runs on Ubicloud (private preview)
- The sovereign/open pitch is stronger than the certification stack that backs it
Company Basics
- Founded: 2023
- YC batch: W24 (primary partner Garry Tan)
- Offices: San Francisco (HQ), Amstelveen NL, Istanbul TR (Levent)
- Team size: ~10 at seed (Mar 2024); YC page later listed 15; current not published
- Legal entities: Ubicloud Inc. (Delaware/US) + Ubicloud B.V. (Netherlands)
- Mission framing: "What Linux is to proprietary operating systems, Ubicloud is to cloud"
- Core theses: radical cost compression, eliminate vendor lock-in, architectural transparency / self-hosting option
Founders at a Glance
Three-founder team's common thread: managed Postgres-as-a-service across Heroku, Citus, Azure, and Crunchy Bridge. Ubicloud is Daniel Farina's 4th managed-cloud control plane.
| Founder | Role | Prior |
| Umur Cubukcu | Co-founder, Co-CEO | Citus Data co-founder/CEO (YC S11), 4y Azure Postgres lead, YC Visiting Partner 2023 |
| Ozgun Erdogan | Co-founder, Co-CEO / CTO | Citus Data co-founder/CTO, Amazon distributed systems, 4y Azure engineering lead |
| Daniel Farina | Co-founder | Core Heroku Postgres engineer, primary WAL-E author, Citus Cloud, Crunchy Bridge |
Umur and Ozgun met at Stanford (with third Citus co-founder Sumedh Pathak, who is not in Ubicloud).
Umur Cubukcu (Co-CEO)
- Education: BS Boğaziçi (Istanbul); MS Management Science & Engineering, Stanford (~2001–2003)
- Prior roles: BCG management consultant → Citus Data co-founder & CEO (YC S11, 2011–Jan 2019) → Microsoft Azure Data, product lead for Azure Database for PostgreSQL / Hyperscale (Citus) (Jan 2019–Oct 2022) → YC Visiting Group Partner W23/S23 (Oct 2022–Oct 2023) → Ubicloud
- Public presence: O'Reilly Strata NY 2018 speaker; Citus blog author; heavily quoted in TechCrunch/SiliconANGLE launch coverage
- Signature framing: "OpenStack takes an army of people; Ubicloud is signup-to-VM in two minutes"
- Profiles: LinkedIn /umurc · X @umurc
Ozgun Erdogan (Co-CEO / CTO)
- Education: BS Galatasaray (Istanbul); MS Computer Science, Stanford
- Prior roles: Amazon distributed systems engineer (Seattle, ~2006–2010; holds patents on distributed cache consistency and load balancing) → Citus Data co-founder & CTO (technical lead on Citus distributed-Postgres planner/executor) → Microsoft Azure engineering lead for Citus/Hyperscale (~4y) → Ubicloud
- Public presence: QCon SF 2017 speaker; PostgreSQL Person of the Week; Heavybit community speaker; General Assembly instructor; Startup Reporter EU interview (2026)
- Signature framing: "The entire stack is open-source, from bare metal to application layers, so businesses can audit our privacy and security claims"
- Profiles: LinkedIn /ozgune
Daniel Farina (Co-founder, Infra)
- Education: not publicly disclosed
- Prior roles: Plumtree Software (early career) → Heroku Postgres core engineer ~2010–2015 (widely credited as primary author of WAL-E, the Postgres continuous-archiving tool) → Citus Cloud control plane ~2016–2019 → Microsoft Azure ~2019–2021 → Crunchy Bridge at Crunchy Data ~2021–2023 → Ubicloud
- Public presence: US Patent 8,484,243 (stream query processing, 2013); active PostgreSQL mailing-list contributor; RubyConf 2024 talk "Build a Cloud in Thirteen Years"
- Signature framing: Ubicloud as the 4th iteration of a 13-year Postgres-as-a-service arc; Ruby chosen for infra orchestration because REPL + mature libraries = productivity advantage for a small team
- Profiles: LinkedIn /danfarina
Funding & Capitalization
- Seed: $16M, closed Jan 2024, announced Mar 5, 2024
- Lead: Y Combinator + 500 Emerging Europe
- Other disclosed: Pioneer Fund, Liquid 2 Ventures, ScaleX Ventures (Turkish), e2vc, Rainfall, Maxitech, angels
- Valuation: not publicly disclosed
- No Series A publicly announced as of Apr 2026
- Capital efficiency thesis: software abstraction layer, not hardware lessor — avoids the multi-billion CapEx of CoreWeave-style plays
- Implied runway: strong for lean SF/NL/TR distributed team with no owned datacenter
Product Portfolio
- Elastic Compute — x86_64 and ARM64 Linux VMs; standard and burstable classes
- Block Storage — non-replicated, AES-XTS encrypted at rest, backed by local NVMe
- Virtual Networking — VPC-style private networks, dual-stack IPv4/IPv6, IPsec-encrypted tunnels, nftables firewalls
- Load Balancer
- Managed PostgreSQL (flagship) — HA across AZs, PITR, read replicas, connection pooling, ParadeDB full-text extension, automated backups
- Managed Kubernetes — public beta, single-node and 3-node HA control plane, UbiCSI driver for local NVMe PVs
- GitHub Actions runners — Standard and Premium tiers; x64 and ARM64; 10x larger cache on Premium
- AI Inference Endpoints — OpenAI-compatible API on vLLM V1 via per-model subdomains
{model}.ai.ubicloud.com/v1; open-weight models only; streaming, JSON mode, function calling; 500k tokens/month free
- EuroGPT Enterprise — €19/user/mo, Llama 3.1 405B + Llama Guard 3, all GPU processing in Germany, GDPR-compliant
- IAM with ABAC — attribute-based access control from day one
- Strategic note: deprecated raw GPU VM rentals Dec 31, 2025 — moved up-stack to managed inference
Architecture (the "Clover" stack)
- Control plane: Ruby + Roda (HTTP) + Sequel (ORM) + Rodauth (auth) + PostgreSQL (state); orchestrates hosts over SSH (no heavy agent, net-ssh library)
- Host "cloudification":
Prog::Vm::HostNexus workflow installs Rhizome host-agent code, SPDK, nftables, configures hugepages, caches boot images
- Virtualization: Linux KVM + Cloud Hypervisor (Rust-based VMM; lighter and more security-focused than QEMU); QEMU 10.1+ used specifically for Blackwell B200 GPU topology
- Tenant isolation: each Cloud Hypervisor instance in its own Linux namespace, runs unprivileged, seccomp-bpf supported
- Block storage: SPDK user-space stack;
bdev_aio → vbdev_crypto (AES-XTS + envelope encryption + auto key rotation) → bdev_ubi (custom COW module for instant VM provisioning from base images)
- Networking: IPsec tunnels, nftables, Linux namespaces, dual-stack IPv4/IPv6
- Opinionated: single stack, deliberately rejects OpenStack's "support everything" complexity
Open Source Footprint
- Repo: github.com/ubicloud/ubicloud
- License: AGPL-3.0 — strong copyleft, prevents hyperscaler repackaging-as-SaaS (the "Elastic/MongoDB problem")
- Primary language: Ruby (~92.5%) — deliberate, inherited from Heroku-era experience
- Stars/Forks: ~12k / ~558
- Dual deployment: managed service at
console.ubicloud.com OR self-hosted via docker compose + cloudify-your-own-bare-metal
- Third-party OSS leveraged: Cloud Hypervisor, KVM, SPDK, nftables, strongSwan/IPsec, QEMU, PostgreSQL, vLLM, Tailwind
Pricing & Cost Positioning
Representative prices (Germany region, 2026):
| Service | Ubicloud | Hyperscaler | Savings |
| VM: 2 vCPU / 8 GB | ~$26 / mo | AWS ~$69, Azure ~$65, GCP ~$62 | ~60–65% |
| VM: 32 vCPU / 128 GB | linear scaling | AWS ~$1,104 / mo | ~60–65% |
| Burstable 1 vCPU | $6.65 / mo | — | — |
| Managed Postgres Hobby | $12.41 / mo | — | — |
| Managed Postgres Standard (2 vCPU) | $49 / mo | AWS RDS ~$200 / mo | ~67% |
| Managed Kubernetes (dev) | $46 / mo | EKS control + EC2 variable | ~73% |
| GitHub Actions 2 vCPU Linux | $0.0008 / min | GitHub $0.0080 / min | 10x (90%) |
| AI inference (Qwen2.5-VL-72B) | $0.80 / M tokens (in+out) | — | — |
| AI inference (Qwen3-Embedding-8B) | $0.05 / M input tokens | — | — |
Public IPv4: $3/mo. Egress: free up to ~0.625 TB per 2 vCPUs, then $3/TB (≈30x cheaper than hyperscaler egress). Free tier on inference: 500k tokens/month. Per-token pricing for most chat models (Llama 3.3, Mistral Small 3, DeepSeek V3/R1) is dashboard-only.
Performance Claims (Postgres)
Self-published benchmarks vs AWS (independent third-party verification not found):
- TPC-C (transactional): 1.4x more TPS than Aurora at 5.8x lower cost; 4.6x more TPS than RDS at 2.8x lower cost
- Latency: 1.91x lower than Aurora, 7.65x lower than RDS
- TPC-H (analytical): 2.42x faster than Aurora; 2.96x faster than RDS
- Headline: "9x price/performance" vs RDS/Aurora
- Driver: SPDK + local NVMe + Cloud Hypervisor = less I/O overhead per dollar
- Caveat: all numbers sourced from ubicloud.com — no external benchmark surfaced
Competitive Positioning
Vs hyperscalers (AWS / GCP / Azure)
- 3x–10x cheaper, open source, portable
- Opinionated and narrow — targets the 10% of services that drive 80% of spend; explicitly no Lambda/DynamoDB/SageMaker equivalents
Vs open-source cloud (OpenStack etc.)
- Offers a first-party managed service
- Opinionated stack vs pluggable-everything
- Modern components (Cloud Hypervisor, SPDK) post-dating OpenStack's design era
- Cubukcu: "OpenStack takes an army of people"
Vs bare-metal VPS (Hetzner, DO, Linode, Vultr, Scaleway, OVH)
- Adds managed PaaS layer (Postgres, K8s, runners, inference) they lack
Vs CI specialists
RunsOn, Depot, BuildJet, Blacksmith, Namespace Labs
Vs GPU clouds
CoreWeave, Lambda — Ubicloud exited this race (GPU rental deprecated Dec 2025); pivot to inference-as-PaaS
Key Customers & Partnerships
- ClickHouse (Jan 22, 2026) — strategic wedge. ClickHouse launched its own native managed Postgres service in private preview, powered entirely by Ubicloud. Coincided with ClickHouse's $400M Series D (Dragoneer-led). ClickHouse engineers now contribute upstream. Shifts Ubicloud toward B2B2B infrastructure play.
- Direct customers with public stories: Felt, Hatchet (formal case studies); Resmo, Windmill, PeerDB (homepage logos)
- AudienceKey — cited by third-party research as achieving 50% DB cost reduction post-migration (not independently verified on Ubicloud's site)
- Claimed scale: ~400 paying customers per a Reddit-sourced figure — unverified
- No public Turkish enterprise, government, or bank customers announced
Office & Data Center Footprint
Offices
| Office | Address |
| San Francisco (HQ) | 450 Townsend St., SF, CA 94107 |
| Amsterdam / Amstelveen | Turfschip 267, 1186XK, Amstelveen NL |
| Istanbul | Esentepe Mah. Talatpaşa Cad. No:5/1, Levent |
Production data center regions
| Region ID | Provider | Location |
eu-central-h1 | Hetzner | Falkenstein, Germany |
eu-north-h1 | Hetzner | Helsinki, Finland |
us-east-a2 | Leaseweb | Manassas, Virginia, USA |
| Türkiye (Istanbul) Private | not disclosed | Istanbul — GPU-only (B200), on request, Oct 2025 |
Marketing materials reference future regions (Frankfurt, Oregon, Singapore, São Paulo) and additional bare-metal partners (OVHcloud, Latitude.sh, AWS Bare Metal). No broader MENA or APAC presence. Ubicloud owns no physical hardware.
Recent Developments (2025)
- ARM64 VMs and ARM GitHub Actions runners GA; "100x price/performance" on certain ARM CI workloads
- Premium Runners launched (2x faster builds, 10x larger cache, 100 GB free cache)
- Managed Kubernetes moved to public beta (Germany + Virginia); UbiCSI local-NVMe PV driver in preview
- Postgres dashboard overhaul (June 2025)
- AI Inference Endpoints — OpenAI-compatible API on vLLM with open-weight models, managed multi-GPU
- SOC 2 Type II certified (Feb 2025 changelog)
- Deprecated raw GPU VM runners (effective Dec 31, 2025) — strategic exit from CapEx-heavy GPU race
- B200 HGX GPU launched in Türkiye (Istanbul) Private Location (Oct 2025); 4- and 8-GPU partitions added Nov 2025
- B200 HGX GPU virtualization (Dec 15, 2025) — deep technical post on QEMU 10.1+, VFIO-PCI, NVIDIA Fabric Manager, Shared NVSwitch Multitenancy; HN front page
Recent Developments (2026)
- ClickHouse partnership (Jan 22, 2026) — ClickHouse native Postgres powered by Ubicloud; private preview; engineering cross-contributions; tied to ClickHouse's $400M Series D
- Blog output — LLM coding practices, VLM-based OCR, documentation automation, CPU-performance myths ("Does MHz still matter?"), AI Coding sober review
- EuroGPT Enterprise continuing to scale (launched Nov 2024) — privacy-first ChatGPT Enterprise alternative, €19/user/mo, Llama 3.1 405B hosted in Germany
- No new funding round publicly disclosed — most recent remains the Mar 2024 seed
EU/EMEA Regulatory Posture — the credible parts
- Dual-entity controller structure: Ubicloud B.V. (NL) and Ubicloud Inc. (US) — Schrems-II-aware
- EEA-only storage of Customer Account Data (personal data of customers themselves)
- Transfer basis: Article 45(1) adequacy + Article 46(2)(c) Standard Contractual Clauses
- SOC 2 Type II confirmed (Feb 2025 changelog; dedicated
/docs/security/soc2 URL currently 404s)
- Matomo for analytics (not Google Analytics) — GDPR-friendlier choice
- Penetration test referenced, available on request
- Proactive engagement on EU Data Act — Nov 2023 blog post welcoming cloud-switching/portability provisions is their most substantive regulatory communication
- EuroGPT residency guarantee: all GPU processing stays in Germany; no customer data used for training
EU/EMEA Regulatory Posture — the gaps
Silent or not-yet-claimed despite their EU sovereignty pitch:
- No ISO 27001 / 27017 / 27018
- No C5 (German BSI — often required for Bundesverwaltung procurement, conspicuous given the German region)
- No SecNumCloud (France / ANSSI)
- No ENS (Spain)
- No EUCS claim, no Gaia-X participation
- No public DORA posture — notable given ClickHouse partnership targets financial services; DORA in force since Jan 17, 2025
- No public NIS2 posture — Ubicloud's IaaS would normally be in scope
- No public EU AI Act role classification — despite operating EuroGPT and inference APIs
- No published BAA process for HIPAA — ToS prohibits PHI absent separate written agreement
- No public SLA posted
Short version: GDPR/SOC 2 baseline is credible; certification stack is light relative to the "sovereign, open, portable" pitch.
Contract Gotchas (Terms of Service)
- Governing law: California
- Data residency not contractually guaranteed by default — ToS permits Ubicloud to move Services Content between regions at its sole discretion absent a written addendum (EuroGPT is a named exception)
- No SLA in the ToS — no uptime commitment, no service-credit regime
- Backups are the customer's responsibility — "Ubicloud does not promise to retain any preservations or backups"
- Termination at sole discretion, with or without notice; may result in immediate data destruction
- PHI and GDPR Article 9 special-category data prohibited without separate written agreement
- DPA not published — available only on request via [email protected]
- Trust Center URL resolves to an empty SPA shell for anonymous visitors
- Sub-processors (Mar 30, 2026): Hetzner (DE/FI), Latitude.sh (DE), Leaseweb (US) for workloads; Cloudflare, Stripe, GitHub, Slack, Matomo, Hubspot, etc. for account data
Risks & Open Questions
Technical / operational
- Bare-metal supply-chain dependency — margin tied to Hetzner/Leaseweb pricing
- Storage non-replicated — distributed multi-AZ replicated block storage still ahead
- Feature-parity deficit — no serverless, no DynamoDB equivalent, no object storage at scale
- Limited regions — 3 production regions; no MENA, APAC, or LatAm
Go-to-market / competitive
- Hyperscaler retaliation — aggressive discounting could erode cost advantage
- Crowded alt-cloud market — DigitalOcean, Linode/Akamai, OVH, CoreWeave, Render all well-funded
- All performance claims self-published
Regulatory / enterprise-readiness
- Certification stack light for EU regulated-sector procurement
- No DORA/NIS2/AI Act public posture
- DPA and SOC 2 report are request-only
Opacity
- Current headcount, revenue, ARR, churn not public
- No post-seed valuation
- ClickHouse deal economics not disclosed
Due-Diligence Verdict Summary
Strong fundamentals
- Elite founder pedigree (Citus/Heroku/Azure) → technical credibility
- Capital-efficient software-abstraction model → not burning GPU-cloud CapEx
- Real strategic wedge in Postgres (9x claimed price/performance)
- Landmark partnership (ClickHouse) validates the tech as embeddable infrastructure — B2B2B pivot signal
- AGPL-3.0 is a defensible legal moat against hyperscaler repackaging
Caveats for buyers and investors
- Sovereign-cloud pitch outruns the certification paperwork
- Self-published benchmarks only
- Data-residency not contractual by default
- No public SLA
- Turkish presence is operational, not commercial — no Turkey-market motion
- Regional footprint insufficient for MENA, APAC, or French public-sector workloads
Fit-for-Purpose Matrix
| Use case | Fit |
| CI/CD optimization (GitHub Actions runners) | Strong — 10x cost savings, low switching cost |
| Postgres-heavy SaaS workloads | Strong — flagship product, real performance claims |
| Stateless / ephemeral compute | Strong — 3x–10x cheaper than hyperscalers |
| Open-source LLM inference (commodity) | Strong — OpenAI-compatible API, 10x cheaper |
| European GDPR-sensitive workloads | Good — with limitations (no ISO 27001 etc.) |
| EuroGPT for GDPR-regulated EU teams | Strong niche — turnkey sovereign ChatGPT alternative |
| Build-your-own-cloud for national / sovereign deployments | Unique — AGPL + BYOC is rare in the market |
| Regulated financial services (DORA-critical) | Weak — no public DORA posture |
| Healthcare / PHI workloads | Weak — prohibited by default ToS |
| French public sector (SecNumCloud required) | No fit |
| Global edge / CDN / deeply integrated serverless | No fit — out of scope by design |
| MENA / APAC / LatAm residency | No fit for managed service; BYOC possible |
Key Sources
Ubicloud primary
Press & founders
HN threads: 37154138 (Aug 2023), 39598826 (Mar 2024), 44167607 (2025), 46312792 (B200, Dec 2025).
Part II
AI / Inference
Inference endpoints, model catalog, vLLM internals, EuroGPT, B200
Briefing · April 2026
Ubicloud AI
Open-source inference endpoints, EuroGPT Enterprise,
and B200 virtualization
TL;DR — AI Strategy
- Pivoted from raw GPU rentals to managed inference PaaS — GPU GitHub Actions runners deprecated Dec 31, 2025; GPU VMs repositioned as private/enterprise-only
- Open-weight only — no Claude/GPT/Gemini re-hosting; every model on the platform is open-weight
- Three product surfaces: inference endpoints (dev API), EuroGPT Enterprise (SaaS), private B200 VMs (enterprise/BYOC)
- Production runtime: vLLM V1 with FlashAttention-3, FlashInfer, speculative decoding, prefix caching
- Signature technical work: open-source virtualization of NVIDIA HGX B200 using QEMU 10.1+ + Fabric Manager Shared NVSwitch Multitenancy
- AI footprint: Germany (Falkenstein, Helsinki, EuroGPT processing) + Türkiye Istanbul Private Location for B200
Product Surface
Two API surfaces:
| Surface | Base URL | Purpose | Auth |
| Management |
https://api.ubicloud.com |
Manage API keys, endpoints, projects |
Bearer JWT |
| Inference data plane |
https://{model}.ai.ubicloud.com/v1 |
OpenAI-compatible inference |
Bearer API key |
Per-model subdomain pattern — each model gets its own hostname (e.g. llama-3-3-70b-turbo.ai.ubicloud.com/v1). There is no unified inference host.
SDK support: any OpenAI-compatible SDK (Python openai, JS); first-party Ruby SDK + ubi CLI (beta).
Free tier: 500,000 tokens / month.
OpenAI Compatibility
Documented and working against the per-model base URL:
POST /v1/chat/completions — non-streaming
POST /v1/chat/completions with stream=True — SSE streaming
POST /v1/chat/completions with response_format={"type":"json_object"} — JSON mode
POST /v1/chat/completions with tools=[...], tool_choice="auto" — function/tool calling
/v1/embeddings — implied by Qwen3-Embedding-8B launch (endpoint path not explicitly documented)
Not documented or not offered: /v1/completions (legacy), /v1/models on data plane, audio, image, batch API, fine-tuning API.
Model Catalog (Confirmed Public)
| Model ID | Family | Role | First seen |
llama-3-3-70b-turbo | Llama 3.3 70B | Chat | Feb 2025 |
mistral-small-3 | Mistral Small 3 (24B) | Chat | Feb 2025 |
ds-r1-qwen-32b | DeepSeek-R1-Distill-Qwen-32B | Reasoning | Feb–Mar 2025 |
| DeepSeek V3 | DeepSeek V3 | Chat | Jun 2025 |
| DeepSeek R1 | DeepSeek R1 | Reasoning | Jun 2025 |
Qwen2.5-VL-72B | Qwen 2.5 VL | Vision-language | Jul 2025 |
| Qwen3 VL | Qwen 3 VL | Vision-language | Oct 2025 |
Qwen3-Embedding-8B | Qwen 3 Embedding | Text embeddings | Mar 2026 |
| Llama Guard 3 | Meta | Moderation (EuroGPT) | Nov 2024 |
| Llama 3.1 405B | Meta | Chat (EuroGPT) | Nov 2024 |
Open-weight only. No Llama 4 in public materials. Context windows and quantization not published per-model.
Public Pricing
Per-token pricing is dashboard-only for most chat models. Only two models are publicly priced on web:
| Model | Price | Notes |
| Qwen2.5-VL-72B | $0.80 / M tokens (input + output) | Jul 2025 |
| Qwen3-Embedding-8B | $0.05 / M input tokens | Mar 2026 |
| Free tier | 500k tokens / month | Feb 2025 |
March 2026 addition: new GET /project/{id}/inference-endpoint API returns full price table programmatically with separate per_million_prompt_tokens and per_million_completion_tokens.
Positioning claims (Ubicloud-authored): "3–10x lower than comparable offerings" for cloud overall; "3x lower than US alternatives" for EuroGPT. "10x cheaper than OpenAI" is NOT a Ubicloud claim — that phrasing came from third-party research.
Hardware Stack
| GPU | Status | First public mention |
| NVIDIA A100 | Preview (Germany) | May 2025 |
| NVIDIA H100 | Production (prior GPU VMs) | — |
| NVIDIA HGX B200 | Production (Türkiye Istanbul, on request) | Oct 2025 |
| NVIDIA RTX PRO 6000 | On request | Dec 2025 |
Not offered in public materials: H200, L40S, MI300X.
B200 partitioning via Shared NVSwitch Multitenancy
| Partition size | When added |
| 1-GPU, 2-GPU | Oct 2025 launch |
| 4-GPU, 8-GPU | Nov 2025 |
Inside a partition: full NVLink/NVSwitch bandwidth. Across partitions: isolated. Fabric Manager enforces routing.
B200 Virtualization — Signature Tech Work
Ubicloud wrote the "missing manual" on open-source virtualization of NVIDIA HGX B200. Stack:
- QEMU 10.1+ (not Cloud Hypervisor) — B200 needs multi-level PCIe topology that Cloud Hypervisor's flat topology can't produce; 10.1 added BAR-mapping optimizations critical for B200's 256 GB Region 2 BAR per GPU
- VFIO-PCI passthrough —
vfio-pci.ids=10de:2901, intel_iommu=on iommu=pt; blacklist nouveau/nvidia/nvidia_drm
- nvidia-open driver on guest (proprietary stack can't drive B200)
- NVIDIA Fabric Manager in
FABRIC_MODE=1 (Shared NVSwitch Multitenancy) on host; fmpm CLI for partition management
- Host/guest driver versions must match exactly (e.g., 580.95.05)
Competitive point: entire stack is open source; operators can replicate it. Reached HN front page Dec 15, 2025.
vLLM V1 Internals
Production runtime is vLLM V1. Three main components:
- AsyncLLM — async wrapper for tokenization/detokenization; talks to engine via IPC (bypasses Python GIL)
- EngineCore — busy loop: pull from input queue, run scheduler + one forward pass per step
- Scheduler — continuous batching via
max_num_batched_tokens; all requests finish prefill before decode
Optimization layer
- FlashAttention-3 for forward passes
- FlashInfer (integrated Feb 2025) as high-performance kernel generator
- PagedAttention-lineage block-based KV cache, dynamically allocated
- Speculative decoding on DeepSeek R1 32B (Mar 2025)
- Prefix caching referenced in Dewey.py deep-research demo
Not covered publicly: multi-worker load balancing, health checks, auto-restart, model hot-swap.
EuroGPT Enterprise
The consumer/SaaS face of Ubicloud AI. Available at eurogpt.ubicloud.com.
- €19 per user per month — framed as 3x cheaper than ChatGPT Enterprise / Copilot
- LLM: Meta Llama 3.1 405B (open weights)
- Moderation: Llama Guard 3 (optional, input + output)
- Embeddings: Mistral E5 7B for RAG with private knowledge base
- Web search: DuckDuckGo (privacy-preserving)
- Data residency: "Data remains in Germany, including all GPU processing"
- Training: "No customer data or metadata used for training purposes"
- Security: encryption in transit + envelope encryption at rest, key rotation, file upload
- SSO: OIDC at platform level (Jul 2025); EuroGPT-specific SSO not explicitly documented
Not disclosed: quantization of the 405B deployment. Not offered: private API for EuroGPT — raw API consumers use Inference Endpoints directly.
Strategic Pivot: GPU Rentals → Inference PaaS
Before (2024)
Offered raw GPU rentals (RTX 4000 Ada / H100) as GitHub Actions runners and GPU VMs.
Inflection (2025)
Recognized the CapEx-heavy raw-GPU race against CoreWeave, Lambda, AWS P5, Azure NDv5 as structurally unviable for a seed-stage company. Moved up-stack to managed inference PaaS + dedicated enterprise GPU (private locations).
After (Dec 31, 2025)
- GPU GitHub Actions runners deprecated
- GPU VMs repositioned as private/enterprise deployments (B200, RTX PRO 6000 on request)
- Open-weight inference endpoints become the primary AI front door
- EuroGPT Enterprise becomes the productized SaaS face
Implication: Ubicloud is no longer competing on GPU-hours — they're competing on tokens and on the quality of the managed inference stack.
Positioning
| Competitor class | Examples | Ubicloud's angle |
| Closed-model LLM vendors | OpenAI, Anthropic | Open-weight only; lower price; EU residency; no training use |
| Fast-inference specialists | Groq, Together, Fireworks, DeepInfra | Same model class; adds full IaaS underneath + EuroGPT SaaS on top |
| GPU clouds | CoreWeave, Lambda, AWS P5 | Open-source B200 virtualization; control plane on GitHub; BYOC option |
| GPU-on-demand | RunPod, Vast.ai | Managed-first; GDPR-native; EuroGPT SaaS |
| European sovereign AI | Mistral-La Plateforme, Aleph Alpha | Broader IaaS (compute + K8s + Postgres) beyond just models |
Differentiators actually claimable
- End-to-end AGPL-3.0 stack (hypervisor → vLLM → UI)
- Proven B200 virtualization (with public technical writeup)
- Germany-resident EuroGPT turnkey product
- Strong Postgres heritage → good RAG / vector story when paired with managed Postgres
Gaps & What's Missing
- No public SLA, rate limits, latency, or throughput numbers for inference endpoints
- No public per-token pricing for chat/reasoning models (only Qwen2.5-VL and Qwen3-Embedding priced on web) — dashboard-only
- Not offered: batch inference API, fine-tuning / LoRA, image generation, audio (Whisper/TTS), multimodal beyond vision-language input
- No public EU AI Act role classification (provider vs deployer) despite operating EuroGPT and open inference
- No named AI customers in public materials; no case studies beyond Ubicloud's own Dewey.py deep-research demo
- No benchmarks vs CoreWeave / Lambda / AWS P5 on B200 workloads; vs OpenAI / Groq / Together on inference throughput or latency
- Istanbul B200 hosting provider not publicly named — framed as "Private Location" / on-request