Public-information briefing · April 2026

Ubicloud — Holistic View

Company briefing + AI / inference deep-dive
Part I

Company

Founders, funding, architecture, products, regulatory, DD verdict

TL;DR

  • YC W24 open-source IaaS/PaaS cloud, founded 2023, HQ San Francisco
  • Three-founder team from Citus Data / Microsoft Azure / Heroku Postgres
  • $16M seed (Mar 2024, YC + 500 Emerging Europe); no subsequent round disclosed
  • Software abstraction layer running on leased bare-metal (Hetzner, Leaseweb, OVH, AWS Bare Metal) — not a hardware lessor
  • Core services: VMs, block storage, networking, managed Postgres, managed Kubernetes (beta), GitHub Actions runners, AI inference
  • Positioned as 3x–10x cheaper than AWS; Postgres claimed 9x price/performance vs RDS/Aurora
  • AGPL-3.0 — core code is self-hostable; managed service runs the same code
  • ClickHouse partnership (Jan 2026) — ClickHouse's native managed Postgres now runs on Ubicloud (private preview)
  • The sovereign/open pitch is stronger than the certification stack that backs it

Company Basics

  • Founded: 2023
  • YC batch: W24 (primary partner Garry Tan)
  • Offices: San Francisco (HQ), Amstelveen NL, Istanbul TR (Levent)
  • Team size: ~10 at seed (Mar 2024); YC page later listed 15; current not published
  • Legal entities: Ubicloud Inc. (Delaware/US) + Ubicloud B.V. (Netherlands)
  • Mission framing: "What Linux is to proprietary operating systems, Ubicloud is to cloud"
  • Core theses: radical cost compression, eliminate vendor lock-in, architectural transparency / self-hosting option

Founders at a Glance

Three-founder team's common thread: managed Postgres-as-a-service across Heroku, Citus, Azure, and Crunchy Bridge. Ubicloud is Daniel Farina's 4th managed-cloud control plane.

FounderRolePrior
Umur CubukcuCo-founder, Co-CEOCitus Data co-founder/CEO (YC S11), 4y Azure Postgres lead, YC Visiting Partner 2023
Ozgun ErdoganCo-founder, Co-CEO / CTOCitus Data co-founder/CTO, Amazon distributed systems, 4y Azure engineering lead
Daniel FarinaCo-founderCore Heroku Postgres engineer, primary WAL-E author, Citus Cloud, Crunchy Bridge

Umur and Ozgun met at Stanford (with third Citus co-founder Sumedh Pathak, who is not in Ubicloud).

Umur Cubukcu (Co-CEO)

  • Education: BS Boğaziçi (Istanbul); MS Management Science & Engineering, Stanford (~2001–2003)
  • Prior roles: BCG management consultant → Citus Data co-founder & CEO (YC S11, 2011–Jan 2019) → Microsoft Azure Data, product lead for Azure Database for PostgreSQL / Hyperscale (Citus) (Jan 2019–Oct 2022) → YC Visiting Group Partner W23/S23 (Oct 2022–Oct 2023) → Ubicloud
  • Public presence: O'Reilly Strata NY 2018 speaker; Citus blog author; heavily quoted in TechCrunch/SiliconANGLE launch coverage
  • Signature framing: "OpenStack takes an army of people; Ubicloud is signup-to-VM in two minutes"
  • Profiles: LinkedIn /umurc · X @umurc

Ozgun Erdogan (Co-CEO / CTO)

  • Education: BS Galatasaray (Istanbul); MS Computer Science, Stanford
  • Prior roles: Amazon distributed systems engineer (Seattle, ~2006–2010; holds patents on distributed cache consistency and load balancing) → Citus Data co-founder & CTO (technical lead on Citus distributed-Postgres planner/executor) → Microsoft Azure engineering lead for Citus/Hyperscale (~4y) → Ubicloud
  • Public presence: QCon SF 2017 speaker; PostgreSQL Person of the Week; Heavybit community speaker; General Assembly instructor; Startup Reporter EU interview (2026)
  • Signature framing: "The entire stack is open-source, from bare metal to application layers, so businesses can audit our privacy and security claims"
  • Profiles: LinkedIn /ozgune

Daniel Farina (Co-founder, Infra)

  • Education: not publicly disclosed
  • Prior roles: Plumtree Software (early career) → Heroku Postgres core engineer ~2010–2015 (widely credited as primary author of WAL-E, the Postgres continuous-archiving tool) → Citus Cloud control plane ~2016–2019 → Microsoft Azure ~2019–2021 → Crunchy Bridge at Crunchy Data ~2021–2023 → Ubicloud
  • Public presence: US Patent 8,484,243 (stream query processing, 2013); active PostgreSQL mailing-list contributor; RubyConf 2024 talk "Build a Cloud in Thirteen Years"
  • Signature framing: Ubicloud as the 4th iteration of a 13-year Postgres-as-a-service arc; Ruby chosen for infra orchestration because REPL + mature libraries = productivity advantage for a small team
  • Profiles: LinkedIn /danfarina

Funding & Capitalization

  • Seed: $16M, closed Jan 2024, announced Mar 5, 2024
  • Lead: Y Combinator + 500 Emerging Europe
  • Other disclosed: Pioneer Fund, Liquid 2 Ventures, ScaleX Ventures (Turkish), e2vc, Rainfall, Maxitech, angels
  • Valuation: not publicly disclosed
  • No Series A publicly announced as of Apr 2026
  • Capital efficiency thesis: software abstraction layer, not hardware lessor — avoids the multi-billion CapEx of CoreWeave-style plays
  • Implied runway: strong for lean SF/NL/TR distributed team with no owned datacenter

Product Portfolio

  • Elastic Compute — x86_64 and ARM64 Linux VMs; standard and burstable classes
  • Block Storage — non-replicated, AES-XTS encrypted at rest, backed by local NVMe
  • Virtual Networking — VPC-style private networks, dual-stack IPv4/IPv6, IPsec-encrypted tunnels, nftables firewalls
  • Load Balancer
  • Managed PostgreSQL (flagship) — HA across AZs, PITR, read replicas, connection pooling, ParadeDB full-text extension, automated backups
  • Managed Kubernetes — public beta, single-node and 3-node HA control plane, UbiCSI driver for local NVMe PVs
  • GitHub Actions runners — Standard and Premium tiers; x64 and ARM64; 10x larger cache on Premium
  • AI Inference Endpoints — OpenAI-compatible API on vLLM V1 via per-model subdomains {model}.ai.ubicloud.com/v1; open-weight models only; streaming, JSON mode, function calling; 500k tokens/month free
  • EuroGPT Enterprise — €19/user/mo, Llama 3.1 405B + Llama Guard 3, all GPU processing in Germany, GDPR-compliant
  • IAM with ABAC — attribute-based access control from day one
  • Strategic note: deprecated raw GPU VM rentals Dec 31, 2025 — moved up-stack to managed inference

Architecture (the "Clover" stack)

  • Control plane: Ruby + Roda (HTTP) + Sequel (ORM) + Rodauth (auth) + PostgreSQL (state); orchestrates hosts over SSH (no heavy agent, net-ssh library)
  • Host "cloudification": Prog::Vm::HostNexus workflow installs Rhizome host-agent code, SPDK, nftables, configures hugepages, caches boot images
  • Virtualization: Linux KVM + Cloud Hypervisor (Rust-based VMM; lighter and more security-focused than QEMU); QEMU 10.1+ used specifically for Blackwell B200 GPU topology
  • Tenant isolation: each Cloud Hypervisor instance in its own Linux namespace, runs unprivileged, seccomp-bpf supported
  • Block storage: SPDK user-space stack; bdev_aiovbdev_crypto (AES-XTS + envelope encryption + auto key rotation) → bdev_ubi (custom COW module for instant VM provisioning from base images)
  • Networking: IPsec tunnels, nftables, Linux namespaces, dual-stack IPv4/IPv6
  • Opinionated: single stack, deliberately rejects OpenStack's "support everything" complexity

Open Source Footprint

  • Repo: github.com/ubicloud/ubicloud
  • License: AGPL-3.0 — strong copyleft, prevents hyperscaler repackaging-as-SaaS (the "Elastic/MongoDB problem")
  • Primary language: Ruby (~92.5%) — deliberate, inherited from Heroku-era experience
  • Stars/Forks: ~12k / ~558
  • Dual deployment: managed service at console.ubicloud.com OR self-hosted via docker compose + cloudify-your-own-bare-metal
  • Third-party OSS leveraged: Cloud Hypervisor, KVM, SPDK, nftables, strongSwan/IPsec, QEMU, PostgreSQL, vLLM, Tailwind

Pricing & Cost Positioning

Representative prices (Germany region, 2026):

ServiceUbicloudHyperscalerSavings
VM: 2 vCPU / 8 GB~$26 / moAWS ~$69, Azure ~$65, GCP ~$62~60–65%
VM: 32 vCPU / 128 GBlinear scalingAWS ~$1,104 / mo~60–65%
Burstable 1 vCPU$6.65 / mo
Managed Postgres Hobby$12.41 / mo
Managed Postgres Standard (2 vCPU)$49 / moAWS RDS ~$200 / mo~67%
Managed Kubernetes (dev)$46 / moEKS control + EC2 variable~73%
GitHub Actions 2 vCPU Linux$0.0008 / minGitHub $0.0080 / min10x (90%)
AI inference (Qwen2.5-VL-72B)$0.80 / M tokens (in+out)
AI inference (Qwen3-Embedding-8B)$0.05 / M input tokens

Public IPv4: $3/mo. Egress: free up to ~0.625 TB per 2 vCPUs, then $3/TB (≈30x cheaper than hyperscaler egress). Free tier on inference: 500k tokens/month. Per-token pricing for most chat models (Llama 3.3, Mistral Small 3, DeepSeek V3/R1) is dashboard-only.

Performance Claims (Postgres)

Self-published benchmarks vs AWS (independent third-party verification not found):

  • TPC-C (transactional): 1.4x more TPS than Aurora at 5.8x lower cost; 4.6x more TPS than RDS at 2.8x lower cost
  • Latency: 1.91x lower than Aurora, 7.65x lower than RDS
  • TPC-H (analytical): 2.42x faster than Aurora; 2.96x faster than RDS
  • Headline: "9x price/performance" vs RDS/Aurora
  • Driver: SPDK + local NVMe + Cloud Hypervisor = less I/O overhead per dollar
  • Caveat: all numbers sourced from ubicloud.com — no external benchmark surfaced

Competitive Positioning

Vs hyperscalers (AWS / GCP / Azure)

  • 3x–10x cheaper, open source, portable
  • Opinionated and narrow — targets the 10% of services that drive 80% of spend; explicitly no Lambda/DynamoDB/SageMaker equivalents

Vs open-source cloud (OpenStack etc.)

  • Offers a first-party managed service
  • Opinionated stack vs pluggable-everything
  • Modern components (Cloud Hypervisor, SPDK) post-dating OpenStack's design era
  • Cubukcu: "OpenStack takes an army of people"

Vs bare-metal VPS (Hetzner, DO, Linode, Vultr, Scaleway, OVH)

  • Adds managed PaaS layer (Postgres, K8s, runners, inference) they lack

Vs CI specialists

RunsOn, Depot, BuildJet, Blacksmith, Namespace Labs

Vs GPU clouds

CoreWeave, Lambda — Ubicloud exited this race (GPU rental deprecated Dec 2025); pivot to inference-as-PaaS

Key Customers & Partnerships

  • ClickHouse (Jan 22, 2026) — strategic wedge. ClickHouse launched its own native managed Postgres service in private preview, powered entirely by Ubicloud. Coincided with ClickHouse's $400M Series D (Dragoneer-led). ClickHouse engineers now contribute upstream. Shifts Ubicloud toward B2B2B infrastructure play.
  • Direct customers with public stories: Felt, Hatchet (formal case studies); Resmo, Windmill, PeerDB (homepage logos)
  • AudienceKey — cited by third-party research as achieving 50% DB cost reduction post-migration (not independently verified on Ubicloud's site)
  • Claimed scale: ~400 paying customers per a Reddit-sourced figure — unverified
  • No public Turkish enterprise, government, or bank customers announced

Office & Data Center Footprint

Offices

OfficeAddress
San Francisco (HQ)450 Townsend St., SF, CA 94107
Amsterdam / AmstelveenTurfschip 267, 1186XK, Amstelveen NL
IstanbulEsentepe Mah. Talatpaşa Cad. No:5/1, Levent

Production data center regions

Region IDProviderLocation
eu-central-h1HetznerFalkenstein, Germany
eu-north-h1HetznerHelsinki, Finland
us-east-a2LeasewebManassas, Virginia, USA
Türkiye (Istanbul) Privatenot disclosedIstanbul — GPU-only (B200), on request, Oct 2025

Marketing materials reference future regions (Frankfurt, Oregon, Singapore, São Paulo) and additional bare-metal partners (OVHcloud, Latitude.sh, AWS Bare Metal). No broader MENA or APAC presence. Ubicloud owns no physical hardware.

Recent Developments (2025)

  • ARM64 VMs and ARM GitHub Actions runners GA; "100x price/performance" on certain ARM CI workloads
  • Premium Runners launched (2x faster builds, 10x larger cache, 100 GB free cache)
  • Managed Kubernetes moved to public beta (Germany + Virginia); UbiCSI local-NVMe PV driver in preview
  • Postgres dashboard overhaul (June 2025)
  • AI Inference Endpoints — OpenAI-compatible API on vLLM with open-weight models, managed multi-GPU
  • SOC 2 Type II certified (Feb 2025 changelog)
  • Deprecated raw GPU VM runners (effective Dec 31, 2025) — strategic exit from CapEx-heavy GPU race
  • B200 HGX GPU launched in Türkiye (Istanbul) Private Location (Oct 2025); 4- and 8-GPU partitions added Nov 2025
  • B200 HGX GPU virtualization (Dec 15, 2025) — deep technical post on QEMU 10.1+, VFIO-PCI, NVIDIA Fabric Manager, Shared NVSwitch Multitenancy; HN front page

Recent Developments (2026)

  • ClickHouse partnership (Jan 22, 2026) — ClickHouse native Postgres powered by Ubicloud; private preview; engineering cross-contributions; tied to ClickHouse's $400M Series D
  • Blog output — LLM coding practices, VLM-based OCR, documentation automation, CPU-performance myths ("Does MHz still matter?"), AI Coding sober review
  • EuroGPT Enterprise continuing to scale (launched Nov 2024) — privacy-first ChatGPT Enterprise alternative, €19/user/mo, Llama 3.1 405B hosted in Germany
  • No new funding round publicly disclosed — most recent remains the Mar 2024 seed

EU/EMEA Regulatory Posture — the credible parts

  • Dual-entity controller structure: Ubicloud B.V. (NL) and Ubicloud Inc. (US) — Schrems-II-aware
  • EEA-only storage of Customer Account Data (personal data of customers themselves)
  • Transfer basis: Article 45(1) adequacy + Article 46(2)(c) Standard Contractual Clauses
  • SOC 2 Type II confirmed (Feb 2025 changelog; dedicated /docs/security/soc2 URL currently 404s)
  • Matomo for analytics (not Google Analytics) — GDPR-friendlier choice
  • Penetration test referenced, available on request
  • Proactive engagement on EU Data Act — Nov 2023 blog post welcoming cloud-switching/portability provisions is their most substantive regulatory communication
  • EuroGPT residency guarantee: all GPU processing stays in Germany; no customer data used for training

EU/EMEA Regulatory Posture — the gaps

Silent or not-yet-claimed despite their EU sovereignty pitch:

  • No ISO 27001 / 27017 / 27018
  • No C5 (German BSI — often required for Bundesverwaltung procurement, conspicuous given the German region)
  • No SecNumCloud (France / ANSSI)
  • No ENS (Spain)
  • No EUCS claim, no Gaia-X participation
  • No public DORA posture — notable given ClickHouse partnership targets financial services; DORA in force since Jan 17, 2025
  • No public NIS2 posture — Ubicloud's IaaS would normally be in scope
  • No public EU AI Act role classification — despite operating EuroGPT and inference APIs
  • No published BAA process for HIPAA — ToS prohibits PHI absent separate written agreement
  • No public SLA posted
Short version: GDPR/SOC 2 baseline is credible; certification stack is light relative to the "sovereign, open, portable" pitch.

Contract Gotchas (Terms of Service)

  • Governing law: California
  • Data residency not contractually guaranteed by default — ToS permits Ubicloud to move Services Content between regions at its sole discretion absent a written addendum (EuroGPT is a named exception)
  • No SLA in the ToS — no uptime commitment, no service-credit regime
  • Backups are the customer's responsibility"Ubicloud does not promise to retain any preservations or backups"
  • Termination at sole discretion, with or without notice; may result in immediate data destruction
  • PHI and GDPR Article 9 special-category data prohibited without separate written agreement
  • DPA not published — available only on request via [email protected]
  • Trust Center URL resolves to an empty SPA shell for anonymous visitors
  • Sub-processors (Mar 30, 2026): Hetzner (DE/FI), Latitude.sh (DE), Leaseweb (US) for workloads; Cloudflare, Stripe, GitHub, Slack, Matomo, Hubspot, etc. for account data

Risks & Open Questions

Technical / operational

  • Bare-metal supply-chain dependency — margin tied to Hetzner/Leaseweb pricing
  • Storage non-replicated — distributed multi-AZ replicated block storage still ahead
  • Feature-parity deficit — no serverless, no DynamoDB equivalent, no object storage at scale
  • Limited regions — 3 production regions; no MENA, APAC, or LatAm

Go-to-market / competitive

  • Hyperscaler retaliation — aggressive discounting could erode cost advantage
  • Crowded alt-cloud market — DigitalOcean, Linode/Akamai, OVH, CoreWeave, Render all well-funded
  • All performance claims self-published

Regulatory / enterprise-readiness

  • Certification stack light for EU regulated-sector procurement
  • No DORA/NIS2/AI Act public posture
  • DPA and SOC 2 report are request-only

Opacity

  • Current headcount, revenue, ARR, churn not public
  • No post-seed valuation
  • ClickHouse deal economics not disclosed

Due-Diligence Verdict Summary

Strong fundamentals

  • Elite founder pedigree (Citus/Heroku/Azure) → technical credibility
  • Capital-efficient software-abstraction model → not burning GPU-cloud CapEx
  • Real strategic wedge in Postgres (9x claimed price/performance)
  • Landmark partnership (ClickHouse) validates the tech as embeddable infrastructure — B2B2B pivot signal
  • AGPL-3.0 is a defensible legal moat against hyperscaler repackaging

Caveats for buyers and investors

  • Sovereign-cloud pitch outruns the certification paperwork
  • Self-published benchmarks only
  • Data-residency not contractual by default
  • No public SLA
  • Turkish presence is operational, not commercial — no Turkey-market motion
  • Regional footprint insufficient for MENA, APAC, or French public-sector workloads

Fit-for-Purpose Matrix

Use caseFit
CI/CD optimization (GitHub Actions runners)Strong — 10x cost savings, low switching cost
Postgres-heavy SaaS workloadsStrong — flagship product, real performance claims
Stateless / ephemeral computeStrong — 3x–10x cheaper than hyperscalers
Open-source LLM inference (commodity)Strong — OpenAI-compatible API, 10x cheaper
European GDPR-sensitive workloadsGood — with limitations (no ISO 27001 etc.)
EuroGPT for GDPR-regulated EU teamsStrong niche — turnkey sovereign ChatGPT alternative
Build-your-own-cloud for national / sovereign deploymentsUnique — AGPL + BYOC is rare in the market
Regulated financial services (DORA-critical)Weak — no public DORA posture
Healthcare / PHI workloadsWeak — prohibited by default ToS
French public sector (SecNumCloud required)No fit
Global edge / CDN / deeply integrated serverlessNo fit — out of scope by design
MENA / APAC / LatAm residencyNo fit for managed service; BYOC possible

Key Sources

Ubicloud primary

Press & founders

HN threads: 37154138 (Aug 2023), 39598826 (Mar 2024), 44167607 (2025), 46312792 (B200, Dec 2025).

Part II

AI / Inference

Inference endpoints, model catalog, vLLM internals, EuroGPT, B200
Briefing · April 2026

Ubicloud AI

Open-source inference endpoints, EuroGPT Enterprise,
and B200 virtualization

TL;DR — AI Strategy

  • Pivoted from raw GPU rentals to managed inference PaaS — GPU GitHub Actions runners deprecated Dec 31, 2025; GPU VMs repositioned as private/enterprise-only
  • Open-weight only — no Claude/GPT/Gemini re-hosting; every model on the platform is open-weight
  • Three product surfaces: inference endpoints (dev API), EuroGPT Enterprise (SaaS), private B200 VMs (enterprise/BYOC)
  • Production runtime: vLLM V1 with FlashAttention-3, FlashInfer, speculative decoding, prefix caching
  • Signature technical work: open-source virtualization of NVIDIA HGX B200 using QEMU 10.1+ + Fabric Manager Shared NVSwitch Multitenancy
  • AI footprint: Germany (Falkenstein, Helsinki, EuroGPT processing) + Türkiye Istanbul Private Location for B200

Product Surface

Two API surfaces:

SurfaceBase URLPurposeAuth
Management https://api.ubicloud.com Manage API keys, endpoints, projects Bearer JWT
Inference data plane https://{model}.ai.ubicloud.com/v1 OpenAI-compatible inference Bearer API key

Per-model subdomain pattern — each model gets its own hostname (e.g. llama-3-3-70b-turbo.ai.ubicloud.com/v1). There is no unified inference host.

SDK support: any OpenAI-compatible SDK (Python openai, JS); first-party Ruby SDK + ubi CLI (beta).

Free tier: 500,000 tokens / month.

OpenAI Compatibility

Documented and working against the per-model base URL:

  • POST /v1/chat/completions — non-streaming
  • POST /v1/chat/completions with stream=True — SSE streaming
  • POST /v1/chat/completions with response_format={"type":"json_object"} — JSON mode
  • POST /v1/chat/completions with tools=[...], tool_choice="auto" — function/tool calling
  • /v1/embeddings — implied by Qwen3-Embedding-8B launch (endpoint path not explicitly documented)
Not documented or not offered: /v1/completions (legacy), /v1/models on data plane, audio, image, batch API, fine-tuning API.

Model Catalog (Confirmed Public)

Model IDFamilyRoleFirst seen
llama-3-3-70b-turboLlama 3.3 70BChatFeb 2025
mistral-small-3Mistral Small 3 (24B)ChatFeb 2025
ds-r1-qwen-32bDeepSeek-R1-Distill-Qwen-32BReasoningFeb–Mar 2025
DeepSeek V3DeepSeek V3ChatJun 2025
DeepSeek R1DeepSeek R1ReasoningJun 2025
Qwen2.5-VL-72BQwen 2.5 VLVision-languageJul 2025
Qwen3 VLQwen 3 VLVision-languageOct 2025
Qwen3-Embedding-8BQwen 3 EmbeddingText embeddingsMar 2026
Llama Guard 3MetaModeration (EuroGPT)Nov 2024
Llama 3.1 405BMetaChat (EuroGPT)Nov 2024

Open-weight only. No Llama 4 in public materials. Context windows and quantization not published per-model.

Public Pricing

Per-token pricing is dashboard-only for most chat models. Only two models are publicly priced on web:

ModelPriceNotes
Qwen2.5-VL-72B$0.80 / M tokens (input + output)Jul 2025
Qwen3-Embedding-8B$0.05 / M input tokensMar 2026
Free tier500k tokens / monthFeb 2025

March 2026 addition: new GET /project/{id}/inference-endpoint API returns full price table programmatically with separate per_million_prompt_tokens and per_million_completion_tokens.

Positioning claims (Ubicloud-authored): "3–10x lower than comparable offerings" for cloud overall; "3x lower than US alternatives" for EuroGPT. "10x cheaper than OpenAI" is NOT a Ubicloud claim — that phrasing came from third-party research.

Hardware Stack

GPUStatusFirst public mention
NVIDIA A100Preview (Germany)May 2025
NVIDIA H100Production (prior GPU VMs)
NVIDIA HGX B200Production (Türkiye Istanbul, on request)Oct 2025
NVIDIA RTX PRO 6000On requestDec 2025

Not offered in public materials: H200, L40S, MI300X.

B200 partitioning via Shared NVSwitch Multitenancy

Partition sizeWhen added
1-GPU, 2-GPUOct 2025 launch
4-GPU, 8-GPUNov 2025

Inside a partition: full NVLink/NVSwitch bandwidth. Across partitions: isolated. Fabric Manager enforces routing.

B200 Virtualization — Signature Tech Work

Ubicloud wrote the "missing manual" on open-source virtualization of NVIDIA HGX B200. Stack:

  • QEMU 10.1+ (not Cloud Hypervisor) — B200 needs multi-level PCIe topology that Cloud Hypervisor's flat topology can't produce; 10.1 added BAR-mapping optimizations critical for B200's 256 GB Region 2 BAR per GPU
  • VFIO-PCI passthroughvfio-pci.ids=10de:2901, intel_iommu=on iommu=pt; blacklist nouveau/nvidia/nvidia_drm
  • nvidia-open driver on guest (proprietary stack can't drive B200)
  • NVIDIA Fabric Manager in FABRIC_MODE=1 (Shared NVSwitch Multitenancy) on host; fmpm CLI for partition management
  • Host/guest driver versions must match exactly (e.g., 580.95.05)
Competitive point: entire stack is open source; operators can replicate it. Reached HN front page Dec 15, 2025.

vLLM V1 Internals

Production runtime is vLLM V1. Three main components:

  • AsyncLLM — async wrapper for tokenization/detokenization; talks to engine via IPC (bypasses Python GIL)
  • EngineCore — busy loop: pull from input queue, run scheduler + one forward pass per step
  • Scheduler — continuous batching via max_num_batched_tokens; all requests finish prefill before decode

Optimization layer

  • FlashAttention-3 for forward passes
  • FlashInfer (integrated Feb 2025) as high-performance kernel generator
  • PagedAttention-lineage block-based KV cache, dynamically allocated
  • Speculative decoding on DeepSeek R1 32B (Mar 2025)
  • Prefix caching referenced in Dewey.py deep-research demo

Not covered publicly: multi-worker load balancing, health checks, auto-restart, model hot-swap.

EuroGPT Enterprise

The consumer/SaaS face of Ubicloud AI. Available at eurogpt.ubicloud.com.

  • €19 per user per month — framed as 3x cheaper than ChatGPT Enterprise / Copilot
  • LLM: Meta Llama 3.1 405B (open weights)
  • Moderation: Llama Guard 3 (optional, input + output)
  • Embeddings: Mistral E5 7B for RAG with private knowledge base
  • Web search: DuckDuckGo (privacy-preserving)
  • Data residency: "Data remains in Germany, including all GPU processing"
  • Training: "No customer data or metadata used for training purposes"
  • Security: encryption in transit + envelope encryption at rest, key rotation, file upload
  • SSO: OIDC at platform level (Jul 2025); EuroGPT-specific SSO not explicitly documented

Not disclosed: quantization of the 405B deployment. Not offered: private API for EuroGPT — raw API consumers use Inference Endpoints directly.

Strategic Pivot: GPU Rentals → Inference PaaS

Before (2024)

Offered raw GPU rentals (RTX 4000 Ada / H100) as GitHub Actions runners and GPU VMs.

Inflection (2025)

Recognized the CapEx-heavy raw-GPU race against CoreWeave, Lambda, AWS P5, Azure NDv5 as structurally unviable for a seed-stage company. Moved up-stack to managed inference PaaS + dedicated enterprise GPU (private locations).

After (Dec 31, 2025)

  • GPU GitHub Actions runners deprecated
  • GPU VMs repositioned as private/enterprise deployments (B200, RTX PRO 6000 on request)
  • Open-weight inference endpoints become the primary AI front door
  • EuroGPT Enterprise becomes the productized SaaS face
Implication: Ubicloud is no longer competing on GPU-hours — they're competing on tokens and on the quality of the managed inference stack.

Positioning

Competitor classExamplesUbicloud's angle
Closed-model LLM vendorsOpenAI, AnthropicOpen-weight only; lower price; EU residency; no training use
Fast-inference specialistsGroq, Together, Fireworks, DeepInfraSame model class; adds full IaaS underneath + EuroGPT SaaS on top
GPU cloudsCoreWeave, Lambda, AWS P5Open-source B200 virtualization; control plane on GitHub; BYOC option
GPU-on-demandRunPod, Vast.aiManaged-first; GDPR-native; EuroGPT SaaS
European sovereign AIMistral-La Plateforme, Aleph AlphaBroader IaaS (compute + K8s + Postgres) beyond just models

Differentiators actually claimable

  • End-to-end AGPL-3.0 stack (hypervisor → vLLM → UI)
  • Proven B200 virtualization (with public technical writeup)
  • Germany-resident EuroGPT turnkey product
  • Strong Postgres heritage → good RAG / vector story when paired with managed Postgres

Gaps & What's Missing

  • No public SLA, rate limits, latency, or throughput numbers for inference endpoints
  • No public per-token pricing for chat/reasoning models (only Qwen2.5-VL and Qwen3-Embedding priced on web) — dashboard-only
  • Not offered: batch inference API, fine-tuning / LoRA, image generation, audio (Whisper/TTS), multimodal beyond vision-language input
  • No public EU AI Act role classification (provider vs deployer) despite operating EuroGPT and open inference
  • No named AI customers in public materials; no case studies beyond Ubicloud's own Dewey.py deep-research demo
  • No benchmarks vs CoreWeave / Lambda / AWS P5 on B200 workloads; vs OpenAI / Groq / Together on inference throughput or latency
  • Istanbul B200 hosting provider not publicly named — framed as "Private Location" / on-request

Key AI Sources