Meyerbro AI Labs

Research & engineering for
local-first AI infrastructure.

Meyerbro AI Labs is the research arm of Meyerbro Ltd. We focus on the systems problems that sit between a trained model and a production deployment running inside a customer's own network — with a particular focus on NVIDIA Blackwell architecture and high-density small-form-factor compute.

In development

Meyerbro Core

Local AI orchestration layer for secure enterprise environments.

Meyerbro Core is a proprietary orchestration layer being developed by Meyerbro AI Labs. It is designed to run entirely inside a customer's network boundary and coordinate on-premise inference workloads across dense GPU fleets — with first-class support for NVIDIA Blackwell and small-form-factor deployment footprints.

The focus is on organisations in regulated sectors — finance, healthcare, government, legal — who need modern AI capability without handing over their data to a third-party API. Meyerbro Core handles model placement, request routing, lifecycle management, and observability, so that teams can treat their internal GPU fleet like a private inference platform.

Inference

TensorRT-LLM and vLLM runtimes with quantisation-aware deployment and request batching tuned per model.

Orchestration

High-density GPU scheduling with MIG partitioning, heterogeneous node awareness, and graceful degradation.

Footprint

Designed for small-form-factor Blackwell nodes so AI capacity can live beside the data — on-site, on-prem, or at the edge.

Boundary

Air-gap friendly. No telemetry egress, no dependency on external inference endpoints.

Meyerbro Core is in active development. Early design partners welcome — get in touch via LinkedIn to discuss a pilot.

# Design Principles

The constraints that shape everything Meyerbro AI Labs builds.

01

Local-First

Models, weights, and prompts never leave the customer boundary. Private by construction, not by policy.

02

Performance per Watt

Optimisation measured against energy cost and thermal envelope, not just tokens per second.

03

Production-Grade

Observability, upgrade paths, and failure recovery built in from day one — not bolted on after a demo.

# Research Areas

Where the lab's engineering effort is focused.

TensorRT-LLM Optimization

Quantisation, kernel tuning, and batching strategies for production inference on NVIDIA Blackwell and Hopper GPUs. Targeting sub-second latency on locally hosted SLMs.

On-Premise SLM Deployment

Hardened reference architectures for running small and mid-size language models entirely inside customer-controlled networks. No data egress, no cloud dependency.

High-Density GPU Orchestration

Scheduling and lifecycle management for dense GPU fleets, including Kubernetes device plugins, MIG partitioning, and heterogeneous workload placement.

Small Form Factor Compute

Engineering for SFF nodes: thermally-aware placement, power budgeting, and edge deployment patterns that bring AI capacity closer to where data is generated.

$ interested in a design partnership?

We are talking to organisations in regulated sectors who need on-premise AI without compromising data boundaries.

Get in Touch