Numia Data
Data Architecture · Data Engineering · Data Warehouse · Real-Time · Bootstrapped · AI Agents
I cofounded Numia in late 2022. The problem was simple: blockchain data was a mess. Protocols wanted analytics but couldn't get reliable numbers. Developers needed APIs but existing options were slow, expensive, or both. We decided to build the data layer that crypto actually needed.
Today we're bootstrapped, profitable, and serving 10M+ API requests per day across 30+ chains. Our clients include TradingView, CoinGecko, CoinMarketCap, DexScreener, and DefiLlama. On the protocol side: dYdX, Osmosis, Celestia, Stride. Small team, no outside funding.
What I Do Here
When you're a cofounder at a startup, your job title is whatever needs doing that week. I've written chain indexers, designed APIs, built ML pipelines, shipped frontend dashboards, handled customer calls, and debugged production at 2AM. That's the job.
The data side takes most of my time. Building indexers that process billions of events without breaking. Setting up DBT transformations that turn raw chain data into something useful. Tuning ClickHouse for the queries our customers actually run. Making sure the API layer can handle 10M+ daily requests at p99 under 100ms.
On the product side, I led the API development, built the analytics dashboards that protocol teams use for decision-making. An onchain CRM that treats wallets as users instead of email addresses. The AI layer that became NumiaAI. A SQL interface for analysts who want direct access. Each one started as a customer problem and became a product.
The Products
We ended up building a suite of tools because blockchain data has different problems at different layers.
- Web3 API. Real-time and historical on-chain data. 10M+ requests per day, powers apps like TradingView, CoinGecko, and DexScreener. The main product I've lead.
- NumiaSQL. Data ingestion and distribution layer. Hundreds of TB of indexed chain data on BigQuery, feeding integrators like Dune, Artemis, Token Terminal, and Nansen.
- Celestia Data. Custom analytics platform built for the Celestia Foundation. DA layer metrics, rollup tracking, and token economics across 50+ networks.
- Token Pulse. Real-time token tracking with whale movements, exchange flows, and wallet segmentation. Sub-second latency for the signals that matter before price moves.
- Datalenses. Analytics dashboards for Cosmos Hub, Osmosis, dYdX, and Celestia. Protocol teams and investors use these to understand what's happening on their chains.
- NumiaAI. Ask questions about blockchain data in plain English. Launched with dYdX. Only works because the underlying data is clean.
- NumiaEngage. On-chain CRM and growth platform. Segment and target wallets based on behavior, distributed through Keplr and Leap wallets.
- DEX Anomaly Detection. ML-based detection of unusual trading patterns using autoencoders. Helps protocols spot wash trading and manipulation.
Technical Decisions
We built a chain-agnostic data model early on. One unified schema across 30+ chains. Every chain has its quirks with different event formats, different block structures, different ideas about what a "event" even is. Getting that abstraction right was painful upfront, but now we can add a new chain in days instead of weeks.
For most of our pipelines, we chose batch over real-time. When we dug into what clients actually needed, the vast majority of use cases worked perfectly fine with 15-minute to 1-hour latency. Reserving real-time infrastructure for the endpoints that genuinely need sub-second responses, like DEXes APIs or real time alerting, saved us months of engineering and keeps our infrastructure costs predictable.
On the storage side, we adapt to whatever fits the use case best. ClickHouse for high-throughput analytical queries, BigQuery for massive batch transformations and SQL access, Postgres for serving live entity state, and whatever else a client's stack requires. Forcing one database to do everything never works. Matching the tool to the layer saves headaches and money.
We use AI across the whole workflow: writing code, exploring data, analyzing results. On the product side, we turned that into NumiaAI, where protocols and institutions can query blockchain data in plain English. It works because the underlying data layer is clean. Without that, you're just pointing a language model at noise.
What I Learned
Blockchain data is harder than it looks. Every chain behaves differently. Documentation lies. Edge cases are most of the job. Chains fork unexpectedly, APIs return different data depending on which node you hit making events disappear and reappear. You build for that or you break.
Bootstrapping forces clarity. When you can't throw money at problems, you learn which problems actually matter. Every feature decision had to pass "does this pay for itself?" We said no to a lot of things that would have been cool but didn't make business sense. That discipline is why we're profitable.
Small teams can build infrastructure. People assume you need armies of engineers for data at this scale. We built a profitable company serving enterprise clients with a small team because we automated aggressively and optimized from day one (kudos to the team for owning every piece of software!). Fewer people means less coordination overhead, which turns out to matter more than headcount.