NumiaSQL
SQL Analytics · Data Warehouse · BigQuery · DBT
Blockchain data is only useful if people can actually access it. Most chains generate tons of on-chain activity, but that data sits in node RPC endpoints that were never designed for analytics. If you're Dune or Token Terminal trying to support a new chain, you need someone who's already done the ingestion work.
That's NumiaSQL. We ingest raw on-chain data from 30+ chains, transform it into clean tables, and make it available through BigQuery. Hundreds of terabytes of indexed blockchain data, continuously updated.
The product
We run indexers that pull raw data from each chain, handle the chaos (reorgs, schema changes, failed transactions, chain upgrades that break everything), and pipe it into BigQuery through DBT models that normalize everything into consistent tables. Each chain structures its data differently. A token transfer on Cosmos looks nothing like one on an Ethereum L2. Our models handle that translation, turning chain-specific event formats into a unified schema that works the same way regardless of which chain you're querying. Transactions, token movements, staking positions, governance activity, DeFi interactions.
Who pays depends on the chain. Sometimes a foundation like Celestia or dYdX funds the indexing for their ecosystem, and then anyone can query it. Sometimes an integrator like Dune, Artemis, Token Terminal, or Nansen pays for access to chains they need. Either way, the data ends up queryable, with governance analytics, treasury tracking, ecosystem growth metrics, and whatever custom models the use case requires. Direct SQL access, no APIs, no rate limits.
The Architecture
BigQuery as the warehouse. The ecosystem integration was the deciding factor: analysts already work in Google Sheets and Looker Studio, Python drivers work out of the box. We're plugging into workflows that already exist.
DBT handles all transformation. The models parse raw transactions into tables organized by event type: all swaps in one table, all transfers in another, staking events in another, and so on. Same structure regardless of which chain the data came from. We also build ad-hoc models when a specific use case needs something different. We partition by time and cluster by the fields people actually filter on because BigQuery charges per data scanned. The difference between a $50 query and a $0.50 query is table design, and we handle that upfront.
What I learned
We spent a lot of early energy on ingestion: getting the pipelines right, handling edge cases, making indexers reliable. Necessary work, but the actual value turned out to be in distribution. Public datasets that major platforms can depend on, private datasets that chain foundations trust for their own reporting. The ingestion is hard, but the distribution is the product.
Standardization across chains took more time than anything else. Every chain thinks it's special, and architecturally they are. But the people querying the data just want "show me transfers over $10k" to work everywhere. Naming conventions, consistent schemas, handling the cases where chains don't map cleanly to the standard model. That's the moat, not the infrastructure.
We could have built our own analytics platform and tried to get everyone to come to us. Instead, we became the data layer that feeds the platforms people already use. Dune gets our data, Artemis gets our data, Token Terminal gets our data. We're infrastructure, not a product competing with our own customers. That positioning is why the business works.