
A unified approach to extracting blockchain data at scale—handling Solana-level throughput with Python.
Here's a paradox: blockchain data is the most transparent in the world, yet one of the hardest to actually use at scale.
Every transaction, every block, every state change is recorded on an immutable ledger. Yet accessing this data reliably at enterprise scale remains a significant infrastructure challenge.
Organizations building analytics platforms, compliance systems, or blockchain intelligence tools face a common problem: how do you extract, transform, and load blockchain data across multiple networks without drowning in operational complexity?
This article explores how we built a unified data harvesting platform capable of handling everything from Ethereum to Solana-scale throughput—and why we chose Python to do it.
The blockchain data extraction landscape has been shaped significantly by open-source projects like blockchain-etl. These pioneering tools demonstrated that blockchain data could be systematically extracted and made available for analysis at scale.
However, as organizations expand to support more blockchain networks, a pattern emerges: each chain requires its own tooling. Separate repositories for ethereum-etl, bitcoin-etl, solana-etl, and others. Each with its own patterns, configurations, and operational requirements.
The operational overhead compounds quickly:

Our approach: consolidate proven concepts into a unified architecture. One codebase. Shared abstractions. Consistent patterns across all supported chains.
Blockchain data presents a unique combination of big data challenges—the "3 Vs" of blockchain:
Volume: Modern blockchains generate massive amounts of data. A single day of Ethereum activity produces gigabytes of transaction data, logs, and traces. Historical backfills for mature chains require processing terabytes.
Velocity: High-throughput chains operate at unprecedented speeds. Solana produces blocks every 400 milliseconds with thousands of transactions per second. Data pipelines must keep pace or fall irretrievably behind.
Variety: Every blockchain architecture is different. EVM chains share common patterns, but variations exist in trace formats, receipt structures, and RPC behaviors. Non-EVM chains like Solana, Stellar, or Celestia have entirely different data models.
A solution that works for one chain but fails at another isn't a solution—it's a workaround.
Solana represents the ultimate stress test for blockchain data infrastructure. With thousands of transactions per second and 400ms block times, it exposes every weakness in naive approaches:
Meeting this challenge required fundamental architectural decisions:
The principle: if the system handles Solana, it handles any blockchain.
High-throughput data challenges often lead teams down a familiar path—specialized frameworks, exotic dependencies, or exploring other languages. We took a different approach: understand the problem deeply, then solve it with fundamentals.
The bottleneck in blockchain data extraction isn't CPU-bound computation—it's I/O. Network calls to RPC nodes, disk writes, downstream system latency. Once we understood this, the path forward became clear.
The technical approach:
No exotic frameworks. Minimal dependencies. Standard library primitives, used correctly.
The result: Solana-scale throughput with a codebase that remains readable, maintainable, and accessible to any developer. Good architecture beats complexity every time.
The platform follows a straightforward three-layer pattern:

Extractors manage RPC communication with blockchain nodes—batching requests, handling rate limits, implementing retry logic.
Processors transform raw blockchain data into structured formats, normalizing diverse RPC responses into consistent schemas.
Exporters deliver processed data to various destinations: file systems for batch workloads, Kafka for streaming pipelines, cloud storage for data lake integration.
Each layer maintains clean interfaces. Adding blockchain support means implementing an Extractor and Processor. Adding output destinations means implementing an Exporter. The core pipeline remains unchanged.
The platform currently supports:
The unified architecture ensures operational knowledge transfers across chains. Understanding how the platform handles one chain provides 80% of the knowledge needed for any other chain.
Distributed systems fail. RPC nodes timeout. Networks experience issues. The question isn't whether failures occur—it's how the system responds.
Automatic retry with exponential backoff: Transient failures resolve without manual intervention.
Progress tracking and resume: Long-running extractions can be interrupted and resumed without data loss or duplication.
Atomic batch semantics: Batches succeed completely or retry entirely. No partial state to debug.
Graceful degradation: Under RPC pressure, the system throttles rather than crashes, catching up when conditions improve.
The blockchain ecosystem is evolving toward a multi-chain world where value and activity flow across dozens of networks. Data infrastructure must handle this diversity without proportional complexity.
Our platform delivers enterprise-grade blockchain data extraction without the complexity typically associated with high-throughput systems. A unified codebase supports diverse blockchain architectures with consistent operational patterns—enabling rapid expansion as the ecosystem evolves.
The multi-chain future requires multi-chain infrastructure. We built it. Interested in learning more about our blockchain data infrastructure? Get in touch—we'd love to discuss how we're solving these challenges.
This article describes high-level architectural approaches to blockchain data extraction. Specific implementation details remain proprietary.