Subsquid achieves rapid blockchain data indexing through a combination of decentralized storage, efficient retrieval mechanisms, and specialized tools. The platform handles Web3 data challenges through innovative architectural choices.
Decentralized Data Lake
Subsquid operates a peer-to-peer network for storing blockchain data. The Subsquid Network is infinitely scalable. New nodes can join the network without any barrier for entry. Data gets compressed and distributed across network nodes, with each responsible for specific block ranges.
Data Retrieval Process
The system follows four steps to access data:
- Archive Height Determination - Locating the data position
- Worker ID Acquisition - Identifying the start block’s corresponding worker
- Worker Querying - Requesting required data from the identified worker
- Iterative Retrieval - Repeating until all desired blocks are obtained
This approach differs fundamentally from traditional RPC methods by enabling retrieval of multiple blocks simultaneously rather than one at a time.
Granular API
Users can request specific data fields from blocks rather than downloading entire blocks. For example, requesting only transaction hashes and gas usage for particular addresses significantly reduces processing overhead.
Squid SDK Features
The SDK provides additional capabilities:
- Real-time indexing of new blocks
- Automatic blockchain reorganization handling
- Built-in GraphQL API integration
- Multiple storage format options (Parquet, CSV)
- RPC endpoint integration for current data access
Performance Impact
This technology stands out for its ability to index blockchain data up to 1000 times faster than traditional methods like subgraphs.