Overview
This blog post explores the concept of data indexing, tracing its historical roots and explaining its critical importance for Web3 development.
Historical Context
Indexing traces back to the 13th century, with Robert Grosseteste credited for creating the first index — a “Tabula” with 440 topics designed to organize knowledge efficiently. The practice gained prominence after the printing press enabled standardized pagination in books.
The evolution continued through Paul Otlet’s work in the 1950s, who envisioned knowledge projected on individual screens allowing users to access information from their armchair — an early prediction of modern digital information systems.
Web2 Indexing
In the contemporary context, indexing enables major online services. Indexing facilitates a large chunk of our online interactions by powering everything from Netflix’s localized content delivery to email and social media functionality. Google’s dominance exemplifies how indexing revolutionized web usability, making the company’s name synonymous with searching.
Blockchain’s Challenge
Blockchains present unique indexing difficulties. Bitcoin’s ledger has grown to approximately 0.5 terabytes, while Ethereum’s state size approaches 1 terabyte. Developers face significant obstacles: what you can easily do in a SQL database is impossible in blockchain, if you don’t index.
Solutions: Indexing Protocols
The Graph pioneered decentralized indexing through subgraphs — schemas defining how blockchain data is retrieved and structured. Subsquid emerged as an alternative, offering a modular approach combining a decentralized data lake with a query engine for developers requiring multi-chain data access and custom schemas.