IAMUVIN

Web3 Development

The Graph Protocol: Indexing Blockchain Data for dApps

Uvin Vindula·June 17, 2024·11 min read
Share

TL;DR

Every dApp I build needs fast, structured access to on-chain data. The Graph Protocol solves this by letting you define subgraphs — custom indexers that listen to smart contract events, transform them into queryable entities, and serve the results through a GraphQL API. In this guide, I walk through everything I have learned from building production subgraphs for DeFi protocols and NFT platforms: schema design that avoids performance traps, event handler patterns that keep your data consistent, deployment on Subgraph Studio, and the decision between hosted service and the decentralized network. Every example comes from real projects I have shipped. If you need custom blockchain indexing for your project, check out my services.


Why You Need Indexing

If you have ever tried to build a frontend for a smart contract, you know the pain. The blockchain stores state, but it does not give you a convenient way to query it. Want to show a user their transaction history? You have to scan every block. Want to display a leaderboard of top stakers? You need to aggregate data from thousands of events across the entire chain history.

The naive approach is to call eth_getLogs with a filter and iterate through the results. I tried this on my first DeFi dashboard. It worked for the first hundred users. Then the RPC calls started timing out, the frontend froze, and the UX became unusable.

Here is what calling contract state directly looks like in practice:

typescript
// The painful way — direct RPC calls for historical data
async function getUserTransactions(
  userAddress: string,
  contractAddress: string
) {
  const provider = new ethers.JsonRpcProvider(RPC_URL);
  const contract = new ethers.Contract(
    contractAddress,
    abi,
    provider
  );

  // This scans EVERY block from deployment to now
  const filter = contract.filters.Transfer(userAddress);
  const events = await contract.queryFilter(filter, 0, "latest");

  // For 50,000 blocks, this takes 30+ seconds
  // For 1,000,000 blocks, your RPC provider will throttle you
  return events.map((event) => ({
    from: event.args.from,
    to: event.args.to,
    amount: event.args.amount.toString(),
    block: event.blockNumber,
  }));
}

This scales terribly. Every new user means another full chain scan. Every page load hammers your RPC provider. And if you need data from multiple contracts — say a DEX with a router, factory, and dozens of pair contracts — the problem multiplies.

The Graph fixes this by running an indexer that processes events as they happen, stores the transformed data in a PostgreSQL database, and serves it through a GraphQL endpoint. Your frontend sends a single GraphQL query and gets exactly the data it needs in milliseconds.

The difference is architectural. Instead of querying the blockchain on every request, you query an index that has already processed and organized the data. It is the same reason Google indexes the web instead of crawling every site when you search.


How The Graph Works

The Graph is built on three core concepts: subgraphs, indexers, and curators. As a developer, your job is to write the subgraph. The network handles the rest.

A subgraph is a package that tells The Graph what to index and how to transform it. It contains three files:

my-subgraph/
  subgraph.yaml          # Manifest — what contracts and events to watch
  schema.graphql         # Entity definitions — how data is structured
  src/
    mapping.ts           # Event handlers — how events become entities

The manifest (subgraph.yaml) declares which smart contracts to index, on which network, starting from which block, and which events to listen for. Think of it as the configuration layer.

The schema (schema.graphql) defines your data model using GraphQL types. Each type becomes a table in the indexer's database. You design this based on what your frontend needs to query.

The mappings (mapping.ts) are AssemblyScript functions that run every time a matching event is emitted. They receive the event data, transform it, and save it as entities defined in your schema. AssemblyScript is a subset of TypeScript that compiles to WebAssembly — it looks like TypeScript but has some limitations I will cover later.

When you deploy a subgraph, here is what happens:

  1. An indexer node picks up your subgraph
  2. It starts processing blocks from your specified start block
  3. For each block, it checks if any of your tracked contracts emitted events you care about
  4. When it finds a match, it runs your mapping function
  5. The mapping function creates or updates entities in the database
  6. The GraphQL endpoint becomes available for queries

The indexer stays in sync with the chain, processing new blocks as they are produced. If the chain reorganizes, the indexer rolls back and reprocesses the affected blocks. This is handled automatically — you do not need to worry about reorgs in your mapping code.


Subgraph Schema Design

Schema design is where most subgraph developers make their first mistakes. A poorly designed schema leads to slow queries, missing data relationships, and mapping functions that fight the data model instead of working with it.

Here is a schema I use for indexing a DeFi staking protocol:

graphql
type StakingPool @entity {
  id: Bytes!
  token: Token!
  totalStaked: BigInt!
  rewardRate: BigInt!
  lastUpdateTime: BigInt!
  rewardPerTokenStored: BigInt!
  stakerCount: Int!
  stakes: [Stake!]! @derivedFrom(field: "pool")
  rewardEvents: [RewardEvent!]! @derivedFrom(field: "pool")
  createdAt: BigInt!
  createdAtBlock: BigInt!
}

type Token @entity {
  id: Bytes!
  symbol: String!
  name: String!
  decimals: Int!
  totalSupply: BigInt!
  pools: [StakingPool!]! @derivedFrom(field: "token")
}

type Stake @entity {
  id: Bytes!
  pool: StakingPool!
  user: User!
  amount: BigInt!
  rewardDebt: BigInt!
  depositedAt: BigInt!
  depositedAtBlock: BigInt!
}

type User @entity {
  id: Bytes!
  stakes: [Stake!]! @derivedFrom(field: "user")
  rewards: [RewardEvent!]! @derivedFrom(field: "user")
  totalStaked: BigInt!
  totalRewardsClaimed: BigInt!
  firstActiveAt: BigInt!
}

type RewardEvent @entity(immutable: true) {
  id: Bytes!
  pool: StakingPool!
  user: User!
  amount: BigInt!
  timestamp: BigInt!
  blockNumber: BigInt!
  transactionHash: Bytes!
}

A few design decisions worth explaining:

Use `Bytes!` for IDs, not `String!`. Contract addresses and transaction hashes are bytes. Storing them as Bytes is more efficient than converting to hex strings. The Graph performs faster lookups on Bytes IDs because it avoids string comparison overhead.

Use `@derivedFrom` for reverse lookups. The stakes field on StakingPool is not stored — it is derived from the pool field on Stake entities. This avoids maintaining redundant arrays and keeps your mappings simpler. The GraphQL layer resolves these at query time.

Mark event logs as `@entity(immutable: true)`. The RewardEvent type represents historical facts that never change. Marking it immutable tells the indexer it never needs to update these records, which significantly improves indexing performance. Any entity that represents a log or historical record should be immutable.

Store timestamps and block numbers. Every entity that represents a point-in-time action should include timestamp and blockNumber. Your frontend will need these for sorting, filtering by date range, and displaying human-readable times.

Aggregate counters on parent entities. The stakerCount on StakingPool and totalRewardsClaimed on User are denormalized counters. You could compute these from the child entities at query time, but that is expensive for large datasets. Maintaining counters in your mappings is more work but makes queries fast.

Here is a schema pattern I use for NFT marketplaces:

graphql
type Collection @entity {
  id: Bytes!
  name: String!
  symbol: String!
  totalSupply: BigInt!
  floorPrice: BigInt!
  volumeTraded: BigInt!
  tokens: [Token!]! @derivedFrom(field: "collection")
  listings: [Listing!]! @derivedFrom(field: "collection")
}

type Token @entity {
  id: ID!
  collection: Collection!
  tokenId: BigInt!
  owner: User!
  metadataURI: String!
  currentListing: Listing
  transfers: [Transfer!]! @derivedFrom(field: "token")
  mintedAt: BigInt!
}

type Listing @entity {
  id: Bytes!
  collection: Collection!
  token: Token!
  seller: User!
  price: BigInt!
  status: ListingStatus!
  createdAt: BigInt!
  soldAt: BigInt
  buyer: User
}

enum ListingStatus {
  Active
  Sold
  Cancelled
}

type Transfer @entity(immutable: true) {
  id: Bytes!
  token: Token!
  from: User!
  to: User!
  timestamp: BigInt!
  blockNumber: BigInt!
  transactionHash: Bytes!
}

type User @entity {
  id: Bytes!
  tokens: [Token!]! @derivedFrom(field: "owner")
  sales: [Listing!]! @derivedFrom(field: "seller")
  purchases: [Listing!]! @derivedFrom(field: "buyer")
}

Notice the currentListing field on Token — it is a nullable reference to the active listing. This lets your frontend query a token and immediately know if it is for sale without fetching all listings and filtering. Small design decisions like this make the difference between a responsive dApp and one that frustrates users.


Event Handlers

Event handlers are the heart of your subgraph. They run every time a contract emits an event you are tracking, and they transform raw event data into the entities defined in your schema.

Here is the manifest that tells The Graph which events to watch:

yaml
specVersion: 1.0.0
indexerHints:
  prune: auto
schema:
  file: ./schema.graphql
dataSources:
  - kind: ethereum
    name: StakingPool
    network: arbitrum-one
    source:
      address: "0x1234567890abcdef1234567890abcdef12345678"
      abi: StakingPool
      startBlock: 150000000
    mapping:
      kind: ethereum/events
      apiVersion: 0.0.7
      language: wasm/assemblyscript
      entities:
        - StakingPool
        - Stake
        - User
        - RewardEvent
      abis:
        - name: StakingPool
          file: ./abis/StakingPool.json
        - name: ERC20
          file: ./abis/ERC20.json
      eventHandlers:
        - event: Staked(indexed address,uint256)
          handler: handleStaked
        - event: Withdrawn(indexed address,uint256)
          handler: handleWithdrawn
        - event: RewardPaid(indexed address,uint256)
          handler: handleRewardPaid
      file: ./src/staking.ts

The startBlock is critical. Set it to the block where your contract was deployed. If you set it to zero, the indexer will scan every block from genesis, which wastes hours of indexing time on blocks that contain nothing relevant.

Now here are the mapping functions:

typescript
import {
  Staked,
  Withdrawn,
  RewardPaid,
} from "../generated/StakingPool/StakingPool";
import {
  StakingPool,
  Stake,
  User,
  RewardEvent,
} from "../generated/schema";
import { BigInt, Bytes } from "@graphprotocol/graph-ts";

const ZERO = BigInt.fromI32(0);

function getOrCreateUser(address: Bytes): User {
  let user = User.load(address);
  if (user === null) {
    user = new User(address);
    user.totalStaked = ZERO;
    user.totalRewardsClaimed = ZERO;
    user.firstActiveAt = ZERO;
    user.save();
  }
  return user;
}

function getOrCreateStake(
  poolAddress: Bytes,
  userAddress: Bytes
): Stake {
  let id = poolAddress.concat(userAddress);
  let stake = Stake.load(id);
  if (stake === null) {
    stake = new Stake(id);
    stake.pool = poolAddress;
    stake.user = userAddress;
    stake.amount = ZERO;
    stake.rewardDebt = ZERO;
    stake.depositedAt = ZERO;
    stake.depositedAtBlock = ZERO;
  }
  return stake;
}

export function handleStaked(event: Staked): void {
  let poolAddress = event.address;
  let userAddress = Bytes.fromHexString(
    event.params.user.toHexString()
  );
  let amount = event.params.amount;

  // Update user
  let user = getOrCreateUser(userAddress);
  user.totalStaked = user.totalStaked.plus(amount);
  if (user.firstActiveAt.equals(ZERO)) {
    user.firstActiveAt = event.block.timestamp;
  }
  user.save();

  // Update stake position
  let stake = getOrCreateStake(poolAddress, userAddress);
  let isNewStaker = stake.amount.equals(ZERO);
  stake.amount = stake.amount.plus(amount);
  stake.depositedAt = event.block.timestamp;
  stake.depositedAtBlock = event.block.number;
  stake.save();

  // Update pool
  let pool = StakingPool.load(poolAddress);
  if (pool !== null) {
    pool.totalStaked = pool.totalStaked.plus(amount);
    if (isNewStaker) {
      pool.stakerCount = pool.stakerCount + 1;
    }
    pool.lastUpdateTime = event.block.timestamp;
    pool.save();
  }
}

export function handleWithdrawn(event: Withdrawn): void {
  let poolAddress = event.address;
  let userAddress = Bytes.fromHexString(
    event.params.user.toHexString()
  );
  let amount = event.params.amount;

  // Update user
  let user = getOrCreateUser(userAddress);
  user.totalStaked = user.totalStaked.minus(amount);
  user.save();

  // Update stake position
  let stake = getOrCreateStake(poolAddress, userAddress);
  stake.amount = stake.amount.minus(amount);
  stake.save();

  // Update pool
  let pool = StakingPool.load(poolAddress);
  if (pool !== null) {
    pool.totalStaked = pool.totalStaked.minus(amount);
    if (stake.amount.equals(ZERO)) {
      pool.stakerCount = pool.stakerCount - 1;
    }
    pool.lastUpdateTime = event.block.timestamp;
    pool.save();
  }
}

export function handleRewardPaid(event: RewardPaid): void {
  let userAddress = Bytes.fromHexString(
    event.params.user.toHexString()
  );
  let amount = event.params.reward;

  // Create immutable reward event
  let id = event.transaction.hash.concatI32(
    event.logIndex.toI32()
  );
  let rewardEvent = new RewardEvent(id);
  rewardEvent.pool = event.address;
  rewardEvent.user = userAddress;
  rewardEvent.amount = amount;
  rewardEvent.timestamp = event.block.timestamp;
  rewardEvent.blockNumber = event.block.number;
  rewardEvent.transactionHash = event.transaction.hash;
  rewardEvent.save();

  // Update user totals
  let user = getOrCreateUser(userAddress);
  user.totalRewardsClaimed =
    user.totalRewardsClaimed.plus(amount);
  user.save();
}

A few patterns I follow in every subgraph:

Use `getOrCreate` helper functions. The first time you encounter a user or entity, you need to initialize all its fields. Wrapping this in a function avoids duplicating initialization logic across handlers.

Build composite IDs with `concat`. The stake entity is unique per pool-user pair, so the ID is poolAddress.concat(userAddress). For event entities, use transaction.hash.concatI32(logIndex.toI32()) to guarantee uniqueness even when multiple events fire in the same transaction.

Update aggregates in every handler. When a user stakes, you update the Stake, the User, and the StakingPool. It is tempting to compute aggregates on the fly, but that makes queries slower. Keep your counters current in the mappings.

Never use `store.remove()` unless you genuinely need to delete data. In most cases, setting a status field or zeroing out a balance is better than deleting the entity. Deleted entities disappear from query results, which means you lose historical context.


Deploying Your Subgraph

Deployment happens through the Graph CLI. Here is the workflow I follow for every new subgraph:

bash
# Install the Graph CLI globally
npm install -g @graphprotocol/graph-cli

# Initialize a new subgraph project
graph init --studio my-staking-subgraph

# Generate types from your schema and ABIs
graph codegen

# Build the subgraph (compiles AssemblyScript to WASM)
graph build

# Authenticate with Subgraph Studio
graph auth --studio YOUR_DEPLOY_KEY

# Deploy to Subgraph Studio
graph deploy --studio my-staking-subgraph

The graph codegen step is essential. It reads your schema.graphql and ABI files, then generates TypeScript types for your entities and event parameters. If you skip this step, your mapping functions will not have type-safe access to event data. I run graph codegen every time I change the schema or update an ABI.

The graph build step compiles your AssemblyScript mappings to WebAssembly. This is where you will catch most errors — AssemblyScript is strict about types and does not support all TypeScript features. Common issues include:

  • No closures or arrow functions in callbacks
  • No optional chaining (?.)
  • Nullable types require explicit null checks
  • No Map or Set — use store operations instead
  • BigInt arithmetic instead of native number operations

When the build succeeds, graph deploy pushes your subgraph to Subgraph Studio. The studio gives you a playground to test queries, monitors indexing progress, and shows you if the indexer encounters errors.

After deployment, your subgraph starts syncing from the startBlock. Depending on how many blocks need processing and how complex your handlers are, initial sync can take anywhere from minutes to hours. Monitor the sync progress in Subgraph Studio — it shows the current block, target block, and estimated time remaining.


Querying with GraphQL

Once your subgraph is synced, you get a GraphQL endpoint. Here is how I query it from a Next.js frontend:

typescript
const GRAPH_URL =
  "https://gateway.thegraph.com/api/YOUR_API_KEY/subgraphs/id/YOUR_SUBGRAPH_ID";

interface StakePosition {
  id: string;
  amount: string;
  depositedAt: string;
  pool: {
    id: string;
    totalStaked: string;
    rewardRate: string;
  };
}

interface UserStakesResponse {
  user: {
    stakes: StakePosition[];
    totalStaked: string;
    totalRewardsClaimed: string;
  } | null;
}

async function getUserStakes(
  address: string
): Promise<UserStakesResponse> {
  const query = `
    query GetUserStakes($user: Bytes!) {
      user(id: $user) {
        totalStaked
        totalRewardsClaimed
        stakes(
          where: { amount_gt: "0" }
          orderBy: depositedAt
          orderDirection: desc
        ) {
          id
          amount
          depositedAt
          pool {
            id
            totalStaked
            rewardRate
          }
        }
      }
    }
  `;

  const response = await fetch(GRAPH_URL, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      query,
      variables: { user: address.toLowerCase() },
    }),
  });

  const { data } = await response.json();
  return data;
}

For React components, I wrap subgraph queries in React Query for caching and automatic revalidation:

typescript
import { useQuery } from "@tanstack/react-query";

function useUserStakes(address: string | undefined) {
  return useQuery({
    queryKey: ["userStakes", address],
    queryFn: () => getUserStakes(address!),
    enabled: !!address,
    staleTime: 15_000, // Refetch every 15 seconds
    select: (data) => data.user,
  });
}

function StakingDashboard({ address }: { address: string }) {
  const { data: user, isLoading, error } = useUserStakes(address);

  if (isLoading) return <StakingSkeleton />;
  if (error) return <ErrorDisplay error={error} />;
  if (!user) return <NoStakesFound />;

  return (
    <div>
      <StakingSummary
        totalStaked={user.totalStaked}
        totalRewards={user.totalRewardsClaimed}
      />
      {user.stakes.map((stake) => (
        <StakeCard key={stake.id} stake={stake} />
      ))}
    </div>
  );
}

GraphQL gives you powerful filtering and sorting out of the box. Here are queries I use regularly:

graphql
# Get top staking pools by TVL
query TopPools {
  stakingPools(
    first: 10
    orderBy: totalStaked
    orderDirection: desc
    where: { stakerCount_gte: 5 }
  ) {
    id
    totalStaked
    rewardRate
    stakerCount
  }
}

# Get recent reward events with pagination
query RecentRewards($skip: Int!, $first: Int!) {
  rewardEvents(
    first: $first
    skip: $skip
    orderBy: timestamp
    orderDirection: desc
  ) {
    id
    amount
    timestamp
    user {
      id
    }
    pool {
      id
    }
  }
}

# Full-text search on token names (requires fulltext config)
query SearchTokens($search: String!) {
  tokenSearch(text: $search, first: 10) {
    id
    name
    symbol
  }
}

One limitation to know: The Graph caps skip at 5000. For deep pagination, use cursor-based pagination with the entity ID:

graphql
query PaginatedStakes($lastId: Bytes!) {
  stakes(
    first: 100
    where: { id_gt: $lastId, amount_gt: "0" }
    orderBy: id
    orderDirection: asc
  ) {
    id
    amount
    user {
      id
    }
  }
}

This pattern scales to millions of entities without the performance cliff that skip creates.


Hosted vs Decentralized Network

The Graph has two deployment targets, and the distinction matters for production applications.

The Hosted Service was The Graph's original offering. It was free, centrally managed by Edge & Node, and supported many chains. The hosted service has been sunset for most networks as of 2024 — Ethereum mainnet subgraphs were migrated to the decentralized network. Some chains still have hosted service support during the transition period, but you should plan for the decentralized network.

The Decentralized Network is the production-grade option. Indexers stake GRT tokens and compete to serve your subgraph. You pay per query using GRT or through the gateway's billing. The benefits are significant:

  • Redundancy. Multiple indexers serve your subgraph. If one goes down, queries automatically route to another.
  • Censorship resistance. No single entity controls who can deploy or query subgraphs.
  • Economic incentives. Indexers are financially motivated to keep your subgraph synced and performant.
  • SLA guarantees. The network economically penalizes indexers that serve stale data.

The cost model works like this: you create an API key in Subgraph Studio, fund it with GRT, and pay per query. As of 2024, query costs are fractions of a cent. For a dApp with moderate traffic, expect to spend $50-200 per month on indexing queries. High-traffic protocols spend more, but the cost is predictable and scales linearly.

For development and testing, Subgraph Studio provides a free query endpoint with rate limiting. I use this for local development and staging environments, then switch to a funded API key for production.


Subgraph Studio

Subgraph Studio is your command center for managing subgraphs on the decentralized network. Here is what I use it for in every project:

Deployment management. Each subgraph gets a unique ID and versioned deployments. You can deploy new versions without downtime — the studio maintains the old version until the new one is fully synced. This is critical for production. Never deploy a schema change that breaks your frontend queries until the new version has caught up to the chain head.

Query playground. The built-in GraphQL playground lets you test queries against live data before writing frontend code. I spend time here validating that my schema design supports the queries my UI needs. If a query requires complex filtering that the schema does not support well, I go back and restructure the schema before the subgraph is in production.

Indexing monitoring. The studio shows real-time sync progress, error logs, and handler execution metrics. When a mapping function fails, the studio shows you exactly which block and transaction caused the error. This is invaluable for debugging.

Signal and curation. On the decentralized network, you signal GRT on your subgraph to attract indexers. More signal means more indexers will pick up your subgraph, which improves query reliability and speed. For a production dApp, I recommend signaling at least 10,000 GRT to ensure consistent indexer coverage.

API key management. Create separate API keys for development, staging, and production. Set rate limits and authorized domains to prevent abuse. Monitor query volume and costs per key.

Here is a typical workflow for managing subgraph versions:

bash
# Version 1 is in production, serving queries

# Make schema changes for version 2
# Update schema.graphql and mapping handlers

# Generate new types
graph codegen

# Build and test locally
graph build

# Deploy as a new version
graph deploy --studio my-staking-subgraph --version-label v2.0.0

# Monitor sync progress in Studio
# Wait for v2 to reach chain head

# Update frontend to use new query endpoint
# The old version continues serving until you deprecate it

Performance Tips

After building a dozen production subgraphs, here are the performance patterns that make the biggest difference:

Set the correct `startBlock`. This is the single highest-impact optimization. Setting startBlock to your contract's deployment block instead of zero can save hours of initial indexing time. For proxy contracts that get upgraded, use the block where the current implementation was set.

Use `@entity(immutable: true)` for event logs. Any entity that represents a historical event — transfers, swaps, reward claims — should be marked immutable. The indexer skips update checks for immutable entities, which speeds up both indexing and queries.

Avoid loading entities you do not modify. Every Entity.load() call is a database read. If your handler emits a Transfer event and you only need to create the transfer record, do not load the sender and receiver entities unless you actually update their fields.

typescript
// Expensive — loads entities unnecessarily
export function handleTransfer(event: Transfer): void {
  let from = User.load(event.params.from); // DB read
  let to = User.load(event.params.to); // DB read
  // ... but we never modify from or to

  let transfer = new TransferEvent(
    event.transaction.hash.concatI32(event.logIndex.toI32())
  );
  transfer.from = event.params.from;
  transfer.to = event.params.to;
  transfer.save();
}

// Efficient — only creates what we need
export function handleTransfer(event: Transfer): void {
  let transfer = new TransferEvent(
    event.transaction.hash.concatI32(event.logIndex.toI32())
  );
  transfer.from = event.params.from;
  transfer.to = event.params.to;
  transfer.save();
}

Batch entity saves. If a single event handler creates or updates multiple entities, the order of .save() calls matters. Save child entities before parent entities that reference them. And if you load an entity, modify it, and save it — do not load it again later in the same handler. Keep a reference to the in-memory object.

Use `Bytes` IDs over `String` IDs. Bytes comparisons are faster than string comparisons in the indexer database. For entities keyed by address or transaction hash, always use Bytes! as the ID type.

Minimize call handlers. The Graph supports callHandlers that trigger on function calls, not just events. But call handlers are significantly slower than event handlers because they require trace-level indexing. Prefer event handlers whenever possible. If your contract does not emit an event for a state change you need to track, consider adding the event to the contract rather than using a call handler.

Use `indexerHints` for pruning. In your manifest, set indexerHints.prune: auto to let the indexer prune historical data that is no longer needed. This reduces storage requirements and speeds up indexing. Only use this for subgraphs where you do not need the full history of every entity — if you need historical queries, keep pruning disabled.

Time-travel queries. The Graph supports querying entity state at a specific block number using the block parameter. This is powerful for building historical dashboards but adds indexer overhead. Only enable time-travel for subgraphs that genuinely need it.

graphql
# Query pool state at a specific block
query HistoricalTVL {
  stakingPool(
    id: "0x1234..."
    block: { number: 18000000 }
  ) {
    totalStaked
    stakerCount
  }
}

Alternative Indexing Solutions

The Graph is the most established protocol, but it is not the only option. Here is how alternatives compare based on my experience:

Ponder is a newer framework that lets you write indexing logic in TypeScript (not AssemblyScript). The developer experience is better — you get full TypeScript features, hot module reloading during development, and a simpler mental model. Ponder is self-hosted, which means you run your own indexer. I have used it for projects where the client wants to own their infrastructure and does not need the decentralized network's redundancy.

Envio focuses on speed. Their HyperIndex product claims 100x faster indexing than The Graph for initial sync. It supports multi-chain indexing natively, which is useful for protocols deployed across several L2s. Envio is hosted — they run the infrastructure, and you pay for the service.

Goldsky provides managed subgraph hosting with Mirror pipelines that stream indexed data into your own database (PostgreSQL, Kafka, or webhooks). This is ideal if you need indexed blockchain data in your existing data infrastructure. I have used Goldsky for a project that needed subgraph data joined with off-chain data in a Supabase database.

Custom indexers using ethers.js or viem to listen to events and write to your own database. This gives you complete control but means you own all the infrastructure, handle chain reorgs, manage database migrations, and ensure uptime. I only recommend this for teams with dedicated DevOps resources and specific requirements that no indexing protocol supports.

Here is my decision framework:

RequirementSolution
Decentralized, trustless indexingThe Graph (Decentralized Network)
Fast development iterationPonder
Multi-chain with fast syncEnvio
Streaming to your own DBGoldsky
Full infrastructure controlCustom indexer
Budget-constrained prototypeThe Graph (Subgraph Studio free tier)

For most projects, I start with The Graph because the ecosystem tooling is mature, the GraphQL API is familiar to frontend developers, and the decentralized network provides production-grade reliability. If a project has specific needs that push toward an alternative, I evaluate on a case-by-case basis.


Key Takeaways

  1. Direct RPC calls do not scale. If your dApp reads historical data from the blockchain, you need an indexing layer. The Graph turns event-driven blockchain data into a queryable API.
  1. Schema design determines performance. Use Bytes IDs, @derivedFrom for reverse lookups, @entity(immutable: true) for event logs, and denormalized counters for frequently queried aggregates.
  1. Event handlers must be deterministic. The same input must always produce the same output. Use getOrCreate patterns, composite IDs with concat, and always update aggregates when child entities change.
  1. Set `startBlock` correctly. This single configuration change can save hours of indexing time. Use the block where your contract was deployed.
  1. Subgraph Studio is your deployment hub. Use versioned deployments, monitor sync progress, and test queries in the playground before writing frontend code.
  1. Cursor-based pagination over `skip`. The skip parameter caps at 5000 and gets slower with depth. Use id_gt for deep pagination across large datasets.
  1. The decentralized network is production-grade. Multiple indexers, economic incentives for uptime, and automatic failover. Fund your API key with GRT and use separate keys for dev and production.
  1. Alternatives exist for specific needs. Ponder for better DX, Envio for speed, Goldsky for streaming to your own database. Evaluate based on your project's requirements, not hype.

Every DeFi dashboard and NFT platform I build uses some form of blockchain indexing. The Graph has been my default for two years because it works reliably, the ecosystem is mature, and the decentralized network gives my clients confidence that their data layer will not have a single point of failure. If you are building a dApp that reads on-chain data — and almost every dApp does — invest the time to learn subgraph development. It will save you from the RPC-scanning pain I went through on my first project.

Need help building a production subgraph for your protocol? I have designed and deployed indexing solutions for DeFi protocols, NFT marketplaces, and governance systems across Ethereum, Arbitrum, and Base. Check out my services to discuss your project.


*Written by Uvin Vindula — Web3 and AI engineer building production-grade decentralized applications from Sri Lanka and the UK. I write about smart contract development, DeFi protocols, and the tools that make blockchain data accessible. Follow my work at @IAMUVIN or reach out at contact@uvin.lk.*

Working on a Web3 or AI project?

Share
Uvin Vindula

Uvin Vindula

Web3 and AI engineer based in Sri Lanka and the UK. Author of The Rise of Bitcoin. Director of Blockchain and Software Solutions at Terra Labz. Founder of uvin.lk — Sri Lanka's Bitcoin education platform with 10,000+ learners.