architecture distributed-systems blockchain

Event-Driven Architecture Wasn't a Buzzword — It Was a Fix

How we moved wallet monitoring from batch processing to event-driven architecture — and why the architectural shift matters beyond the performance numbers.

Jun 10, 2024 6 min read

The Old System

Every 5 minutes, a cron job woke up, pulled the full list of monitored wallets from the database, and queried each one for new transactions. At 500 wallets, this was fine. At 15,000 wallets, it was a disaster.

// The old approach -- don't do this
class WalletPollingService {
  async pollAllWallets(): Promise<void> {
    const wallets = await this.db.query('SELECT * FROM monitored_wallets WHERE active = true');

    for (const wallet of wallets) {
      try {
        const latestBlock = await this.provider.getBlockNumber();
        const txns = await this.provider.getTransactions(
          wallet.address,
          wallet.last_checked_block,
          latestBlock
        );

        for (const tx of txns) {
          await this.processTransaction(wallet, tx);
        }

        await this.db.query(
          'UPDATE monitored_wallets SET last_checked_block = $1 WHERE id = $2',
          [latestBlock, wallet.id]
        );
      } catch (err) {
        logger.error('Poll failed', { wallet: wallet.address, error: err.message });
      }
    }
  }
}

// Triggered every 5 minutes via cron

The problems were compounding. Each polling cycle took longer as the wallet count grew. At 15,000 wallets, a single cycle took over four minutes — dangerously close to the five-minute interval. We were making 15,000+ RPC calls per cycle, burning through rate limits and racking up provider costs. Worst of all, if a high-value transaction happened at minute one, the user wouldn’t know until minute five. In crypto, 5 minutes is an eternity.

We also had a subtle correctness bug: if the polling cycle took longer than five minutes, the next cycle would start before the previous one finished, leading to duplicate transaction processing. We patched it with a mutex lock, but that just meant cycles got skipped entirely under load.

The New System

We replaced the polling architecture with webhook-driven event processing. Instead of asking every wallet “did anything happen?”, we told Alchemy “tell us when something happens.” The inversion is simple in concept but fundamentally changes the system’s performance characteristics.

// Event-driven approach
interface WebhookEvent {
  webhookId: string;
  type: 'ADDRESS_ACTIVITY';
  event: {
    network: string;
    activity: Array<{
      fromAddress: string;
      toAddress: string;
      value: number;
      asset: string;
      hash: string;
      blockNum: string;
      category: 'external' | 'internal' | 'token';
    }>;
  };
}

class WalletEventProcessor {
  private eventBus: EventEmitter;

  async handleWebhook(payload: WebhookEvent, signature: string): Promise<void> {
    if (!this.verifySignature(payload, signature)) {
      throw new UnauthorizedError('Invalid webhook signature');
    }

    const activities = payload.event.activity;

    for (const activity of activities) {
      const event: WalletEvent = {
        id: `${activity.hash}-${activity.fromAddress}-${activity.toAddress}`,
        hash: activity.hash,
        from: activity.fromAddress,
        to: activity.toAddress,
        value: activity.value,
        asset: activity.asset,
        network: payload.event.network,
        category: activity.category,
        blockNumber: parseInt(activity.blockNum, 16),
        receivedAt: new Date(),
      };

      // Deduplicate -- webhooks can fire more than once
      const exists = await this.cache.get(`tx:${event.id}`);
      if (exists) continue;
      await this.cache.set(`tx:${event.id}`, '1', 'EX', 86400);

      // Process immediately
      await this.processEvent(event);
    }
  }

  private async processEvent(event: WalletEvent): Promise<void> {
    // Find all monitoring rules that match this event
    const rules = await this.db.query(
      `SELECT * FROM monitoring_rules
       WHERE (watch_address = $1 OR watch_address = $2) AND active = true`,
      [event.from, event.to]
    );

    for (const rule of rules) {
      if (this.matchesRule(event, rule)) {
        await this.notify(rule, event);
      }
    }
  }

  private matchesRule(event: WalletEvent, rule: MonitoringRule): boolean {
    if (rule.min_value && event.value < rule.min_value) return false;
    if (rule.asset_filter && event.asset !== rule.asset_filter) return false;
    if (rule.direction === 'incoming' && event.to !== rule.watch_address) return false;
    if (rule.direction === 'outgoing' && event.from !== rule.watch_address) return false;
    return true;
  }

  private verifySignature(payload: WebhookEvent, signature: string): boolean {
    const hmac = crypto.createHmac('sha256', process.env.ALCHEMY_WEBHOOK_SECRET!);
    hmac.update(JSON.stringify(payload));
    return hmac.digest('hex') === signature;
  }
}

The results were immediate. Notification latency dropped from an average of 2.5 minutes to under 2 seconds. RPC calls dropped by 98%. The system that used to strain under 15,000 wallets now handles 50,000 without breaking a sweat — because it does zero work when nothing happens.

Why the Architectural Shift Matters

The performance improvements are obvious, but the real value is structural. The old system scaled with the number of wallets. The new system scales with the number of events. Those are very different growth curves.

Most wallets are quiet most of the time. In our dataset, 80% of monitored wallets see fewer than two transactions per day. Under the polling model, we were burning 15,000 API calls every five minutes to discover that 12,000 wallets had nothing new. Under the event model, those 12,000 wallets cost us exactly nothing.

This also changed how we think about features. Adding a new monitoring rule in the old system meant adding another check inside the polling loop, making it slower. In the event-driven system, adding a rule is just adding a row to the database. The event processing logic evaluates it when — and only when — relevant activity occurs.

The Tradeoffs

Event-driven architecture isn’t free. We traded one set of problems for another.

Webhook delivery isn’t guaranteed. Alchemy has retry logic, but we still needed a reconciliation job that runs hourly to catch any missed events. It processes far fewer wallets than the old system — just the ones that haven’t received an event in a suspiciously long time — but it exists because you cannot fully trust external delivery guarantees.

Debugging is harder. A cron job that runs sequentially is easy to reason about. An event-driven system that processes asynchronous webhooks, deduplicates, and fans out to notification channels has more moving parts. We invested in structured logging and distributed tracing to compensate.

Testing required a different approach. We built a webhook simulator that could replay production event patterns against staging. Without it, we would have shipped bugs around edge cases like reorged blocks and zero-value internal transactions.

Lessons

The biggest lesson: don’t poll for data when you can subscribe to it. This sounds obvious, but I’ve seen polling architectures survive far longer than they should because they’re simple and familiar. The simplicity is real, but it becomes a trap. By the time polling is visibly broken, you’ve already accumulated a lot of logic that assumes a synchronous, batch-oriented world.

Start event-driven when you can. Retrofitting it is harder than building it from the beginning. The webhook-based system took 3 weeks to build and deploy, but 2 of those weeks were spent untangling assumptions from the polling architecture — not writing new code.

Interested in this kind of work? Let's talk

Discussion