Observability

Yew Backend implements a comprehensive observability stack using self-hosted open-source solutions. The three pillars of observability are:

Logging - Structured logs aggregated in Loki
Metrics - Time-series metrics collected by Prometheus
Tracing - Distributed traces stored in Tempo

All observability features can be individually enabled or disabled via environment variables.

Architecture Overview

The Three Pillars

┌────────────────────────────────────────────────────────┐
│                      Yew Backend                       │
│                                                        │
│  ┌────────────┐    ┌────────────┐      ┌────────────┐  │
│  │  Logging   │    │  Metrics   │      │  Tracing   │  │
│  │   (Push)   │    │   (Pull)   │      │   (Push)   │  │
│  └─────┬──────┘    └──────▲─────┘      └─────┬──────┘  │
│        │                  │                  │         │
└────────┼──────────────────┼──────────────────┼─────────┘
         │                  │                  │
     HTTP POST          HTTP GET           OTLP/HTTP
         │                  │                  │
         ▼                  │                  ▼
    ┌─────────┐        ┌────┴─────┐       ┌─────────┐
    │  Loki   │        │Prometheus│       │  Tempo  │
    └─────────┘        └──────────┘       └─────────┘
         │                  │                  │
         └──────────────────┴──────────────────┘
                            │
                     ┌──────▼──────┐
                     │   Grafana   │
                     │ (Dashboard) │
                     └─────────────┘

Push vs Pull Models

Push Model (Logging & Tracing)

Backend actively sends data to the aggregation service
Loki: Backend sends logs via HTTP POST to /loki/api/v1/push
Tempo: Backend sends traces via OTLP to /v1/traces
Advantages: Real-time delivery, works behind firewalls
Trade-off: Backend must handle network failures gracefully

Pull Model (Metrics)

Aggregation service scrapes data from the backend
Prometheus: Scrapes /metrics endpoint on a schedule (typically 15s)
Advantages: Simple backend implementation, Prometheus controls rate
Trade-off: Requires backend to be reachable by Prometheus

Environment Variables

All observability features are controlled via environment variables defined in backend/src/common/config/secrets.ts.

Feature Flags

# Enable/disable each observability component independently
LOGGING_ENABLED=true    # Enable structured logging to Loki
TRACING_ENABLED=false   # Enable distributed tracing to Tempo
METRICS_ENABLED=false   # Enable metrics collection for Prometheus

Service Configuration

# Application metadata
SERVICE_NAME=yew-backend
NODE_ENV=development
HOSTNAME=localhost

Observability Endpoints

# Loki - Log aggregation (Push Model)
# Backend pushes logs to this URL
LOKI_URL=http://localhost:3100

# Tempo - Distributed tracing (Push Model)
# Backend pushes traces via OTLP to this URL
TEMPO_URL=http://localhost:4318

# Prometheus - Metrics (Pull Model)
# Prometheus scrapes the /metrics endpoint
# No URL needed - configure in prometheus.yml instead

Local Development Setup

Prerequisites

You need to run Loki, Prometheus, Tempo, and Grafana locally. The recommended approach is using Docker Compose.

Docker Compose Configuration

Create a docker-compose.observability.yml file:

version: '3.8'

services:
  loki:
    image: grafana/loki:latest
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/local-config.yaml
    volumes:
      - loki-data:/loki

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'

  tempo:
    image: grafana/tempo:latest
    ports:
      - "4318:4318"  # OTLP HTTP
      - "3200:3200"  # Tempo API
    command: -config.file=/etc/tempo/tempo.yaml
    volumes:
      - ./tempo.yaml:/etc/tempo/tempo.yaml
      - tempo-data:/tmp/tempo

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
    volumes:
      - grafana-data:/var/lib/grafana

volumes:
  loki-data:
  prometheus-data:
  tempo-data:
  grafana-data:

Prometheus Configuration

Create prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'yew-backend'
    static_configs:
      - targets: ['host.docker.internal:8443']
    metrics_path: '/metrics'

Tempo Configuration

Create tempo.yaml:

server:
  http_listen_port: 3200

distributor:
  receivers:
    otlp:
      protocols:
        http:
          endpoint: 0.0.0.0:4318

storage:
  trace:
    backend: local
    local:
      path: /tmp/tempo/traces

query_frontend:
  search:
    enabled: true

Starting the Stack

docker-compose -f docker-compose.observability.yml up -d

Accessing the Services

Grafana: http://localhost:3000
Prometheus: http://localhost:9090
Loki: http://localhost:3100
Tempo: http://localhost:3200

Using Observability Services

Logging

The CustomLoggerService provides structured logging with context.

import { CustomLoggerService, LogContext } from './common/logger/logger.service';

export class MyService {
  constructor(private readonly logger: CustomLoggerService) {}

  async doSomething() {
    const context: LogContext = {
      module: 'MyService',
      method: 'doSomething',
      userId: '123',
    };

    this.logger.logWithContext('Starting operation', context);

    try {
      // ... your code
      this.logger.logWithContext('Operation completed', context);
    } catch (error) {
      this.logger.errorWithContext(
        'Operation failed',
        error.stack,
        context
      );
    }
  }
}

Log Levels:

logWithContext() - Info level
errorWithContext() - Error level
warnWithContext() - Warning level
debugWithContext() - Debug level

Metrics

The CustomMetricsService provides Prometheus-compatible metrics.

import { CustomMetricsService } from './common/metrics/metrics.service';

export class MyService {
  constructor(private readonly metrics: CustomMetricsService) {}

  async processItem(itemType: string) {
    // Increment a counter
    this.metrics.incrementCounter('items_processed_total', {
      type: itemType,
      status: 'success',
    });

    // Set a gauge
    this.metrics.setGauge('queue_size', 42);

    // Record histogram (e.g., duration)
    const duration = await this.metrics.timeAsync(
      'process_duration_seconds',
      async () => {
        // ... your code
        return result;
      },
      { type: itemType }
    );
  }
}

Metric Types:

incrementCounter() - Monotonically increasing counter
setGauge() / incrementGauge() / decrementGauge() - Current value gauge
recordHistogram() - Distribution of values (e.g., durations, sizes)
recordSummary() - Summary with percentiles
timeAsync() - Automatically time async function execution

Tracing

The CustomTracingService provides distributed tracing using OpenTelemetry.

import { CustomTracingService } from './common/tracing/tracing.service';

export class MyService {
  constructor(private readonly tracing: CustomTracingService) {}

  async processOrder(orderId: string) {
    // Create a span for the entire operation
    return await this.tracing.traceAsync(
      'MyService.processOrder',
      async (span) => {
        span.setAttributes({
          'order.id': orderId,
          'order.type': 'standard',
        });

        // ... your code

        // Add events to the span
        span.addEvent('Order validated');

        // ... more code

        return result;
      },
      { orderId }
    );
  }

  async callExternalService() {
    // Manual span management
    const span = this.tracing.startSpan('callExternalService');

    try {
      const result = await fetch('https://api.example.com/data');
      span.setStatus({ code: SpanStatusCode.OK });
      return result;
    } catch (error) {
      span.recordException(error);
      span.setStatus({ code: SpanStatusCode.ERROR });
      throw error;
    } finally {
      span.end();
    }
  }
}

CLI Commands

All observability features are automatically disabled when running CLI commands. This is configured in backend/src/cli/cli.ts:

// Disable observability for CLI commands
process.env.LOGGING_ENABLED = 'false';
process.env.TRACING_ENABLED = 'false';
process.env.METRICS_ENABLED = 'false';

This prevents unnecessary noise during administrative tasks and ensures CLI commands run without dependency on external services.

Interceptors

The application uses interceptors to automatically collect observability data from HTTP requests:

ContextInterceptor (always enabled) - Stores request context in AsyncLocalStorage
TracingInterceptor (when TRACING_ENABLED=true) - Extracts trace context from headers
MetricsInterceptor (when METRICS_ENABLED=true) - Records HTTP metrics
ResponseInterceptor (always enabled) - Adds response metadata

The interceptor chain is configured in backend/src/main.ts and automatically respects the feature flags.

Best Practices

When to Log

Info: Normal operations, state changes, significant events
Warning: Recoverable errors, deprecated usage, unexpected but handled conditions
Error: Unhandled exceptions, critical failures, data loss
Debug: Detailed debugging information (usually disabled in production)

When to Add Metrics

Counters: Events that happen (requests, errors, items processed)
Gauges: Current state (queue size, active connections, memory usage)
Histograms: Distributions (request durations, payload sizes)

When to Create Spans

Service boundaries: Calls to external services, databases, queues
Business operations: Key business logic functions
Complex workflows: Multi-step processes that benefit from visualization

Context Propagation

All three observability components support context propagation:

Logging: LogContext carries requestId, userId, module info
Metrics: Labels allow grouping and filtering (e.g., by status, endpoint)
Tracing: OpenTelemetry automatically propagates trace context via W3C Trace Context headers

Troubleshooting

Logs Not Appearing in Loki

Check LOGGING_ENABLED=true in your .env
Verify LOKI_URL is accessible from the backend
Check backend logs for "Failed to send log to Loki" errors
Verify Loki is running: curl http://localhost:3100/ready

Metrics Not Scraped by Prometheus

Check METRICS_ENABLED=true in your .env
Verify /metrics endpoint is accessible: curl http://localhost:8443/metrics
Check Prometheus targets: http://localhost:9090/targets
Verify prometheus.yml has correct backend URL

Traces Not Appearing in Tempo

Check TRACING_ENABLED=true in your .env
Verify TEMPO_URL is accessible from the backend
Check OpenTelemetry initialization in main.ts
Verify Tempo is receiving data: curl http://localhost:4318/v1/traces

Performance Impact

Each observability component has minimal performance impact:

Logging: Asynchronous, fire-and-forget HTTP calls
Metrics: In-memory counters and gauges, exported on-demand
Tracing: ~1-2% overhead with proper sampling

For production, consider:

Sampling traces (not every request needs to be traced)
Using appropriate log levels (avoid debug logs in production)
Monitoring metric cardinality (avoid unbounded label values)

Architecture Overview​

The Three Pillars​

Push vs Pull Models​

Environment Variables​

Feature Flags​

Service Configuration​

Observability Endpoints​

Local Development Setup​

Prerequisites​

Docker Compose Configuration​

Prometheus Configuration​

Tempo Configuration​

Starting the Stack​

Accessing the Services​

Using Observability Services​

Logging​

Metrics​

Tracing​

CLI Commands​

Interceptors​

Best Practices​

When to Log​

When to Add Metrics​

When to Create Spans​

Context Propagation​

Troubleshooting​

Logs Not Appearing in Loki​

Metrics Not Scraped by Prometheus​

Traces Not Appearing in Tempo​

Performance Impact​

Further Reading​