Skip to main content

Observability

Yew Backend implements a comprehensive observability stack using self-hosted open-source solutions. The three pillars of observability are:

  1. Logging - Structured logs aggregated in Loki
  2. Metrics - Time-series metrics collected by Prometheus
  3. Tracing - Distributed traces stored in Tempo

All observability features can be individually enabled or disabled via environment variables.

Architecture Overview

The Three Pillars

┌────────────────────────────────────────────────────────┐
│ Yew Backend │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Logging │ │ Metrics │ │ Tracing │ │
│ │ (Push) │ │ (Pull) │ │ (Push) │ │
│ └─────┬──────┘ └──────▲─────┘ └─────┬──────┘ │
│ │ │ │ │
└────────┼──────────────────┼──────────────────┼─────────┘
│ │ │
HTTP POST HTTP GET OTLP/HTTP
│ │ │
▼ │ ▼
┌─────────┐ ┌────┴─────┐ ┌─────────┐
│ Loki │ │Prometheus│ │ Tempo │
└─────────┘ └──────────┘ └─────────┘
│ │ │
└──────────────────┴──────────────────┘

┌──────▼──────┐
│ Grafana │
│ (Dashboard) │
└─────────────┘

Push vs Pull Models

Push Model (Logging & Tracing)

  • Backend actively sends data to the aggregation service
  • Loki: Backend sends logs via HTTP POST to /loki/api/v1/push
  • Tempo: Backend sends traces via OTLP to /v1/traces
  • Advantages: Real-time delivery, works behind firewalls
  • Trade-off: Backend must handle network failures gracefully

Pull Model (Metrics)

  • Aggregation service scrapes data from the backend
  • Prometheus: Scrapes /metrics endpoint on a schedule (typically 15s)
  • Advantages: Simple backend implementation, Prometheus controls rate
  • Trade-off: Requires backend to be reachable by Prometheus

Environment Variables

All observability features are controlled via environment variables defined in backend/src/common/config/secrets.ts.

Feature Flags

# Enable/disable each observability component independently
LOGGING_ENABLED=true # Enable structured logging to Loki
TRACING_ENABLED=false # Enable distributed tracing to Tempo
METRICS_ENABLED=false # Enable metrics collection for Prometheus

Service Configuration

# Application metadata
SERVICE_NAME=yew-backend
NODE_ENV=development
HOSTNAME=localhost

Observability Endpoints

# Loki - Log aggregation (Push Model)
# Backend pushes logs to this URL
LOKI_URL=http://localhost:3100

# Tempo - Distributed tracing (Push Model)
# Backend pushes traces via OTLP to this URL
TEMPO_URL=http://localhost:4318

# Prometheus - Metrics (Pull Model)
# Prometheus scrapes the /metrics endpoint
# No URL needed - configure in prometheus.yml instead

Local Development Setup

Prerequisites

You need to run Loki, Prometheus, Tempo, and Grafana locally. The recommended approach is using Docker Compose.

Docker Compose Configuration

Create a docker-compose.observability.yml file:

version: '3.8'

services:
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
volumes:
- loki-data:/loki

prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'

tempo:
image: grafana/tempo:latest
ports:
- "4318:4318" # OTLP HTTP
- "3200:3200" # Tempo API
command: -config.file=/etc/tempo/tempo.yaml
volumes:
- ./tempo.yaml:/etc/tempo/tempo.yaml
- tempo-data:/tmp/tempo

grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
volumes:
- grafana-data:/var/lib/grafana

volumes:
loki-data:
prometheus-data:
tempo-data:
grafana-data:

Prometheus Configuration

Create prometheus.yml:

global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_configs:
- job_name: 'yew-backend'
static_configs:
- targets: ['host.docker.internal:8443']
metrics_path: '/metrics'

Tempo Configuration

Create tempo.yaml:

server:
http_listen_port: 3200

distributor:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318

storage:
trace:
backend: local
local:
path: /tmp/tempo/traces

query_frontend:
search:
enabled: true

Starting the Stack

docker-compose -f docker-compose.observability.yml up -d

Accessing the Services

Using Observability Services

Logging

The CustomLoggerService provides structured logging with context.

import { CustomLoggerService, LogContext } from './common/logger/logger.service';

export class MyService {
constructor(private readonly logger: CustomLoggerService) {}

async doSomething() {
const context: LogContext = {
module: 'MyService',
method: 'doSomething',
userId: '123',
};

this.logger.logWithContext('Starting operation', context);

try {
// ... your code
this.logger.logWithContext('Operation completed', context);
} catch (error) {
this.logger.errorWithContext(
'Operation failed',
error.stack,
context
);
}
}
}

Log Levels:

  • logWithContext() - Info level
  • errorWithContext() - Error level
  • warnWithContext() - Warning level
  • debugWithContext() - Debug level

Metrics

The CustomMetricsService provides Prometheus-compatible metrics.

import { CustomMetricsService } from './common/metrics/metrics.service';

export class MyService {
constructor(private readonly metrics: CustomMetricsService) {}

async processItem(itemType: string) {
// Increment a counter
this.metrics.incrementCounter('items_processed_total', {
type: itemType,
status: 'success',
});

// Set a gauge
this.metrics.setGauge('queue_size', 42);

// Record histogram (e.g., duration)
const duration = await this.metrics.timeAsync(
'process_duration_seconds',
async () => {
// ... your code
return result;
},
{ type: itemType }
);
}
}

Metric Types:

  • incrementCounter() - Monotonically increasing counter
  • setGauge() / incrementGauge() / decrementGauge() - Current value gauge
  • recordHistogram() - Distribution of values (e.g., durations, sizes)
  • recordSummary() - Summary with percentiles
  • timeAsync() - Automatically time async function execution

Tracing

The CustomTracingService provides distributed tracing using OpenTelemetry.

import { CustomTracingService } from './common/tracing/tracing.service';

export class MyService {
constructor(private readonly tracing: CustomTracingService) {}

async processOrder(orderId: string) {
// Create a span for the entire operation
return await this.tracing.traceAsync(
'MyService.processOrder',
async (span) => {
span.setAttributes({
'order.id': orderId,
'order.type': 'standard',
});

// ... your code

// Add events to the span
span.addEvent('Order validated');

// ... more code

return result;
},
{ orderId }
);
}

async callExternalService() {
// Manual span management
const span = this.tracing.startSpan('callExternalService');

try {
const result = await fetch('https://api.example.com/data');
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (error) {
span.recordException(error);
span.setStatus({ code: SpanStatusCode.ERROR });
throw error;
} finally {
span.end();
}
}
}

CLI Commands

All observability features are automatically disabled when running CLI commands. This is configured in backend/src/cli/cli.ts:

// Disable observability for CLI commands
process.env.LOGGING_ENABLED = 'false';
process.env.TRACING_ENABLED = 'false';
process.env.METRICS_ENABLED = 'false';

This prevents unnecessary noise during administrative tasks and ensures CLI commands run without dependency on external services.

Interceptors

The application uses interceptors to automatically collect observability data from HTTP requests:

  1. ContextInterceptor (always enabled) - Stores request context in AsyncLocalStorage
  2. TracingInterceptor (when TRACING_ENABLED=true) - Extracts trace context from headers
  3. MetricsInterceptor (when METRICS_ENABLED=true) - Records HTTP metrics
  4. ResponseInterceptor (always enabled) - Adds response metadata

The interceptor chain is configured in backend/src/main.ts and automatically respects the feature flags.

Best Practices

When to Log

  • Info: Normal operations, state changes, significant events
  • Warning: Recoverable errors, deprecated usage, unexpected but handled conditions
  • Error: Unhandled exceptions, critical failures, data loss
  • Debug: Detailed debugging information (usually disabled in production)

When to Add Metrics

  • Counters: Events that happen (requests, errors, items processed)
  • Gauges: Current state (queue size, active connections, memory usage)
  • Histograms: Distributions (request durations, payload sizes)

When to Create Spans

  • Service boundaries: Calls to external services, databases, queues
  • Business operations: Key business logic functions
  • Complex workflows: Multi-step processes that benefit from visualization

Context Propagation

All three observability components support context propagation:

  • Logging: LogContext carries requestId, userId, module info
  • Metrics: Labels allow grouping and filtering (e.g., by status, endpoint)
  • Tracing: OpenTelemetry automatically propagates trace context via W3C Trace Context headers

Troubleshooting

Logs Not Appearing in Loki

  1. Check LOGGING_ENABLED=true in your .env
  2. Verify LOKI_URL is accessible from the backend
  3. Check backend logs for "Failed to send log to Loki" errors
  4. Verify Loki is running: curl http://localhost:3100/ready

Metrics Not Scraped by Prometheus

  1. Check METRICS_ENABLED=true in your .env
  2. Verify /metrics endpoint is accessible: curl http://localhost:8443/metrics
  3. Check Prometheus targets: http://localhost:9090/targets
  4. Verify prometheus.yml has correct backend URL

Traces Not Appearing in Tempo

  1. Check TRACING_ENABLED=true in your .env
  2. Verify TEMPO_URL is accessible from the backend
  3. Check OpenTelemetry initialization in main.ts
  4. Verify Tempo is receiving data: curl http://localhost:4318/v1/traces

Performance Impact

Each observability component has minimal performance impact:

  • Logging: Asynchronous, fire-and-forget HTTP calls
  • Metrics: In-memory counters and gauges, exported on-demand
  • Tracing: ~1-2% overhead with proper sampling

For production, consider:

  • Sampling traces (not every request needs to be traced)
  • Using appropriate log levels (avoid debug logs in production)
  • Monitoring metric cardinality (avoid unbounded label values)

Further Reading