Observability
Yew Backend implements a comprehensive observability stack using self-hosted open-source solutions. The three pillars of observability are:
- Logging - Structured logs aggregated in Loki
- Metrics - Time-series metrics collected by Prometheus
- Tracing - Distributed traces stored in Tempo
All observability features can be individually enabled or disabled via environment variables.
Architecture Overview
The Three Pillars
┌────────────────────────────────────────────────────────┐
│ Yew Backend │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Logging │ │ Metrics │ │ Tracing │ │
│ │ (Push) │ │ (Pull) │ │ (Push) │ │
│ └─────┬──────┘ └──────▲─────┘ └─────┬──────┘ │
│ │ │ │ │
└────────┼──────────────────┼──────────────────┼─────────┘
│ │ │
HTTP POST HTTP GET OTLP/HTTP
│ │ │
▼ │ ▼
┌─────────┐ ┌────┴─────┐ ┌─────────┐
│ Loki │ │Prometheus│ │ Tempo │
└─────────┘ └──────────┘ └─────────┘
│ │ │
└──────────────────┴──────────────────┘
│
┌──────▼──────┐
│ Grafana │
│ (Dashboard) │
└─────────────┘
Push vs Pull Models
Push Model (Logging & Tracing)
- Backend actively sends data to the aggregation service
- Loki: Backend sends logs via HTTP POST to
/loki/api/v1/push - Tempo: Backend sends traces via OTLP to
/v1/traces - Advantages: Real-time delivery, works behind firewalls
- Trade-off: Backend must handle network failures gracefully
Pull Model (Metrics)
- Aggregation service scrapes data from the backend
- Prometheus: Scrapes
/metricsendpoint on a schedule (typically 15s) - Advantages: Simple backend implementation, Prometheus controls rate
- Trade-off: Requires backend to be reachable by Prometheus
Environment Variables
All observability features are controlled via environment variables defined in backend/src/common/config/secrets.ts.
Feature Flags
# Enable/disable each observability component independently
LOGGING_ENABLED=true # Enable structured logging to Loki
TRACING_ENABLED=false # Enable distributed tracing to Tempo
METRICS_ENABLED=false # Enable metrics collection for Prometheus
Service Configuration
# Application metadata
SERVICE_NAME=yew-backend
NODE_ENV=development
HOSTNAME=localhost
Observability Endpoints
# Loki - Log aggregation (Push Model)
# Backend pushes logs to this URL
LOKI_URL=http://localhost:3100
# Tempo - Distributed tracing (Push Model)
# Backend pushes traces via OTLP to this URL
TEMPO_URL=http://localhost:4318
# Prometheus - Metrics (Pull Model)
# Prometheus scrapes the /metrics endpoint
# No URL needed - configure in prometheus.yml instead
Local Development Setup
Prerequisites
You need to run Loki, Prometheus, Tempo, and Grafana locally. The recommended approach is using Docker Compose.
Docker Compose Configuration
Create a docker-compose.observability.yml file:
version: '3.8'
services:
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
volumes:
- loki-data:/loki
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
tempo:
image: grafana/tempo:latest
ports:
- "4318:4318" # OTLP HTTP
- "3200:3200" # Tempo API
command: -config.file=/etc/tempo/tempo.yaml
volumes:
- ./tempo.yaml:/etc/tempo/tempo.yaml
- tempo-data:/tmp/tempo
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
volumes:
- grafana-data:/var/lib/grafana
volumes:
loki-data:
prometheus-data:
tempo-data:
grafana-data:
Prometheus Configuration
Create prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'yew-backend'
static_configs:
- targets: ['host.docker.internal:8443']
metrics_path: '/metrics'
Tempo Configuration
Create tempo.yaml:
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
storage:
trace:
backend: local
local:
path: /tmp/tempo/traces
query_frontend:
search:
enabled: true
Starting the Stack
docker-compose -f docker-compose.observability.yml up -d
Accessing the Services
- Grafana: http://localhost:3000
- Prometheus: http://localhost:9090
- Loki: http://localhost:3100
- Tempo: http://localhost:3200
Using Observability Services
Logging
The CustomLoggerService provides structured logging with context.
import { CustomLoggerService, LogContext } from './common/logger/logger.service';
export class MyService {
constructor(private readonly logger: CustomLoggerService) {}
async doSomething() {
const context: LogContext = {
module: 'MyService',
method: 'doSomething',
userId: '123',
};
this.logger.logWithContext('Starting operation', context);
try {
// ... your code
this.logger.logWithContext('Operation completed', context);
} catch (error) {
this.logger.errorWithContext(
'Operation failed',
error.stack,
context
);
}
}
}
Log Levels:
logWithContext()- Info levelerrorWithContext()- Error levelwarnWithContext()- Warning leveldebugWithContext()- Debug level
Metrics
The CustomMetricsService provides Prometheus-compatible metrics.
import { CustomMetricsService } from './common/metrics/metrics.service';
export class MyService {
constructor(private readonly metrics: CustomMetricsService) {}
async processItem(itemType: string) {
// Increment a counter
this.metrics.incrementCounter('items_processed_total', {
type: itemType,
status: 'success',
});
// Set a gauge
this.metrics.setGauge('queue_size', 42);
// Record histogram (e.g., duration)
const duration = await this.metrics.timeAsync(
'process_duration_seconds',
async () => {
// ... your code
return result;
},
{ type: itemType }
);
}
}
Metric Types:
incrementCounter()- Monotonically increasing countersetGauge()/incrementGauge()/decrementGauge()- Current value gaugerecordHistogram()- Distribution of values (e.g., durations, sizes)recordSummary()- Summary with percentilestimeAsync()- Automatically time async function execution
Tracing
The CustomTracingService provides distributed tracing using OpenTelemetry.
import { CustomTracingService } from './common/tracing/tracing.service';
export class MyService {
constructor(private readonly tracing: CustomTracingService) {}
async processOrder(orderId: string) {
// Create a span for the entire operation
return await this.tracing.traceAsync(
'MyService.processOrder',
async (span) => {
span.setAttributes({
'order.id': orderId,
'order.type': 'standard',
});
// ... your code
// Add events to the span
span.addEvent('Order validated');
// ... more code
return result;
},
{ orderId }
);
}
async callExternalService() {
// Manual span management
const span = this.tracing.startSpan('callExternalService');
try {
const result = await fetch('https://api.example.com/data');
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (error) {
span.recordException(error);
span.setStatus({ code: SpanStatusCode.ERROR });
throw error;
} finally {
span.end();
}
}
}
CLI Commands
All observability features are automatically disabled when running CLI commands. This is configured in backend/src/cli/cli.ts:
// Disable observability for CLI commands
process.env.LOGGING_ENABLED = 'false';
process.env.TRACING_ENABLED = 'false';
process.env.METRICS_ENABLED = 'false';
This prevents unnecessary noise during administrative tasks and ensures CLI commands run without dependency on external services.
Interceptors
The application uses interceptors to automatically collect observability data from HTTP requests:
- ContextInterceptor (always enabled) - Stores request context in AsyncLocalStorage
- TracingInterceptor (when
TRACING_ENABLED=true) - Extracts trace context from headers - MetricsInterceptor (when
METRICS_ENABLED=true) - Records HTTP metrics - ResponseInterceptor (always enabled) - Adds response metadata
The interceptor chain is configured in backend/src/main.ts and automatically respects the feature flags.
Best Practices
When to Log
- Info: Normal operations, state changes, significant events
- Warning: Recoverable errors, deprecated usage, unexpected but handled conditions
- Error: Unhandled exceptions, critical failures, data loss
- Debug: Detailed debugging information (usually disabled in production)
When to Add Metrics
- Counters: Events that happen (requests, errors, items processed)
- Gauges: Current state (queue size, active connections, memory usage)
- Histograms: Distributions (request durations, payload sizes)
When to Create Spans
- Service boundaries: Calls to external services, databases, queues
- Business operations: Key business logic functions
- Complex workflows: Multi-step processes that benefit from visualization
Context Propagation
All three observability components support context propagation:
- Logging:
LogContextcarries requestId, userId, module info - Metrics: Labels allow grouping and filtering (e.g., by status, endpoint)
- Tracing: OpenTelemetry automatically propagates trace context via W3C Trace Context headers
Troubleshooting
Logs Not Appearing in Loki
- Check
LOGGING_ENABLED=truein your.env - Verify
LOKI_URLis accessible from the backend - Check backend logs for "Failed to send log to Loki" errors
- Verify Loki is running:
curl http://localhost:3100/ready
Metrics Not Scraped by Prometheus
- Check
METRICS_ENABLED=truein your.env - Verify
/metricsendpoint is accessible:curl http://localhost:8443/metrics - Check Prometheus targets: http://localhost:9090/targets
- Verify
prometheus.ymlhas correct backend URL
Traces Not Appearing in Tempo
- Check
TRACING_ENABLED=truein your.env - Verify
TEMPO_URLis accessible from the backend - Check OpenTelemetry initialization in main.ts
- Verify Tempo is receiving data:
curl http://localhost:4318/v1/traces
Performance Impact
Each observability component has minimal performance impact:
- Logging: Asynchronous, fire-and-forget HTTP calls
- Metrics: In-memory counters and gauges, exported on-demand
- Tracing: ~1-2% overhead with proper sampling
For production, consider:
- Sampling traces (not every request needs to be traced)
- Using appropriate log levels (avoid debug logs in production)
- Monitoring metric cardinality (avoid unbounded label values)