Skip to main content

Tracing Standards

Distributed tracing to Grafana Tempo and Datadog APM using OpenTelemetry. Automatic HTTP/database tracing via global interceptors.

Philosophy: Logging + HTTP Tracing = 95% of Observability

For V1, we DO NOT trace service methods by default. HTTP-level tracing combined with structured logging provides excellent observability for most use cases.

What You Get Automatically (No Code Required)

HTTP Request Tracing - Every request traced with timing, status, route ✅ Log Correlation - All logs share requestId and traceId for flow reconstruction ✅ Database Tracing - TypeORM operations automatically traced ✅ Error Tracking - HTTP layer captures all unhandled exceptions

When to Add Service-Level Tracing

Only add manual tracing for:

  1. Complex multi-step operations (5+ sequential steps where you need timing breakdowns)
  2. Performance bottlenecks (when HTTP trace shows slowness but you need granular timing)
  3. Distributed operations (when one request triggers multiple background jobs or external calls)

For simple CRUD operations, logging is sufficient.


Automatic Tracing

HTTP requests and database queries are traced automatically. No manual code needed in controllers or services.

Automatic spans created for:

  • HTTP requests (method, route, status, duration)
  • Database queries (query type, table, duration)
  • External HTTP calls (via instrumented HTTP clients)
  • RabbitMQ messages (via instrumented message handlers)

Manual Tracing (Use Sparingly)

⚠️ Only use manual tracing when you specifically need performance breakdown of complex operations.

For most services, skip traceAsync and rely on logs + HTTP tracing.

import { Injectable } from '@nestjs/common';
import { CustomTracingService } from '../../common/tracing/tracing.service';

@Injectable()
export class UserService {
constructor(private readonly tracingService: CustomTracingService) {}

async createUser(values: UserCreateInput): Promise<User> {
return await this.tracingService.traceAsync(
'UserService.createUser',
async (span) => {
// Add span attributes
span.setAttributes({
'user.email': values.email,
'user.role': values.role,
});

// Business logic
const user = await this.repository.save(values);

// Add result attributes
span.setAttributes({
'user.id': user.id,
'user.created': true,
});

return this.toUser(user);
},
{
'service.name': 'UserService',
'operation.type': 'create',
},
);
}
}

Trace Context

Trace context (traceId, spanId) flows automatically through:

  • HTTP requests via W3C Trace Context headers
  • ContextService (available in all services)
  • RabbitMQ messages

Access current trace context:

const context = this.contextService.getLoggingContext();
// { requestId, userId, traceId, spanId }

Span Attributes

Add metadata to spans for filtering and analysis.

span.setAttributes({
'service.name': 'UserService',
'service.method': 'readOne',
'resource.id': userId,
'resource.type': 'user',
'operation.type': 'read',
});

Common attributes:

  • service.name - Service class name
  • service.method - Method name
  • resource.id - Entity ID
  • resource.type - Entity type (user, order, etc.)
  • operation.type - Operation (create, read, update, delete)
  • error.type - Error class name (on errors)
  • http.method - HTTP method
  • http.route - API route

Span Events

Add timeline events to spans.

span.addEvent('validation.started');
await validateInput(values);
span.addEvent('validation.completed');

span.addEvent('cache.miss', { key: cacheKey });

Error Handling

Errors are automatically recorded in spans when using traceAsync.

return await this.tracingService.traceAsync(
'UserService.readOne',
async (span) => {
const user = await this.repository.findOne({ where: { id } });

if (!user) {
// Error automatically recorded by traceAsync
throw new UserNotFoundError(id);
}

return this.toUser(user);
},
);

Manual error recording:

try {
await operation();
} catch (error) {
span.recordException(error as Error);
span.setStatus({
code: SpanStatusCode.ERROR,
message: error.message,
});
throw error;
}

Anti-Patterns

❌ Don't trace every method Only trace important business operations. HTTP and database are automatic.

// Bad - unnecessary tracing
async toUser(entity: UserEntity): Promise<User> {
return await this.tracingService.traceAsync('toUser', async () => {
return { id: entity.id, name: entity.name }; // Too granular
});
}

// Good - trace business operations
async processOrder(values: OrderInput): Promise<Order> {
return await this.tracingService.traceAsync('OrderService.processOrder', async (span) => {
// Complex multi-step operation worth tracing
});
}

❌ Don't forget to pass span to callback The span parameter is needed to add attributes and events.

// Bad - no span parameter
this.tracingService.traceAsync('operation', async () => {
// Can't add attributes or events
return result;
});

// Good - use span parameter
this.tracingService.traceAsync('operation', async (span) => {
span.setAttributes({ key: 'value' });
return result;
});

❌ Don't add high cardinality attributes Avoid unbounded values like user IDs in span names or high-cardinality attributes.

// Bad - unique span names
this.tracingService.traceAsync(`UserService.getUser.${userId}`, ...); // WRONG

// Good - attributes for variable data
this.tracingService.traceAsync('UserService.getUser', async (span) => {
span.setAttributes({ 'user.id': userId }); // Right
});

❌ Don't block on tracing Tracing is async and non-blocking. Don't await span operations.

Other mistakes:

  • ❌ Forgetting to set span status on errors
  • ❌ Not adding useful attributes to spans
  • ❌ Creating spans for trivial operations
  • ❌ Not using traceAsync for error handling