news 2026/5/16 0:41:20

链路追踪与分布式追踪:构建可观测的微服务系统

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
链路追踪与分布式追踪:构建可观测的微服务系统

链路追踪与分布式追踪:构建可观测的微服务系统

一、分布式追踪概述

1.1 为什么需要链路追踪

在微服务架构中,一次请求可能涉及多个服务的协同工作:

  • 问题定位困难:出现问题时难以快速定位是哪个服务
  • 性能瓶颈不明:无法了解整个链路的性能情况
  • 依赖关系复杂:服务间的调用关系难以理清
  • 调用链路不透明:无法追踪请求的完整路径

1.2 链路追踪核心概念

概念描述
Trace一次请求的完整链路标识
Span链路中的一个工作单元
Annotation时间点上的标记事件
Baggage随请求传递的上下文数据

1.3 链路追踪架构

┌─────────────────────────────────────────────────────────────────────────┐ │ 分布式追踪架构 │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ Client │────▶│Service A │────▶│Service B │────▶│Service C │ │ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ Trace Context │ │ │ │ traceId: abc123 | spanId: 1 | parentSpanId: null | sampled: true │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────┐ │ │ │ Collector │ │ │ │ (Zipkin/Jaeger)│ │ │ └─────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────┐ │ │ │ Storage │ │ │ │ (ES/MySQL) │ │ │ └─────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────┘

二、Spring Cloud Sleuth配置

2.1 基础依赖

<dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-sleuth</artifactId> </dependency> <!-- 可选:添加OpenTelemetry支持 --> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-tracing</artifactId> </dependency> <dependency> <groupId>io.opentelemetry</groupId> <artifactId>opentelemetry-exporter-otlp</artifactId> </dependency>

2.2 Sleuth配置

spring: application: name: user-service sleuth: sampler: probability: 1.0 # 采样率 0-1 rate: 100 # 每秒最大采样数 propagation: type: B3 w3c: enabled: true baggage: remote-fields: - user-id - request-id correlation-enabled: true header-names: user-id: X-User-Id instrument: web: enabled: true reactor: enabled: true mongo: enabled: true redis: enabled: true logs: enabled: true

2.3 手动创建Span

@Service public class UserService { private static final Logger log = LoggerFactory.getLogger(UserService.class); @Autowired private Tracer tracer; public User getUserById(Long id) { // 创建子Span Span span = tracer.nextSpan().name("getUserById").start(); try (Tracer.SpanInScope inScope = tracer.withSpanInScope(span)) { log.info("Getting user by id: {}", id); // 创建子Span Span dbSpan = tracer.nextSpan().name("queryDatabase").start(); try (Tracer.SpanInScope dbScope = tracer.withSpanInScope(dbSpan)) { dbSpan.tag("db.system", "mysql"); dbSpan.tag("db.statement", "SELECT * FROM users WHERE id = ?"); User user = userRepository.findById(id).orElse(null); return user; } finally { dbSpan.end(); } } finally { span.end(); } } }

三、Jaeger集成

3.1 Jaeger服务端配置

version: '3.8' services: jaeger: image: jaegertracing/all-in-one:latest ports: - "16686:16686" # UI - "6831:6831/udp" # Jaeger.thrift (compact) - "14250:14250" # gRPC environment: - COLLECTOR_OTLP_ENABLED=true - SPAN_STORAGE_TYPE=elasticsearch - ES_SERVER_URLS=http://elasticsearch:9200 depends_on: - elasticsearch elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:7.17.0 environment: - discovery.type=single-node - "ES_JAVA_OPTS=-Xms512m -Xmx512m" ports: - "9200:9200"

3.2 Spring Boot集成Jaeger

spring: application: name: user-service autoconfigure: exclude: - org.springframework.cloud.sleuth.autoconfig.SleuthReactorInstrumentationAutoConfiguration otlp: tracing: endpoint: http://localhost:4318/v1/traces headers: Authorization: Bearer your-token management: tracing: sampling: probability: 1.0 propagation: type: w3c exclusions: - /actuator/** - /health

3.3 自定义Jaeger配置

@Configuration public class JaegerConfig { @Bean public Configurer samplerConfigurer() { return builder -> builder .withLogSpans(true) .withCodec(Propagation.B3) .withSampler(new ProbabilisticSampler(0.5)); } @Bean public RestTemplateCustomizer jaegerRestTemplateCustomizer(Tracer tracer) { return restTemplate -> { List<ClientHttpRequestInterceptor> interceptors = new ArrayList<>( restTemplate.getInterceptors()); interceptors.add(new TracingClientHttpRequestInterceptor(tracer)); restTemplate.setInterceptors(interceptors); }; } }

四、Zipkin集成

4.1 Zipkin服务端配置

# docker-compose.yml version: '3.8' services: zipkin: image: openzipkin/zipkin:latest ports: - "9411:9411" environment: - STORAGE_TYPE=elasticsearch - ES_HOSTS=http://elasticsearch:9200 - RABBIT_URI=amqp://guest:guest@rabbit:5672 depends_on: - elasticsearch

4.2 Spring Boot集成Zipkin

spring: application: name: user-service zipkin: base-url: http://localhost:9411 sender: type: rest # 或 rabbit/kafka/web sampler: probability: 1.0 # 采样率 locator: discovery: enabled: true # 从Eureka发现Zipkin服务器

4.3 异步发送配置

spring: zipkin: sender: type: rabbit rabbit: queue: zipkin connection-name: zipkin-sender rabbitmq: host: localhost port: 5672 username: guest password: guest management: metrics: export: zipkin: enabled: true

五、OpenTelemetry集成

5.1 OpenTelemetry SDK配置

spring: application: name: user-service otel: exporter: otlp: endpoint: http://localhost:4317 headers: api-key: your-api-key service: name: ${spring.application.name} version: 1.0.0 traces: exporter: otlp metrics: exporter: otlp logs: exporter: otlp sampler: ratio: 1.0 parent-based: true

5.2 自定义Span配置

@Component public class TracingInterceptor extends HandlerInterceptorAdapter { private final Tracer tracer; public TracingInterceptor(Tracer tracer) { this.tracer = tracer; } @Override public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) { Span span = tracer.nextSpan() .name(request.getMethod() + " " + request.getRequestURI()) .tag("http.method", request.getMethod()) .tag("http.url", request.getRequestURL().toString()) .tag("http.host", request.getRemoteHost()) .start(); tracer.withSpanInScope(span); request.setAttribute("currentSpan", span); return true; } @Override public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) { Span span = tracer.currentSpan(); if (span != null) { span.tag("http.status_code", String.valueOf(response.getStatus())); if (ex != null) { span.tag("error", "true"); span.tag("error.message", ex.getMessage()); span.status(StatusCode.ERROR); } span.end(); } } }

5.3 数据库追踪

@Component public class TracingDataSourceDecorator extends DataSourceWrapper { private final Tracer tracer; public TracingDataSourceDecorator(DataSource delegate, Tracer tracer) { super(delegate); this.tracer = tracer; } @Override public Connection getConnection() throws SQLException { Span span = tracer.nextSpan().name("db.query").start(); try (Tracer.SpanInScope inScope = tracer.withSpanInScope(span)) { span.tag("db.system", "mysql"); span.tag("db.pool.active", getActiveCount()); Connection connection = super.getConnection(); return new TracingConnection(connection, span, tracer); } catch (Exception e) { span.tag("error", "true"); span.status(StatusCode.ERROR); throw e; } finally { span.end(); } } }

六、请求上下文传播

6.1 上下文传播配置

@Configuration public class ContextPropagationConfig { @Autowired private BeanFactory beanFactory; @Bean public ContextRegistry contextRegistry() { ContextRegistry registry = ContextRegistry.getInstance(); registry.registerContextPropagator(TextMapPropagator.getDefault()); return registry; } @Bean public BaggageRegistry baggageRegistry() { BaggageRegistry registry = BaggageRegistry.newBuilder() .addDefaultBaggageHandler((key, value) -> MDC.put(key, value)) .build(); registry.register BaggageHandler.forEntry( Entry.of("user-id", new MDCEntryToContextCarrier()) ); return registry; } }

6.2 MDC集成

@Component public class MdcTracingFilter extends OncePerRequestFilter { private static final String TRACE_ID = "traceId"; private static final String SPAN_ID = "spanId"; @Autowired private Tracer tracer; @Override protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain chain) throws ServletException, IOException { Span currentSpan = tracer.currentSpan(); if (currentSpan != null) { MDC.put(TRACE_ID, currentSpan.context().traceId()); MDC.put(SPAN_ID, currentSpan.context().spanId()); } try { chain.doFilter(request, response); } finally { MDC.clear(); } } }

6.3 跨服务上下文传递

@Service public class UserServiceClient { private final RestTemplate restTemplate; private final Tracer tracer; public UserServiceClient(RestTemplate restTemplate, Tracer tracer) { this.restTemplate = restTemplate; this.tracer = tracer; } public User getUserById(Long id) { HttpHeaders headers = new HttpHeaders(); // 从当前Span注入上下文到HTTP Header Span span = tracer.currentSpan(); if (span != null) { Injector<HttpHeaders> injector = TracingPropagators.getDefault() .getPropagator(getGlobalTracer()); injector.inject(span.context(), headers, HttpHeadersCarrier.create(headers)); } HttpEntity<Void> entity = new HttpEntity<>(headers); ResponseEntity<User> response = restTemplate.exchange( "http://user-service/api/users/{id}", HttpMethod.GET, entity, User.class, id ); return response.getBody(); } }

七、链路分析

7.1 慢查询分析

@Service public class SlowQueryAnalyzer { @Autowired private Tracer tracer; public void analyze() { Span currentSpan = tracer.currentSpan(); if (currentSpan == null) return; // 获取当前Span的子Span Collection<SpanData> childSpans = getChildSpans(currentSpan.context().spanId()); // 找出慢Span List<SpanData> slowSpans = childSpans.stream() .filter(span -> span.durationMs() > 1000) // 超过1秒 .sorted(Comparator.comparing(SpanData::durationMs).reversed()) .collect(Collectors.toList()); log.warn("Slow spans detected: {}", slowSpans); } }

7.2 调用链分析

@Service public class TraceAnalyzer { @Autowired private SpanRepository spanRepository; public CallGraph buildCallGraph(String traceId) { List<SpanData> spans = spanRepository.findByTraceId(traceId); CallGraph graph = new CallGraph(); for (SpanData span : spans) { Node node = new Node( span.getSpanId(), span.getOperationName(), span.getDurationMs() ); graph.addNode(node); if (span.getParentSpanId() != null) { graph.addEdge(span.getParentSpanId(), span.getSpanId()); } } return graph; } public List<Path> findCriticalPath(String traceId) { CallGraph graph = buildCallGraph(traceId); return graph.findLongestPath(); } }

7.3 依赖分析

@Service public class DependencyAnalyzer { public ServiceDependencyGraph buildDependencyGraph() { List<SpanData> allSpans = spanRepository.findAll(); Map<String, Set<String>> dependencies = new HashMap<>(); for (SpanData span : allSpans) { String service = span.getServiceName(); span.getTags().forEach((key, value) -> { if (key.startsWith("peer.")) { String peerService = extractPeerService(value); if (peerService != null) { dependencies.computeIfAbsent(service, k -> new HashSet<>()) .add(peerService); } } }); } return new ServiceDependencyGraph(dependencies); } }

八、告警配置

8.1 错误率告警

# Prometheus告警规则 groups: - name: tracing-alerts rules: - alert: HighErrorRate expr: | sum(rate(spring_sleuth_spans{tag_error="true"}[5m])) by (service) / sum(rate(spring_sleuth_spans_count[5m])) by (service) > 0.05 for: 5m labels: severity: critical annotations: summary: "High error rate in {{ $labels.service }}" description: "Error rate is {{ $value | humanizePercentage }}" - alert: SlowResponseTime expr: | histogram_quantile(0.95, sum(rate(spring_sleuth_spans_duration_seconds_bucket[5m])) by (le, service) ) > 2 for: 10m labels: severity: warning annotations: summary: "Slow response time in {{ $labels.service }}" description: "95th percentile is {{ $value | humanizeDuration }}"

8.2 延迟告警

- alert: LatencyIncrease expr: | sum(rate(spring_sleuth_spans_duration_seconds_sum[5m])) by (service) / sum(rate(spring_sleuth_spans_duration_seconds_count[5m])) by (service) > 1.5 * avg_over_time( sum(rate(spring_sleuth_spans_duration_seconds_sum[1h])) by (service) / sum(rate(spring_sleuth_spans_duration_seconds_count[1h])) by (service) [1h:5m]) for: 5m labels: severity: warning annotations: summary: "Latency increased in {{ $labels.service }}"

九、Grafana仪表盘

9.1 链路追踪面板

{ "title": "Request Trace Overview", "panels": [ { "title": "Request Rate by Service", "type": "graph", "targets": [ { "expr": "sum(rate(spring_sleuth_spans_count[5m])) by (service)", "legendFormat": "{{ service }}" } ] }, { "title": "Error Rate", "type": "graph", "targets": [ { "expr": "sum(rate(spring_sleuth_spans{tag_error=\"true\"}[5m])) by (service)", "legendFormat": "{{ service }}" } ] }, { "title": "P99 Latency", "type": "graph", "targets": [ { "expr": "histogram_quantile(0.99, sum(rate(spring_sleuth_spans_duration_seconds_bucket[5m])) by (le, service))", "legendFormat": "{{ service }}" } ] } ] }

十、最佳实践

10.1 采样策略

策略适用场景配置
全量采样开发环境、调试probability: 1.0
概率采样生产环境常规probability: 0.1-0.5
头部采样请求入口统一采样sampler: HeadBased
自适应采样动态调整错误时提高采样率

10.2 性能优化建议

  1. 异步发送:使用Kafka/RabbitMQ异步发送追踪数据
  2. 采样策略:根据流量动态调整采样率
  3. 数据压缩:启用追踪数据的压缩
  4. 批量发送:聚合多个Span后批量发送
  5. 存储优化:使用合适的存储后端和索引策略

10.3 安全考虑

# 敏感数据过滤 spring: sleuth: instrument: exclude: - org.springframework.web.servlet.Filter propagation: type: w3c baggage: correlation-enabled: false # 禁用自动MDC关联 data: redis: customizers: - tracing-repository-customizer

十一、总结

链路追踪是微服务可观测性的核心组件,通过本文的介绍,你可以:

  1. 链路追踪概述:Trace、Span、Annotation等核心概念
  2. Spring Cloud Sleuth:分布式追踪的基础组件
  3. Jaeger集成:CNCF推荐的追踪系统
  4. Zipkin集成:Twitter开源的追踪系统
  5. OpenTelemetry:跨语言的追踪标准
  6. 上下文传播:跨服务传递追踪上下文
  7. 链路分析:慢查询、调用链、依赖分析
  8. 告警配置:基于Prometheus的告警规则
  9. Grafana仪表盘:可视化链路追踪数据

通过完善的链路追踪系统,可以快速定位问题、优化性能、理解系统行为,构建真正可观测的微服务系统。

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/5/16 0:39:09

基于ESP8266与机智云的宿舍安全预警系统:物联网毕设实战指南

1. 项目概述与核心价值最近几年&#xff0c;高校宿舍的安全问题时不时就会成为大家讨论的焦点。无论是线路老化引发的火灾&#xff0c;还是不规范用电导致的短路&#xff0c;甚至是人员意外滞留&#xff0c;这些潜在风险都让管理者头疼&#xff0c;也让住在里面的学生感到不安。…

作者头像 李华
网站建设 2026/5/16 0:39:08

AI技术演进与落地全景解析

在人工智能产业飞速迭代的今天&#xff0c;2025年已成为AI从“规模竞赛”向“价值深耕”转型的关键节点。相较于前两年大模型参数的野蛮生长&#xff0c;今年的AI发展呈现出“精准突破、全域落地、生态协同、风险可控”的鲜明特征——多模态能力持续升级、推理模型实现质的飞跃…

作者头像 李华