introduction Grafana Alloy

Docs

Grafana Alloy: 前身為 grafana agent
由 grafana 推出的 all-in-one observability collector, 能夠同時蒐集 metric/log/trace

為什要放棄 grafana agent 可見官方的 blog

alloy 優勢如下

discovery 自動發現 scrape target
整合多個 metric exporter: 你不需要再各自安裝 exporter 簡化環境建置

但要注意內建的 expoter 產生的 metric name/label 可能會與社群不同, 導致 dashboard 可能需要修改

alloy 弱勢如下

log 部份僅支援 loki

實戰

使用 helm chart https://artifacthub.io/packages/helm/grafana/alloy 進行測試

scrape metrics in k8s

簡易的 helm values 如下

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54


controller:
  volumes:
    extra: 
    - hostPath:
        path: /proc
        type: ""
      name: proc
    - hostPath:
        path: /sys
        type: ""
      name: sys
    - hostPath:
        path: /
        type: ""
      name: root

alloy:
  mounts:
    extra: 
    - mountPath: /host/proc
      name: proc
      readOnly: true
    - mountPath: /host/sys
      name: sys
      readOnly: true
    - mountPath: /host/root
      mountPropagation: HostToContainer
      name: root
      readOnly: true

  configMap:
    content: |-
      // metrics
      prometheus.remote_write "default" {
        endpoint {
          url = "http://gf-stack-mimir-distributor.monitor.svc:8080/api/v1/push"
          headers = {
            "X-Scope-OrgID" = "alloy",
          }
        }
      }

      prometheus.exporter.unix "node" { }

      // Configure a prometheus.scrape component to collect unix metrics.
      prometheus.scrape "node" {
        targets    = prometheus.exporter.unix.node.targets
        forward_to = [prometheus.remote_write.default.receiver]
      }

      prometheus.scrape "pod" {
        targets    = discovery.kubernetes.pod.targets
        forward_to = [prometheus.remote_write.default.receiver]
      }

grafana alloy 會藉由 discovery.kubernetes 會去取得 pod 的 ip:port
新增至 scrape targets
prometheus.scrape 會使用 endpoint /metrics 嘗試取得 metrics
grafana alloy 會利用內建的 node-exporter 產生 scrape tragets prometheus.scrape 取得 metrics
利用 prometheus.remote_write 將 metrics 送至 promethues

同一時間我們看看 resource usage

$ kubectl top pod 
NAME                                                    CPU(cores)   MEMORY(bytes)   
alloy-7kwd9                                             39m          461Mi           
victoria-metrics-agent-d6cff6696-lg964                  2m           102Mi           
victoria-metrics-agent-prometheus-node-exporter-6l5jm   4m           10Mi

坦白說這 resource usage 令我難以接受
看來採用 victoria-metrics-agent 依舊是較佳選擇

scrape logs in k8s

簡易的 helm values 如下

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96


alloy:
  mounts:
    varlog: true

  configMap:
    content: |-
      
      // local.file_match discovers files on the local filesystem using glob patterns and the doublestar library. It returns an array of file paths.
      local.file_match "node_logs" {
        path_targets = [{
            // Monitor syslog to scrape node-logs
            __path__  = "/var/log/syslog",
            job       = "node/syslog",
            node_name = sys.env("HOSTNAME"),
            cluster   = "local",
        }]
      }

      // loki.source.file reads log entries from files and forwards them to other loki.* components.
      // You can specify multiple loki.source.file components by giving them different labels.
      loki.source.file "node_logs" {
        targets    = local.file_match.node_logs.targets
        forward_to = [loki.write.default.receiver]
      }


      // discovery.kubernetes allows you to find scrape targets from Kubernetes resources.
      // It watches cluster state and ensures targets are continually synced with what is currently running in your cluster.
      discovery.kubernetes "pod" {
        role = "pod"
        // Restrict to pods on the node to reduce cpu & memory usage
        selectors {
          role = "pod"
          field = "spec.nodeName=" + coalesce(sys.env("HOSTNAME"), constants.hostname)
        }
      }

      // discovery.relabel rewrites the label set of the input targets by applying one or more relabeling rules.
      // If no rules are defined, then the input targets are exported as-is.
      discovery.relabel "pod_logs" {
        targets = discovery.kubernetes.pod.targets

        // Label creation - "namespace" field from "__meta_kubernetes_namespace"
        rule {
          source_labels = ["__meta_kubernetes_namespace"]
          action = "replace"
          target_label = "namespace"
        }

        // Label creation - "pod" field from "__meta_kubernetes_pod_name"
        rule {
          source_labels = ["__meta_kubernetes_pod_name"]
          action = "replace"
          target_label = "pod"
        }

        // Label creation - "container" field from "__meta_kubernetes_pod_container_name"
        rule {
          source_labels = ["__meta_kubernetes_pod_container_name"]
          action = "replace"
          target_label = "container"
        }

        // Label creation -  "app" field from "__meta_kubernetes_pod_label_app_kubernetes_io_name"
        rule {
          source_labels = ["__meta_kubernetes_pod_label_app_kubernetes_io_name"]
          action = "replace"
          target_label = "app"
        }

      }

      // loki.source.kubernetes tails logs from Kubernetes containers using the Kubernetes API.
      loki.source.kubernetes "pod_logs" {
        targets    = discovery.relabel.pod_logs.output
        forward_to = [loki.process.pod_logs.receiver]
      }

      // loki.process receives log entries from other Loki components, applies one or more processing stages,
      // and forwards the results to the list of receivers in the component's arguments.
      loki.process "pod_logs" {
        stage.static_labels {
            values = {
              cluster = "laterstack",
              job = "alloy",
            }
        }
        forward_to = [loki.write.default.receiver]
      }
    
      loki.write "default" {
        endpoint {
          url = "http://loki-write.monitor.svc:3100/loki/api/v1/push"
          tenant_id = "alloy"
        }
      }

grafana alloy 會藉由 discovery.kubernetes 會去取得 pod 的 label 新增至 scrape targets
loki.source.kubernetes 取得 logs
利用 loki.source.file 取的 host log
利用 loki.write 將 log 送至 loki

同一時間我們看看 resource usage

$ kubectl top pod
NAME                                                    CPU(cores)   MEMORY(bytes)   
alloy-85jxg                                             8m           74Mi            
promtail-9hpgh                                          30m          126Mi

作為對比 alloy 比 promtail 好上不少
看起來是能夠放心轉換 promtail 至 alloy
雖然 fluent-bit 在前面的測試中 cpu 表現較差
但其功能性絕對是碾壓 alloy , 因此我依舊會建議採用 fluent-bit

conclusion

grafana alloy 雖然理念很好
但 all-in-one observability collector 依舊是呈現術業有專攻的現象
以目前來說, 各自採用合適的 collector 依舊是較佳的選擇

Last updated on 2025 Nov 3

VictoriaMetrics Operator 簡介：無痛轉換 Prometheus Operator benchmark log collector for loki - fluent-bit, alloy, promtail