Logging

This section is aims to provide guidance around what logging solutions are available to be used in conjunction with a kubernetes cluster and it’s applications.

Centralised logging

Given the nature of today’s multi-cloud, distributed working it can be a challenge for the people involved in running, deploying and maintaining cloud based systems. Part of this challenge is the need to aggregate various metrics, across a wide range of sources and presenting these in a ‘single pane of glass’.

One of the key data sources that typically need to managed in this way is log data. Whether this is system or application generated it is always desirable to be able to forward all of these log output to specialised systems such as Elasticsearch, Kibana or Grafana which can handle the display, searchability and analysis of the received log data.

It is the role of the log collectors such as Fluentd or Logstash to forward these logs from their origins on to the chosen analysis tools.

Fluentd

Fluentd is a Cloud Native Computing Foundation (CNCF) open source data collector aimed at providing a unified logging layer with a pluggable architecture.

It attempts to structure all data as JSON in order to unify the collecting, filtering, buffering and outputting of log data from multiple sources and destinations.

Pluggable architecture

The flexible nature of the Fluentd plugin system allows users to make better use of their log data in a much easier way through the use of the 500+ community created plugins that provide a wide range of supported data source and data output options.

Shipping logs to an AWS S3 bucket

In this example we will look at adding a Fluentd daemonset to our cluster so that we can export the logs to a central S3 bucket, this in turn means that our log data can be made available to any downstream analysis tools that we desire.

The following are the configuration details that will be used in this example and these would need to be modified to fit your own personal circumstances.

For each example we will list the manifest file and the command required to deploy it.

  • cluster namespace : fluentdlogging

  • AWS S3 bucket name : fluentlogs-cc

  • AWS S3 bucket prefix : myapp

  • AWS access key id: ‘<AWS_ACCESS_KEY>’

  • AWS secret access key: ‘<AWS_SECRET_ACCESS_KEY>’

First we need to create the namespace to deploy our fluentd application into along with a service account and the appropriate cluster role that can access pods and namespaces.

# fluentd-s3-rbac.yml

---
# create logging namespace
apiVersion: v1
kind: Namespace
metadata:
  name: fluentdlogging

---
# create the fluentd ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd
  namespace: fluentdlogging

---
# create fluentd ClusterRole to access pods and namespaces
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: fluentd
  namespace: fluentdlogging
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - namespaces
  verbs:
  - get
  - list
  - watch

---
# bind the ServiceAccount to the ClusterRole
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: fluentd
roleRef:
  kind: ClusterRole
  name: fluentd
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: fluentd
  namespace: fluentdlogging


$ kubectl apply -f fluentd-s3-rbac.yml
namespace/fluentdlogging created
serviceaccount/fluentd created
clusterrole.rbac.authorization.k8s.io/fluentd created
clusterrolebinding.rbac.authorization.k8s.io/fluentd created

The next part is the configmap that will hold the configuration for the Fluentd S3 plugin. There are extra parameters outlined in the documentation and in our example we have modified the following:

  • path: which defines the prefix that will be used for the S3 bucket we are uploading to.

  • timekey: The delay for the output frequency, which we have dropped to 5 minutes for the purpose of testing.

# fluentd-s3-configmap.yml

---
# create the fluentd.conf config map
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-configmap
  namespace: fluentdlogging
data:
  fluent.conf: |
    @include "#{ENV['FLUENTD_SYSTEMD_CONF'] || 'systemd'}.conf"
    @include "#{ENV['FLUENTD_PROMETHEUS_CONF'] || 'prometheus'}.conf"
    @include kubernetes.conf
    @include conf.d/*.conf

    <match **>
      # docs: https://docs.fluentd.org/v0.12/articles/out_s3
      # note: this configuration relies on the nodes have an IAM instance profile with access to your S3 bucket
      @type s3
      @id out_s3
      @log_level info
      s3_bucket "#{ENV['S3_BUCKET_NAME']}"
      s3_region "#{ENV['S3_BUCKET_REGION']}"
      s3_object_key_format %{path}/%Y/%m/%d/cluster-log-%{index}.%{file_extension}
      path "cluster-1"
      <inject>
        time_key time
        tag_key tag
        localtime false
      </inject>
      <buffer>
        @type file
        path /var/log/fluentd-buffers/s3.buffer
        timekey 3600
        timekey_use_utc true
        chunk_limit_size 256m
      </buffer>
    </match>
$ kubectl apply -f fluentd-s3-configmap.yml
configmap/fluentd-configmap created

This file allows us to store our AWS credentials in a secret.

# fluentd-s3-secrets.yml

---
# create secrets for AWS access
apiVersion: v1
kind: Secret
metadata:
  name: aws-secret-fluentd
  namespace: fluentdlogging
stringData:
  aws_access_key_id: '<AWS_ACCESS_KEY>'
  aws_secret_access_key: '<AWS_SECRET_ACCESS_KEY>'
$ kubectl apply -f fluentd-s3-secrets.yml
secret/aws-secret-fluentd created

Finally, we create a daemonset to run the flunetd nodes. This will create one pod per worker node.

The config for the flunetd container specifies, via the use of environment variables, the necessary S3 parameters. It also loads the previously supplied config map though a volume mount.

# fluentd-s3-daemonset.yml

---
# create the fluentd daemonset
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: fluentdlogging
  labels:
    k8s-app: fluentd-logging
    version: v1
    kubernetes.io/cluster-service: "true"
spec:
  template:
    metadata:
      labels:
        k8s-app: fluentd-logging
        version: v1
        kubernetes.io/cluster-service: "true"
    spec:
      serviceAccount: fluentd
      serviceAccountName: fluentd
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1.10-debian-s3-1
        env:
        - name: FLUENT_UID
          value: "0"
        - name:  S3_BUCKET_NAME
          value: "fluentdlogs-cc"
        - name:  S3_BUCKET_REGION
          value: "ap-southeast-2"
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: aws-secret-fluentd
              key: aws_access_key_id
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: aws-secret-fluentd
              key: aws_secret_access_key
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluentd-etc-volume
          mountPath: /fluentd/etc/fluent.conf
          subPath: fluent.conf
          readOnly: true
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluentd-etc-volume
        configMap:
          name: fluentd-configmap
$ kubectl apply -f fluentd-s3-daemonset.yml
daemonset.extensions/fluentd created

Once the full configuration is deployed we can check to see that the flunentd pods are online.

$ kubectl get pod -n fluentdlogging
NAME            READY   STATUS    RESTARTS   AGE
fluentd-9dgxb   1/1     Running   0          6m10s
fluentd-jjq8m   1/1     Running   0          6m10s

Once they reach the running state we can query the pods to confirm that the bucket was created.

$ kubectl -n fluentdlogging logs fluentd-9dgxb | grep "Creating bucket"
2020-05-18 01:37:35 +0000 [info]: #0 [out_s3] Creating bucket fluentdlogs-cc on

We can also validate this using the awscli tool and see that our S3 bucket has been created.

$ aws s3 ls
2020-05-18 13:37:37 fluentdlogs-cc

To cleanup once you are finished run the following command.

k delete -f fluentd-s3-rbac.yml \
-f fluentd-s3-configmap.yml \
-f fluentd-s3-secrets.yml \
-f fluentd-s3-daemonset.yml