Logging Kubernetes to Catalyst Cloud

Introduction

This is a tutorial for showing how you can setup centralised logging to Catalyst Cloud for your applications running on Kubernetes.

Background

Given the nature of today’s multi-cloud, distributed working it can be a challenge for the people involved in running, deploying and maintaining cloud-based systems. Part of this challenge is the need to aggregate various metrics, across a wide range of sources and presenting these in a ‘single pane of glass’.

One of the key data sources that typically need to be managed in this way is log data. Whether this is system or application generated it is always desirable to be able to forward all of these log output to specialised systems such as Elasticsearch, Kibana or Grafana which can handle the display, searchability and analysis of the received log data.

It is the role of the log collectors such as Fluentd or Logstash to forward these logs from their origins on to the chosen analysis tools.

Fluentd

Fluentd is a Cloud Native Computing Foundation (CNCF) open source data collector aimed at providing a unified logging layer with a pluggable architecture.

It attempts to structure all data as JSON in order to unify the collecting, filtering, buffering and outputting of log data from multiple sources and destinations.

The flexible nature of the Fluentd plugin system allows users to make better use of their log data in a much easier way through the use of the 500+ community created plugins that provide a wide range of supported data source and data output options.

This tutorial shows how you can use Fluentd to set up logging for Kubernetes clusters backed by the Catalyst Cloud Object Storage service.

Overview

We will be adding a Fluentd DaemonSet to our cluster so that we can export the logs to a Catalyst Cloud Object Storage container via the S3 API, using the Fluentd S3 plugin.

This allows you to make log data available to any downstream analysis tool that is able to use the supported Object Storage APIs.

Creating target container

First, we will create the Object Storage container that Fluentd will publish log files to.

For more information on how to create an Object Storage container in Catalyst Cloud, please refer to Using Containers.

Run the following command to create the fluentd container.

openstack container create fluentd

By default, the container is created with the multi-region replication policy. To use one of the single-region replication policies, use the --storage-policy option to set a custom storage policy when creating the container.

For example, for single-region replication to the nz-hlz-1 region:

openstack container create fluentd --storage-policy nz-hlz-1--o1--sr-r3

To confirm the container was created successfully, run openstack container show fluentd to list the properties of the container:

$ openstack container show fluentd
+----------------+---------------------------------------+
| Field          | Value                                 |
+----------------+---------------------------------------+
| account        | AUTH_e5d4c3b2a1e5d4c3b2a1e5d4c3b2a1e5 |
| bytes_used     | 0                                     |
| container      | fluentd                               |
| object_count   | 0                                     |
| storage_policy | nz--o1--mr-r3                         |
+----------------+---------------------------------------+

Creating namespace and service account

Now in Kubernetes, we will create the namespace that Fluentd will run in, along with dedicated service accounts that grant Fluentd the required privileges.

Create a YAML file named fluentd-rbac.yml with the content as shown below.

A logging namespace is created for Fluentd to run in, along with a fluentd service account. A matching new cluster role is also created with the required permissions, along with a binding for the cluster role to the service account.

---
# Create logging namespace.
apiVersion: v1
kind: Namespace
metadata:
  name: logging

---
# Create the fluentd ServiceAccount.
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd
  namespace: logging

---
# Create a fluentd ClusterRole to access pods and namespaces.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluentd
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - namespaces
  verbs:
  - get
  - list
  - watch

---
# Bind the ServiceAccount to the ClusterRole.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: fluentd
roleRef:
  kind: ClusterRole
  name: fluentd
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: fluentd
  namespace: logging

Run kubectl apply -f fluentd-rbac.yml to create the resources in the Kubernetes cluster.

$ kubectl apply -f fluentd-rbac.yml
namespace/logging created
serviceaccount/fluentd created
clusterrole.rbac.authorization.k8s.io/fluentd created
clusterrolebinding.rbac.authorization.k8s.io/fluentd created

Configuring Fluentd

We now need to create the ConfigMap that will hold the configuration for Fluentd.

The configuration file featured below sets up Fluentd to:

  • Upload all logs to a Catalyst Cloud Object Storage container.

  • Read the following parameters from environment variables:

    • Object Storage container name, region, and an optional path prefix.

    • Access Key ID and Secret Access Key.

    • Optional partitioning configuration overrides (e.g. upload frequency, chunk size).

Create a YAML file named fluentd-configmap.yml with the content as shown below.

---
# Create the fluent.conf config map.
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd
  namespace: logging
data:
  fluent.conf: |
    @include "#{ENV['FLUENTD_SYSTEMD_CONF'] || 'systemd'}.conf"
    @include "#{ENV['FLUENTD_PROMETHEUS_CONF'] || 'prometheus'}.conf"
    @include kubernetes.conf
    @include conf.d/*.conf

    <match **>
      # docs: https://docs.fluentd.org/output/s3
      @type s3
      @id out_s3
      @log_level info
      s3_bucket "#{ENV['S3_BUCKET_NAME']}"
      s3_region "#{ENV['S3_BUCKET_REGION']}"
      s3_endpoint "#{ENV['S3_ENDPOINT_URL'] || use_default}"
      force_path_style "#{ENV['S3_FORCE_PATH_STYLE'] || use_default ? true : false}"
      aws_key_id "#{ENV['AWS_ACCESS_KEY_ID']}"
      aws_sec_key "#{ENV['AWS_ACCESS_SECRET_KEY']}"
      path "#{ENV['S3_PATH'] || use_default}"
      s3_object_key_format "#{ENV['S3_OBJECT_KEY_FORMAT'] || '%{path}%Y/%m/%d/cluster-log-%{index}.%{file_extension}'}"
      <inject>
        time_key time
        tag_key tag
        localtime false
      </inject>
      <buffer>
        @type file
        path /var/log/fluentd-buffers/s3.buffer
        timekey "#{ENV['S3_TIMEKEY'] || '3600'}"
        timekey_use_utc true
        chunk_limit_size "#{ENV['S3_CHUNK_LIMIT_SIZE'] || '256m'}"
      </buffer>
    </match>

Run kubectl apply -f fluentd-configmap.yml to create the config map.

$ kubectl apply -f fluentd-configmap.yml
configmap/fluentd created

Creating application credentials

We now need to create the application EC2 credentials that Fluentd will use to authenticate with the Object Storage S3 API.

This consists of an Access Key ID and a Secret Access Key.

Run the following command to create the EC2 credentials:

openstack ec2 credentials create

The credentials are returned in the output. access is the Access Key ID, and secret is the Secret Access Key.

Copy these values, as they will be used in the next step.

$ openstack ec2 credentials create
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field           | Value                                                                                                                                                |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| access          | ee55dd44cc33bb2211aaee55dd44cc33                                                                                                                     |
| access_token_id | None                                                                                                                                                 |
| app_cred_id     | None                                                                                                                                                 |
| links           | {'self': 'https://api.nz-por-1.catalystcloud.io:5000/v3/users/e5d4c3b2a1e5d4c3b2a1e5d4c3b2a1e5/credentials/OS-EC2/1a2b3c4d5e1a2b3c4d5e1a2b3c4d5e1a'} |
| project_id      | e5d4c3b2a1e5d4c3b2a1e5d4c3b2a1e5                                                                                                                     |
| secret          | 11aa22bb33cc44dd55ee11aa22bb33cc                                                                                                                     |
| trust_id        | None                                                                                                                                                 |
| user_id         | 1a2b3c4d5e1a2b3c4d5e1a2b3c4d5e1a                                                                                                                     |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------+

We now need to create a Secret containing the Access Key ID and Secret Access Key. This Secret will be referenced by the DaemonSet to provide the values to the Fluentd configuration as environment variables.

Create a YAML file named fluentd-secrets.yml, pasting in the correct values for aws_access_key_id and aws_secret_access_key.

---
# Create secrets for Fluentd.
apiVersion: v1
kind: Secret
metadata:
  name: fluentd
  namespace: logging
stringData:
  aws_access_key_id: 'ee55dd44cc33bb2211aaee55dd44cc33'
  aws_secret_access_key: '11aa22bb33cc44dd55ee11aa22bb33cc'

Run kubectl apply -f fluentd-secrets.yml to create the secrets.

$ kubectl apply -f fluentd-secrets.yml
secret/fluentd created

Creating the daemon set

Finally, we will create the DaemonSet to run the Fluentd service. This will create one pod per worker node.

The Fluentd container definition mounts the previously created config map as the Fluentd configuration file, which then loads credentials, container/bucket parameters and other options from the environment variables passed to the container from the daemon set.

Note

This daemon set is designed to be used on a Kubernetes cluster hosted on Catalyst Cloud Kubernetes Service.

If you are using your own Kubernetes clusters, the container environment variables may need some slight changes. Check the daemon set definition below for more information.

Create a YAML file named fluentd-daemonset.yml.

Make sure to change the values for the following environment variables (highlighted below) to the correct values:

  • S3_BUCKET_NAME - Name of the Object Storage container to save logs to.

  • OS_REGION_NAME - Catalyst Cloud region to use to connect to the Object Storage S3 API.

    • If the container uses a single-region replication policy, set this to the region the container is located in.

      • For example, if the container is located in the nz-hlz-1 region, set this to nz-hlz-1.

    • If the container uses the multi-region replication policy AND the Kubernetes cluster is also hosted on Catalyst Cloud, set this to the same region in which the Kubernetes cluster is located.

      • For example, if the Kubernetes cluster is hosted in the nz-hlz-1 region, set this to nz-hlz-1.

    • If none of the above apply, set this to nz-por-1.

---
# Create the Fluentd daemon set.
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: logging
  labels:
    k8s-app: fluentd
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  selector:
    matchLabels:
      name: fluentd
  template:
    metadata:
      labels:
        name: fluentd
    spec:
      serviceAccountName: fluentd
      tolerations:
      - key: node-role.kubernetes.io/control-plane
        effect: NoSchedule
      containers:
      - name: fluentd
        # Check the Docker Hub page for updated versions of the image:
        # https://hub.docker.com/r/fluent/fluentd-kubernetes-daemonset
        image: fluent/fluentd-kubernetes-daemonset:v1.16-debian-s3-1
        env:
        - name: S3_BUCKET_NAME
          value: "fluentd"
        - name: OS_REGION_NAME
          value: "nz-por-1"
        # Required on Catalyst Cloud Kubernetes Service.
        # For other Kubernetes clusters, this may need to be set to `json`
        # if containerd is configured to use the `json-file` log driver.
        - name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
          value: '/^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/'
        # Optional values:
        #  * S3_PATH - Add prefix to the log files in the target container/bucket.
        #  * S3_OBJECT_KEY_FORMAT - Format string for the log file path.
        #  * S3_TIMEKEY - Interval for log files, in seconds. Default is 3600 seconds (1 hour).
        #  * S3_CHUNK_LIMIT_SIZE - Maximum size limit for chunks. Default is '256m' (256MB).
        - name: S3_ENDPOINT_URL
          value: "https://object-storage.$(OS_REGION_NAME).catalystcloud.io"
        - name: S3_BUCKET_REGION
          value: "us-east-1"
        - name: S3_FORCE_PATH_STYLE
          value: "true"
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: fluentd
              key: aws_access_key_id
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: fluentd
              key: aws_secret_access_key
        - name: FLUENT_UID
          value: "0"
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: var-log
          mountPath: /var/log
        - name: var-lib-docker-containers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-conf
          mountPath: /fluentd/etc/fluent.conf
          subPath: fluent.conf
          readOnly: true
      terminationGracePeriodSeconds: 30
      volumes:
      - name: var-log
        hostPath:
          path: /var/log
      - name: var-lib-docker-containers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-conf
        configMap:
          name: fluentd

Run kubectl apply -f fluentd-daemonset.yml to create the daemon set.

$ kubectl apply -f fluentd-daemonset.yml
daemonset.apps/fluentd created

Testing Fluentd

Once the daemon set has been created, one Fluentd pod will be started on all Kubernetes control plane and worker nodes.

Run kubectl get pod -n logging to check the status of all pods in the logging namespace.

$ kubectl get pod -n logging
NAME            READY   STATUS    RESTARTS   AGE
fluentd-5mkjp   1/1     Running   0          4m35s
fluentd-mggwm   1/1     Running   0          4m35s
fluentd-vwvf9   1/1     Running   0          4m35s
fluentd-zgskc   1/1     Running   0          4m35s

Once they reach the Running state we can query the pod logs to make sure they are running correctly.

$ kubectl logs -n logging pod/fluentd-mggwm | grep "fluentd worker is now running"
2024-04-11 04:10:26 +0000 [info]: #0 fluentd worker is now running worker=0

At this point Fluentd should start logging to Object Storage, with compressed log files being saved at the end of the hour.

Once the hour has finished, check the Object Storage container to see if log files were uploaded.

Run openstack object list fluentd to list all files in the fluentd container.

$ openstack object list fluentd
+-----------------------------+
| Name                        |
+-----------------------------+
| 2024/04/11/cluster-log-0.gz |
| 2024/04/11/cluster-log-1.gz |
| 2024/04/11/cluster-log-2.gz |
+-----------------------------+

If the log files are successfully being saved, congratulations! Fluentd is now working on your Kubernetes cluster to upload logging to Catalyst Cloud Object Storage.

Cleanup

And that’s the end of this tutorial!

If you’d like to cleanup your work done in this tutorial, keep reading.

Run the following command to delete all created Kubernetes resources.

kubectl delete -f fluentd-daemonset.yml -f fluentd-secrets.yml -f fluentd-configmap.yml -f fluentd-rbac.yml

Once the command has finished running, all of the resources will have been deleted.

$ kubectl delete -f fluentd-daemonset.yml -f fluentd-secrets.yml -f fluentd-configmap.yml -f fluentd-rbac.yml
daemonset.apps "fluentd" deleted
secret "fluentd" deleted
configmap "fluentd" deleted
namespace "logging" deleted
serviceaccount "fluentd" deleted
clusterrole.rbac.authorization.k8s.io "fluentd" deleted
clusterrolebinding.rbac.authorization.k8s.io "fluentd" deleted

You will also need to delete the resources you made on Catalyst Cloud.

Delete the EC2 credentials created for Fluentd by running openstack ec2 credentials delete and passing it the Access Key ID.

openstack ec2 credentials delete ee55dd44cc33bb2211aaee55dd44cc33

Run the following command to delete the fluentd container, and all objects stored within it.

openstack container delete fluentd --recursive

Once this has been done, the fluentd should no longer be returned by openstack container list.