Filter secrets from Kubernetes logs

Running any non-trivial application on Kubernetes will most likely require authorized access to other components - databases, storage buckets, APIs - all of which require a connection string or some sort of access key. Storing these values in Kubernetes is done through Secrets, and while there are plenty of ways to make sure the secrets are safe while at rest, as well as how to configure an external KMS provider, once the secret is injected into your application container, its value will be plain text.

In this article we wil explore how to filter any Kubernetes secrets that end up in application logs - the entire project for this article can be found on GitHub, together with a sample application.

How to filter the logs of a Kubernetes pod

One approach could be to run a privileged DaemonSet that mounts the logs directory and runs a filtering process on the logs. This presents the immediate advantage of not having to modify your applications at all - however, filtering the logs of all pods deployed on the cluster will be a highly intensive task, and mounting part of the node filesystem inside a pod (that is privileged or not) can prove to be disruptive. That is why, while this approach is mentioned for completeness, is not recommended.

So, is there a way to filter the logs of a single application, in a non-privileged way? We can easily achieve this by making use of the sidecar pattern.

The sidecar pattern is a single node pattern made up of two containers. The first is the application container. It contains the core logic for the application. Without this container, the application would not exist. In addition to the application container, there is a sidecar container. The role of the sidecar is to augment and improve the application container, often without the application container’s knowledge. In its simplest form, a sidecar container can be used to add functionality to a container that might otherwise be difficult to improve.

Excerpt from the second chapter of the book Designing Distributed Systems, by Brendan Burns.

In a nutshell, this is our approach:

forward the stdout of the main application container to a file
run a sidecar container and share the file where the main application container writes the logs
run a filtering process in the sidecar that continuously reads the main application log file and writes the filtered logs to the sidecar stdout
collect the sidecar stdout as the application logs

Note that if you have an alpha cluster with shared PID namespaces - that is a cluster created with --feature-gates=PodShareProcessNamespace=true, then you can access the main process’ stdout directly, without sharing a volume between the main container and the sidecar.

While this is still an alpha feature (thus not recommended for testing in production clusters), and we will continue to explore how to achieve this through volume mounts, this guide is still helpful, with the only difference that the logs file will be accessed in the sidecar as /proc/<main-application-pid>/fd/1, which is the file descriptor of stdout.

The main advantage of this approach is that we don’t need to mount parts of the node filesystem in a pod, and we can enable the filter for individual applications, filtering the secrets from a specific namespace. The only mention here is that we should be able to redirect the logs of the main application to a specific file. If that is not possible, explore running the sidecar by sharing the PID namespace and accessing stdout directly.

The filtering process

The filtering application is a simple Go application that tails the file where the main application writes the logs, iterates through all Kubernetes secrets in the provided namespace, and checks if any value is present in the logs. If found, it will remove it, then write the redacted log to stdout. This container’s logs become the application logs:

func filter(line string, secrets []v1.Secret) string {
	for _, s := range secrets {
		for _, v := range s.Data {
			// if the log line contains a secret value redact it
			if strings.Contains(line, string(v)) {
				line = strings.Replace(line, string(v), "[ redacted ]", -1)
			}
		}
	}
	return line
}

The filtering algorithm is fairly simple - it only does a strings.Contains on the log line for every Kubernetes secret value in the namespace – this means that the more secrets in the namespace, the more CPU cycles it will take to filter a log – so be mindful of this when running in a namespace with lots of secrets.

Note that the filtering function above is a simple example on how to process your logs – for any production-purpose filtering, you should fork the repository and replace the function with your own, and an immediate alternative would be to use regex instead of the loop with strings.Contains – but you are free to come up with virtually any filtering algorithm.

A Kubernetes cache is used in order to avoid getting all secrets from the API on every filtering request. The resync period for the cache is set to 30 seconds. If a new secret is added, then the application tries to print it before the cache resynced, that log will contain the secret value. If your use case demands it, reduce the resync period - but keep in mind the impact this will have on networking and on the API server.

Because of the simplicity of the filtering loop, any processing done to a secret value (such as a base64 encoding), followed by printing it in the logs, will not be caught by the current implementation – however, the filter can be easily modified to accommodate additional transformations of values.

The filtering loop currently runs for for each new line. For a real world scenario, filtering the logs in chunks of lines seems a reasonable approach – this can be modified in the main function of the filter.

Important note: This method only prevents accidental printing of logs from an application output. It is not designed to prevent a potentially malicious attempt of gaining access to Kubernetes secrets – and it should be used accordingly.

The sample application

Now that we saw how the filtering process works, let’s see an example of this in action.

The sample, which is included in the GitHub repository, is a simple NodeJS application that just logs the query parameters of every request. The following section redirects the application stdout to a logs file whose name is provided as an environment variable:

var logsFile = process.env.LOGS_FILE;

// redirect stdout and stderr to file
var access = fs.createWriteStream(logsFile);
process.stdout.write = process.stderr.write = access.write.bind(access);

This ensures all console.log calls from within the application will be redirected to our desired logs file, which will be shared with the filtering container through a volume mount.

Of course, the main application can be written in any language, and redirecting the logs to a file should be a fairly straightforward task in any modern framework.

Now we simply build a container image out of our application and push it to a container registry. Then, we need to create a Kubernetes manifest with the main application and the sidecar:

apiVersion: v1
kind: Pod
metadata:
  name: filter-logs
spec:
  # if running in a cluster without RBAC, remove the service account
  serviceAccountName: filter-sa
  containers:
  - name: main-app
    image: radumatei/node-redirect-stdout:alpine
    env:
    - name: LOGS_FILE
      value: "/var/log/app"
    volumeMounts:
    - name: varlog
      mountPath: /var/log

  - name: filter
    image: radumatei/filter-kubernetes-logs:latest
    env:
    - name: LOGS_FILE
      value: "/var/log/app"
    - name: NAMESPACE
      value: "<your-namespace>"
    volumeMounts:
    - name: varlog
      mountPath: /var/log

  volumes:
  - name: varlog
    emptyDir: {}

This is a simple Kubernetes manifest for a pod with two containers: the main application container and the sidecar filter; we pass the logs file as environment variables to both containers, while also mounting a volume that contains the log file to both containers.

Please note that in the repo there is also a Kubernetes manifest that must be used on RBAC-enabled clusters – that is not shown here for brevity.

For simplicity, let’s create a new namespace:

$ kubectl create namespace filter-logs

Now in this namespace, create a new secret, according to the instructions in the Kubernetes documentation. The values I used are: super-secret-admin-username for the user name and super-secret-admin-password for the password.

Now deploy the pod (and if needed, also the filter-role.yaml file containing the RBAC objects). We expect that if the web application would ever try to write super-secret-admin-username or super-secret-admin-password (or the value of any secret in the namespace) in the logs, our filtering sidecar would redact that.

Let’s see if that actually happens:

Clearly, the sidecar container redacts all secret values from the application logs, so we avoid accidentally outputting secret values in the logs of our applications. Executing kubectl logs also works as expected:

$ kubectl logs filter-logs -c filter
{ user: 'radu',
  password: 'abcdef' }

{ user: '[ redacted ]',
  password: '[ redacted ]' }

{ user: '[ redacted ]',
  password: '[ redacted ]abc' }

{ user: '[ redacted ]2',
  password: '[ redacted ]abc' }

Conclusion

In this article we saw how to filter Kubernetes secrets from the logs of our applications by running a sidecar container that continuously redacts the secret values from the logs. As mentioned, you are free to write your own filtering algorithm based on the needs of your application, as well as implement filtering in chunks of multiple log lines.

Thanks for reading, let me know your thoughts in the comments, and have fun filtering your application logs!