Architecture

kty is a standalone SSH server which provides much of the standard SSH functionality as if a Kubernetes cluster was a single host. This allows for ssh me@my-cluster to provide a shell in a running pod, tunneling between resources without a VPN and copying files from the cluster. By leveraging OpenID and RBAC, you can provide access to your clusters from anywhere using ssh, a tool that comes included in almost every environment out there.

Similar to normal SSH servers, it delegates authentication and authorization to other methods. In particular, identity is verified via. OpenID providers or SSH public keys that have been added as custom resources to the cluster. It is possible to bring your own provider but there is a default one provided that allows for Github and Google authentication out of the box.

Once authenticated, each user’s session has a Kubernetes client attached to it that impersonates the user - in fact, the server itself has almost no permissions itself. Based on the user’s permissions granted with RBAC, the server acts on the user’s behalf based on what the SSH session is telling it to do. This is effectively the same thing that kubectl does with authentication plugins, just without the plugin or kubectl binary involved.

How it works

Shell Access

After a session has been established, the SSH client can issue a pty request. To do this:

  1. The client requests a new channel.
  2. The PTY request is sent to the server.
  3. A k8s client is created that impersonates the user.
  4. An IO loop and UI loop are started as background tasks.
  5. The user receives a TUI dashboard that is interpreting the k8s client’s output.

Note: log streaming works similarly here. A stream is setup and gets added to a buffer on each loop of the renderer.

Once the dashboard has been started up, a user can go to the Shell tab. This:

  1. Issues a pod/exec request in the client.
  2. Pauses drawing the UI in the state that it was at.
  3. Switches the UI to raw mode. Instead of drawing the UI, it streams directly from the pod’s exec input/output.
  4. On destruction of the stream, it resumes drawing the UI.

Client functionality used:

  • A Reflector to list pods and get all the state there.
  • pod/logs to stream logs from the pod
  • pod/exec to exec /bin/bash in a container and return a shell. This also uses the tty functionality there to allow for direct access instead of a simple stream.

Ingress Tunneling

Once a session has been established:

  1. The client starts listening to the configured port on localhost.
  2. When a new connection to that port is received, it opens a direct-tcpip channel to the server.
  3. The incoming hostname is mapped from a k8s resource string (<resource>/<namespace>/<name>) to something that can be connected to. In the case of nodes and pods, this is the IP address associated with the resource itself. For services, this is the service’s DNS name itself.
  4. A connection from the server to the resource is established.
  5. A background task is started that copies data between the client’s channel and the stream between the server and resource.
  6. When EOF is sent on either the source stream or the destination stream, the background task is canceled and the channel is closed.

Notes:

  • The client creates a new channel on each incoming TCP connection.
  • Usually, a client sends a PTY request and then asks for new tcpip channels. Because of this behavior, we show tunnel status in the dashboard. It also report errors on a per-session basis.
  • The server assumes that it has access to the requested resources. If not, the connection will fail and the user will receive an error.
  • This does not use pod/port-forward like kubectl does. It proxies directly from the server to the resource.
  • This does not rely on proxy support in the API server. That is restricted to HTTP/HTTPS and doesn’t allow raw TCP.

Client functionality used:

  • get for the resource in question (node, pod).

Egress Tunneling

Once a session has been established:

  1. The client issues a tcpip-forward request.
  2. The client optionally sends a pty request.
  3. A backgroudn task is started that listens on a random port on the server.
  4. The service in the connection string (-R default/foo:8080:localhost:8080) is patched (or created) so that it has no selector.
  5. An EndpointSlice is created with:
    • An address pointing at the server’s IP address. This comes from local-ip-address when run within a cluster and via. serve config otherwise.
    • A TargetRef that is the server. On cluster, this is built from the kubeconfig’s default namespace and the hostname of the pod.
  6. Incoming connections open a forwarded-tcpip channel to the client.
  7. A background task is started that copies data between the source (something on the cluster) and the destination (the localhost).
  8. When EOF is sent on either the source stream or the destination stream, the background task is canceled and the SSH channel is closed. This does not terminate the session - that is still running and could be handling other things.

Notes:

  • There’s a new channel created to the client on every incoming connection for the cluster. This works because SSH sessions are assumed to be multiplexed and bidirectional.
  • The service is patched, the assumption is that the user issuing the request can override the service if desired. It is entirely possible, however, that an important service is overwritten.
  • The service and endpoint created are not garbage collected. OwnerReferences are not cross-namespace, so it becomes difficult to know what is unused and what isn’t.

Client functionality used:

  • patch for services and endpoints.

SCP / SFTP

Once a session has been established:

  1. The client requests a new channel.
  2. A subsystem request for sftp is issued.
  3. The channel is handed off to a background task which handles the sftp protocol.

At this point, what happens depends on the incoming request. For a simple scp me@my-cluster:/default/foo/bar/etc/hosts /tmp/hosts:

  • An fstat request is sent to verify that the file exists.
  • The default namespace is fetched.
  • The foo pod is fetched.
  • The bar container is verified to exist.
  • A pod/exec is started in the default/foo pod for the bar container that does ls -l /etc/hosts.
  • The result of this is parsed into something that can be sent back to the client.
  • The client issues an open request.
  • The client issues a read request.
  • Another pod/exec is started, this time with cat /etc/hosts. The content of this is streamed back to the client.
  • The client finally issues a close request and an EOF.
  • The server closes the connection.

In addition to stat and read requests, SFTP allows for browsing entire file trees. This is handled at the namespace/pod level via list requests for those resources (eg list namespaces for the root). Inside the container, pdo/exec and ls is used again on a per-directory basis.

Notes:

  • This seems a little ridiculous, and it is. This is almost how kubectl cp works! Instead of ls and cat it uses tar and hopes that it works.

Client functionality used:

  • list for namespaces and pods.
  • get for namespaces and pods.
  • pod/exec for files inside the container.

Design Decisions

  • Instead of having a separate User CRD to track the user, we rely on k8s’ users/groups which are subjects on (Cluster)RoleBindings. Identity is extracted from the openid tokens via claims (email by default) and that is used to map to k8s concepts. The Key resource maps the values from a previous token to the SSH key used during the original authentication attempt. This key expires when the token itself would have and can be created manually with any desired expiration.

  • The minimum viable access is list for pods across all namespaces. Understanding what subset of a cluster users can see is a PITA. This is because k8s cannot filter list requests to a subset. When combined with the fact that SelfSubjectRulesReview only works on a per-namespace basis, it becomes extremely expensive to understand what an individual user can see across the entire cluster. This will be updated in the future but requires namespaces to be available via UI elements.