Networking

Kubernetes networking is a fundamental and often advanced topic, critical for understanding how applications communicate within a cluster and with the outside world. It addresses several key challenges, including communication between containers within a Pod, communication between different Pods, exposing applications to external users, and enabling services to communicate internally within the cluster. All the practical examples discussed here can be performed using Minikube on an AWS EC2 instance.

Container-to-Container Communication Within a Pod#

Within a single Kubernetes Pod, multiple containers can exist. While a container itself does not have its own IP address, the Pod does. If there are two or more containers inside the same Pod, they can communicate with each other using localhost. This is akin to members of the same household not needing a telephone to speak to each other; they are within the same "house" (the Pod). For example, if you have two containers, c0 and c01, within a single Pod, and c01 is running an Apache server, c0 can access c01 by making a request to localhost on the Apache server's port (e.g., localhost:80).

To illustrate this, consider a Pod definition that creates two containers: one running Ubuntu (c0) and another running Apache HTTPD (c01).

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  containers:
  - name: c0
    image: ubuntu
    command: ["/bin/bash", "-c", "while true; do sleep 5; done"]
  - name: c01
    image: httpd # Apache HTTP Server
    ports:
    - containerPort: 80 # Exposing port 80 for Apache

After applying this YAML with kubectl apply -f <filename>.yaml, you can check the Pod status using kubectl get pods. Once the Pod is running, you can execute a command inside the c0 container to test communication with c01. You would typically install curl in c0 first, then run curl localhost:80. A successful response like "It works" (the default Apache message) indicates that c0 can communicate with c01.

Pod-to-Pod Communication Within the Same Node#

When different Pods need to communicate, even if they are on the same Node, they cannot use localhost because they are in different "houses" (different Pods). Instead, each Pod is assigned its own unique IP address. These Pod IPs enable communication between different Pods on the same Node. For instance, if you have Pod1 running Nginx and Pod2 running Apache on the same Node, Pod1 can communicate with Pod2 by using Pod2's IP address and the port it exposes.

The YAML for two separate Pods would look something like this: For an Nginx Pod (e.g., pod-nginx.yaml):

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
spec:
  containers:
  - name: nginx-container
    image: nginx
    ports:
    - containerPort: 80

For an Apache Pod (e.g., pod-apache.yaml):

apiVersion: v1
kind: Pod
metadata:
  name: apache-pod
spec:
  containers:
  - name: apache-container
    image: httpd
    ports:
    - containerPort: 80

After applying both YAMLs, you can retrieve their IP addresses using kubectl get pods -o wide. Then, from within one Pod (e.g., nginx-pod), you can execute curl <apache-pod-ip>:80 to confirm communication.

It is important to remember that by default, these Pod IPs are not accessible from outside the Node. They are meant for internal cluster communication.

The Challenge of Ephemeral Pod IPs and the Role of Services#

A significant challenge in Kubernetes is the ephemeral nature of Pod IPs. Pods are designed to be short-lived and can be terminated and recreated for various reasons, such as scaling operations, updates, or failures. When a Pod is recreated, it gets a new IP address. This poses a major problem for applications or users trying to access these Pods, as they cannot reliably keep track of constantly changing IP addresses. For example, if a frontend application needs to connect to a backend database Pod, and that database Pod's IP keeps changing, the frontend would lose its connection.

To overcome this, Kubernetes introduces the Service object. A Service acts as a logical bridge or a stable abstraction layer in front of a set of Pods. It provides a static Virtual IP (VIP) and a DNS name that remains constant, regardless of whether the underlying Pods are terminated, recreated, or moved to different Nodes. When a client connects to the Service's VIP, the Service then redirects the traffic to the appropriate Pod(s). This redirection is dynamically maintained by a component called Kube-proxy, which continuously monitors the cluster and updates the mapping between the Service's VIP and the current Pod IPs. Kube-proxy queries the Kubernetes API server to learn about new Services and Pods in the cluster.

Services use labels and selectors to determine which Pods they should target. You define specific labels on your Pods (e.g., app: my-app), and then the Service's selector is configured to match those labels. This ensures the Service only routes traffic to the intended Pods.

There are four main types of Services: 1. ClusterIP: Exposes the Service on a cluster-internal IP. This type is the default and only accessible from within the cluster. 2. NodePort: Exposes the Service on a static port (the NodePort) on each Node in the cluster. This makes the Service accessible from outside the cluster using <NodeIP>:<NodePort>. 3. LoadBalancer: Exposes the Service externally using a cloud provider's load balancer. This type is specific to cloud environments. 4. Headless: Used when you don't need a stable IP but want direct access to Pods. It doesn't allocate a ClusterIP and provides DNS records for each Pod directly.

The source indicates a hierarchy where NodePort builds on ClusterIP, and LoadBalancer builds on NodePort.

ClusterIP Service Example#

A ClusterIP Service provides a virtual IP that is only visible and accessible from within the cluster. It is commonly used for internal communication between different components of a microservice architecture.

Here is a conceptual YAML for a Deployment and a ClusterIP Service: First, a Deployment (e.g., deploy-httpd.yaml) to manage Apache Pods:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-apache-app # Label for Pods
  template:
    metadata:
      labels:
        app: my-apache-app # Labels applied to Pods
    spec:
      containers:
      - name: apache-container
        image: httpd
        ports:
        - containerPort: 80

Then, a ClusterIP Service (e.g., service-clusterip.yaml) to expose the Deployment:

apiVersion: v1
kind: Service
metadata:
  name: demo-service
spec:
  selector:
    app: my-apache-app # Matches the Pods' label
  ports:
    - protocol: TCP
      port: 80 # Service port
      targetPort: 80 # Container port
  type: ClusterIP # Explicitly defined as ClusterIP

After applying these files, kubectl get svc will show the Service and its assigned ClusterIP. You can then use curl <ClusterIP>:80 from another Pod within the cluster to access the Apache application. If you then manually delete the Apache Pod (kubectl delete pod <pod-name>), the Deployment will automatically recreate it with a new IP, but the ClusterIP of the Service will remain the same, and you will still be able to access the new Pod using the original ClusterIP. This demonstrates the static nature of the Service's Virtual IP.

NodePort Service Example#

A NodePort Service enables external access to applications running in Pods. It does this by exposing the Service on a static port (the NodePort, typically in the range 30000-32767) on each Node in the cluster. External users can then access the application using the Node's IP address (or Public DNS) and this specific NodePort. The NodePort will forward the traffic to the Service's ClusterIP, which then routes it to the correct Pod.

Using the same Deployment as before, here is a conceptual YAML for a NodePort Service (e.g., service-nodeport.yaml):

apiVersion: v1
kind: Service
metadata:
  name: demo-service
spec:
  selector:
    app: my-apache-app # Matches the Pods' label
  ports:
    - protocol: TCP
      port: 80 # Service port
      targetPort: 80 # Container port
      nodePort: 31341 # Optional: specific NodePort, or Kubernetes assigns one
  type: NodePort # Explicitly defined as NodePort

After applying this, kubectl get svc will show the NodePort assigned to your Service (e.g., 31341:80/TCP). To access the application from outside the cluster (e.g., from your web browser), you would use the Public DNS of your EC2 instance (Node) followed by the NodePort. For instance, http://<EC2-Public-DNS>:31341. It's crucial to ensure that the assigned NodePort is allowed in your cloud provider's security group (e.g., AWS Security Group) for inbound traffic.

Kubernetes Volumes for Data Persistence#

Containers are designed to be stateless and ephemeral; any data stored directly within a container's filesystem is lost if the container crashes or is terminated. This presents a problem for applications that need to store persistent data. Kubernetes solves this using Volumes. A Volume in Kubernetes is essentially a directory that is accessible to the containers within a Pod. Unlike container-specific storage, a Volume's lifecycle is tied to the Pod, not individual containers.

Important Volume behaviors: * If a container within a Pod crashes or is restarted, the data in the Volume persists and the new or restarted container can access it. * If the Pod itself is deleted or fails, the Volume associated with it is also deleted, and any data it contained is lost.

Volumes can be shared between multiple containers within the same Pod. Kubernetes offers various types of Volumes, including local storage (like emptyDir and hostPath), network file systems (like NFS), and cloud-provider-specific storage solutions (like AWS EBS or Azure Disk). The video focuses on emptyDir and hostPath volumes.

EmptyDir Volume Example#

An EmptyDir volume is created when a Pod is assigned to a Node and exists as long as that Pod is running on that Node. It starts empty, hence its name. Its primary purpose is to provide temporary, shared storage for multiple containers within the same Pod. Data in an EmptyDir volume is lost if the Pod is deleted.

Consider a Pod definition with two containers (c1 and c2) that share an EmptyDir volume:

apiVersion: v1
kind: Pod
metadata:
  name: my-volume-emptydir
spec:
  containers:
  - name: c1
    image: ubuntu
    command: ["/bin/bash", "-c", "while true; do sleep 5; done"]
    volumeMounts:
    - name: exchange # Volume name
      mountPath: /tmp/exchange # Path inside container c1
  - name: c2
    image: ubuntu
    command: ["/bin/bash", "-c", "while true; do sleep 5; done"]
    volumeMounts:
    - name: exchange # Same volume name
      mountPath: /tmp/data # Path inside container c2
  volumes:
  - name: exchange # Defines the EmptyDir volume
    emptyDir: {}

After applying this YAML, you can verify data sharing. Execute a command in c1 to create a file within its mounted path (/tmp/exchange/). For example, kubectl exec -it my-volume-emptydir -c c1 -- /bin/bash to enter the container, then echo "Technical Guftgu Zindabad" > /tmp/exchange/my-file.txt. Then, exit c1 and enter c2 using kubectl exec -it my-volume-emptydir -c c2 -- /bin/bash. Navigate to c2's mounted path (/tmp/data/) and list the contents. You should see my-file.txt created by c1. This confirms that both containers are accessing and sharing the same underlying volume.

HostPath Volume Example#

A HostPath volume maps a file or directory from the host Node's filesystem into a Pod. This type of volume is useful when you want the data to persist even if the Pod is deleted, as the data resides on the underlying Node's disk. It also allows Pods to access specific files or directories on the host. However, it ties the Pod to a specific Node and is not suitable for multi-node clusters where Pods might move, or for highly available applications unless carefully managed [Outside source - I'm noting this is outside the source, as the source doesn't explicitly mention the limitations of HostPath in multi-node environments, but focuses on its persistence].

Here is a conceptual YAML for a Pod using a HostPath volume:

apiVersion: v1
kind: Pod
metadata:
  name: my-volume-hostpath
spec:
  containers:
  - name: hostpath-container
    image: ubuntu
    command: ["/bin/bash", "-c", "while true; do sleep 5; done"]
    volumeMounts:
    - name: test-volume
      mountPath: /tmp/hostpath-mount # Path inside the container
  volumes:
  - name: test-volume
    hostPath:
      path: /tmp/data # Path on the host Node's filesystem
      type: DirectoryOrCreate # Ensures the directory exists or creates it

After applying this YAML, you can execute a command inside the hostpath-container to create a file in its mounted path (/tmp/hostpath-mount/). For instance, kubectl exec -it my-volume-hostpath -c hostpath-container -- /bin/bash, then echo "I love Technical Guftgu" > /tmp/hostpath-mount/my-file.txt. After creating the file, exit the container. Now, access the host Node's filesystem directly (e.g., via SSH into the EC2 instance) and navigate to /tmp/data/. You should find my-file.txt there, demonstrating that the data written by the container was persisted to the host machine. Similarly, if you create a file directly on the host in /tmp/data/, the container can also see and access that file.

Common Kubernetes Commands#

Throughout these operations, various kubectl commands are essential for managing and inspecting Kubernetes resources: * kubectl apply -f <filename.yaml>: Applies a configuration defined in a YAML file. * kubectl get <resource-type>: Lists resources (e.g., kubectl get pods, kubectl get deploy, kubectl get svc, kubectl get rs). * kubectl get <resource-type> -o wide: Provides more detailed output, often including IP addresses. * kubectl describe <resource-type> <resource-name>: Shows extensive details about a specific resource, including its configuration, events, and related objects. * kubectl logs -f <pod-name>: Streams logs from a container in a Pod. * kubectl exec -it <pod-name> -c <container-name> -- <command>: Executes a command inside a specific container, often used to get an interactive shell. * kubectl scale --replicas=<count> deploy <deployment-name>: Scales a Deployment up or down by changing the desired number of Pod replicas [Previous conversation, not explicit in source, but implied by Deployment functionality]. * kubectl rollout history deploy <deployment-name>: Shows the revision history of a Deployment [Previous conversation, not explicit in source, but implied by Deployment functionality]. * kubectl rollout undo deploy <deployment-name>: Reverts a Deployment to its previous version [Previous conversation, not explicit in source, but implied by Deployment functionality]. * kubectl rollout undo deploy <deployment-name> --to-revision=<number>: Reverts a Deployment to a specific historical revision [Previous conversation, not explicit in source, but implied by Deployment functionality]. * kubectl delete -f <filename.yaml> or kubectl delete <resource-type> <resource-name>: Deletes a resource.

Potential Deployment Failure Scenarios (Interview Focus)#

During an interview, understanding why Kubernetes Deployments might fail is crucial. Common reasons include [Previous conversation, not explicit in source, but good for exam/interview]: * Insufficient Quota: Lack of available CPU or memory resources in the cluster or on the Nodes. * Unhealthy Nodes: The Kubernetes Nodes are in an unhealthy state or are down, preventing Pod scheduling. * Image Pull Errors: The specified container image cannot be pulled from the registry due to an incorrect name/tag, authentication issues for private registries, or the image not existing. * Insufficient Permissions: The user or service account deploying the application lacks the necessary permissions to create or manage Kubernetes resources. * Limit Ranges: Resource limits defined (e.g., CPU or memory requests/limits) are exceeded, preventing Pods from being scheduled or starting correctly. * Application Runtime Misconfiguration: Issues within the application code or its configuration prevent it from starting or running correctly inside the container. * kubectl Connection Issues: Problems with the kubectl client's configuration or network connectivity to the Kubernetes API server.