This repository captures comprehensive, enterprise-grade Kubernetes and GCP operations, including provisioning, automation, and troubleshooting. It documents hands-on implementations across container orchestration, infrastructure as code, observability, security, and multi-environment cloud deployments, demonstrating both practical expertise and real-world operational workflows.
All projects, manifests, and exercises reflect production-style scenarios, showcasing the ability to deploy, manage, and troubleshoot complex Kubernetes clusters and cloud resources efficiently.
This lab was executed on a modern enterprise Kubernetes and GCP stack, including the following core tools and platforms:
Operating System: Ubuntu 24.04 LTS
Containerization: Docker 28.3.3
Kubernetes: kubectl 1.33.4, Minikube v1.33.1
Service Mesh: Istio 1.27.0 (demo profile)
Infrastructure as Code: Terraform v1.13.1
Cloud Platform: Google Cloud Platform (Compute Engine, Cloud Storage, IAM, Billing, gcloud CLI)
Development Tools: Visual Studio Code v1.103.1, YAML manifests
This environment supports end-to-end experimentation, including cluster provisioning, service deployment, monitoring, storage management, networking, and security validation across multiple namespaces and cloud resources.
├── README.md
├── Screenshots/ # Visual documentation of all tasks
│ ├── Deployments/
│ ├── Docker/
│ ├── GCP/
│ ├── Istio/
│ ├── Minikube/
│ ├── Namespaces/
│ ├── Networking/
│ ├── Pods/
│ ├── Prometheus/
│ ├── ReplicaSets/
│ ├── Security/
│ ├── ServiceAccounts/
│ ├── Scripts/
│ ├── Storage/
│ └── Terraform/
├── Manifests/ # YAML manifests for Kubernetes resources
├── Deployments/ # Deployment manifests & testing (rollouts, scaling, rollback)
├── Docker/ # Docker installation, container management, volumes
├── GCP/ # GCP provisioning, storage, VM management, billing, IAM
├── Istio/ # Service mesh setup, Bookclub demo app, gateway & virtual services
├── Minikube/ # Local Kubernetes cluster setup, load balancer, filesystem management
├── Namespaces/ # Namespace creation, management, and taint validation
├── Networking/ # Network policies and troubleshooting
├── Pods/ # Pod creation, validation, troubleshooting, volume testing
├── Prometheus/ # Monitoring and observability solution, scrapes for pod uptime
├── ReplicaSets/ # ReplicaSet manifests, pod scaling, HA validation
├── Security/ # Certificates, JSON keys, Kubernetes secrets
├── ServiceAccounts/ # Service account creation, JWT retrieval, pod attachment
├── Scripts/ # Bash scripting to automate Prometheus, Grafana and Kubernetes deployments and administration
├── Storage/ # PersistentVolume (PV) and PersistentVolumeClaim (PVC) creation & validation
└── Terraform/ # Terraform manifests, provisioning, validation, and destruction
Summary: This section covers creating, updating, rolling out, scaling, and troubleshooting Kubernetes deployments. Demonstrates knowledge of pod lifecycle, replication, rollout strategies, annotations, and error handling.
- 2025-08-17 Create deployment manifest, deploy, and validate pod status.
- 2025-08-17 Validate deployment details.
- 2025-08-17 Validate auto-scaling of replicas per deployment specifications.
- 2025-08-17 Confirm all components are running and in intended state.
- 2025-08-17 Delete deployment, scale replicas, create new deployment, validate replicas and health.
- 2025-08-17 Rollout new deployment, validate 6/6 pods are up and healthy.
- 2025-08-17 Annotate deployment, roll it out, check history, confirm change-cause annotation.
- 2025-08-17 Validate change-cause in deployment description.
- 2025-08-17 Update deployment image, annotate change-cause, rollout, validate history.
- 2025-08-17 Validate scaling of replicas.
- 2025-08-17 Downgrade deployment (fail), rollback, validate change-cause annotations and pod health.
- 2025-08-19 Deploy invalid image, confirm rollout error, validate broken pods and correct replica count.
- 2025-08-19 Rollback deployment, confirm pods terminated and correct replicas running.
- 2025-08-19 Validate image version rolled back.
- 2025-08-19 Confirm replica count via deployment description.
Summary: This section demonstrates installing Docker, launching containers, creating and mounting volumes, and validating persistent storage at the container level.
- 2025-08-13 Install Docker and launch Hello World container.
- 2025-08-13 Validate locally installed Docker version.
- 2025-09-01 Create local volume on node, validate availability.
- 2025-09-01 Start container with attached local volume, validate pod writing to volume, connect via exec -it, confirm data written.
Summary: This section covers provisioning compute resources and storage buckets, configuring IAM and gservice accounts, managing billing, creating snapshots, and enabling authentication from local workstation. Emphasizes cloud operations and observability.
- 2025-08-29 Create storage buckets in GCP project for compute VMs, Prometheus monitoring/observability, and Kubernetes.
- 2025-08-29 Create primary VM in GCP project to provision worker VMs with Terraform.
- 2025-08-29 Create RSA keypair on main workstation to facilitate SSH authentication to primary GCP VM.
- 2025-08-29 Add public key to primary GCP VM, validate SSH authentication from main workstation.
- 2025-08-29 Create bash script on primary workstation to facilitate faster SSH authentication to primary GCP VM.
- 2025-08-29 Change primary Terraform GCP VM IP type to standard ephemeral and network type to standard in GCP Compute.
- 2025-08-29 Install and validate the gcloud shell on main workstation.
- 2025-09-02 Review and analyze overall spend for the month in Billing.
- 2025-09-02 Additional review and analysis of spend by project and service.
- 2025-09-02 Review Billing dashboard.
- 2025-09-02 Filter Billing dashboard by SKU.
- 2025-09-02 Generate billing report.
- 2025-09-02 Set alerting thresholds for budgeting in Billing.
- 2025-08-29 Assign primary VM persistent disk to snapshot schedule.
- 2025-09-01 Validate primary disk for Terraform bastion host assigned to snapshot schedule, confirm run times.
- 2025-09-01 Create one-off snapshot for primary disk/bastion host, validate success.
- 2025-09-02 Enable authentication to GCP project using gcloud from primary workstation.
- 2025-09-02 Create service account, assign Storage Admin IAM binding via gcloud.
- 2025-09-02 Validate current buckets in Cloud Storage accessible via gcloud from Minikube.
- 2025-09-02 Create multiple service accounts, assign Storage Admin IAM bindings via gcloud.
Summary: Installation and validation of Istio service mesh, deploying sample applications, creating gateways, exposing services externally, and validating ingress traffic.
- 2025-08-24 Install Istio with demo profile, validate installation stability.
- 2025-08-13 Validate locally installed Istio version.
- 2025-08-29 Install Bookclub app in Istio default namespace with demo profile, validate services and pods are running.
- 2025-08-29 Validate Bookclub app in default namespace is accepting HTTP requests served by Praqma pod using curl.
- 2025-08-29 Create gateway and virtual networking services in Istio default namespace, confirm gateway is up, export INGRESS_NAME, validate external IP created, export INGRESS_PORT, SECURE_INGRESS_PORT, and TCP_INGRESS_PORT.
- 2025-08-29 Validate Bookclub app is accessible via curl in default Istio namespace.
- 2025-08-29 Validate both load balancer external IP & internal cluster IPs, confirm $GATEWAY_URL matches external IP.
- 2025-08-29 Validate Bookclub app IP exposed externally, reachable via browser.
- 2025-08-29 Validate Bookclub app is externally exposed and serving round-robin reviews from app in default Istio namespace.
- 2025-08-29 Cleanup and shutdown Bookclub application in default Istio namespace.
Summary: Local Kubernetes cluster setup, pod deployment, external load balancer provisioning, filesystem creation, and troubleshooting Minikube startup issues.
- 2025-08-12 Install Minikube.
- 2025-08-12 Start cluster and validate Minikube version.
- 2025-08-12 Validate cluster is running, stop cluster, then delete.
- 2025-08-14 Install kubectl, start cluster, validate cluster info and kubectl version.
- 2025-08-16 Identify Minikube not launching due to memory allocation; reduce memory and start cluster, create nginx pod.
- 2025-08-16 Fix Minikube not starting, reduce memory allocation create pod, create nginx pod.
- 2025-08-29 Start and validate external load balancer in Minikube.
- 2025-09-02 Create and validate filesystem on Minikube VM.
Summary: Demonstrates creating and managing Kubernetes namespaces, validating taints, and applying proper namespace scoping for resources.
- 2025-08-19: Create namespace using manifest, then create the first namespace.
- 2025-08-19: Create namespace directly with kubectl command.
- 2025-08-19: Inspect node for any existing taints.
- 2025-09-02: Create multiple namespaces and validate proper creation and functionality.
Summary: Covers applying network policies, identifying manifest errors, and validating connectivity to pods, showing hands-on troubleshooting skills for Kubernetes networking.
- 2025-08-24: TROUBLESHOOTING: Create a network policy manifest, apply it to the node, identify and correct JSON errors in the manifest, apply changes, and validate the policy is correctly applied to the pod.
Summary: Creation, management, troubleshooting, and validation of pods. Includes manifest writing, exec into containers, volume mounts, and pod lifecycle events.
- 2025-08-16: Validate running pods on the node and review pod details.
- 2025-08-16: Delete a pod and confirm it is no longer running on the node.
- 2025-08-16: Create pod manifests, validate formatting, spacing, and content correctness.
2025-08-16: TROUBLESHOOTING: Resolve a pod not starting due to a crash loop.
2025-08-16: TROUBLESHOOTING: Fix pod with the wrong image defined in the manifest.
2025-08-16: TROUBLESHOOTING: Correct pod with incorrect image version.
2025-08-16: Describe running pods and review start processes for all pods.
2025-08-16: Create multiple pods via manifest, start them, validate 3/3 are running.
2025-08-16: Delete all pods and confirm none remain on the node.
2025-08-23: TROUBLESHOOTING: Correct pod creation failure due to DockerHub image name mismatch.
2025-08-23: TROUBLESHOOTING: Resolve Prometheus pod failing to start due to incorrect image.
2025-08-23: Validate all available Kubernetes API resources on the node.
2025-08-29: Write and deploy Praqma pod manifest, validate pod is running.
2025-09-01: Create Golang pod from manifest, validate running state, connect via exec -it, and confirm hostPath volume is writing data.
2025-09-01: Describe Golang pod to confirm hostPath volume mount and configuration.
2025-09-02: Validate pod label selectors properly filter and sort pods by assigned labels.
2025-09-07: Create config map for prometheus in monitoring namespace.
Summary: Covers monitoring and observability of pods, ensuring visibility into pod up status and overall health checks of defined metrics.
- 2025-08-17: Create and run bash script -> this apply the manifest to the DNS services, pods and configmaps in the monitoring namespace, sleeps the prometheus podfor 20 seconds to allow time for it to initialize, starts port forwarding and finally internally exposes the Prometheus GUI so scraped data may be visualized.
- 2025-08-17: Validate the pods exposed to the prometheus service are showing up in monitoring and their up state may be validated under target health status.
Summary: Covers ReplicaSet manifest creation, scaling, validating replica counts, and ensuring high availability by recreating deleted pods automatically.
- 2025-08-17: Create ReplicaSet manifest in Visual Studio Code, validate accessibility via bash.
- 2025-08-17: Create ReplicaSet manifest via bash, validate correct number of pods are running and healthy.
- 2025-08-17: Create new pod manifest in Visual Studio Code for ReplicaSet testing, validate availability in bash.
- 2025-08-17: Confirm correct number of pods defined in the ReplicaSet are running.
- 2025-08-17: Delete a pod and validate that ReplicaSet automatically creates a new pod; ensure 3/3 pods are running (HA validation).
- 2025-08-17: Check details of the ReplicaSet, including replica and pod counts.
- 2025-08-17: Edit ReplicaSet manifest to scale replicas to 4, confirm the fourth pod is created and running.
- 2025-08-17: Scale down replicas to 2 using CLI without editing the manifest, validate only two pods are running.
Summary: Creating certificates for Kubernetes components, signing them, managing gservice account keys, and creating Kubernetes secrets to secure sensitive credentials.
- 2025-08-21: Create the root certificate.
- 2025-08-21: Create and sign an administrator certificate.
- 2025-08-21: Create and sign the system masters certificate.
- 2025-08-21: Create and sign certificates for kube-scheduler, kube-controller-manager, and kube-proxy services.
- 2025-09-02: Create JSON keys and assign them to each applicable GCP service account.
- 2025-09-02: Create Kubernetes secrets from JSON keys for use by all three GCP service accounts.
Summary: Demonstrates creation of service accounts, attaching them to pods, decoding JWT tokens, validating token TTL, and troubleshooting service account deployment issues.
- 2025-08-21: Create a service account, validate on the node, describe the service account.
- 2025-08-21: Describe a pod with an attached service account.
- 2025-08-23: Grab Java Web Token (JWT) for the Prometheus pod.
- 2025-08-23: Install jq locally to view the decoded token JSON; check exp and iat values to confirm token expiration is set for one year.
- 2025-08-23 (TROUBLESHOOTING): Manifest not creating service account or pod; updated the manifest on the node, deleted the pod, re-applied the manifest, validated creation.
Summary: Demonstrates PersistentVolume (PV) and PersistentVolumeClaim (PVC) creation, ephemeral storage validation, mounting across pods, multi-namespace validation, and StorageClass creation.
- 2025-09-01: Write PersistentVolume manifest for both Redis and Nginx pods, validate creation and review details via kubectl describe.
- 2025-09-01: Write manifest for both Nginx and Redis pods, start and validate running 2/2; illustrate difference between persistent and ephemeral storage.
- 2025-09-01: Describe Redis pod and validate automatically created ephemeral volume is present and mapped to the pod; check mountpoint.
- 2025-09-01: Describe Nginx pod and validate automatically created ephemeral volume is present and mapped to the pod; check mountpoint.
- 2025-09-01: Write PersistentVolumeClaim (PVC) manifest, apply, validate bound to PersistentVolume (PV).
- 2025-09-02: Create multiple pods and associated PersistentVolumes in multiple namespaces, validate creation, ready status, and namespace.
- 2025-09-02: Validate multiple PV and PVC bind status across multiple namespaces.
- 2025-09-02: Create and validate StorageClass on node.
Summary: Shows declarative infrastructure management in GCP including installing Terraform, initializing, planning, applying manifests, and destroying resources declaratively.
- 2025-08-29: Install Terraform and validate on primary GCP VM.
- 2025-09-02: Initialize Terraform, validate no issues.
- 2025-09-02: Run terraform plan against manifest, validate no issues.
- 2025-09-02: Validate GCP compute environment prior to applying compute manifest.
- 2025-09-02: Apply compute manifest using Terraform, validate additional compute resources have been provisioned declaratively.
- 2025-09-02: Validate additional compute resources (VMs) are present in the GCP project compute service.
- 2025-09-02: Apply compute manifest using Terraform, validate additional compute resources have been destroyed declaratively.