-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
TL;DR
CRITICAL BUG: Breaks fundamental IaC principles
Warning
"Untracked GKE-managed resources blocking destruction of tracked resources" resourceInUseByAnotherResource
This private-cluster-update-variant module violates the fundamental promise of Infrastructure-as-Code. terraform destroy fails catastrophically due to GKE creating untracked forwarding rules linked to the load balancer that block subnet deletion.
3 DAYS of debugging and manual intervention required for what should be a simple destroy operation. It has been very frustrating to say the least.
Affected: All users using http_load_balancing = true (the default).
Looking at the module structure, this affects the modules/private-cluster-update-variant/
terraform-google-kubernetes-engine/modules/private-cluster-update-variant/cluster.tf
Lines 22 to 32 in 98ffedd
| resource "google_container_cluster" "primary" { | |
| provider = google | |
| name = var.name | |
| description = var.description | |
| project = var.project_id | |
| resource_labels = var.cluster_resource_labels | |
| location = local.location | |
| node_locations = local.node_locations | |
| cluster_ipv4_cidr = var.cluster_ipv4_cidr |
terraform-google-kubernetes-engine/modules/private-cluster-update-variant/networks.tf
Lines 19 to 26 in 98ffedd
| data "google_compute_subnetwork" "gke_subnetwork" { | |
| provider = google | |
| count = var.add_cluster_firewall_rules ? 1 : 0 | |
| name = var.subnetwork | |
| region = local.region | |
| project = local.network_project_id | |
| } |
Note: This is similar to this issue https://discuss.hashicorp.com/t/gcp-delete-automatic-created-resources-not-by-terraform/28749
Expected behavior
Classic scenario
When I run terraform destroy, ALL resources should be cleaned up automatically without any manual intervention. That's the entire point of IaC - declarative, reproducible, hands-off infrastructure management.
Observed behavior
BROKEN DESTROY PROCESS
- terraform destroy runs
- GKE cluster gets destroyed
- VPC module SUBNET DELETION FAILS with dependency errors
- Manual detective work required to identify phantom forwarding rules created by GKE
- Manual gcloud commands needed to clean up orphaned resources
- Multiple destroy attempts required
- Complete failure of IaC principles
terraform destroy
...
module.vpc["vpc"].module.subnets.google_compute_subnetwork.subnetwork["us-east1/vllm-subnet"]: Destroying... [id=projects/**/regions/us-east1/subnetworks/vllm-subnet]
╷
│ Error:
Error when reading or editing Subnetwork: googleapi: Error 400:
The subnetwork resource 'projects/**/regions/us-east1/subnetworks/vllm-subnet' is already being used by
'projects/**/regions/us-east1/forwardingRules/a910a2abd46d247119f8a241b7234957', resourceInUseByAnotherResource
│ Root Cause:
When http_load_balancing = true , GKE automatically creates forwarding rules that reference the subnet. These resources are NOT tracked by Terraform, creating invisible dependencies that break the destruction process.
project → vpc → subnet → gke → [INVISIBLE FORWARDING RULES] → subnet deletion fails
Terraform Configuration
Using modules/private-cluster-update-variant with standard VPC setup:
module "gke" {
source = "./modules/private-cluster-update-variant"
project_id = local.target_project_id
name = var.cluster_name
region = var.region
network = local.vpc_name
subnetwork = local.subnet_name
http_load_balancing = true # DEFAULT - causes the issue
deletion_protection = false
# ... other config
}
vpc
``HCL
module "vpc" {
source = "./modules/google-network"
# version = "~> 9.0"
for_each = var.create_vpc ? { "vpc" = {} } : {}
# Required parameters
project_id = local.target_project_id
network_name = var.vpc_name
routing_mode = "REGIONAL"
# Subnets configuration (GCP uses single subnet + secondary ranges)
subnets = [
{
subnet_name = var.subnetwork_name
subnet_ip = var.subnetwork_cidr
subnet_region = var.region
subnet_private_access = "true"
subnet_flow_logs = "true"
description = "GKE cluster subnet"
}
]
# Secondary IP ranges for GKE pods and services (singular naming)
secondary_ranges = {
(var.subnetwork_name) = [
{
range_name = var.pod_range_name
ip_cidr_range = var.pod_cidr
},
{
range_name = var.service_range_name
ip_cidr_range = var.service_cidr
}
]
}
# Optional parameters
delete_default_internet_gateway_routes = false
depends_on = [
module.vllm_gke_project,
google_project_service.existing_project_services ,
]
}Terraform Version
Terraform: 1.3+
Google Provider: >= 6.42.0, < 7
Module Version: v38.0.0Terraform Provider Versions
terraform {
required_version = ">= 1.3, < 2.0"
required_providers {
google = {
source = "hashicorp/google"
version = ">= 6.27.0, < 7"
}
google-beta = {
source = "hashicorp/google-beta"
version = ">= 4.64, < 7"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.10"
}
helm = {
source = "hashicorp/helm"
version = ">= 2.15"
}
kubectl = {
source = "gavinbunney/kubectl"
version = ">= 1.19.0"
}
random = {
source = "hashicorp/random"
version = ">= 2.1"
}
local = {
source = "hashicorp/local"
version = ">= 2.5"
}
}
}
# Configure the Google provider
provider "google" {
project = var.project_id # i.e TF_VAR_project_id
region = var.region
}
provider "google-beta" {
project = var.project_id
region = var.region
}
# Get access token for authentication
data "google_client_config" "default" {}
provider "random" {}
...Additional information
Failed Workaround Attempts
1. Import-based Resource Management (doesn't work):
# Attempted to import GKE-created forwarding rules
resource "google_compute_forwarding_rule" "managed_lb_rules" {
for_each = { for rule in local.blocking_rules : rule.name => rule }
lifecycle { ignore_changes = all }
}
import {
for_each = { for rule in local.blocking_rules : rule.name => rule }
to = google_compute_forwarding_rule.managed_lb_rules[each.key]
id = "projects/${local.target_project_id}/regions/${var.region}/forwardingRules/${each.key}"
}2. Destroy-time Cleanup (doesn't work):
# Data source that runs after GKE is created
data "google_compute_forwarding_rules" "all_rules" {
project = local.target_project_id
region = var.region
# depends_on = [
# module.gke
# ]
}
locals {
blocking_rules = [
for rule in data.google_compute_forwarding_rules.all_rules.rules :
rule if can(regex("/${var.subnetwork_name}$", rule.subnetwork))
]
blocking_map = { for r in local.blocking_rules : r.name => r }
}
resource "null_resource" "cleanup_forwarding_rule" {
for_each = { for r in local.blocking_rules : r.name => r }
triggers = {
project_id = local.target_project_id
region = var.region
rule_name = each.key
}
provisioner "local-exec" {
when = destroy
command = "gcloud compute forwarding-rules delete '${self.triggers.rule_name}' --region='${self.triggers.region}' --project='${self.triggers.project_id}' --quiet"
on_failure = continue
}
depends_on = [module.gke]
}The gcloud command works perfectly outside Terraform but doesn't solve the dependency ordering within Terraform's destroy process.
gcloud compute addresses delete fw-id --region=us-east1 --project=$project_idImpact
This bug completely breaks the Infrastructure as Code paradigm. Users are forced into manual intervention, defeating the entire purpose of declarative infrastructure management. No production environment can accept this implementation.
❌ Hidden Circular Dependency Created by GKE:
subnet ← [invisible forwarding rules] ← gke cluster ← subnet
This proves the module creates dependencies outside Terraform's knowledge that break even perfect configurations.
Needed Fix
- Ensure proper destruction ordering
- Provide a declarative solution that doesn't require manual intervention
Thank You