Site icon Joe Guaneri

Misadventures of AWS Default VPC and Terraform

When you create a new AWS account, one of the best practices to follow is to remove all the default VPC resources. So when the 4.0 AWS Provider released it was really exciting as they included a new option to help destroy this:

resource/aws_default_vpc: If no default VPC exists in the current AWS Region one is now created. The force_destroy destroy argument has been added (defaults to false). Setting this argument to true deletes the default VPC on terraform destroy (#22253)

So going ahead you’d think you could now simply do this.

resource aws_default_vpc "default_vpc" {
force_destroy = true
}

All the headaches for removing them in your brand newly vended AWS account would go away. And with one small block, we can remove scripts from vending machines. Well unfortunately, it doesn’t work like this. This article will explain why, show a brute forced solution, and discuss alternatives.

The Dependency Problem

First, when you run this block, the apply action will add the default VPC (or create one if it does not exist) rather than remove it. The default VPC is an odd resource as it exists before the terraform state does since it gets created during account creation. The force_destroy=true is required so when a destroy is executed, the resource is removed from the account and not just the state file. So the first problem is, you have to apply to add the resource to state before you can even destroy it.

No big deal, you run an apply to add it, then run a second apply with the -destroy flag to delete it. The second problem occurs when you run the destroy. The apply will tease you for around 5 minutes before it fails with an error.

Error: error deleting EC2 VPC (vpc-000): DependencyViolation:

This error has been an open issue now for since March 3, 2022. The problem is the default VPC resource cannot be deleted unless all its dependencies are removed first. This may seem strange at first given that when you delete the VPC in the AWS management console the process is all inclusive. AWS just performs the multiple API calls behind the scenes. There is a Terraform resource for the default subnets that get created, unfortunately nothing exists for the default IGW that gets created. So before you can destroy the default VPC you first need to find and then destroy the IGWs.

The Terraform Solution

To delete the default VPC using Terraform requires multiple steps to replicate the AWS management console experience. To show the audacity of the code required to this the following repository has been created with all the code needed to perform the steps: https://github.com/joeguaneri/delete-default-vpc

1. Locate all the IGWs
2. Import all the IGWs into a state file
3. Destroy IGWs in the state file
4. Add all the default VPCs into a state file
5. Add all the default subnets into a state file
6. Destroy all the default VPCs and default subnets in the state file

Some notes around why this code looks bad (and it definitely is very bad):
– The IGW import requires two phases since import blocks are processed before data lookups in the lifecycle of a terraform plan/apply. This is why the cli import was used over the import block, as the data lookup could be easily accessed from a terraform output after the initial plan was created. The output does contain quotes (“”) that need to be trimmed.
– The data lookup itself is isolated also since the VPC and subnets are added and destroyed after the IGWs are removed. The data lookups fail since the assertion no longer exists if this code was combined (which was attempted originally).
– There’s definitely a cleaner way to iterate through all of the regions, but this was done explicitly since not all regions have the same amount of subnets.
– Each region really needs its own provider for this architecture, so to make this code somewhat manageable it was split by region, with each region aliased explicitly to make things easily extensible if new default regions are added.
– There exists an oddity in us-west-1 where there are 3 availability zones but only 2 get provisioned. The extra resource will error but does not cause the apply to fail.
– The Terraform state files are created locally as they are really just unnecessary artifacts once the code is executed to completion and not needed anymore. The run.sh provided deletes them prior to every run so this can be reused for multiple accounts without any additional work needed.

Conclusion

Using Terraform to solve this seems like a great idea, then after writing the first 10 lines it quickly felt like a bad idea. The Terraform files are more complex than an elegant script. Unless there is some real justification for using Terraform for this, the script still seems like a better approach.

Exit mobile version