Skip to main content
Service Catalog Version 0.90.3Last updated in version 0.90.1

Amazon ECS Cluster

View SourceRelease Notes

Overview

This service contains Terraform code to deploy a production-grade ECS cluster on AWS using Elastic Container Service (ECS).

This service launches an ECS cluster on top of an Auto Scaling Group that you manage. If you wish to launch an ECS cluster on top of Fargate that is completely managed by AWS, refer to the ecs-fargate-cluster module. Refer to the section EC2 vs Fargate Launch Types for more information on the differences between the two flavors.

ECS architectureECS architecture

Features

This Terraform Module launches an EC2 Container Service Cluster that you can use to run Docker containers. The cluster consists of a configurable number of instances in an Auto Scaling Group (ASG). Each instance:

  • Runs the ECS Container Agent so it can communicate with the ECS scheduler.

  • Authenticates with a Docker repo so it can download private images. The Docker repo auth details should be encrypted using Amazon Key Management Service (KMS) and passed in as input variables. The instances, when booting up, will use gruntkms to decrypt the data in-memory. Note that the IAM role for these instances, which uses var.cluster_name as its name, must be granted access to the Customer Master Key (CMK) used to encrypt the data.

  • Runs the CloudWatch Logs Agent to send all logs in syslog to CloudWatch Logs. This is configured using the cloudwatch-agent.

  • Emits custom metrics that are not available by default in CloudWatch, including memory and disk usage. This is configured using the cloudwatch-agent module.

  • Runs the syslog module to automatically rotate and rate limit syslog so that your instances don’t run out of disk space from large volumes.

  • Runs the ssh-grunt module so that developers can upload their public SSH keys to IAM and use those SSH keys, along with their IAM user names, to SSH to the ECS Nodes.

  • Runs the auto-update module so that the ECS nodes install security updates automatically.

Learn

note

This repo is a part of the Gruntwork Service Catalog, a collection of reusable, battle-tested, production ready infrastructure code. If you’ve never used the Service Catalog before, make sure to read How to use the Gruntwork Service Catalog!

Under the hood, this is all implemented using Terraform modules from the Gruntwork terraform-aws-ecs repo. If you are a subscriber and don’t have access to this repo, email support@gruntwork.io.

Core concepts

To understand core concepts like what is ECS, and the different cluster types, see the documentation in the terraform-aws-ecs repo.

To use ECS, you first deploy one or more EC2 Instances into a "cluster". The ECS scheduler can then deploy Docker containers across any of the instances in this cluster. Each instance needs to have the Amazon ECS Agent installed so it can communicate with ECS and register itself as part of the right cluster.

For more info on ECS clusters, including how to run Docker containers in a cluster, how to add additional security group rules, how to handle IAM policies, and more, check out the ecs-cluster documentation in the terraform-aws-ecs repo.

For info on finding your Docker container logs and custom metrics in CloudWatch, check out the cloudwatch-agent documentation.

Repo organization

  • modules: the main implementation code for this repo, broken down into multiple standalone, orthogonal submodules.
  • examples: This folder contains working examples of how to use the submodules.
  • test: Automated tests for the modules and examples.

Deploy

Non-production deployment (quick start for learning)

If you just want to try this repo out for experimenting and learning, check out the following resources:

  • examples/for-learning-and-testing folder: The examples/for-learning-and-testing folder contains standalone sample code optimized for learning, experimenting, and testing (but not direct production usage).

Production deployment

If you want to deploy this repo in production, check out the following resources:

Manage

For information on how to configure cluster autoscaling, see How do you configure cluster autoscaling?

For information on how to manage your ECS cluster, see the documentation in the terraform-aws-ecs repo.

Reference

Required

cluster_instance_amistringrequired

The AMI to run on each instance in the ECS cluster. You can build the AMI using the Packer template ecs-node-al2.json. One of cluster_instance_ami or cluster_instance_ami_filters is required.

cluster_instance_ami_filtersobject(…)required

Properties on the AMI that can be used to lookup a prebuilt AMI for use with ECS workers. You can build the AMI using the Packer template ecs-node-al2.json. Only used if cluster_instance_ami is null. One of cluster_instance_ami or cluster_instance_ami_filters is required. Set to null if cluster_instance_ami is set.

object({
# List of owners to limit the search. Set to null if you do not wish to limit the search by AMI owners.
owners = list(string)

# Name/Value pairs to filter the AMI off of. There are several valid keys, for a full reference, check out the
# documentation for describe-images in the AWS CLI reference
# (https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-images.html).
filters = list(object({
name = string
values = list(string)
}))
})
cluster_instance_typestringrequired

The type of instances to run in the ECS cluster (e.g. t2.medium)

cluster_max_sizenumberrequired

The maxiumum number of instances to run in the ECS cluster

cluster_min_sizenumberrequired

The minimum number of instances to run in the ECS cluster

cluster_namestringrequired

The name of the ECS cluster

vpc_idstringrequired

The ID of the VPC in which the ECS cluster should be launched

vpc_subnet_idslist(string)required

The IDs of the subnets in which to deploy the ECS cluster instances

Optional

alarms_sns_topic_arnlist(string)optional

The ARNs of SNS topics where CloudWatch alarms (e.g., for CPU, memory, and disk space usage) should send notifications

[]
allow_ssh_from_cidr_blockslist(string)optional

The IP address ranges in CIDR format from which to allow incoming SSH requests to the ECS instances.

[]

The IDs of security groups from which to allow incoming SSH requests to the ECS instances.

[]

Protect EC2 instances running ECS tasks from being terminated due to scale in (spot instances do not support lifecycle modifications). Note that the behavior of termination protection differs between clusters with capacity providers and clusters without. When capacity providers is turned on and this flag is true, only instances that have 0 ECS tasks running will be scaled in, regardless of capacity_provider_target. If capacity providers is turned off and this flag is true, this will prevent ANY instances from being scaled in.

false

Enable a capacity provider to autoscale the EC2 ASG created for this ECS cluster.

false

Maximum step adjustment size to the ASG's desired instance count. A number between 1 and 10000.

null

Minimum step adjustment size to the ASG's desired instance count. A number between 1 and 10000.

null

Target cluster utilization for the ASG capacity provider; a number from 1 to 100. This number influences when scale out happens, and when instances should be scaled in. For example, a setting of 90 means that new instances will be provisioned when all instances are at 90% utilization, while instances that are only 10% utilized (CPU and Memory usage from tasks = 10%) will be scaled in.

null
cloud_init_partsmap(object(…))optional

Cloud init scripts to run on the ECS cluster instances during boot. See the part blocks in https://www.terraform.io/docs/providers/template/d/cloudinit_config.html for syntax

map(object({
filename = string
content_type = string
content = string
}))
{}

The ID (ARN, alias ARN, AWS ID) of a customer managed KMS Key to use for encrypting log data.

null

The name of the log group to create in CloudWatch. Defaults to <a href="#cluster_name"><code>cluster_name</code></a>-logs.

""

The number of days to retain log events in the log group. Refer to https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_log_group#retention_in_days for all the valid values. When null, the log events are retained forever.

null
cloudwatch_log_group_tagsmap(string)optional

Tags to apply on the CloudWatch Log Group, encoded as a map where the keys are tag keys and values are tag values.

null
cluster_access_from_sgslist(any)optional

Specify a list of Security Groups that will have access to the ECS cluster. Only used if enable_cluster_access_ports is set to true

Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.
[]

Whether to associate a public IP address with an instance in a VPC

false

The name of the Key Pair that can be used to SSH to each instance in the ECS cluster

null
default_userstringoptional

The default OS user for the ECS worker AMI. For AWS Amazon Linux AMIs, which is what the Packer template in ecs-node-al2.json uses, the default OS user is 'ec2-user'.

"ec2-user"
disallowed_availability_zoneslist(string)optional

A list of availability zones in the region that should be skipped when deploying ECS. You can use this to avoid availability zones that may not be able to provision the resources (e.g instance type does not exist). If empty, allows all availability zones.

[]

Set to true to enable Cloudwatch log aggregation for the ECS cluster

true

Set to true to enable Cloudwatch metrics collection for the ECS cluster

true

Specify a list of ECS Cluster ports which should be accessible from the security groups given in cluster_access_from_sgs

Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.
[]

Set to true to enable several basic Cloudwatch alarms around CPU usage, memory usage, and disk space usage. If set to true, make sure to specify SNS topics to send notifications to using alarms_sns_topic_arn

true
enable_fail2banbooloptional

Enable fail2ban to block brute force log in attempts. Defaults to true

true
enable_imdsbooloptional

Set this variable to true to enable the Instance Metadata Service (IMDS) endpoint, which is used to fetch information such as user-data scripts, instance IP address and region, etc. Set this variable to false if you do not want the IMDS endpoint enabled for instances launched into the Auto Scaling Group for the workers.

true
enable_ip_lockdownbooloptional

Enable ip-lockdown to block access to the instance metadata. Defaults to true

true
enable_ssh_gruntbooloptional

Set to true to add IAM permissions for ssh-grunt (https://github.com/gruntwork-io/terraform-aws-security/tree/master/modules/ssh-grunt), which will allow you to manage SSH access via IAM groups.

true

Since our IAM users are defined in a separate AWS account, this variable is used to specify the ARN of an IAM role that allows ssh-grunt to retrieve IAM group and public SSH key info from that account.

""

The number of periods over which data is compared to the specified threshold

2

The period, in seconds, over which to measure the CPU utilization percentage. Only used if enable_ecs_cloudwatch_alarms is set to true

300

The statistic to apply to the alarm's high CPU metric. Either of the following is supported: SampleCount, Average, Sum, Minimum, Maximum

"Average"

Trigger an alarm if the ECS Cluster has a CPU utilization percentage above this threshold. Only used if enable_ecs_cloudwatch_alarms is set to true

90

The period, in seconds, over which to measure the disk utilization percentage. Only used if enable_ecs_cloudwatch_alarms is set to true

300

Trigger an alarm if the EC2 instances in the ECS Cluster have a disk utilization percentage above this threshold. Only used if enable_ecs_cloudwatch_alarms is set to true

90

The number of periods over which data is compared to the specified threshold

2

The period, in seconds, over which to measure the memory utilization percentage. Only used if enable_ecs_cloudwatch_alarms is set to true

300

The statistic to apply to the alarm's high CPU metric. Either of the following is supported: SampleCount, Average, Sum, Minimum, Maximum

"Average"

Trigger an alarm if the ECS Cluster has a memory utilization percentage above this threshold. Only used if enable_ecs_cloudwatch_alarms is set to true

90

The desired HTTP PUT response hop limit for instance metadata requests for the workers.

null
internal_alb_sg_idslist(string)optional

The Security Group ID for the internal ALB

[]

Enable a multi-az capacity provider to autoscale the EC2 ASGs created for this ECS cluster, only if capacity_provider_enabled = true

false
public_alb_sg_idslist(string)optional

The Security Group ID for the public ALB

[]

When true, precreate the CloudWatch Log Group to use for log aggregation from the EC2 instances. This is useful if you wish to customize the CloudWatch Log Group with various settings such as retention periods and KMS encryption. When false, the CloudWatch agent will automatically create a basic log group to use.

true
ssh_grunt_iam_groupstringoptional

If you are using ssh-grunt, this is the name of the IAM group from which users will be allowed to SSH to the nodes in this ECS cluster. This value is only used if enable_ssh_grunt=true.

"ssh-grunt-users"

If you are using ssh-grunt, this is the name of the IAM group from which users will be allowed to SSH to the nodes in this ECS cluster with sudo permissions. This value is only used if enable_ssh_grunt=true.

"ssh-grunt-sudo-users"
tenancystringoptional

The tenancy of this server. Must be one of: default, dedicated, or host.

"default"
use_imdsv1booloptional

Set this variable to true to enable the use of Instance Metadata Service Version 1 in this module's aws_launch_configuration. Note that while IMDsv2 is preferred due to its special security hardening, we allow this in order to support the use case of AMIs built outside of these modules that depend on IMDSv1.

true

When true, all IAM policies will be managed as dedicated policies rather than inline policies attached to the IAM roles. Dedicated managed policies are friendlier to automated policy checkers, which may scan a single resource for findings. As such, it is important to avoid inline policies when targeting compliance with various security standards.

true