AWS Security Groups are one of the most used and abused configurations inside an AWS environment if you are using them on cloud for quite a long time. Since AWS security groups are simple to configure, users, many times ignore the importance of it and do not follow best practices relating to it. In reality, operating on AWS security groups every day is much more intensive and complex than configuring them once. A fact that nobody talks about! So, in this article, based on our experience in dealing with AWS Security groups since 2008, here are a set of best practices related to configuration and day-to-day operations.
In the world of security, proactive and reactive speed determines the winner. So, a lot of these best practices should be automated. AWS released so many features in the last few years relating to security, that we should not visualize security groups in isolation. But it just does not make sense anymore. The Security Group should always be seen in the overall security context. Let’s start with the pointers.
- Enable AWS VPC Flow Logs for your VPC or Subnet or ENI level
AWS VPC flow logs can be configured to capture both accept and reject entries flowing through the ENI and Security groups of the EC2, ELB + some more services. These VPC Flow log entries can be scanned to detect attack patterns, alert abnormal activities, and information flow inside the VPC and provide valuable insights to the SOC/MS team operations.
- Use IAM to manage access permission
Use AWS Identity and Access Management (IAM) to control who in your organization has permission to create and manage security groups and network ACLs (NACL). Isolate the responsibilities and roles for better defense. For example, you can give only your network administrators or security admin the permission to manage the security groups and restrict other roles.
- Enable AWS Cloud Trail logs for your account
The AWS Cloud Trail will log all the security group events and it is needed for the management and operations of security groups. Event streams can be created from AWS Cloud Trail logs and they can be processed using AWS Lambda.
For example: whenever a Security Group is deleted, this event will be captured with details on the AWS Cloud Trail logs. Events can be triggered in AWS Lamdba which can process this SG change and alert the MS/SOC on the dashboard or email as per your workflow. This is a very powerful way of reacting to events within a span of <7 minutes. Alternatively, you can process the AWS Cloud Trail logs stored in your S3 every X frequency as a batch and achieve the above. But the Operation team’s reaction time can vary depending on the generation and polling frequency of the AWS Cloud Trail logs. This activity is a must for your operations team.
- Enable AWS AppConfig for your AWS account
AWS AppConfig can be used to create, manage, and quickly deploy application configurations. App records all events related to your security group changes and can even send emails.
- Follow proper naming conventions for the AWS security group
The naming convention should follow certain enterprise standards.
For example, it can follow the notation: “AWS Region+ Environment Code+OS Type+Tier+Application Code”
Security Group Name – EU-P-LWA001
AWS Region (2 characters) = EU, VA, CA, etc.
Environment Code (1 character) = P-Production, Q-Quality Analysis, T-testing, D-Development, etc.
OS Type (1 character) = L-Linux, W-Windows, etc.
Tier (1 character) = W-Web, A-App, C-Cache, D-DB, etc.
Application Code (4 characters) = A001
We have been using Amazon Web Services since 2008 and found over the years managing the security groups in multiple environments is itself a huge task. Proper naming conventions from the beginning is a simple practice but will make your AWS journey manageable.
- Ensure the security groups naming convention stays confidential
For security purposes, make sure your Amazon Web Services security groups naming convention is not self-explanatory and they stay internal.
Example: AWS security group named ‘UbuntuWebCRMProd’ is self-explanatory for hackers that it is a Production CRM web tier running on ubuntu OS. Have an automated program detecting AWS security groups with Regex Pattern scanning of AWS SG assets periodically for information revealing names and alert the SOC/Managed service teams.
- Delete AWS security groups that don’t follow the naming standards
Periodically detect, alert, or delete AWS Security groups not following the organization naming standards strictly. Also, have an automated program doing this as part of your SOC/Managed service operations. Once you have this stricter control implemented then things will fall in line automatically.
- Identify the AWS security groups staying idle
Have automation in place to detect all EC2, ELB, and other AWS assets associated with Security groups. This automation will help us to periodically detect Amazon Web Services Security groups lying idle with no associations, alert the MS team and cleanse them. Unwanted security groups accumulated over time will create unwanted confusion.
- Constantly detect “default” security groups
In your AWS account, when you create a VPC, AWS automatically creates a default security group for the VPC. If you don’t specify a different security group when you launch an instance, the instance is automatically associated with the appropriate default security group. It will allow inbound traffic only from other instances associated with the “default” security group and allow all outbound traffic from the instance.
The default security group specifies itself as a source security group in its inbound rules. This is what allows instances associated with the default security group to communicate with other instances associated with the default security group.
This is not a good security practice. If you don’t want all your instances to use the default security group, you can create your own security groups and specify them when you launch your instances. This is applicable to EC2, RDS, ElastiCache, and some more services in AWS. So, detect “default” security groups periodically and alert the SOC/MS.
- Trigger alerts when security groups are modified
Alerts by email and cloud management dashboard should be triggered whenever critical security groups or rules are added/modified/deleted in production. This is important for reactive action of your managed services security operations team and audit purpose.
- Enable automated programs to detect conflicting SG/rules
When you associate multiple security groups with an Amazon EC2 instance, the rules from each security group are effectively aggregated to create one set of rules. AWS uses this set of rules to determine whether to allow access or not. If there is more than one SG rule for a specific port, AWS applies the most permissive rule. For example, if you have a rule that allows access to TCP port 22 (SSH) from IP address 203.0.113.10 and another rule that allows access to TCP port 22 for everyone, then everyone will have access to TCP port 22 because permissive takes precedence.
a. Have automated programs detecting EC2 associated with multiple SG/rules and alert the SOC/MS periodically. Condense the same manually to 1-3 rules max as part of your operations.
b. Have automated programs detecting conflicting SG/rules like restrictive+permissive rules together and alert the SOC/MS periodically.
- Least restrictive security groups should not be created
Do not create least restrictive security groups like 0.0.0.0/0 which is open to everyone.
Since web servers can receive HTTP and HTTPS traffic open, only their SG can be permissive like
0.0.0.0/0,TCP, 80, allow inbound HTTP access from anywhere
0.0.0.0/0,TCP, 443, allow inbound HTTPS access from anywhere
All least restrictive SG created in your account should be alerted to SOC/MS teams immediately.
- Create automated alerts to notify when security groups are created with default ports
Have a security policy not to launch servers with default ports like 3306, 1630, 1433, 11211, 6379, etc. If the policy must be accepted, then security groups also have to be created on the new hidden listening ports instead of the default ports. This provides a small layer of defense since one cannot infer the information from the security group port on the EC2 service it is protecting. Automated detection and alerts should be created for SOC/MS, if security groups are created with default ports.
- Create security groups on secure ports for regulated applications
Applications that require stricter compliance requirements like HIPAA, PCI, etc. need end-to-end transport encryption to be implemented on the server back-end in AWS. The communication from ELB to Web->App->DB->Other tiers need to be encrypted using SSL or HTTPS. This means only secured ports like 443, 465, and 22 are permitted in corresponding EC2 security groups. Automated detection and alerts should be created for SOC/MS if security groups are not created on secure ports for regulated applications.
- Check for any anomalies in your production environment
Detection, alert, and actions can be taken by parsing the AWS Cloud Trail logs based on usual patterns observed in your production environment
a. If a port was opened and closed in <30 or X mins in production, then it can be a candidate for suspicious activity if it is not the normal pattern for your production.
b. If a permissive Security Group was created and closed in <30 or X mins, it can be a candidate for suspicious activity if it is not the normal pattern for your production.
Detect anomalies on how long a change affected and reverted in security groups in production.
- Automate the process to enhance security
In case ports must be opened in Amazon Web Services security groups or a permissive AWS security group needs to be applied, automate this entire process as part of your operations such that a security group is open for X agreed minutes and will be automatically closed aligning with your change management. Reducing manual intervention avoids operational errors and adds security.
- Enable strict controls while creating SSH/RDP connection
Make sure SSH/RDP connection is open in AWS Security Group only for jump box/bastion hosts for your VPC/subnets. Have stricter controls/policies to avoid opening SSH/RDP to other instances of the production environment. Periodically check, alert, and close for this loophole as part of your operations.
- Avoid SSH open to the entire internet
It is a bad practice to have SSH open to the entire Internet for emergency or remote support. By allowing the entire Internet to access your SSH port there is nothing stopping an attacker from exploiting your EC2 instance. The best practice is to allow very specific IP addresses in your security groups and this restriction improves the protection. This could be your office or on-premise or DC through which you connect your jump box.
- Check for the number of security groups being created
Too much or too less: How many security groups are preferred for a usual multi-tiered web app is a frequently asked question.
Option 1: One security group cutting across multiple tiers is easy to configure, but it is not recommended for secure production applications.
Option 2: One Security group for every instance is too much protection and tough to manage operationally in the longer term.
Option 3: Individual Security group for different tiers of the application. For example, Have separate security groups for ELB, Web, App, DB, and Cache tiers of your application stack.
Periodically check whether the Option 1 type rule is being created in your production and alert the SOC/MS.
- Avoid allowing UDP or ICMP in security groups
Avoid allowing UDP (User Datagram Protocol) or ICMP (Internet Control Message Protocol) for private instances in Security groups. This is not a good practice unless specifically needed.
- Open only specific ports in a security group
Opening a range of ports in a security group is not a good practice. It is ideal to open only specific ports. In the security group, you can add many inbound ingress rules. While opening the ports, it is always advised to open for specific ports like 80,443, etc. rather than a range of ports like 200-300.
- Private Subnet instances can be accessed only from the VPC CIDR IP range
Opening instances to the public IP ranges is a possibility, but it does not make any sense.
E.g., Opening HTTP to 0.0.0.0/0 in the SG of the private subnet instance does not make any sense. So, detect and cleanse such rules.
- Use AWS lambda events to detect abnormal activities
AWS CloudTrail log captures the events related security. AWS lambda events or automated programs should trigger alerts to operations when abnormal activities are detected.
a. Alert when X number of SG were added/deleted at “Y” Hours or Day by IAM user/Account
b. Alert when X number of SG Rules were added/deleted at “Y” Hours or Day by IAM user/Account
- Automate most of the security group related tasks
In case you are an enterprise make sure all security groups related activities of your production are part of your change management process. Security Group actions can be manual or automated with your change management in an enterprise.
In case you are an agile Startup or SMB and do not have a complicated change management process, then automate most of the security group-related tasks and events as illustrated above on various best practices. This will bring immense efficiency to your operations.
- Use outbound/egress security groups wherever applicable within your VPC
Restrict FTP connection to any server on the Internet from your VPC. This way you can avoid data dumps and important files getting transferred out from your VPC. Defend harder and make it tougher!
- Use ELB in front of your instance as a security proxy
For some tiers of your application, use ELB in front of your instance as a security proxy with restrictive security groups – restrictive ports and IP ranges. This doubles your defense but increases the latency.
- Tools used to achieve these best practices
Some of the tools we use in conjunction to automate and meet above best practices are ServiceNow, Amazon CFT, AWS API’S, Rundeck, Puppet, Chef, Python, .Net and Java automated programs.
About the Author
Harish Ganesan was the Chief Technology Officer (CTO) of SecureKloud, responsible for the overall technology direction of the SecureKloud products and services. He has around two decades of experience in architecting and developing Cloud Computing, E-commerce, and Mobile application systems. He has also built large internet banking solutions that catered to the needs of millions of users, where security and authentication were critical factors. He is also a prolific blogger and frequent speaker at popular cloud conferences.