Detecting the Use of Stolen AWS Lambda Credentials

Summary

Amazon Web Service (AWS) Lambda is a serverless event-driven compute service. It is a function as a service (FaaS) that allows users to deploy application functionality without the complexity of maintaining the underlying infrastructure. Lambda executions can be triggered by events from other AWS services or software-as-a-service (SaaS) applications.

Inside the Lambda execution environment is a set of AWS Security Token Service (AWS STS) temporary and limited-privilege credentials for AWS Identity and Access Management (IAM). An attacker may be able to steal these credentials via a user application vulnerability or resource misconfiguration. The attacker could then use these credentials to escalate privileges, maintain persistence, or move laterally through an organization’s AWS account or accounts.

Secureworks® Counter Threat Unit™ (CTU) researchers have developed a technique using AWS CloudTrail to detect the use of stolen credentials. Every time an AWS Lambda executes, it generates an AWS CloudTrail logging event that can be used to establish a baseline for normal operation. The use of stolen credentials can then be detected when a logging event deviates from the baseline. A similar approach could be applied to detect AWS credentials stolen from other services. Although Amazon GuardDuty detects EC2 instance credentials used from another AWS account, it does not apply to Lambda or any other services nor does it detect credentials being used within the same account.

AWS Lambda execution and event logging

Understanding the detection logic technique requires some knowledge of the AWS Lambda architecture, operation, and management API calls.

AWS CloudTrail

AWS CloudTrail monitors and records account activity and API usage across all AWS accounts. By default, CloudTrail records a 90-day history of management events and makes this data freely available to AWS customers. Customers may optionally enable data event logging that includes AWS Lambda execution and Invoke events, but this feature is an additional cost. Because not all AWS customers may enable this feature, the detections need to rely on the management events captured in all environments.

Lambda microVMs and Workers

According to AWS, “Lambda will create its execution environments on a fleet of Amazon EC2 instances called AWS Lambda Workers. Workers are […] launched and managed by Lambda in a separate isolated AWS account which is not visible to customers. Workers have one or more hardware-virtualized micro virtual machines [(microVMs)] created by Firecracker.” Figure 1 shows the isolation of two customers’ Lambda functions on shared infrastructure.

Figure 1. Isolation model for AWS Lambda Workers. (Source: AWS)

Lambda execution environment lifecycle

The standard lifecycle of the execution environment includes three primary phases (see Figure 2).

Figure 2. Lambda execution environment lifecycle. (Source: AWS)

Init — Lambda creates or unfreezes an execution environment with the configured resources, downloads the code for the function and all layers, initializes extensions, initializes the runtime, and then runs the function’s initialization code (i.e., the code outside the main handler). The Init phase happens either during the first invocation or in advance of function invocations if provisioned concurrency is enabled. The Init phase is split into three sub-phases: Extension init, Runtime init, and Function init. These sub-phases ensure that all extensions and the runtime complete their setup tasks before the function code runs.
When a Lambda creates an execution environment during the Init phase, it is commonly referred to as the ‘cold start.’ When an already initialized environment is invoked again before shutdown is triggered, it is called a ‘warm start.’
Invoke — Lambda invokes the function handler. After the function runs to completion, Lambda prepares to handle another function invocation.
Shutdown — This phase is triggered if the Lambda function does not receive any invocations for a period of time. It is unclear what logic AWS uses to calculate this timeframe. In the Shutdown phase, Lambda shuts down the runtime, alerts the extensions to let them stop cleanly, and then removes the environment.

According to AWS, Workers have a maximum lease time of 14 hours. However, CTU researchers observed much shorter lifecycles. The AWS STS credentials used by the Worker have a default expiration of 12 hours, so it is probable that the credentials are still valid after a Worker is shut down.

AWS introduced Lambda SnapStart for Java 11 runtime in November 2022. Its lifecycle is slightly different from the standard Lambda lifecycle. However, the standard lifecycle is suitable for this analysis because CTU™ researchers observed no differences in the detection logic.

Lambda initialization and logging

Every Lambda cold start records two CloudTrail events: AssumeRole and CreateLogStream (see Figure 3).

Figure 3. API calls during Lambda initialization. (Source: Secureworks)

The AssumeRole event is recorded when the customer-defined Lambda identity access management (IAM) execution role requests AWS STS credentials from the invoke service. The timing of this event in the execution lifecycle is unclear, but CTU research shows it occurs prior to initialization of the Lambda Worker.
The Lambda Worker makes an API call and attempts to create a CloudWatch log stream. If the CloudWatch log group does not exist, then the first Lambda execution’s CreateLogStream API call fails with error code ResourceNotFoundException. The Lambda Worker then attempts to call CreateLogGroup, followed by a successful CreateLogStream call.

The detection outlined in this analysis relies on the following assumptions:

The Lambda execution IAM role is configured with permissions to log to CloudWatch. The AWS managed policy that is typically used, AWSLambdaBasicExecutionRole, includes these permissions.
If the CloudWatch log group exists but the Lambda execution role does not have logging permission, then CloudTrail will still log the failed events.
If the log group does not exist and the execution role does not have logging permission, then CreateLogStream generates no CloudTrail events.

Because the Lambda Worker is the source of the CreateLogStream event, the event includes the AWS region and source IP address where the customer’s Lambda function executes. The source of the AssumeRole event is outside the Worker and therefore does not contain this information. Invoke events are data events and are labeled in Figure 3 for completeness, but they are excluded from the following detection logic because they may not exist in all environments.

Figure 4 lists an example AssumeRole event created during the Lambda Init phase.

Figure 4. AssumeRole CloudTrail event. (Source: Secureworks)

Figure 5 lists an example CreateLogStream event.

Figure 5. CreateLogStream CloudTrail event. (Source: Secureworks)

Concurrency

Sometimes multiple CreateLogStream events possess the same access key ID. These events possibly represent the Lambda Worker starting multiple processes with the same STS credentials. Figure 6 shows the diff command output where two event keys match but the IP addresses are different.

Figure 6. Output of diff command comparing CreateLogStream events. (Source: Secureworks)

Figure 7 is an example of a Lambda initialization with concurrency.

Figure 7. API calls during Lambda initialization with concurrent execution. (Source: Secureworks)

AWS STS key reuse

CTU researchers discovered that AWS STS access keys can be reused over time and associated with unrelated events. The two events in Figure 8 possess the same STS access key ID (accessKeyId), but the credentials were assumed by different user identities that had different IAM roles. The principalId and userName fields are different components of the IAM role’s Amazon resource names (ARNs). The time difference (‘creationDate’) is only 11 days. It is possible that the AWS account ID is used as input to the access key generation process.

Figure 8. Output of diff command comparing CloudTrail events for different roles with the same access key ID. (Source: Secureworks)

Amazon Virtual Private Cloud (VPC) access

A Lambda function runs inside a VPC owned by the Lambda service. Lambdas can use an elastic network interface (ENI) to connect from the AWS-managed Lambda VPC to private subnets in a customer-managed VPC. When a Lambda is configured to access resources in a customer VPC and security groups are configured to allow egress, the CreateLogStream event’s source IP address will not match the public source IP address of other events.

Figure 9 is an example of a complete Lambda configuration with VPC access. Each number corresponds to a different source IP address in the CloudTrail event logs. CreateNetworkInterface and AllocateAddress events are generated by the Lambda Worker when connecting to the customer VPC. Other source IP addresses may appear in CloudTrail when a Lambda function interacts with other AWS resources. The source IP address for a request to AWS S3 via a VPC gateway endpoint is the private IP address assigned to the ENI. The request to DynamoDB routes via the internet, and the source is the public IP address assigned to the NAT gateway.

Figure 9. API calls during Lambda initialization with VPC access. (Source: Secureworks)

The VPC access scenario uses AllocateAddress events instead of CreateLogStream to find the public IP address, specifically the responseElements.publicIp field (see Figure 10).

Figure 10. AllocateAddress CloudTrail event. (Source: Secureworks)

Proof-of-concept detection

The proof-of-concept detection uses Amazon Athena to identify CloudTrail events associated with the use of stolen Lambda credentials. This detection is portable and can be used ad-hoc in any AWS account, such as during an incident response investigation.

Detection logic

The first step in building the detector is to extract metadata from the CreateLogStream events. These events are generated by the Workers during the Init phase of the Lambda execution lifecycle. The CloudTrail logs are filtered on the following fields:

eventName — Filter for the ‘CreateLogStream’ value, which the Worker uses to set up CloudWatch logging
userAgent — Filter for strings that contain the ‘awslambda-worker’ value (e.g., awslambda-worker/1.0 rusoto/0.48.0 rust/1.67.1 linux)

A table is created using these events to form a baseline of standard operation. The table is populated with the following metadata fields:

eventTime — timestamp of the CreateLogStream event (This event occurs seconds after the related AssumeRole event and can be used to infer when the STS credentials were generated.)
userIdentity.accessKeyId — AWS STS access key ID used by the Worker and Lambda function
userIdentity.arn — ARN of the IAM role associated with the access credentials, used to deconflict when STS access key IDs are reused
sourceIPAddress — source IP address of the event (When Lambda concurrency is enabled, multiple IP addresses associated with Lambda executions use the same credentials.)
awsRegion — geographical location of the AWS region where the event originated (This location provides additional information to disambiguate events.)

A second table is created that contains all customer-allocated public IP addresses. These are IP addresses that have been associated with a resource in an AWS account, such as a NAT gateway or EC2 instance. CloudTrail logs are filtered for ‘AllocateAddress’ events, and the following fields are extracted:

publicIp — IP address allocated to an AWS resource
allocationId — unique identifier for the original AllocateAddress event
·networkBorderGroup — location where the IP address was allocated within an AWS region

To determine if credentials have been exfiltrated or used outside the standard Lambda execution, a query is run against all CloudTrail events. It looks for events where the access key ID exist in the baseline Lambda events table and where the following criteria is met:

The sourceIPAddress is different from the IP addresses used by the Lambda Worker in the Init phase.
The userIdentity.arn is the same as the ARN used by the Lambda.
The eventName is not Decrypt. AWS Key Management Service (KMS) generates decrypt events if enabled. This detector ignores these events.
The eventTime is more recent than the Lambda Init phase.
The sourceIPAddress has not been observed in the account. False positives could occur if sourceIPAddresses are very old and are not within the log retention period. This condition should not be an issue with this detector because the address gets allocated to the ENI at approximately the same time as the CreateLogStream event.

Applying the detection using Amazon Athena

Before AWS CloudTrail logs can be queried with Amazon Athena, the following prerequisites are required:

Configure a trail to write logs to an S3 bucket if one does not already exist. AWS recommends this process as best security practice to store and retain events longer than 90 days. For the detection to be accurate, the trail should be configured to collect logs from all enabled AWS regions.
Create an Athena table for CloudTrail logs. The default table name is ‘cloudtrail_logs’. If a different table name is used, the FROM statements in the Athena queries needs to be updated with the revised name.

When the prerequisites are completed, Amazon Athena can search CloudTrail events for use of stolen credentials. The Appendix of this analysis includes these Athena queries in plain text for researchers who want to replicate the detection.

Create a separate table containing the CreateLogStream events (see Figure 11). The query will retrieve the earliest eventTime value and combine the source IP addresses from concurrent execution environments into an array.

Figure 11. Athena query to create table for CreateLogStream events. (Source: Secureworks)
Create another table containing all of the public IP addresses associated with customer resources (e.g., internet gateway) (see Figure 12).

Figure 12. Athena query to create table for customer-allocated IP addresses. (Source: Secureworks)
Query for events using stolen Lambda credentials (see Figure 13).

Figure 13. Athena query to select events using stolen credentials. (Source: Secureworks)

Figure 14 shows an example GetCallerIdentity event found with this detection.

Figure 14. Event detected using stolen credentials. (Source: Secureworks)

The following values should immediately flag this event as suspicious:

awsRegion “us-east-1” does not match “us-west-2”
eventName “GetCallerIdentity” is unlikely to be run by legitimate Lambda
sourceIPAddress “203 . 0 . 113 . 9” does not match “34 . 220 . 84 . 211”
sourceIPAddress “203 . 0 . 113 . 9” is not within the AWS IP address space

Taking the detector further

AWS publishes its public IP address ranges using Classless Inter-Domain Routing (CIDR) notation in JSON format. This data could increase detection logic efficiency by first checking if an IP address exists outside AWS, and then using more computationally expensive logic to check other conditions.

Similar detection logic could be applied to other services within AWS. Primary candidates would be AWS Elastic Container Service (ECS) or Elastic Kubernetes Service (EKS). Both services can leverage the AWS Fargate serverless compute engine, which is built on the same Firecracker microVM as the Lambda Worker.

Caveats

Lambda pricing is calculated per one million requests. Cold starts in environments with a high frequency of executions happen less often. However, real-time detection could generate a large quantity of detector database inserts and result in a large table size.

The detection logic described in this analysis relies on an undocumented feature. At any time, AWS could change how Lambda executes.

Conclusion

AWS CloudTrail is a rich source of management events. Network defenders can use specific events in the AWS Lambda operating environment to provide a baseline or context for other events. The proof-of-concept detector using Athena can effectively search events for malicious behavior.

Appendix — Athena queries

Text versions of the Athena queries used in the detection proof of concept are provided for convenience to other researchers who want to explore this functionality.

Create table for CreateLogStream events

CREATE TABLE "lambda_coldstart" AS
SELECT 
    useridentity.accessKeyId as accesskeyid, 
    -- Source IP can be different for the same access key id due to Lambda concurrency
    array_agg(sourceipaddress) as sourceipaddresses, 
    awsregion,
    useridentity.arn as arn,
    array_agg(useragent) as useragents,
    MIN(eventtime) as eventtime
FROM cloudtrail_logs WHERE
    eventname = 'CreateLogStream' AND 
    useragent LIKE 'awslambda-worker%'
GROUP BY 1, 3, 4

Create table for customer-allocated IP addresses

CREATE TABLE allocated_addresses AS
SELECT 
    json_extract_scalar(responseelements, '$.publicIp') as publicip,
    json_extract_scalar(responseelements, '$.allocationId') as allocationid,
    -- this is region name
    json_extract_scalar(responseelements, '$.networkBorderGroup') as networkbordergroup
FROM 
    "cloudtrail_logs" 
WHERE 
    eventname = 'AllocateAddress';

Select events using stolen credentials

SELECT
    lcs.accesskeyid,
    lcs.sourceipaddresses,
    lcs.awsregion,
    ct.useridentity.accessKeyId,
    ct.sourceipaddress, 
    ct.awsregion,
    ct.eventname,
    ct.eventid
FROM cloudtrail_logs ct, lambda_coldstart lcs WHERE
    lcs.accesskeyid = ct.useridentity.accessKeyId AND
    not contains(lcs.sourceipaddresses, ct.sourceipaddress) AND
    -- Exclude AWS managed services
    ct.sourceipaddress != 'AWS Internal' AND
    -- access keys can be reused to make sure it's the same ARN (which will differ)
    ct.useridentity.arn = lcs.arn AND
    -- Decrypt is noisy for the purposes of this detector
    ct.eventname != 'Decrypt' AND
    ct.eventtime > lcs.eventtime AND
    -- Lookup IP addresses that have been allocated to account and exclude if they match
    NOT EXISTS (SELECT 1 FROM allocated_addresses aa WHERE aa.publicip = ct.sourceipaddress);

This article was originally published by Secureworks.com. Read the original article here.