Securing S3 Bucket From Direct Access

After implementing an AWS CloudFront distribution for serving content from AWS S3 it is best practice to prevent direct access to the S3 bucket. This will prevent duplicate content issues on search engines and will also mean your content can only be accessed by the domains you expect.

Pre-requisites

This article will be based on the blog described in Hexo, AWS and Serverless Framework and Securing A Test Environment Using AWS WAF; however, it should be easy to apply this logic to any project hosting on AWS and managed with Serverless Framework.

Create an Origin Access Identity

The first part of removing direct S3 access is to create an identity for the AWS CloudFront distribution to use to identify itself to AWS S3. To do this a CloudFrontOriginAccessIdentity resource needs to be created.

Create a resource under the Resources element in the resources.yml file (at line 144)
1
2
3
4
5
6
7
8
9
10
CloudFrontIdentity:
Type: AWS::CloudFront::CloudFrontOriginAccessIdentity
Properties:
CloudFrontOriginAccessIdentityConfig:
Comment:
Fn::Join:
- " "
- - ${self:custom.domain.domainname}
- CloudFront
- Identity

Apply the Identity to the CloudFront Distribution

To apply the previous identity to the AWS CloudFront distribution the distribution configuration in the resources.yml file needs to be updated. The setting is specific to an origin within the configuration and can only be applied to an S3 Origin.

Modify the current origin configuration (at lines 62-67)
1
2
3
4
5
6
7
8
9
10
11
12
- DomainName:
Fn::GetAtt:
- WebsiteS3Bucket
- DomainName
Id: defaultOrigin
S3OriginConfig:
OriginAccessIdentity:
Fn::Join:
- "/"
- - origin-access-identity
- cloudfront
- Ref: CloudFrontIdentity

Add Functionality to Display index.html Files By Default

As documented on AWS CloudFront Developer Guide - Specifying a Default Root Object:

However, if you define a default root object, an end-user request for a subdirectory of your distribution does not return the default root object.

[...]

The behavior of CloudFront default root objects is different from the behavior of Amazon S3 index documents. When you configure an Amazon S3 bucket as a website and specify the index document, Amazon S3 returns the index document even if a user requests a subdirectory in the bucket.

This means logic has to be added to display the index.html files Hexo generates in each directory. The first step to implement this is to create a AWS Lambda function; then add the function to Serverless Framework's configuration; and finally hook the function to the AWS CloudFront distribution.

Start by installing serverless-plugin-cloudfront-lambda-edge.

Install the serverless plugin
1
npm i @silvermine/serverless-plugin-cloudfront-lambda-edge --save-dev

Add the plugin to the serverless.yl file.

Add the plugin at line 8 of serverless.yml
1
- '@silvermine/serverless-plugin-cloudfront-lambda-edge'

Then create a new file at functions/urlRewrite.js with the AWS Lambda function (in this case it's Javascript).

Create functions/urlRewrite.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
'use strict';
exports.handler = (event, context, callback) => {

// Extract the request from the CloudFront event that is sent to Lambda@Edge
var request = event.Records[0].cf.request;

// Extract the URI from the request
var olduri = request.uri;

// Match any '/' that occurs at the end of a URI. Replace it with a default index
var newuri = olduri.replace(/\/$/, '\/index.html');

// Log the URI as received by CloudFront and the new URI to be used to fetch from origin
console.log("Old URI: " + olduri);
console.log("New URI: " + newuri);

// Replace the received URI with the URI that includes the index page
request.uri = newuri;

// Return to CloudFront
return callback(null, request);

};

Serverless Framework now needs to know about the new function so it can coordinate the deployment for the code, configuration of AWS Lambda, and linking the function to the distribution. This is done by adding a new top level element to the serverless.yml file.

Insert the function definition at line 95 of serverless.yml
1
2
3
4
5
6
7
8
9
10
11
# Define the Lambda functions for the site
functions:
# This function will be deployed to Lambda@Edge and rewrite URLs to include index.html
urlrewrite:
name: ${self:service}-${self:custom.stage}-cf-url-rewriter
handler: functions/urlRewrite.handler
memorySize: 128
timeout: 1
lambdaAtEdge:
distribution: WebsiteCloudFrontDistribution
eventType: origin-request

Update the S3 Bucket Policy

Now an identifier for AWS CloudFront has been configured the policy on the AWS S3 bucket can be restricted to only allow access via the CloudFront distribution.

In the resources.yml file there is a Statement for the S3 Bucket Policy. This needs to be adjusted to remove public access and grant access to CloudFront.

Modify the S3 Bucket Policy Statement (at lines 22-32 of resources.yml)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
- Sid: CloudFrontForGetBucketObjects
Effect: Allow
Principal:
CanonicalUser:
Fn::GetAtt:
- CloudFrontIdentity
- S3CanonicalUserId
Action: 's3:GetObject'
Resource:
Fn::Join:
- ''
-
- 'arn:aws:s3:::'
- Ref: WebsiteS3Bucket
- /*

Configure File Uploads to be Private

Within the main serverless.yml file, under the custom.assets.targets element, configuration needs to be added to ensure all files are uploaded as private.

Add this line before the files element in each target definition (lines 42 and 53)
1
acl: private

Update the S3 Bucket

All the requirements are in place for AWS CloudFront to be able to access the AWS S3 bucket contents. All that's left to do it remove all public access from the bucket.

To do this locate the bucket configuration in resources.yml, update the AccessControl to BucketOwnerFullControl and delete the WebsiteConfiguration.

Modify the S3 Bucket definition (at lines 5-10)
1
2
3
4
5
6
7
8
Properties:
AccessControl: BucketOwnerFullControl
BucketName: ${self:custom.domain.domainname}
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true

Testing

Prior to these changes the website was accessible from:

  • https://s3-us-west-2.amazonaws.com/<domain>/index.html
  • http://<domain>.s3-website-us-west-2.amazonaws.com/
  • https://<domain>.s3-us-west-2.amazonaws.com/index.html
  • https://>domain>/index.html

Of these only the last one should continue to work.

The Final Configuration Files

[serverless.yml] []view raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# The name of your project
service: <project>

# Plugins for additional Serverless functionality
plugins:
- serverless-s3-deploy
- serverless-plugin-scripts
- '@silvermine/serverless-plugin-cloudfront-lambda-edge'

# Configuration for AWS
provider:
name: aws
runtime: nodejs8.10
profile: serverless
# Some future functionality requires us to use us-east-1 at this time
region: us-east-1

# This enables us to use the default stage definition, but override it from the command line
stage: ${opt:stage, self:provider.stage}
# This enables us to prepend the stage name for non-production environments
domain:
fulldomain:
prod: ${self:custom.domain.domain}
other: ${self:custom.stage}.${self:custom.domain.domain}
# This value has been customised so I can maintain multiple demonstration sites
domain: ${self:custom.postname}.${self:custom.domain.zonename}
domainname: ${self:custom.domain.fulldomain.${self:custom.stage}, self:custom.domain.fulldomain.other}
# DNS Zone name (this is only required so I can maintain multiple demonstration sites)
zonename: alphageek.com.au
cacheControlMaxAgeHTMLByStage:
# HTML Cache time for production environment
prod: 3600
# HTML Cache time for other environments
other: 0
cacheControlMaxAgeHTML: ${self:custom.domain.cacheControlMaxAgeHTMLByStage.${self:custom.stage}, self:custom.domain.cacheControlMaxAgeHTMLByStage.other}
sslCertificateARN: arn:aws:acm:us-east-1:165657443288:certificate/61d202ea-12f2-4282-b602-9c3b83183c7a
assets:
targets:
# Configuration for HTML files (overriding the default cache control age)
- bucket:
Ref: WebsiteS3Bucket
acl: private
files:
- source: ./public/
headers:
CacheControl: max-age=${self:custom.domain.cacheControlMaxAgeHTML}
empty: true
globs:
- '**/*.html'
# Configuration for all assets
- bucket:
Ref: WebsiteS3Bucket
acl: private
files:
- source: ./public/
empty: true
globs:
- '**/*.js'
- '**/*.css'
- '**/*.jpg'
- '**/*.png'
- '**/*.gif'
scripts:
hooks:
# Run these commands when creating the deployment artifacts
package:createDeploymentArtifacts: >
hexo clean &&
hexo generate
# Run these commands after infrastructure changes have been completed
deploy:finalize: >
sls s3deploy -s ${self:custom.stage}
# AWS Region to S3 website hostname mapping
s3DNSName:
us-east-2: s3-website.us-east-2.amazonaws.com
us-east-1: s3-website-us-east-1.amazonaws.com
us-west-1: s3-website-us-west-1.amazonaws.com
us-west-2: s3-website-us-west-2.amazonaws.com
ap-south-1: s3-website.ap-south-1.amazonaws.com
ap-northeast-3: s3-website.ap-northeast-3.amazonaws.com
ap-northeast-2: s3-website.ap-northeast-2.amazonaws.com
ap-southeast-1: s3-website-ap-southeast-1.amazonaws.com
ap-southeast-2: s3-website-ap-southeast-2.amazonaws.com
ap-northeast-1: s3-website-ap-northeast-1.amazonaws.com
ca-central-1: s3-website.ca-central-1.amazonaws.com
eu-central-1: s3-website.eu-central-1.amazonaws.com
eu-west-1: s3-website-eu-west-1.amazonaws.com
eu-west-2: s3-website.eu-west-2.amazonaws.com
eu-west-3: s3-website.eu-west-3.amazonaws.com
eu-north-1: s3-website.eu-north-1.amazonaws.com
sa-east-1: s3-website-sa-east-1.amazonaws.com
# Determine what resources file to include based on the current stage
customConfigFile: ${self:custom.customConfigFiles.${self:custom.stage}, self:custom.customConfigFiles.other}
customConfigFiles:
prod: prod
other: other

# Define the Lambda functions for the site
functions:
# This function will be deployed to Lambda@Edge and rewrite URLs to include index.html
urlrewrite:
name: ${self:service}-${self:custom.stage}-cf-url-rewriter
handler: functions/urlRewrite.handler
memorySize: 128
timeout: 1
lambdaAtEdge:
distribution: WebsiteCloudFrontDistribution
eventType: origin-request

# Define the resources we will need to host the site
resources:
# Include the resources file
- ${file(config/resources.yml)}
# Include the outputs file
- ${file(config/outputs.yml)}
# Include a custom configuration file based on the environment
- ${file(config/resources/environment/${self:custom.customConfigFile}.yml)}

[config/resources.yml] []view raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
Resources:
# Set-up an S3 bucket to store the site
WebsiteS3Bucket:
Type: AWS::S3::Bucket
Properties:
AccessControl: BucketOwnerFullControl
BucketName: ${self:custom.domain.domainname}
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
# Set-up a policy on the bucket so it can be used as a website
WebsiteBucketPolicy:
Type: AWS::S3::BucketPolicy
Properties:
PolicyDocument:
Id:
Fn::Join:
- ""
- - ${self:service.name}
- BucketPolicy
Statement:
- Sid: CloudFrontForGetBucketObjects
Effect: Allow
Principal:
CanonicalUser:
Fn::GetAtt:
- CloudFrontIdentity
- S3CanonicalUserId
Action: 's3:GetObject'
Resource:
Fn::Join:
- ''
-
- 'arn:aws:s3:::'
- Ref: WebsiteS3Bucket
- /*
Bucket:
Ref: WebsiteS3Bucket
# Configure CloudFront to get all content from S3
WebsiteCloudFrontDistribution:
Type: 'AWS::CloudFront::Distribution'
Properties:
DistributionConfig:
WebACLId:
Ref: CustomAuthorizationHeaderRestriction
Aliases:
- ${self:custom.domain.domainname}
- www.${self:custom.domain.domainname}
CustomErrorResponses:
- ErrorCode: '404'
ResponsePagePath: "/error.html"
ResponseCode: '200'
ErrorCachingMinTTL: '30'
DefaultCacheBehavior:
Compress: true
ForwardedValues:
QueryString: false
Cookies:
Forward: all
SmoothStreaming: false
TargetOriginId: defaultOrigin
ViewerProtocolPolicy: redirect-to-https
DefaultRootObject: index.html
Enabled: true
Origins:
- DomainName:
Fn::GetAtt:
- WebsiteS3Bucket
- DomainName
Id: defaultOrigin
S3OriginConfig:
OriginAccessIdentity:
Fn::Join:
- "/"
- - origin-access-identity
- cloudfront
- Ref: CloudFrontIdentity
PriceClass: PriceClass_All
ViewerCertificate:
AcmCertificateArn: ${self:custom.domain.sslCertificateARN}
SslSupportMethod: sni-only
# DNS Record for the domain
WebsiteDNSRecord:
Type: "AWS::Route53::RecordSet"
Properties:
AliasTarget:
DNSName:
Fn::GetAtt:
- WebsiteCloudFrontDistribution
- DomainName
HostedZoneId: Z2FDTNDATAQYW2
HostedZoneName: ${self:custom.domain.domain}.
Name: ${self:custom.domain.domainname}
Type: 'A'
# DNS Record for www.domain
WebsiteWWWDNSRecord:
Type: "AWS::Route53::RecordSet"
Properties:
AliasTarget:
DNSName:
Fn::GetAtt:
- WebsiteCloudFrontDistribution
- DomainName
HostedZoneId: Z2FDTNDATAQYW2
HostedZoneName: ${self:custom.domain.domain}.
Name: www.${self:custom.domain.domainname}
Type: 'A'
# Predicate to match the authorization header
CustomAuthorizationHeader:
Type: AWS::WAF::ByteMatchSet
Properties:
ByteMatchTuples:
-
FieldToMatch:
Type: HEADER
Data: Authorization
TargetString:
Fn::Join:
- " "
- - Custom
- "<Password>"
TextTransformation: NONE
PositionalConstraint: EXACTLY
Name:
Fn::Join:
- "_"
- - ${self:custom.domain.domainname}
- Authorization
- Header
CustomAuthorizationHeaderRule:
Type: AWS::WAF::Rule
Properties:
Name:
Fn::Join:
- "_"
- - ${self:custom.domain.domainname}
- Authorization
- Header
- Rule
MetricName:
Fn::Join:
- ""
- - ${self:custom.stage}
- ${self:service.name}
- Authorization
- Header
- Rule
Predicates:
-
DataId:
Ref: CustomAuthorizationHeader
Negated: false
Type: ByteMatch

[config/outputs.yml] []view raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Outputs:
WebsiteURL:
Value:
Fn::GetAtt:
- WebsiteS3Bucket
- WebsiteURL
Description: URL for my website hosted on S3
S3BucketSecureURL:
Value:
Fn::Join:
- ''
-
- 'https://'
- Fn::GetAtt:
- WebsiteS3Bucket
- DomainName
Description: Secure URL of S3 bucket to hold website content

[config/prod.yml] []view raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Resources:
# Allow the custom authorisation header in the production environment
CustomAuthorizationHeaderRestriction:
Type: AWS::WAF::WebACL
Properties:
DefaultAction:
Type: ALLOW
Name:
Fn::Join:
- "_"
- - ${self:custom.domain.domainname}
- Authorization
- Header
- Restriction
MetricName:
Fn::Join:
- ""
- - ${self:custom.stage}
- ${self:service.name}
- Authorization
- Header
- Restriction
Rules:
-
Action:
Type: ALLOW
Priority: 1
RuleId:
Ref: CustomAuthorizationHeaderRule

[config/other.yml] []view raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Resources:
# Require the custom authorisation header with the correct password in non-production environment
CustomAuthorizationHeaderRestriction:
Type: AWS::WAF::WebACL
Properties:
DefaultAction:
Type: BLOCK
Name:
Fn::Join:
- "_"
- - ${self:custom.domain.domainname}
- Authorization
- Header
- Restriction
MetricName:
Fn::Join:
- ""
- - ${self:custom.stage}
- ${self:service.name}
- Authorization
- Header
- Restriction
Rules:
-
Action: ALLOW
Priority: 1
RuleId:
Ref: CustomAuthorizationHeaderRule

Example Site