Sources
AWS S3Amazon S3 (Simple Storage Service) is a highly scalable and secure object storage solution that serves as a versatile component in data workflows. In a streaming ETL (Extract, Transform, Load) workflow, S3 often acts as a source, destination, or intermediate storage for raw and transformed data. Its flexibility and cost-effectiveness make it an ideal choice for handling large-scale data streams and archiving processed data.
Connecting S3 (IAM Role Setup - General Setup)
To integrate S3 into your ProxyHook you first need to create an IAM role for ProxyHook within your AWS account. If you've already done this, you can skip to the next section
Connecting S3 (IAM Role Setup - Bucket Access)
In order to grab files from your S3 bucket, you will need to ensure your ProxyHook IAM account has permissions to read from the desired bucket. To do so, follow these steps:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::YOUR_S3_BUCKET_HERE/*"
}
]
}
Click next, create a policy name and hit save. If done correctly, the IAM role will now be able to read files from your S3 Bucket.
Enabling Event Notifications
The final step to start receiving the data from new S3 files is to add an Event Notification for your bucket.
At this point, any new files dropped into your S3 Bucket (based on optional criteria you set during notification setup) will feed into your ProxyHook event.