Snowflake Meets AWS S3: A Complete Walkthrough for Data Loading

This blog provides a step-by-step guide to setting up a Snowflake environment for loading data from AWS S3.

Prerequisites:

Snowflake Account-> Ensure you have access to a Snowflake account.

AWS S3 Bucket-> Have your data stored in an S3 bucket, with appropriate permissions.

AWS IAM Role->Create an IAM role for Snowflake to access your S3 bucket.

Step 1:

Create and enable permissions the S3 Bucket

Configure AWS IAM Role:

Update the S3 bucket policy to allow access from Snowflake using your Snowflake account’s AWS region.

Step 2:

Setup snowflake environment for loading the data from AWS S3 bucket

A warehouse in Snowflake supplies the necessary resources, including CPU, memory, and temporary storage, to execute DML operations effectively. Also create database and schema as part of testing.

create warehouse pilot24;

create database demo;

create schema garden;

Step 3:

Execute the below command to create storage object, this object help you to connect to the AWS bucket.

create or replace storage integration aws_s3
type = external_stage
storage_provider = s3
enabled = true
storage_aws_role_arn = ‘arn:aws:iam::940482408934:role/snowflakerole’
storage_allowed_locations = (‘s3://pilot24/’);

Step 4:

The below describe command helps you get the Amazon resource name ARN.Amazon resource names (ARN) uniquely identify AWS resources.

desc integration aws_s3;

Step 5:

Create a named file format that describes a set of staged data to access or load into Snowflake tables.

create or replace file format csv_format type = csv field_delimiter = ‘,’ skip_header = 1 null_if=(‘NULL’,’null’) empty_field_as_null=true;

Step 6:

Creates a stage to use for loading data from files into Snowflake tables and unloading data from tables into files:

create or replace stage aws_stage_pilot24 storage_integration = aws_s3 url = ‘s3://pilot24/’ file_format = csv_format;

Step 7:

List command will list the files that have been staged in amazon S3

list @aws_stage_pilot24

Add Snowflake as a trusted entity by attaching the following trust policy in AWS role as follows

Step 8:

Create the table as plant_details for load testing.

CREATE TABLE plant_details (

plant_name VARCHAR(100) NOT NULL, — Assuming plant_name is a text field

UOM CHAR(1) NOT NULL, — UOM as a single character

Low_End_of_Range INT NOT NULL, — Integer for the low end of range

High_End_of_Range INT NOT NULL — Integer for the high end of range);

Step 9:

Copy Data into Snowflake tables

copy into plant_details from @aws_stage_pilot24 on_error=’skip_file’;

Step 10:

Validate Your Data

select * from plant_details;

Conclusion:

Upon completing the aforementioned steps, you have effectively established a Snowflake environment for importing data from AWS S3. Snowflake’s integration features facilitate a smooth and efficient process for handling S3 data.

Gokul kumar