SQL Server administration and T-SQL development, Web Programming with ASP.NET, HTML5 and Javascript, Windows Phone 8 app development, SAP Smartforms and ABAP Programming, Windows 7, Visual Studio and MS Office software
Development resources, articles, tutorials, code samples, tools and downloads for AWS Amazon Web Services, Redshift, AWS Lambda Functions, S3 Buckets, VPC, EC2, IAM

Load Data from Compressed Text File into Amazon Redshift with COPY Command


In Amazon S3 bucket folders cloud users can store compressed data to gain from storage space. As many AWS services Amazon Redshift SQL COPY command supports to load data from compressed text files. In this tutorial, I want to share how compressed text files including delimited or fixed length data can be easily imported into Amazon Redshift database tables.

In this tutorial, I will use sample Amazon Redshift database table sales in tickit schema. For more on Amazon Redshift sample database, please check referenced tutorial.

Please note that AWS supports load of compressed files using following options gzip, lzop, or bzip2.
For details, check official documentation for loading compressed data files from Amazon S3.

This is very important, otherwise during COPY command execution to load data into Redshift table some errors may occur as in this sample

copy tickit.sales
from 's3://kodyaz/sales_tab.zip'
iam_role 'arn:aws:iam::1234567890:role/Amazon-Redshift-Copy-Command-Role'
timeformat as 'MM/DD/YYYY HH24:MI:SS'
delimiter '\t' region 'eu-central-1' gzip;
Code

[Amazon](500310) Invalid operation: S3CurlException: Failed writing body (0 != 16347)
error: S3CurlException: Failed writing body (0 != 16347)
Cause: Failed to inflateinvalid or incomplete deflate data. zlib
error code: -3, CurlError 23, multiCurlError 0, CanRetry 0, UserError 0 code: 9001
context: S3 key being read : s3://kodyaz/sales_tab.zip
query: 1260803
location: table_s3_scanner.cpp:373 process: query0_119_1260803 [pid=9448];

This error occurred since the text data file was not compressed with "gzip" compression method.
In such cases where the compressed file or zipped file format does not match with the format passed to the Amazon Redshift COPY command, these error will occur.

On the other hand, if you download following file sales_tab.bz2 which is compressed using the BZIP2 compression method, by modifying the COPY command for the compression method parameter, we can easily load data from compressed text files into Amazon Redshift database table from AWS S3 buckets.

archiving type to compress text file for Amazon Redshift COPY command

copy tickit.sales
from 's3://kodyaz/sales_tab.bz2'
iam_role 'arn:aws:iam::1234567890:role/Amazon-Redshift-Copy-Command-Role'
timeformat as 'MM/DD/YYYY HH24:MI:SS'
delimiter '\t' region 'eu-central-1' bzip2;
Code

This COPY command will be exexcuted successfully loading data from compressed text file stored on Amazon S3 bucket folders into Redshift database table sales.

As last note please visit and have a look at following resources to load data from SQL Server to Amazon Redshift database using Copy command, to create sample database on Amazon Redshift and load sample data with COPY command and lastly for common COPY command errors on Amazon Redshift database.



AWS


Copyright © 2004 - 2021 Eralper YILMAZ. All rights reserved.