Redshift Database - Too much content are deleted during executing alter distkey command

On Redshift database, when I execute SQL command to add sortkey on an existing database table based on Amazon Redshift Advisor recommendations, I got invalid operation error indicating that too much content of database table are deleted during executing alter distkey command. When I checked the data size of the table, the Redshift database table size was around 6 GB. Retrying to execute the ALTER TABLE SQL command did not help me to define a new sortkey for the selected table.

At first I thought there is a configuration setting which is a soft limit for data reorganizations on data nodes but I could not figure out such a solution for this Redshift error.

The workaround I found is to create a new table based on the DDL script of the table or the Data Definition Language SQL script of the source table with sortkey definition as an addition to the original CREATE TABLE SQL script. Then I copied all data from source table to new Redshift table with sortkey column definition is done.

After this point, I have duplicates of the data on each data node in two separate Redshift database tables.
I drop the original table and rename the new table as previous dropped database table name.
After this renaming is completed, all queries will execute without any SQL errors and the Amazon Redshift Advisor recommendation is applied successfully.

Here is the SQL command that I executed on Redshift database resulted with invalid operation error


ALTER TABLE "public"."dl_th_customer_events_incl_dim" ALTER SORTKEY ("event_date");

Code

Here is the SQL exception details for the "Too much content of <table name> are deleted during executing alter distkey command" error


An error occurred when executing the SQL command:

ALTER TABLE "public"."dl_th_customer_events_incl_dim" ALTER SORTKEY ("event_date")



[Amazon](500310) Invalid operation: Too much content of 'dl_th_customer_events_incl_dim' are deleted during executing alter distkey command. Please Retry.; [SQL State=XX000, DB Errorcode=500310]

1 statement failed.

Code

Too much content of table are deleted during executing alter distkey command

Amazon Redshift database administrators and SQL developers can check the existing sort key and distribution key of a database table by querying the SVV_TABLE_INFO system view.
The size of the table in MB and the number of table rows (including rows marked as deleted waiting for a vacuum) are also visible in this system view for database tables.

I get the CREATE TABLE DDL SQL command of the table using the Redshift database client tool.
For example, if you are using SQL Workbench/J SQL programmer can right click on the table and then choose "Show source" context menu option.

Or if you are using DBeaver Universal Database Management tool for developing your SQL solutions on Amazon Redshift Data Warehouse, you can right click on table name then follow menu options "Generate SQL > DDL" path

generate DDL SQL CREATE command of a Redshift database table

After the source table DDL script is fetched, change the table name with a new name by adding _tmp for example.
Also add the sortkey option by specifying the required table column(s) like below SQL code


CREATE TABLE IF NOT EXISTS public.dl_th_customer_events_incl_dim_tmp

(

tid VARCHAR(4000) ENCODE lzo

,event_type_id VARCHAR(4000) ENCODE lzo

,event_type VARCHAR(4000) ENCODE lzo

,event_date TIMESTAMP WITHOUT TIME ZONE

,"session" VARCHAR(4000) ENCODE lzo

)

SORTKEY (event_date);

Code

After the new table is created, I executed following SQL command to copy data from one table into another


insert into public.dl_th_customer_events_incl_dim_tmp select * from public.dl_th_customer_events_incl_dim

Code

Actually, the INSERT statement executed less than 20 seconds astonishing me.
I am surprised because ALTER TABLE could not manage to reorganize table data on the other hand INSERT INTO SELECT FROM performed quite well.

Then I dropped the original database table. Be careful, be sure that the all data is now clonned into new table which is defined with SORTKEY column.


drop table public.dl_th_customer_events_incl_dim;

Code

And rename the new table with deleted table's name so all existing code can be executed without any effect


alter table public.dl_th_customer_events_incl_dim_tmp rename to dl_th_customer_events_incl_dim;

Code

Please note, in above ALTER TABLE RENAME SQL command, the first table name includes the schema name on the other hand the second table name does not.

Now if database programmers or data warehouse developers query the SVV_TABLE_INFO system view, they will see the sort key is applied successfully as a workaround.


select * from SVV_TABLE_INFO where "table" = 'dl_th_customer_events_incl_dim';

Code

display sort key columns of Amazon Redshift database tables

I hope this Redshift SQL tutorial is useful for database administartors and SQL programmers building DWH solutions on cloud