Redshift Database - Too much content are deleted during executing alter distkey command
On Redshift database, when I execute SQL command to add sortkey on an existing database table based on Amazon Redshift Advisor recommendations, I got invalid operation error indicating that too much content of database table are deleted during executing alter distkey command. When I checked the data size of the table, the Redshift database table size was around 6 GB. Retrying to execute the ALTER TABLE SQL command did not help me to define a new sortkey for the selected table.
At first I thought there is a configuration setting which is a soft limit for data reorganizations on data nodes but I could not figure out such a solution for this Redshift error.
The workaround I found is to create a new table based on the DDL script of the table or the Data Definition Language SQL script of the source table with sortkey definition as an addition to the original CREATE TABLE SQL script. Then I copied all data from source table to new Redshift table with sortkey column definition is done.
After this point, I have duplicates of the data on each data node in two separate Redshift database tables.
I drop the original table and rename the new table as previous dropped database table name.
After this renaming is completed, all queries will execute without any SQL errors and the Amazon Redshift Advisor recommendation is applied successfully.
Here is the SQL command that I executed on Redshift database resulted with invalid operation error
Here is the SQL exception details for the "Too much content of <table name> are deleted during executing alter distkey command" error
Amazon Redshift database administrators and SQL developers can check the existing sort key and distribution key of a database table by querying the SVV_TABLE_INFO system view.
The size of the table in MB and the number of table rows (including rows marked as deleted waiting for a vacuum) are also visible in this system view for database tables.
I get the CREATE TABLE DDL SQL command of the table using the Redshift database client tool.
For example, if you are using SQL Workbench/J SQL programmer can right click on the table and then choose "Show source" context menu option.
Or if you are using DBeaver Universal Database Management tool for developing your SQL solutions on Amazon Redshift Data Warehouse, you can right click on table name then follow menu options "Generate SQL > DDL" path
After the source table DDL script is fetched, change the table name with a new name by adding _tmp for example.
Also add the sortkey option by specifying the required table column(s) like below SQL code
After the new table is created, I executed following SQL command to copy data from one table into another
Actually, the INSERT statement executed less than 20 seconds astonishing me.
I am surprised because ALTER TABLE could not manage to reorganize table data on the other hand INSERT INTO SELECT FROM performed quite well.
Then I dropped the original database table. Be careful, be sure that the all data is now clonned into new table which is defined with SORTKEY column.
And rename the new table with deleted table's name so all existing code can be executed without any effect
Please note, in above ALTER TABLE RENAME SQL command, the first table name includes the schema name on the other hand the second table name does not.
Now if database programmers or data warehouse developers query the SVV_TABLE_INFO system view, they will see the sort key is applied successfully as a workaround.
I hope this Redshift SQL tutorial is useful for database administartors and SQL programmers building DWH solutions on cloud