Split String Delimited Data on Amazon Redshift Database using SQL
In this Amazon Redshift database SQL tutorial, I want to show SQL developers how to split string values using SQL string split_part function with examples. Splitting string is a requirement for many cases where data is stored as a concatenated form in a database table column. In this Redshift tutorial, I will show how to split string values and split string data type column values using split_part function.
As helper database objects and Redshift database functions; I will use a numbers table, Common Table Expression CTE and regexp_count function
Let's assume that you want to split a string formed of concatenated value of database platform names like 'Amazon Redshift, SQL Server, Oracle, MySQL, PostgreSQL, Aurora'
We can split this string into pieces using split_part string function and fetch the first item in the databases list as "Amazon Redshift".
SQL programmer can continue to extract second, third, and other items from the concatenated string value.
Although there are not 7 items in the databases list, the last column value is empty string. The index starts from 1 and continues with positive numbers.
And now assume, we have a table of lists on different data.
Here is our sample database table DDL script and sample data Insert commands.
Please note that I have created a temporary table on Amazon Redshift database.
By executing below SQL code on Redshift database, I can also display the number of items in each list.
regexp_count function enables SQL developer to find the items count of delimiter character "," comma
By using below SQL query, we can query table data to extract items from the delimited list.
On the other hand, it is not a good method to provide third parameter of the SQL string function split_part
It is difficult and useless to try to guess the number of pieces in delimited value. Instead of this, we can create a numbers table on Amazon Redshift database.
To create a numbers table on Amazon Redshift database, I could use generate_series function to create a series of numbers starting from 1 to a large number enough to return all delimited parts.
Unfortunately, the numbers table created on Redshift database using SQL generate_series function can not be joined with other database tables.
Another way to build a SQL numbers table on Redshift database is executing following CREATE Table code
Above code creates a database table populated with integer values starting from 1 to 256
Now we can combine these two tables or better use CTE Command Table Expression and using an INNER JOIN to split delimited table data just like database developers do on SQL Server and many other databases with CROSS JOIN and split table functions.
The result of executing above SQL Select query on Redshift table with sample data contains 16 rows each mapping to a part of concatenated list column values.
As seen in this Redshift SQL tutorial, although it is a bit confusing, splitting string on Redshift database is possible.
One last note, SQL split functions on data warehouses is resource and time consuming task because databases like Redshift are produced for aggregations not for string processing.