SQL Server administration and T-SQL development, Web Programming with ASP.NET, HTML5 and Javascript, Windows Phone 8 app development, SAP Smartforms and ABAP Programming, Windows 7, Visual Studio and MS Office software
Development resources, articles, tutorials, code samples, tools and downloads for AWS Amazon Web Services, Redshift, AWS Lambda Functions, S3 Buckets, VPC, EC2, IAM

Split String Delimited Data on Amazon Redshift Database using SQL


In this Amazon Redshift database SQL tutorial, I want to show SQL developers how to split string values using SQL string split_part function with examples. Splitting string is a requirement for many cases where data is stored as a concatenated form in a database table column. In this Redshift tutorial, I will show how to split string values and split string data type column values using split_part function.

As helper database objects and Redshift database functions; I will use a numbers table, Common Table Expression CTE and regexp_count function

SQL Split String on Redshift Database

Let's assume that you want to split a string formed of concatenated value of database platform names like 'Amazon Redshift, SQL Server, Oracle, MySQL, PostgreSQL, Aurora'
We can split this string into pieces using split_part string function and fetch the first item in the databases list as "Amazon Redshift".

SELECT SPLIT_PART('Amazon Redshift, SQL Server, Oracle, MySQL, PostgreSQL, Aurora', ',', 1);
Code

SQL programmer can continue to extract second, third, and other items from the concatenated string value.

SELECT
SPLIT_PART('Amazon Redshift, SQL Server, Oracle, MySQL, PostgreSQL, Aurora', ',', 1),
SPLIT_PART('Amazon Redshift, SQL Server, Oracle, MySQL, PostgreSQL, Aurora', ',', 2),
SPLIT_PART('Amazon Redshift, SQL Server, Oracle, MySQL, PostgreSQL, Aurora', ',', 3),
SPLIT_PART('Amazon Redshift, SQL Server, Oracle, MySQL, PostgreSQL, Aurora', ',', 4),
SPLIT_PART('Amazon Redshift, SQL Server, Oracle, MySQL, PostgreSQL, Aurora', ',', 5),
SPLIT_PART('Amazon Redshift, SQL Server, Oracle, MySQL, PostgreSQL, Aurora', ',', 6),
SPLIT_PART('Amazon Redshift, SQL Server, Oracle, MySQL, PostgreSQL, Aurora', ',', 7);
SQL split code

Although there are not 7 items in the databases list, the last column value is empty string. The index starts from 1 and continues with positive numbers.

split string on Amazon Redshift database

And now assume, we have a table of lists on different data.
Here is our sample database table DDL script and sample data Insert commands.
Please note that I have created a temporary table on Amazon Redshift database.

Create Temporary Table DelimitedItems (
 Id int,
 List varchar(max)
);

Insert Into DelimitedItems Select 1, 'Turkey, Germany, Russia, China, Spain';
Insert Into DelimitedItems Select 2, 'VB, C-Sharp, C++, Java, Python, Go';
Insert Into DelimitedItems Select 3, 'smallint, int, bigint, date, varchar';
sample data for SQL code

By executing below SQL code on Redshift database, I can also display the number of items in each list.

Select
 id,
 list,
 regexp_count(list, ',')+1 as "count"
From DelimitedItems;
Using RegExp_Count SQL function

regexp_count function enables SQL developer to find the items count of delimiter character "," comma

Redshift temporary table with delimited data

By using below SQL query, we can query table data to extract items from the delimited list.

Select
 id,
 split_part(list,',',1) as first,
 split_part(list,',',2) as second,
 split_part(list,',',3) as third,
 split_part(list,',',4) as fourth,
 split_part(list,',',5) as fifth,
 split_part(list,',',6) as sixth
From DelimitedItems;
SQL String function split_part

On the other hand, it is not a good method to provide third parameter of the SQL string function split_part
It is difficult and useless to try to guess the number of pieces in delimited value. Instead of this, we can create a numbers table on Amazon Redshift database.

To create a numbers table on Amazon Redshift database, I could use generate_series function to create a series of numbers starting from 1 to a large number enough to return all delimited parts.

select generate_series as id from generate_series(1,30,1);
Create series in SQL

Unfortunately, the numbers table created on Redshift database using SQL generate_series function can not be joined with other database tables.

Amazon Redshift database SQL numbers table

Another way to build a SQL numbers table on Redshift database is executing following CREATE Table code

CREATE TEMPORARY TABLE numbers AS (
SELECT
1 + p0.n
+ p1.n*2
+ p2.n * POWER(2,2)
+ p3.n * POWER(2,3)
+ p4.n * POWER(2,4)
+ p5.n * POWER(2,5)
+ p6.n * POWER(2,6)
+ p7.n * POWER(2,7)
as num
FROM
(SELECT 0 as n UNION SELECT 1) p0,
(SELECT 0 as n UNION SELECT 1) p1,
(SELECT 0 as n UNION SELECT 1) p2,
(SELECT 0 as n UNION SELECT 1) p3,
(SELECT 0 as n UNION SELECT 1) p4,
(SELECT 0 as n UNION SELECT 1) p5,
(SELECT 0 as n UNION SELECT 1) p6,
(SELECT 0 as n UNION SELECT 1) p7
);
SQL temporary table

Above code creates a database table populated with integer values starting from 1 to 256

Now we can combine these two tables or better use CTE Command Table Expression and using an INNER JOIN to split delimited table data just like database developers do on SQL Server and many other databases with CROSS JOIN and split table functions.

With numbers as (
 SELECT
  1 + p0.n
  + p1.n*2
  + p2.n * POWER(2,2)
  + p3.n * POWER(2,3)
  + p4.n * POWER(2,4)
  + p5.n * POWER(2,5)
  + p6.n * POWER(2,6)
  + p7.n * POWER(2,7)
  as num
 FROM
  (SELECT 0 as n UNION SELECT 1) p0,
  (SELECT 0 as n UNION SELECT 1) p1,
  (SELECT 0 as n UNION SELECT 1) p2,
  (SELECT 0 as n UNION SELECT 1) p3,
  (SELECT 0 as n UNION SELECT 1) p4,
  (SELECT 0 as n UNION SELECT 1) p5,
  (SELECT 0 as n UNION SELECT 1) p6,
  (SELECT 0 as n UNION SELECT 1) p7
), delimited as (
 Select
  id,
  list,
  regexp_count(list, ',') + 1 as "count"
 From DelimitedItems
)
Select
 tbl.id,
 tbl.list,
 tbl.count,
 numbers.num,
 split_part(tbl.list, ',', numbers.num::int) as t
From delimited tbl
Inner Join numbers
On numbers.num <= tbl.count
Order BY
 tbl.id, numbers.num;
SQL code

The result of executing above SQL Select query on Redshift table with sample data contains 16 rows each mapping to a part of concatenated list column values.

SQL split code for Redshift database table with delimited data

As seen in this Redshift SQL tutorial, although it is a bit confusing, splitting string on Redshift database is possible.

One last note, SQL split functions on data warehouses is resource and time consuming task because databases like Redshift are produced for aggregations not for string processing.



AWS


Copyright © 2004 - 2024 Eralper YILMAZ. All rights reserved.