SAP HANA SQLScript Split String Function
For SAP HANA database SQLScript developers it is possible to create a split string function using regular expressions function substr_regexpr() and series data function series_generate_integer(). In this HANA SQL tutorial, I want to show how SQLScript programmers can use these two HANA functions with Concat() function for string split tasks required in many database developments.
Let's assume that you have a string variable consists of a list of string values seperated with comma ',' as follows:
If you are familiar with regular expressions from your previous development efforts, you will be happy to learn that you can use SUBSTR_REGEXPR function to identify specific patterns that will help HANA database SQL developer to split delimeter seperated string values into tabular format.
For example, if you look at the pString variable value, you can realize that we don't have delimeter character "," at the beginning and at the end of the variable value.
What happens if we add delimeter character to the beginning and to the end of the SQL string value.
Developers experienced with Regular Expressions can now easily see the trick here.
If we search for a specific pattern that starts with "," our delimeter character followed with a number of characters and ends with next "," delimeter, we can split given input string into pieces.
You see we have extracted "Yoda" out of the pString SQLScript string variable. This is the first occurrence of the searched pattern.
What about extracting the second pattern in the variable, can we find it, too?
Let's execute following modified SQLScript SELECT statement
Unfortunately, the results are not as what we exactly expected.
"Anakin Skywalker" the second SubStr_RegExpr() function output is not actually the second term in the string list variable pString
This is because, if we remove pattern ",*," for ",Yoda," the remaining string "Mace Windu,Anakin Skywalker,Luke,Me," is not lacking of beginning "," delimeter character.
So the next occurrence fetched for ",*," pattern is at ",Anakin Skywalker,"
Similarly, if we use above SQLScript regular expression query we will fetch the third occurrence for ",Me," by-passing "Luke"
Database developers can modify the regular expression pattern from '[,][^,]*[?=,]' to '([,])([^,]*)(?=,)' form.
Additionally using the GROUP clause in the SUBSTR_REGEXPR() function will enable the HANA database developer to choose only the names, that is the 2nd group in the expressions by excluding the "," commas in the matched pattern.
Let's execute the new version of the SQLScript code
As expected the output will be as seen in following screenhot from SAP HANA Studio
If the SQL developer does not know how many items will be after splitting string and if he wants to get the resultset in tabular format not in a pivot format, then we need a numbers table function.
And the easiest method to create numbers table on SAP HANA database is using SERIES_GENERATE_INTEGER series data function.
Let's see now how to split string using SQL on SAP HANA database with SUBSTR_REGEXPR() and SERIES_GENERATE_INTEGER() functions.
You see how easy it is string splitting using SQLScript on SAP HANA database for SQL developers.
Series_Generate_Integer() function created an array of integer numbers starting 1 to 10.
And I used the column Element_Number in Occurrence clause so that I can match every occurrence of the regular expression within the given string variable.
But of course the upper limit of the table function or Series_Generate_Integer() function determines the maximum split count.
In this case 10.
So, if the database developer is not sure a number like 100 maybe more than enough.
Those "?" or NULL values are for those occurrences that does not have a match in the given string value.
Now another step maybe working with a database table which has a column of string data type like nvarchar(5000), etc where concatenated string values are stored.
Let's create a sample database table.
And now query HANA database table using following SQL Split String SELECT statement
Please note that since I used another SELECT statement and filtered NULL pieces that are caused of non-matched series data function values. The output of the SQL string split query is as seen in following screenshot.