Select Random Rows From SAP HANA Database Table using SQLScript
To Select random rows or selecting random records on SAP HANA database table, SQLScript developers can use SQL functions just like in other databases.
Another requirement is to order or sort a set of rows randomly.
In both cases, database programmers can use RAND() function.
Another option can be using the SQLScript TABLESAMPLE clause to select randomly choosen rows from a dataset.
Select Specific Number of Random Rows on SAP HANA Database
By definition SQL Rand() function returns a pseudo-random numeric value (in Double data type). Rand_Secure() is a similar function which is better for using security purposes. On the other hand, Rand() performs better.
Rand() and Rand_Secure() random functions returns a double numeric value within the range of 0 to 1 (less than 1 excluding it).
Let's demonstrate random selection of table records with a few RAND() function SQLScript samples.
Here is our first SQL example. Please note that both of the SELECT statements are same in nature.
The random row selection in SQL Server can be accomplished by sorting the table with ORDER BY NewID() clause.
Not a HANA database solution but as a SQL developer you can keep it in your mind :)
Of course if the SQL developers require more than a single row, they can replace 1 with number n. Or it is better to replace the number with an integer variable and use in random Select statement as follows:
Select Random Rows for Sampling on HANA Database
As a developer, if you are interested to work with a randomly selected sample data from a database table, SQLScript provides TABLESAMPLE clause in SELECT statement syntax.
TableSample clause returns a random sample of the table data.
As the SQL programmer, you can specify the sample size as a percentage of the data source.
Let's run our sample SQL script code to demonstrate how database developers can use TABLESAMPLE clause with SELECT statement.
Above SQL Select statements returns randomly selected rows forming approximately 10% of the source object
What I have experienced when I have work with a small amount of data (like slightly more than 100 rows) is that:
If you are working with a small percentage of sampling size, Bernoulli sampling or System sampling can return no rows where you expect to see at least 1 record.
If you choose, let's say 10 percent instead of 1 percent, Bernoulli sampling returns a number of rows that is close to the number which can be calculated based on the total number of rows in the source and sampling percentage.
On the other hand, System handling returns a set of rows whose row number has a wider range.
In short, I'ld prefer to use Bernoulli sampling in case I can pay the performance cost.
In addition to the TableSample clause, within the returned list, you can sort rows using Rand() random function and select top 3 rows only, for example. But since you might have less rows than expected from the table sampling functions (TABLESAMPLE BERNOULLI or TABLESAMPLE SYSTEM), it is not a convenient method of selected specific number of random rows from a database table.
Although Bernoulli sampling returns what I require, System sampling returned nothing because of its nature for wider variance.