Data Virtuality SQL Query for Online CSV Data using Web Service Connector
Data virtualization architects and developers can consume and query data published as CSV from a web URL using Data Virtuality. In this Data Virtuality tutorial I want to show how to connect to a CSV data file using generic web service connector invokeHTTP producedure and use standard SQL syntax to query this comma seperated values data from remote web service.
If you search on the web, especially data scientists will find a lot of resources publishing many statistics as CSV as publibly available like stats.govt.nz
Especiall Google's Dataset Search tool is great for finding public data available on the web.
For the sake of this tutorial, I will use the English Dictionary data set published in CSV format via URL CSV data file
First, launch Data Virtuality Studio.
Since we are going to use invokeHTTP producedure, first we need to create a web service data source using the Web Service connector on Data Virtuality server.
If you have created any web service data source before, you can use that web service and call the invokeHTTP producedure of that previously created web service.
In this case you can skip this web service data source creation step.
Create Web Service Data Source for CSV File
In order to create a web service data source, right click on "Data Sources" node within Data Explorer window. From the context menu, select option "Add data source"
Select "Web Service" data source type on the first dialog screen of the data source creation wizard
Continue by pressing "Next" button to next screen for entering data source parameters for the new web service we are creating on Data Virtuality.
An Alias type a descriptive name for your new resource. For example, ws_English for our English dictionary CSV file.
For the End point, we can paste the CSV file download URL.
That is: https://www.bragitoff.com/wp-content/uploads/2016/03/dictionary.csv
Since that is public and can be accessed anonymously, keep "Security type" as "None" which is the default option
Leave all other options as default
Click Next and then Finish to complete web service data source creation to consume the CSV file on Data Virtuality.
When you get "Data source added" message, click OK to close the data source creation wizard.
Query CSV Data using CSV Query Builder Tool
Now, using one of the the web service data sources created on Data Virtuality server, we will now query CSV data and create a SQL View object for later use.
Let's open the SQL Editor. You can use "Alt+E" or use the icon
Open CSV Query Builder tool using the icon
When CSV Query Builder tool is launched, switch to Web service tab.
For Data Source, choose the web service data source we have just created, or one of the previously created web services that you have access on the Data Virtuality server.
For End Point, be sure that it is the CSV file download URL
If you have downloaded the file on to your computer and checked the CSV data file, you will see whether it has a header line or you will see which character is used as the seperator character for the CSV file. Using your observation on the CSV file, you can configure Dynamic parsing section. In most cases, default options are correct. But for this English words list, there is not a header line, so I unmark this checkbox next to Header.
When you click Apply button the tool will analyze the source data.
As a result of the data source analyze, it will display the columns data and provide some sample data rows parsed accordingly.
For most cases when there is a header, we can directly close the CSV Query Builder application.
But in this case it is better to rename the data columns as follows:
Click OK to close the wizard
When the CSV Query Builder tool closes, it created automatically the SQL code we need to access online CSV data using the web service data source created in previous steps.
SQL programmers and data virtualization platform developers can modify following SQL code for different CSV files. Although this is an option for SQL developer, using the CSV Query Builder tool will visualize the data with preview option and enable use all configuration options for the data source.
SELECT
"csv_table"."Word"
,"csv_table"."Tip"
,"csv_table"."Meaning"
FROM
(
exec "ws_English".invokeHTTP (
endpoint => 'https://www.bragitoff.com/wp-content/uploads/2016/03/dictionary.csv'
,action => 'GET'
,requestContentType => 'application/xml'
)
) w
,TEXTTABLE (
to_chars (
w.result
,'UTF-8'
) COLUMNS "Word" STRING
,"Tip" STRING
,"Meaning" STRING DELIMITER ',' QUOTE '"'
) "csv_table";;
Let's now create a SQL View using above SQL Select statement.
You will need a Virtual Schema for this. If you have previously created one, you can also use that virtual schema, too
All you need to do is adding the following first two lines of SQL code "CREATE VIEW VirtualSchema.ViewName AS" in front of the above SQL Select query
Create View Utilities.English_Words_List
AS
SELECT
"csv_table"."Word"
,"csv_table"."Tip"
,"csv_table"."Meaning"
FROM
(
exec "ws_English".invokeHTTP (
endpoint => 'https://www.bragitoff.com/wp-content/uploads/2016/03/dictionary.csv'
,action => 'GET'
,requestContentType => 'application/xml'
)
) w
,TEXTTABLE (
to_chars (
w.result
,'UTF-8'
) COLUMNS "Word" STRING
,"Tip" STRING
,"Meaning" STRING DELIMITER ',' QUOTE '"'
) "csv_table";;
Now I can only refer to view without using all underelying web service invokeHTTP procedure execution codes as follows to query specific words in English for their meaning
SELECT "Word", "Tip", "Meaning"
FROM "Utilities.English_Words_List"
WHERE "Word" LIKE 'Eagle%';;
To summarize, data virtualization platform architects can create a single general web service data source for CSV files from different web resources. Data Virtuality developers can use this web service connector with the help of CSV Query Builder tool to fetch the CSV data or to execute SQL queries on the online data.
I hope data professionals working on Logical Data Warehouse solution Data Virtuality will find this tutorial useful.