Data Virtuality File Connector and Query CSV File Data
Using Data Virtuality File Connector, users can define a path for data files on a file share or local path and query data files like CSV files or XML/JSON files using SQL queries. In this tutorial, I want to show the steps how a local directory can be defined as a data source to Data Virtuality Server using File Connector and how the contents of a CSV file can be displayed and a SQL query can be build on top of CSV file data.
Create Connection to Local Path using Data Virtuality File Connector
On my local computer I have a number of CSV files which contains some data. Using Data Virtuality File Connector I want to introduce the file directory (or a file share) as a data source for Data Virtuality Logical Data Warehouse for my installation.
In this tutorial, I want to show the steps of using File Connector with a sample and how to access data over this file connection.
Launch Data Virtuality Studio and connect to the target Data Virtuality server as the first step.
On Data Explorer window right click on Data sources node to create a new Data Source using File connector.
When the "Select data source type" screen is displayed, among File node choose File Connector as seen in following screenshot.
To configure the File Connector settings for a new file folder, provide similar entries according to your case seen in below screenshot.
Provide a descriptive name to your folder path. This will be the schema name when you try to reach files in this path via the file connector.
Enter the directory path. Since my files containing data are stored in "C:/Kodyaz" folder I entered this file folder path
Keep other parameters as offered default.
Click Next to continue and if you wish you can mark "Gather statistics" checkbox to collect statistical information about this data source. Click on Finish button to complete setup.
If you get the message "Data source added." then you have successfully defined a new data source using File connector to reach data contained in files within given path.
After the file folder is added as a new data source, you will not see the files as a table or view under the data source schema name. It will be empty showing 0 tables/views.
But using the procedures that are assigned to data sources created with File Connector, it is possible to access files and data within the data source or file folder.
For example, to get the list of all files in target directory or file path, following CALL command can be used to execute getFiles procedure. Just double click on getFiles, a template for the CALL procedure command will be displayed in a new SQL editor. You can modify the procedure call command in the SQL editor according to your requirement.
The output of the above SQL command will give us the filePath including file name and folder, last modified time, last access time, creation time, whether it is a regular file or a directory, the file size and a file key, etc.
Create Virtual Schema
Within Data Virtuality Studio, in Data Explorer window, right click on "Virtual Schemas" and from context menu option choose "Create schema" as follows:
Give a descriptive name to your new virtual schema. In this example tutorial, I will use "localfiles" since the schema will contain data from local files connected by Data Virtuality File Connector
Then click Add button. The new schame localfiles will be automatically listed under the list "Virtual schemas" on Data Explorer window. If not, you can use the Refresh list context menu option displayed by a right-click on Virtual Schemas node.
We are now ready to create a view within the virtual schema so it is possible to query the data within data files any time easily.
Create View
In this step, we will create a new view within our recently created Virtual Schema localfiles which SELECT and display all data in sample schools.csv file which locates on file folder defined by the file connector.
Please note that, above TEXTTABLE() function splits the file column data that is fetched with getFiles() procedure into rows as well as into columns defined in COLUMNS parameter and other parameters like DELIMETER, HEADER line count, etc.
Execute SQL Query on CSV File via View
Data professionals can now build SQL query to query data that .csv file contains using standard SQL query. It is also possible to use view data defined by using file connector within SQL queries and join with other data sources, too.
Following SQL query is a very sample SELECT query without any filtering criteria or a JOIN condition.
I hope Data Virtuality users now have a quick step to use File Connector with text files including data and stored on local folders. As seen in this tutorial, it is easy to map to the file directory using File connector using as data source type and read file contents with procedures available for the data type.