SQL Server administration and T-SQL development, Web Programming with ASP.NET, HTML5 and Javascript, Windows Phone 8 app development, SAP Smartforms and ABAP Programming, Windows 7, Visual Studio and MS Office software
Data Virtualization Tools and Software Denodo Platform and Data Virtuality Tutorials and Downloads for SQL Database Developer

Connect FTP Servers and Query Data using SQL in Data Virtuality

Data Virtuality provides FTP and SFTP connectors for Logical Data Warehouse administrator and architect to enable them to connect remote FTP/SFTP servers, list files on FTP server, upload and download files and even enables SQL developers working on DV server to query remote FTP CSV files using SQL commands. In this Data Virtuality tutorial, I want to show how to create an FTP data source, how to connect FTP server from Data Virtuality Studio, list files, delete files, etc. and query data stored in CSV files stored on FTP server using SQL query within Data Virtuality SQL Editor.

Data Virtuality FTP data connector

To be sure that every reader of this Data Virtuality tutorial can follow the steps described in this tutorial easily with hands-on experience, I checked for a public FTP server which enables connect, read, write and even delete files from the FTP server. I will use DLP Test FTP Server for this data virtualization tutorial which provides free public access with a predefined username and password.

If you visit the web page above you will see following provided server information for a successfull connection.

FTP Host: ftp.dlptest.com
Username: dlpuser@dlptest.com
Password: eUj8GeW55SvYaswqUyDSm5v6N


Create FTP Data Source on Data Virtuality

Now let's launch Data Virtuality Studio and create a new FTP data source for above public FTP server.
On Data Explorer windows right-click on Data Sources node and click on "Add data source"

add FTP data source to Data Virtuality

Select "FTP" under "File" as the data source type for our remote FTP server connection from Data Virtuality server.
Please note, althouth the configurations are very similar, for SFTP connections there is an other connector named SFTP

Data Virtuality FTP connector to access remote FTP servers

Click Next to continue

Now on FTP data source parameters dialog screen, please populate given fields below according to your target server.
For this tutorial, I will use the dlptest FTP server.

Alias is a descriptive name for your server. By following my naming convention, I start with "ftp_" as indicating data source type and the server name. So my alias or data source name is ftp_dlptest

Host is the FTP server name, for my case it is ftp.dlptest.com
Please note that the host name does not include "ftp://" etc.

Remote Directory: Some FTP server administrators forces their clients to write to a specific folder. So they have some rules about incoming file like Upload folder, outgoing, or public files, etc. For our case, we will use the home directory.

Port is by default 21 for FTP connections unless the server has been configured to use an alternative port.

User name and Password: Please use a valid credential that will provide you access to the remote FTP server. For our test, I shared the username and password given by the FTP server owners in previous sections in this tutorial.

FTP server connection parameters on Data Virtuality data source

When you provide all necessary connection parameters, press Test Connection button to validate if the FTP server access can be established successfully. If yes continue to next data source wizard screen and finish configuration. If you see "Data source added" message then we can proceed to following step.


Data Virtuality FTP Data Source Procedures

Under the FTP data source, we have just created there are 5 procedures shipped by default.
These are:

deleteFiles: Deletes the specified files.
getFilesReturns text files that match the given path and pattern as BLOBs.
getTextFilesReturns text files that match the given path and pattern as CLOBs.
listFilesLists all files that match the given path and pattern.
saveFileSaves the given value to the given path. Any existing file will be overridden.

FTP data source Data Virtuality procedures

Now by using listFiles procedure, we can list the files on target FTP server.

call "ftp_dlptest.listFiles"(
 "pathAndPattern" => '*.csv'
);;

Optional "pathAndPattern" parameter enables developer on Data Virtuality, to filter the files list based on the file name and its path

list files on FTP server on Data Virtuality

You can also use the same procedure in following format for all files on the remote FTP server.

call "ftp_dlptest.listFiles"();;

Let's now delete one of the files (test.csv) using deleteFiles FTP data source procedure.

call "ftp_dlptest.deleteFiles"("pathAndPattern" => 'test.csv');;

After the deleteFiles procedure, if the Data Virtuality developer executes the listFiles procedure once more, he or she will see that the file is not any more on the remote FTP server and deleted successfully.


Export Data from Data Virtuality to FTP Server in CSV Format

In order to transfer data from one of the existing data sources into FTP server, we can use following SQL code which is executing the procedure named UTILS.csvExport
For more on UTILS.csvExport please refer to Data Virtuality documentation at CSV Export with FTP Upload

Following csvExport procedure execution will enable me to create a CSV file named Cities.csv on remote FTP server with data source name ftp_dlptest.

The CSV file will contain the data in table/view Cities within schema or data source named sandbox
Actually, I can read a table or view by its name and export all its data in CSV format to an FTP server easily.

call "UTILS.csvExport" (
 "sourceSchema" => 'sandbox'
 ,"sourceTable" => 'Cities'
 ,"targetSchema" => 'ftp_dlptest'
 ,"targetFile" => 'Cities.csv'
 ,"header" => true
);;

If I again execute the listFiles procedure, I see that a new CSV file is created on target FTP server

export data from Data Virtuality to FTP server in CSV format


SQL Query for FTP Server CSV Data on Data Virtuality

Now let's launch a new SQL Editor SQL Editor and try to prepare a SQL query using CSV Query Builder tool CSV Query Builder.

On File source tab, select the FTP data source name from the "Data source" combo box.
Then on "File format", be sure that "CSV" is selected.

"File name" can be typed from the list we fetched bby executing listFiles procedure on the FTP data source.
So my taget CSV data file name "Cities.csv"

On Dynamic parsing section, I know that the file has headers and the delimeter is "," comma character. So I keep all setting as default values.

When I click Apply, the CSV Query Builder will analyze the remote file stored on FTP server and parse it. As a result, I can see the column names and sample rows are displayed in preview screen as follows

Data Virtuality CSV Query Builder for FTP server files

If I click OK, following SQL code will be prepared for Data Virtuality developer on the SQL Editor screen.

SELECT
 "csv_table"."LatD",
 "csv_table"."LatM",
 "csv_table"."LatS",
 "csv_table"."NS",
 "csv_table"."LonD",
 "csv_table"."LonM",
 "csv_table"."LonS",
 "csv_table"."EW",
 "csv_table"."City",
 "csv_table"."State"
FROM
(call "ftp_dlptest".getFiles('Cities.csv')) f,
 TEXTTABLE(to_chars(f.file,'UTF-8')
  COLUMNS
  "LatD" STRING ,
  "LatM" STRING ,
  "LatS" STRING ,
  "NS" STRING ,
  "LonD" STRING ,
  "LonM" STRING ,
  "LonS" STRING ,
  "EW" STRING ,
  "City" STRING ,
  "State" STRING
  DELIMITER ','
  QUOTE '"'
  HEADER 1
 )
"csv_table";;

You see, by executing the SELECT query, I can access FTP server, parse CSV file and fetch data stored in the remote file using SQL easily on Data Virtuality

Query data using SQL code from FTP CSV file on Data Virtuality

In this Data Virtuality tutorial, I wanted to demonstrate how easy it is to query FTP server CSV files using standard SQL codes. Data virtualization platforms like Data Virtuality provides options for data architects and data consumers to access data on demand from different data sources. FTP servers are one of the data sources that data consumers or data visualization tools like Power BI, Tableau, etc can access and query data from.

Data Virtualization


Copyright © 2004 - 2021 Eralper YILMAZ. All rights reserved.