Connect SQL Server Database from Pivotal Greenplum using PXF
In this Greenplum database tutorial, I want to share the steps to connect SQL Server database using PXF Greenplum Platform Extension Framework and external tables. PXF enables SQL developers to define database connections to remote data sources and create external tables which are linked tables or virtual tables with data is read on demand from external resource. Since I did not come across with a sample that connects Greenplum to SQL Server database on the web, I wanted to show the required configurations in order with database programmers and Greenplum data warehouse developers.
Connect to Greenplum Server Master Node
Connect to Greenplum master node using putty
Login with "gpadmin" user and execute following command
Download SQL Server JDBC Driver
Download the appropriate JDBC driver for SQL Server from Microsoft SQL Docs web portal.
If you choose Microsoft JDBC Driver 7.4 for SQL Server like me to download and install on your Greenplum Server,
For Linux the driver download file is sqljdbc_7.4.1.0_enu.tar.gz
And to install the driver on a Windows server, the download file is sqljdbc_7.4.1.0_enu.exe
Both have a size of 37.5 MB
If you unzip the compressed file, you will see instructions for how to install the JDBC driver on Linux
For accessing SQL Server from Pivotal Greenplum using PXF, the data warehouse administrators can use the mssql-jdbc-7.4.1.jre8.jar .jar file
Greenplum PXF Configuration to Connect SQL Server
Check PXF configuration folder and switch to that directory.
Run following command one after an other
Echo command will show us that the $PXF_CONF (PXF Configuration Directory) is usr/local/greenplum-pxf, for your notes. Then switch to PXF user configuration directory.
If you list the contents of the folder, you will see that there are subfolders:
lib folder for external driver files,
templates folder where the PXF server connection configuration file templates are stored,
servers folder where Greenplum PXF developers should create the server connection file
Switch to "lib" folder and copy the SQL Server JDBC driver file mssql-jdbc-7.4.1.jre8.jar into this directory using "cp" copy command
Now, go back one level up from "lib" folder. Then switch to "templates" folder. You will see "jdbc-site.xml" template file in this folder. We will copy this file and make modifications on it in following steps
Again return to $PXF_CONF folder which is back in one level up using "cd .." command
Now enter into "servers" folder.
The Greenplum administrator should create separate sub-folder for each external JDBC connection in this "servers" folder
If you have not yet defined anything, you will only see the "default" subfolder here.
As seen below, I have already created JDBC connections for PXF external tables since I have 3 more subfolders in this directory.
To connect to SQL Server, you need to create a server connection definition.
Let's name it "sqlserver" .
By the way, you can give any descriptive name here, this name is not related with the target data source platform.
Create the subfolder using following command
And enter to sqlserver folder
And now copy the jdbc connection template file into this new folder
Now the Greenplum developers should edit the jdbc-site.xml file contents for the target SQL Server database instance.
If you look at the contents with "more jdbc-site.xml" command, you will see that we need following information to successfully connect to a remote SQL Server instance.
JDBC driver class name,
JDBC URL for SQL Server,
A valid database user and its password
Using "Ctrl+C" you can exit from the file content display command.
For the JDBC connection URL for SQL Server is different from the one shared in the template which is valid for PostgreSQL databases.
So, if you are trying to connect to a different data platform than PostgreSQL (or its variants like Amazon Redshift, Greenplum, etc.), you should have a look at JDBC Drivers Reference
For SQL Server the JDBC URL syntax is in following format:
Port is default 1433.
For example, if you want to access to a SQL Server database named "kodyazdb"running on host "10.160.80.40" thant the JDBC URL will be:
As I said before the template is misleading the developers at this point, you should double-check the JDBC connection URL from other resources too.
Now edit the "jdbc-site.xml" connection configuration file using "vi" tool.
Save the changes.
Create External Table on Greenplum Database
Now we are ready to create external table on Greenplum database which connects to SQL Server database table and reads data on demand using PXF, Greenplum Platform Extension Framework.
Here is the CREATE EXTERNAL TABLE command for SQL developers on Greenplum database.
In following above SQL SELECT command;
"dbo.states" is the schema name and the table or view name where the data is being read on the SQL Server data source. Since the database name is already defined in jdbc-site.xml file, database is not mentioned in the script.
"sqlserver", the value of "SERVER" parameter in Location argument is the name of the folder created for server configuration where the jdbc-site.xml file is created in
Then by querying the external table pxf_states using following SQL Select statement;
SQL developers building applications on Greenplum database can read data from external database tables easily using PXF, Greenplum Platform Extension Framework.