Skip to main content

Query Excel file data using SQL Server PolyBase

What is PolyBase?

PloyBase is the technology used to connect external data sources to the SQL server. It can be used to query tables or files in an external data source. Initially, this was introduced with SQL Server 2016. At that time, it was limited to a few external data sources such as Big data clusters and Azure Blog Storage data. However, in SQL Server 2019 version it was expanded to add more data sources.

The following data sources can connect using SQL Server 2019 instance.

  • Big data clusters
  • Azure Blob storage
  • Relational Database (ex: SQL Server, Oracle, Teradata)
  • NoSQL (ex: MongoDB)
  • ODBC (ex: Excel)


1 Polybase basic hierarchy
                                                                         Polybase basic hierarchy


How to configure PolyBase in SQL Server?

You can install PolyBase with a new SQL Server instance or you can add PolyBase feature to an existing SQL Server instance. In both scenarios, you need to select PolyBase Query Service for External Data and Java connector for the HDFS data source (Not mandatory).


2 Add Polybase to SQL Server
                                                                       Add Polybase to SQL Server



After installation complete, you need to go to the Services and check the following new services you started. If not running, you need to start these two services manually.


3 PolyBase services
                                                                               PolyBase services


To confirm that the PolyBase installation is successful, you can run the following command in SQL Server Management Studio new query window.

SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsSuccessfullyInstalled;

If the installation is successful, it will display ‘1’ in the results window.


Query excel file using PolyBase

SQL Server 2019 introduced a facility to connect Excel file to SQL Server Using PolyBase via ODBC driver. Before starting, you need to make sure that the correct ODBC driver is installed on your computer. To check that you need to open the ODBC Data Source Administrator application. In the ODBC Data source driver list, the following driver should be installed. If not, you can download it from the here.


4 ODBC driver installation
                                                              ODBC driver installation


To read excel file data using SQL Server PolyBase, you need to follow the below steps.

  • First, create a new SQL Server Database.
  • Then, generate the Master Key for above created database using the following command.

  • Next step is to create a new excel file or download a sample excel file from the web (For this example, download the excel file from here)


  • After downloading the excel file, create a new External data source for it using the following command. You should give the correct excel file path for CONNECTION_OPTIONS. For the password, you can use the previously created password for the master key.



OPEN MASTER KEY DECRYPTION BY PASSWORD = '<Enter created master key Password >';
    LOCATION = 'odbc://noplace',

  CONNECTION_OPTIONS = 'Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)}; DBQ=D:\PolyBase\FinancialSample.xlsx'


  • If this command is successfully executed, it should create a new Data Source in External Data Source section in the created database.


5 Created external data source
     Created external data source


  • Next, you need to generate an external table for the excel file. For that, you can use the following command. Excel sheet column names and Table column names should match each other, data types should match each other as well. For the Location, you have to use the name of the excel sheet.


  CREATE EXTERNAL TABLE dbo.FinancialSample


        Segment NVARCHAR(255),
        Country NVARCHAR(255),
        Product NVARCHAR(255),
        DiscountBand NVARCHAR(255),
        UnitsSold FLOAT(53),
        ManufacturingPrice FLOAT(53),
        SalePrice FLOAT(53),
        GrossSales FLOAT(53),     
        COGS FLOAT(53),
        Profit FLOAT(53),
        MonthNumber FLOAT(53),
        MonthName NVARCHAR(255),
        Year NVARCHAR(255)
        LOCATION = '[Sheet1$]',
        DATA_SOURCE = FinancialSample


  • When this is successfully executed, it should display the new External Table in your created database.


6 Created external table
             Created external table



  • Now you can Query Excel file data using the SQL Server. To do that you can run the following select command for the created external table. If this is executed successfully, the result section should display excel file data.


SELECT  [Segment]


  FROM [PolyBaseDemo].[dbo].[FinancialSample]


7 Query Results window
                                                                                           Query Results window


  • If you change data in your excel file and rerun the above select query, those changes will appear in the results window. There’s no need to run any other Import query.



You are reading these data from the originally located excel file. You’re not importing data to SQL Server Database using any other external method. Therefore, you can save the data importing time. Also, you can query the real time data in the file. So, this is very important when working with big data.





© 2021 Creative Software. All Rights Reserved | Privacy | Terms of Use