cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is it possible to connect QGIS to Databricks using ODBC?

gfar
New Contributor II

I can connect ArcGIS to Databricks using ODBC, but using the same ODBC DSN for QGIS I get an error - Unable to initialize ODBC connection to DSN

Has anyone got this working?

13 REPLIES 13

Anonymous
Not applicable

@Grainne Farrantโ€‹ :

It is possible to connect QGIS to Databricks using ODBC, but it requires additional configuration. Here are the general steps to follow:

  1. Install the ODBC driver for Databricks on your machine where QGIS is installed. You can download the driver from the Databricks website.
  2. Set up a DSN for Databricks in the ODBC Data Source Administrator on your machine. Make sure you test the connection to ensure it is working properly.
  3. In QGIS, go to the Layer menu and choose Add Layer > Add Vector Layer. Choose the Database tab and click the New button to create a new connection.
  4. Choose the ODBC option and enter the name of the DSN you created in step 2.
  5. Enter your Databricks username and password and click Test Connection to make sure it is working properly.
  6. Once the connection is established, you should be able to see the tables and views available in your Databricks instance and add them to QGIS as layers.

Note that some users have reported issues with using ODBC drivers with QGIS on MacOS, so you may need to use a different approach if you are on that platform.

gfar
New Contributor II

Hello,

I'm on Windows. I have downloaded the driver, set-up the DSN for Databricks and tested it - all works fine. I have used this DSN to connect to Databricks from ArcGIS.

In QGIS, I followed your instructions and when I click test connections, I get an error - Connection failed- check settings and try again. Extended error information: Unable to initialize ODBC connection to DSN for <username>/<password>@<DSN>

I'm using QGIS 3.28.5-Firenze.

lukeoz
New Contributor III

I have been able to connect to Databricks in QGIS using the ODBC connection. An extra step not listed above that I had to do was to generate a personal access token (Databricks user Settings > Developer > Access tokens > Manage > Generate new token).

When setting up the DSN in step 2 above, I used the "User Name and Password" Authentication mechansim, using "token" as the username and the token generated above as the password. SSL has to be enabled in the SSL Options, and the value for Thrift Transport set to HTTP with the HTTP path value (available from the Databricks cluster's Advanced Options JDBC/ODBC tab) set in the HTTP Options

 

Then, once in QGIS, I was able to go to Layer > Add Layer > Add Vector Layer, choose Database, type ODBC, create a new connection, with the value of the Data Source Name when creating the DSN in ODBC DSN Setup used as the value for Database in the Connection information, and for the Authentication configuration, add an authentication configuration, using the same username ("token") and password (personal access token value) from creating the DSN

This all worked for me, and I was successfully able to connect to the database and the tables within them ... 

However, despite the tables I want having fields of "geometry" type, the connector doesn't seem to recognise these fields and be able to draw them in QGIS.

This is where I am currently stuck, if anyone is able to suggest what can be done to display the spatial features from the table stored in Databricks. The geometry features are valid, as I am able to visualise them using SedonaKepler in the local Databricks notebook. In QGIS, the geometry field values appear as "BLOB" in the attribute table and do not seem to be recognised as spatial features.

fgoulet
New Contributor III

Hi!

Did you do something on the DB-side to be able to connect from QGIS? 

When I test my connection in QGIS it works, but when I try to "add", there's an error message saying that "

invalid Datasource: ODBC:token/xxxxxxxxxxxxxxxxxxxxxxxx@sql_warehouse_analyst is not a valid datasource or recognize" (free translation from French ๐Ÿ˜‰ 

Our storage is parquet. Do we need Delta tables?

Thanks!

lukeoz
New Contributor III

I was loading in Delta tables, yes.

fgoulet
New Contributor III

Maybe that's our problem. Thanks!

lukeoz
New Contributor III

Good luck, and let me know if you can get the spatial features showing as well!

fgoulet
New Contributor III

I've confirmed that we use Delta Table. For the geometry column, I was going to do a ST_GeomFromWKT or ... to create a Query Layer...

On Databricks, in the Query History, I have what is attached... It's not a query. Problem from QGIS? Could I ask which version you are using? I have 3.38.2-Grenoble

Thanks!

fgoulet
New Contributor III

We needed to add our schema as Server Side Properties in the Advanced Options. We now see our tables in QGIS.

To view the geometry, you can use the Database Manager and choose "Virtual Layers/Project Layers" (I'm guessing, my QIGS is in French) and use 'st_geometryfromtext' to create a Query Layer. 

I tried with points, but as long as your geometry is in WKT in Databricks, it should work!

lukeoz
New Contributor III

With respect to adding your schema as Server Side Properties in the Advanced Options, could you please provide a screenshot so that I can see what the Keys and Values look like for your schema? Maybe this is what I am missing??

I am using QGIS 3.34 (long term release version)

My Delta table in Databricks had the spatial features as a geometry type in its schema, but in QGIS it did not recognise it as geometry, with the attribute table not showing any spatial features, with "BLOB" appearing as the value for all rows in the geometry field.

lukeoz_1-1736292811851.png

lukeoz_3-1736292883094.png

 

What I managed to do, however, was to add a field to the Delta table of the geometry WKT and store the field as a string type. When QGIS reads in this table, I then created a virtual layer, with the query on the Delta table running a ST_GeomFromText transformation on the geometry string type field, and it displayed the spatial features that way.

lukeoz_4-1736293064657.png

lukeoz_5-1736293080775.png

While this is a workaround, I would like to think it is possible to store the geometry features in the Delta table as a geometry type, and for QGIS to recognise that field as geometry.

 

Please let me know about how the schema was added to the Server Side Properties ... if that doesn't work, I may have to stick with keeping the geometry as a string format in the Delta Table

 

Merci! ๐Ÿ™‚

 

 

 

 

fgoulet
New Contributor III

The key is "databricks.catalog", but it didn't solved my problem with the geometry either. It only works for WKT (but, don't quote me on this ๐Ÿ˜‰ I heard that Databricks may be working on a geometry type that could be native (we have geometry as a UDT and use Sedona).

Also, I have probably 30 tables in my schema (not that much) with millions of row (again, not billions) and yesterday it analyzed the schema for hours (the objects being greyed and unselectionable) before being able to add them to QGIS. 

lukeoz
New Contributor III

Ah, maybe that's what the problem is. With respect to the large number of tables, I have experienced the same problem. I have tried changing the ODBC connection's HTTP path so that it connected to a Serverless Warehouse, but I haven't been able to work that out yet. Also, while I am still trying to work out how to only connect to a single named schema (I haven't been able to figure out a wildcard), I can connect to a single Delta table, which is a much quicker process. In the Database field, I added the name of the table after the name of the ODBC connection.

It only lists that one table, but it does make the process much quicker. Hopefully that is of some help. If you make progress in other areas, I'll be happy to hear from you!

 

lukeoz_0-1736373484519.png

 

fgoulet
New Contributor III

That should probably help, but I tried and my table has 0 rows when the same table loaded with all the schema analyzed has 835...

Still have testing to do, but with that, you can now choose a single file to add using the connection string
ODBC:token/yourpersonalprivatetoken@yourodbcconnection,yourtable

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group