Databricks Community

Frank · ‎09-21-2022

We tried to insert records to Delta table using ORM. It looks like only SQLAlchemy has option to connect to Delta table.

We tried the following code

from sqlalchemy import Column, String, DateTime, Integer, create_engine 
 
engine = create_engine("databricks+pyhive://token:<mytoken>@dbc-5a522242-184b.cloud.databricks.com:443/default", connect_args={"http_path": "/sql/1.0/endpoints/955e53e5f53e8105"}, echo=True)
 
 
engine = create_engine("databricks+pyhive://token:<mytoken> @dbc-5a522242-184b.cloud.databricks.com:443/default", connect_args={"cluster": "Starter Warehouse"}, echo=True)
 
 
Base.metadata.create_all(engine)

But there is errors

EOFError                                  Traceback (most recent call last)
<command-1061455742937075> in <module>
      1 # 3 - create table in db (migration)
      2 print(BASE_DIR)
----> 3 Base.metadata.create_all(engine)
 
/local_disk0/pythonVirtualEnvDirs/virtualEnv-7c147b6d-75bb-4981-8ab3-879a1c8e5fa7/lib/python3.8/site-packages/sqlalchemy/sql/schema.py in create_all(self, bind, tables, checkfirst)
   4915         if bind is None:
   4916             bind = _bind_or_error(self)
-> 4917         bind._run_ddl_visitor(
   4918             ddl.SchemaGenerator, self, checkfirst=checkfirst, tables=tables
   4919         )
 
/local_disk0/pythonVirtualEnvDirs/virtualEnv-7c147b6d-75bb-4981-8ab3-879a1c8e5fa7/lib/python3.8/site-packages/sqlalchemy/engine/base.py in _run_ddl_visitor(self, visitorcallable, element, **kwargs)
   3225 
   3226     def _run_ddl_visitor(self, visitorcallable, element, **kwargs):
-> 3227         with self.begin() as conn:
   3228             conn._run_ddl_visitor(visitorcallable, element, **kwargs)
   3229

Ryan_Chynoweth · ‎09-21-2022

Hi @Frank Zhang , I have not seen someone try this yet. Why are you looking to write data to delta using ORM? Is there some documentation/blog you are following to do this?

Typically you could just use df.write.saveAsTable to save data to delta.

Frank · ‎09-22-2022

Below are some examples. There is a library also for that.

https://github.com/dropbox/PyHive/issues/305

https://dev.to/fruiza/build-a-python-flask-api-for-your-delta-lake-3bgf

I guess we are open to other ways. This is only related with the ingestion data side. We have some transactions that we want to read in. Using ORM can make sure other existing app and code can work with it.

Ryan_Chynoweth · ‎09-23-2022

@Frank Zhang - I see that makes sense. So you are not coding in a Databricks notebook correct? You have a Flask application you developed that is running against Databricks cluster/warehouse.

1 - If I understand correctly (as stated above). Do you have the driver installed? I think this would likely require you to use a Simba Driver to create the connection. I think this would be the only way to use ORM.

2 - If you don't want to a driver you can use the Python Connector for Databricks. Check out this blog I wrote that connects a flask application. Note that I don't think ORM would work here but you may be able to find a way to do so.

Is that helpful?

Frank · ‎09-23-2022

Hi Ryan

Yes the item 2 Python SQL Connector is the first thing I tried before I was asked to explore on the ORM option.

So if the SQL connector works, I still need to install the Simba driver? Do I need to install this driver in my local server where I run flask or do I need to install this in Databrick cluster?

I thought since SQL connector can work, maybe I do not need another driver.

Frank · ‎09-23-2022

I tried to install the JDBC driver on MAC OS. I am using Python code. Can I still use this driver? Is there a doc on how to install it on Mac?

“DatabricksJDBC42.jar” cannot be opened because it is from an unidentified developer.

Ryan_Chynoweth · ‎09-26-2022

Hi @Frank Zhang ,

Please disregard the driver comment.

The Python SQL Connector requires no driver. Just a pip install and you are good to go.

The links you provided don't actually show a working example of using SQL Alchemy's ORM to connect to Databricks. It looks like someone created a library to do this that is not part of the SQLAlchemy project to solve this problem.

While I am not certain I do not believe you can use SQLAlchemy ORM to read the data. My recommendation would be to use the Python SQL Connector.

When you are reading the data, where are you reading it from? Application or Databricks?

Frank · ‎09-26-2022

Hi Ryan
How come the Redash can use Flask and SQLAlchemy ORM for Delta Lake?
We can use SQL for now but it will be good to know which way is better? The SQL connector is super slow for some reason.
1. Is there other data ingestion method you recommend for fast and transactional data?
2. Why Redash is using ORM? Is it faster than SQL?
3. Currently in query analyzer, each query is planning time is 2S why the planning time so long with SQL connector?
Thanks,

Ryan_Chynoweth · ‎09-27-2022

Do you have an example of Redash+SQLAlchemy ORM? It could be possible I just can't find an example of it to take a look.

A few follow up questions:

How much data are you reading?
What size SQL warehouse are you using?
How long is it taking?
Why are you using Redash instead of Databricks SQL?

Frank · ‎09-26-2022

Thanks Ryan for your help and inputs.
When you are reading the data, where are you reading it from? Application or Databricks?
We got the data from machine learning machines. They will generate the metrics at a constant rate.
We need to put those data somewhere, for example Delta lake.
The front end dashboard will read the data. It could be some aggregated function on the raw data, for example min/max/average of the raw data. We will not use Databrick to read the data.

Databricks Community

SQLAlchemy ORM Connection String Error

Photos

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks