09-21-2022 12:55 PM
We tried to insert records to Delta table using ORM. It looks like only SQLAlchemy has option to connect to Delta table.
We tried the following code
from sqlalchemy import Column, String, DateTime, Integer, create_engine
engine = create_engine("databricks+pyhive://token:<mytoken>@dbc-5a522242-184b.cloud.databricks.com:443/default", connect_args={"http_path": "/sql/1.0/endpoints/955e53e5f53e8105"}, echo=True)
engine = create_engine("databricks+pyhive://token:<mytoken> @dbc-5a522242-184b.cloud.databricks.com:443/default", connect_args={"cluster": "Starter Warehouse"}, echo=True)
Base.metadata.create_all(engine)
But there is errors
EOFError Traceback (most recent call last)
<command-1061455742937075> in <module>
1 # 3 - create table in db (migration)
2 print(BASE_DIR)
----> 3 Base.metadata.create_all(engine)
/local_disk0/pythonVirtualEnvDirs/virtualEnv-7c147b6d-75bb-4981-8ab3-879a1c8e5fa7/lib/python3.8/site-packages/sqlalchemy/sql/schema.py in create_all(self, bind, tables, checkfirst)
4915 if bind is None:
4916 bind = _bind_or_error(self)
-> 4917 bind._run_ddl_visitor(
4918 ddl.SchemaGenerator, self, checkfirst=checkfirst, tables=tables
4919 )
/local_disk0/pythonVirtualEnvDirs/virtualEnv-7c147b6d-75bb-4981-8ab3-879a1c8e5fa7/lib/python3.8/site-packages/sqlalchemy/engine/base.py in _run_ddl_visitor(self, visitorcallable, element, **kwargs)
3225
3226 def _run_ddl_visitor(self, visitorcallable, element, **kwargs):
-> 3227 with self.begin() as conn:
3228 conn._run_ddl_visitor(visitorcallable, element, **kwargs)
3229
09-21-2022 03:12 PM
Hi @Frank Zhang , I have not seen someone try this yet. Why are you looking to write data to delta using ORM? Is there some documentation/blog you are following to do this?
Typically you could just use df.write.saveAsTable to save data to delta.
09-22-2022 03:30 PM
Below are some examples. There is a library also for that.
https://github.com/dropbox/PyHive/issues/305
https://dev.to/fruiza/build-a-python-flask-api-for-your-delta-lake-3bgf
I guess we are open to other ways. This is only related with the ingestion data side. We have some transactions that we want to read in. Using ORM can make sure other existing app and code can work with it.
09-23-2022 10:45 AM
@Frank Zhang - I see that makes sense. So you are not coding in a Databricks notebook correct? You have a Flask application you developed that is running against Databricks cluster/warehouse.
1 - If I understand correctly (as stated above). Do you have the driver installed? I think this would likely require you to use a Simba Driver to create the connection. I think this would be the only way to use ORM.
2 - If you don't want to a driver you can use the Python Connector for Databricks. Check out this blog I wrote that connects a flask application. Note that I don't think ORM would work here but you may be able to find a way to do so.
Is that helpful?
09-23-2022 02:16 PM
Hi Ryan
Yes the item 2 Python SQL Connector is the first thing I tried before I was asked to explore on the ORM option.
So if the SQL connector works, I still need to install the Simba driver? Do I need to install this driver in my local server where I run flask or do I need to install this in Databrick cluster?
I thought since SQL connector can work, maybe I do not need another driver.
09-23-2022 04:39 PM
I tried to install the JDBC driver on MAC OS. I am using Python code. Can I still use this driver? Is there a doc on how to install it on Mac?
“DatabricksJDBC42.jar” cannot be opened because it is from an unidentified developer.
09-26-2022 08:42 AM
Hi @Frank Zhang ,
Please disregard the driver comment.
The Python SQL Connector requires no driver. Just a pip install and you are good to go.
The links you provided don't actually show a working example of using SQL Alchemy's ORM to connect to Databricks. It looks like someone created a library to do this that is not part of the SQLAlchemy project to solve this problem.
While I am not certain I do not believe you can use SQLAlchemy ORM to read the data. My recommendation would be to use the Python SQL Connector.
When you are reading the data, where are you reading it from? Application or Databricks?
09-26-2022 07:28 PM
09-27-2022 06:21 AM
Do you have an example of Redash+SQLAlchemy ORM? It could be possible I just can't find an example of it to take a look.
A few follow up questions:
09-26-2022 07:31 PM
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group