SQLAlchemy ORM Connection String Error
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-21-2022 12:55 PM
We tried to insert records to Delta table using ORM. It looks like only SQLAlchemy has option to connect to Delta table.
We tried the following code
from sqlalchemy import Column, String, DateTime, Integer, create_engine
engine = create_engine("databricks+pyhive://token:<mytoken>@dbc-5a522242-184b.cloud.databricks.com:443/default", connect_args={"http_path": "/sql/1.0/endpoints/955e53e5f53e8105"}, echo=True)
engine = create_engine("databricks+pyhive://token:<mytoken> @dbc-5a522242-184b.cloud.databricks.com:443/default", connect_args={"cluster": "Starter Warehouse"}, echo=True)
Base.metadata.create_all(engine)
But there is errors
EOFError Traceback (most recent call last)
<command-1061455742937075> in <module>
1 # 3 - create table in db (migration)
2 print(BASE_DIR)
----> 3 Base.metadata.create_all(engine)
/local_disk0/pythonVirtualEnvDirs/virtualEnv-7c147b6d-75bb-4981-8ab3-879a1c8e5fa7/lib/python3.8/site-packages/sqlalchemy/sql/schema.py in create_all(self, bind, tables, checkfirst)
4915 if bind is None:
4916 bind = _bind_or_error(self)
-> 4917 bind._run_ddl_visitor(
4918 ddl.SchemaGenerator, self, checkfirst=checkfirst, tables=tables
4919 )
/local_disk0/pythonVirtualEnvDirs/virtualEnv-7c147b6d-75bb-4981-8ab3-879a1c8e5fa7/lib/python3.8/site-packages/sqlalchemy/engine/base.py in _run_ddl_visitor(self, visitorcallable, element, **kwargs)
3225
3226 def _run_ddl_visitor(self, visitorcallable, element, **kwargs):
-> 3227 with self.begin() as conn:
3228 conn._run_ddl_visitor(visitorcallable, element, **kwargs)
3229
- Labels:
-
Delta
-
SQLAlchemy ORM Connection
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-21-2022 03:12 PM
Hi @Frank Zhang , I have not seen someone try this yet. Why are you looking to write data to delta using ORM? Is there some documentation/blog you are following to do this?
Typically you could just use df.write.saveAsTable to save data to delta.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-22-2022 03:30 PM
Below are some examples. There is a library also for that.
https://github.com/dropbox/PyHive/issues/305
https://dev.to/fruiza/build-a-python-flask-api-for-your-delta-lake-3bgf
I guess we are open to other ways. This is only related with the ingestion data side. We have some transactions that we want to read in. Using ORM can make sure other existing app and code can work with it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-23-2022 10:45 AM
@Frank Zhang - I see that makes sense. So you are not coding in a Databricks notebook correct? You have a Flask application you developed that is running against Databricks cluster/warehouse.
1 - If I understand correctly (as stated above). Do you have the driver installed? I think this would likely require you to use a Simba Driver to create the connection. I think this would be the only way to use ORM.
2 - If you don't want to a driver you can use the Python Connector for Databricks. Check out this blog I wrote that connects a flask application. Note that I don't think ORM would work here but you may be able to find a way to do so.
Is that helpful?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-23-2022 02:16 PM
Hi Ryan
Yes the item 2 Python SQL Connector is the first thing I tried before I was asked to explore on the ORM option.
So if the SQL connector works, I still need to install the Simba driver? Do I need to install this driver in my local server where I run flask or do I need to install this in Databrick cluster?
I thought since SQL connector can work, maybe I do not need another driver.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-23-2022 04:39 PM
I tried to install the JDBC driver on MAC OS. I am using Python code. Can I still use this driver? Is there a doc on how to install it on Mac?
“DatabricksJDBC42.jar” cannot be opened because it is from an unidentified developer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-26-2022 08:42 AM
Hi @Frank Zhang ,
Please disregard the driver comment.
The Python SQL Connector requires no driver. Just a pip install and you are good to go.
The links you provided don't actually show a working example of using SQL Alchemy's ORM to connect to Databricks. It looks like someone created a library to do this that is not part of the SQLAlchemy project to solve this problem.
While I am not certain I do not believe you can use SQLAlchemy ORM to read the data. My recommendation would be to use the Python SQL Connector.
When you are reading the data, where are you reading it from? Application or Databricks?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-26-2022 07:28 PM
How come the Redash can use Flask and SQLAlchemy ORM for Delta Lake?
We can use SQL for now but it will be good to know which way is better? The SQL connector is super slow for some reason.
1. Is there other data ingestion method you recommend for fast and transactional data?
2. Why Redash is using ORM? Is it faster than SQL?
3. Currently in query analyzer, each query is planning time is 2S why the planning time so long with SQL connector?
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-27-2022 06:21 AM
Do you have an example of Redash+SQLAlchemy ORM? It could be possible I just can't find an example of it to take a look.
A few follow up questions:
- How much data are you reading?
- What size SQL warehouse are you using?
- How long is it taking?
- Why are you using Redash instead of Databricks SQL?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-26-2022 07:31 PM
When you are reading the data, where are you reading it from? Application or Databricks?
We got the data from machine learning machines. They will generate the metrics at a constant rate.
We need to put those data somewhere, for example Delta lake.
The front end dashboard will read the data. It could be some aggregated function on the raw data, for example min/max/average of the raw data. We will not use Databrick to read the data.