cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

SQLAlchemy ORM Connection String Error

Frank
New Contributor III

We tried to insert records to Delta table using ORM. It looks like only SQLAlchemy has option to connect to Delta table.

We tried the following code

from sqlalchemy import Column, String, DateTime, Integer, create_engine 
 
engine = create_engine("databricks+pyhive://token:<mytoken>@dbc-5a522242-184b.cloud.databricks.com:443/default", connect_args={"http_path": "/sql/1.0/endpoints/955e53e5f53e8105"}, echo=True)
 
 
engine = create_engine("databricks+pyhive://token:<mytoken> @dbc-5a522242-184b.cloud.databricks.com:443/default", connect_args={"cluster": "Starter Warehouse"}, echo=True)
 
 
Base.metadata.create_all(engine)

But there is errors

EOFError                                  Traceback (most recent call last)
<command-1061455742937075> in <module>
      1 # 3 - create table in db (migration)
      2 print(BASE_DIR)
----> 3 Base.metadata.create_all(engine)
 
/local_disk0/pythonVirtualEnvDirs/virtualEnv-7c147b6d-75bb-4981-8ab3-879a1c8e5fa7/lib/python3.8/site-packages/sqlalchemy/sql/schema.py in create_all(self, bind, tables, checkfirst)
   4915         if bind is None:
   4916             bind = _bind_or_error(self)
-> 4917         bind._run_ddl_visitor(
   4918             ddl.SchemaGenerator, self, checkfirst=checkfirst, tables=tables
   4919         )
 
/local_disk0/pythonVirtualEnvDirs/virtualEnv-7c147b6d-75bb-4981-8ab3-879a1c8e5fa7/lib/python3.8/site-packages/sqlalchemy/engine/base.py in _run_ddl_visitor(self, visitorcallable, element, **kwargs)
   3225 
   3226     def _run_ddl_visitor(self, visitorcallable, element, **kwargs):
-> 3227         with self.begin() as conn:
   3228             conn._run_ddl_visitor(visitorcallable, element, **kwargs)
   3229 

9 REPLIES 9

Ryan_Chynoweth
Esteemed Contributor

Hi @Frank Zhangโ€‹ , I have not seen someone try this yet. Why are you looking to write data to delta using ORM? Is there some documentation/blog you are following to do this?

Typically you could just use df.write.saveAsTable to save data to delta.

Frank
New Contributor III

Below are some examples. There is a library also for that.

https://github.com/dropbox/PyHive/issues/305

https://dev.to/fruiza/build-a-python-flask-api-for-your-delta-lake-3bgf

I guess we are open to other ways. This is only related with the ingestion data side. We have some transactions that we want to read in. Using ORM can make sure other existing app and code can work with it.

Ryan_Chynoweth
Esteemed Contributor

@Frank Zhangโ€‹ - I see that makes sense. So you are not coding in a Databricks notebook correct? You have a Flask application you developed that is running against Databricks cluster/warehouse.

1 - If I understand correctly (as stated above). Do you have the driver installed? I think this would likely require you to use a Simba Driver to create the connection. I think this would be the only way to use ORM.

2 - If you don't want to a driver you can use the Python Connector for Databricks. Check out this blog I wrote that connects a flask application. Note that I don't think ORM would work here but you may be able to find a way to do so.

Is that helpful?

Frank
New Contributor III

Hi Ryan

Yes the item 2 Python SQL Connector is the first thing I tried before I was asked to explore on the ORM option.

So if the SQL connector works, I still need to install the Simba driver? Do I need to install this driver in my local server where I run flask or do I need to install this in Databrick cluster?

I thought since SQL connector can work, maybe I do not need another driver.

Frank
New Contributor III

I tried to install the JDBC driver on MAC OS. I am using Python code. Can I still use this driver? Is there a doc on how to install it on Mac?

โ€œDatabricksJDBC42.jarโ€ cannot be opened because it is from an unidentified developer.

Ryan_Chynoweth
Esteemed Contributor

Hi @Frank Zhangโ€‹ ,

Please disregard the driver comment.

The Python SQL Connector requires no driver. Just a pip install and you are good to go.

The links you provided don't actually show a working example of using SQL Alchemy's ORM to connect to Databricks. It looks like someone created a library to do this that is not part of the SQLAlchemy project to solve this problem.

While I am not certain I do not believe you can use SQLAlchemy ORM to read the data. My recommendation would be to use the Python SQL Connector.

When you are reading the data, where are you reading it from? Application or Databricks?

Frank
New Contributor III
Hi Ryan
How come the Redash can use Flask and SQLAlchemy ORM for Delta Lake?
We can use SQL for now but it will be good to know which way is better? The SQL connector is super slow for some reason.
1. Is there other data ingestion method you recommend for fast and transactional data?
2. Why Redash is using ORM? Is it faster than SQL?
3. Currently in query analyzer, each query is planning time is 2S why the planning time so long with SQL connector?
Thanks,

Ryan_Chynoweth
Esteemed Contributor

Do you have an example of Redash+SQLAlchemy ORM? It could be possible I just can't find an example of it to take a look.

A few follow up questions:

  • How much data are you reading?
  • What size SQL warehouse are you using?
  • How long is it taking?
  • Why are you using Redash instead of Databricks SQL?

Frank
New Contributor III
Thanks Ryan for your help and inputs.
When you are reading the data, where are you reading it from? Application or Databricks?
We got the data from machine learning machines. They will generate the metrics at a constant rate.
We need to put those data somewhere, for example Delta lake.
The front end dashboard will read the data. It could be some aggregated function on the raw data, for example min/max/average of the raw data. We will not use Databrick to read the data.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group