Data Engineering

Forum Posts

Sorted by:

by AP • New Contributor III

08-01-2022 3:02:58 PM

1352 Views
2 replies
2 kudos

How can we connect to the databricks managed metastore

Hi, I am trying to take advantage of the treasure trove of the information that metastore contains and take some actions to improve performance. In my case, the metastore is managed by databricks, we don't use external metastore.How can I connect to ...

Data Engineering

1352 Views
2 replies
2 kudos

08-01-2022 3:02:58 PM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

08-01-2022 3:25:27 PM

2 kudos

@AKSHAY PALLERLA to get the jdbc/odbc information you can get it from the cluster configuration. In the cluster configuration page, under advanced options, you have JDBC/ODBC tab. Click on that tab and it should give you the details you are looking ...

2 kudos

08-01-2022 3:25:27 PM

1 More Replies

by ThomasKastl • Contributor

07-26-2022 12:47:18 AM

2341 Views
6 replies
5 kudos

Resolved! Databricks runs cell, but stops output and hangs afterwards.

tl;dr: A cell that executes purely on the head node stops printed output during execution, but output still shows up in the cluster logs. After execution of the cell, Databricks does not notice the cell is finished and gets stuck. When trying to canc...

Data Engineering

2341 Views
6 replies
5 kudos

07-26-2022 12:47:18 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

07-26-2022 6:00:44 AM

5 kudos

As that library work on pandas problem can be that it doesn't support pandas on spark. On the local version, you probably use non-distributed pandas. You can check behavior by switching between:import pandas as pd import pyspark.pandas as pd

5 kudos

07-26-2022 6:00:44 AM

5 More Replies

by 165036 • New Contributor III

07-25-2022 11:09:31 PM

1001 Views
1 replies
1 kudos

Resolved! Mounting of S3 bucket via Terraform is frequently timing out

Summary of the problemWhen mounting an S3 bucket via Terraform the creation process is frequently timing out (running beyond 10 minutes). When I check the Log4j logs in the GP cluster I see the following error message repeated:```22/07/26 05:54:43 ER...

Data Engineering

1001 Views
1 replies
1 kudos

07-25-2022 11:09:31 PM

View Replies

Latest Reply

165036
New Contributor III

07-31-2022 8:44:28 PM

1 kudos

Solved. See here: https://github.com/databricks/terraform-provider-databricks/issues/1500

1 kudos

07-31-2022 8:44:28 PM

by dataAllMyLife • New Contributor

04-27-2022 1:16:20 PM

655 Views
1 replies
0 kudos

JDBC Connection closes between 'stmt.execute( ... ) and stmt.executeQuery( ... )

I'm running a Java application that registers a CSV table with HIVE and then checks the number of rows imported. Its done in several steps.:Statement stmt = con.createStatement();....stmt.execute( "CREATE TABLE ( <definition> < > );.....ResultSet rs...

Data Engineering

655 Views
1 replies
0 kudos

04-27-2022 1:16:20 PM

View Replies

Latest Reply

Noopur_Nigam
Valued Contributor II

07-31-2022 2:31:21 AM

0 kudos

@Reto Matter Are you running a jar job or using dbconnect to run java code? Please provide how are you trying to make a connection and full exception stack trace.

0 kudos

07-31-2022 2:31:21 AM

by 624398 • New Contributor III

06-22-2022 3:17:50 AM

1149 Views
4 replies
2 kudos

Resolved! Making py connector to raise an error for wrong SQL when asking to plan a query

Hey all,My aim is to validate a given SQL string without actually running it.I thought I could use the `EXPLAIN` statement to do so.So I tried using the `databricks-sql-connector` for python to explain a query, and so determine whether it's valid or ...

Data Engineering

1149 Views
4 replies
2 kudos

06-22-2022 3:17:50 AM

View Replies

Latest Reply

Kaniz
Community Manager

06-27-2022 9:33:03 AM

2 kudos

Hi @Nativ Issac, We haven’t heard from you on the last response from @Hubert Dudek , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpful to o...

2 kudos

06-27-2022 9:33:03 AM

3 More Replies

by jayallenmn • New Contributor III

07-26-2022 8:41:27 AM

926 Views
4 replies
3 kudos

Resolved! Couple of Delta Lake questions

Hey guys,We're considering Delta Lake as the storage for our project and have a couple questions. The first one is what's the pricing for Delta Lake - can't seem to find a page that says x amount costs y.The second question is more technical - if we...

Data Engineering

926 Views
4 replies
3 kudos

07-26-2022 8:41:27 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

07-27-2022 4:03:02 AM

3 kudos

delta lake itself is free. It is a file format. But you will have to pay for storage and compute of course.If you want to use Databricks with delta lake, it will not be free unless you use the community edition.Depending on what you are planning to...

3 kudos

07-27-2022 4:03:02 AM

3 More Replies

by karthikM • New Contributor

06-29-2022 3:06:53 PM

956 Views
3 replies
1 kudos

Delta Live Tables

is DLT supported for Scala? Any reference implementations or wikis to get started?

Data Engineering

956 Views
3 replies
1 kudos

06-29-2022 3:06:53 PM

View Replies

Latest Reply

Kaniz
Community Manager

07-04-2022 2:00:50 AM

1 kudos

Hi @Karthik Munipalle, Delta Live Tables queries can be implemented in Python or SQL.Here are few articles best explaining about DLT. Please have a look.https://docs.databricks.com/data-engineering/delta-live-tables/index.htmlhttps://databricks.com/...

1 kudos

07-04-2022 2:00:50 AM

2 More Replies

by Daps022 • New Contributor

06-28-2022 6:10:05 PM

1046 Views
4 replies
2 kudos

How Streaming pieline handle by delta lake house

Data Engineering

1046 Views
4 replies
2 kudos

06-28-2022 6:10:05 PM

View Replies

Latest Reply

Kaniz
Community Manager

06-30-2022 6:57:12 AM

2 kudos

Hi @Dhaval Patel, We haven’t heard from you on the last response from @Ralph David Lagos , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to o...

2 kudos

06-30-2022 6:57:12 AM

3 More Replies

by Yagao • New Contributor

06-28-2022 3:52:09 PM

1517 Views
5 replies
2 kudos

How to do python within sql query in Databricks ?

Can anyone show me one use case how to do python within sql query ?

Data Engineering

1517 Views
5 replies
2 kudos

06-28-2022 3:52:09 PM

View Replies

Latest Reply

tomasz
New Contributor III

07-12-2022 3:20:58 PM

2 kudos

To run Python within a SQL query you have to first define a Python function and then register it as a UDF. Once that is done you are able to call that UDF within a SQL query. Please take a look at this documentation here:https://docs.databricks.com/s...

2 kudos

07-12-2022 3:20:58 PM

4 More Replies

by Komal7 • New Contributor

06-28-2022 3:37:17 PM

517 Views
2 replies
0 kudos

When is AQE becoming GA? Is it coming in older spark versions?

Data Engineering

517 Views
2 replies
0 kudos

06-28-2022 3:37:17 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

07-05-2022 11:31:07 AM

0 kudos

Hi @Komal Gyanani,AQE was a major improvement added to Spark 3.0. It was added since Databricks runtime 7.3 LT (Spark 3.0) https://docs.databricks.com/release-notes/runtime/releases.html and here is docs on AQE https://docs.databricks.com/spark/late...

0 kudos

07-05-2022 11:31:07 AM

1 More Replies

by Marcosan • New Contributor II

06-28-2022 2:42:55 PM

719 Views
3 replies
4 kudos

What’s the best way to pass dependency versions dynamically to a cluster

I am using init scripts and would like to be able to control the version of a component that we release internally and frequently. We are now manually updating a dbfs requirement.txt file but I think that this problem may have been encountered befor...

Data Engineering

719 Views
3 replies
4 kudos

06-28-2022 2:42:55 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

07-11-2022 8:24:49 AM

4 kudos

You can programmatically create cluster templates in JSON files and include config JSON files with libraries needed. Cluster deployment in that scenario needs to be controlled via API https://docs.databricks.com/dev-tools/api/latest/clusters.html

4 kudos

07-11-2022 8:24:49 AM

2 More Replies

by Edel • New Contributor II

06-28-2022 1:05:40 PM

619 Views
2 replies
2 kudos

Have you compared the performance between ADWC and Delta Lake for data warehousing?

Just want to know if you have a benchmark or some tests comparing Oracle ADWC vs Delta lake for data warehousing

Data Engineering

619 Views
2 replies
2 kudos

06-28-2022 1:05:40 PM

View Replies

Latest Reply

Kaniz
Community Manager

06-30-2022 3:29:38 AM

2 kudos

Hi @Edelweiss Kammermann, Check these article out for a comparison.https://www.trustradius.com/compare-products/databricks-lakehouse-platform-vs-oracle-autonomous-data-warehouse

2 kudos

06-30-2022 3:29:38 AM

1 More Replies

by HowardZ • New Contributor

06-28-2022 12:55:20 PM

1125 Views
2 replies
0 kudos

Resolved! How do I create an athena table (instead of hive table) in databricks?

My dashboard uses Athena as data source for its availability (I don't need to fire up the cluster and manually refresh the data), but it requires me to create the tables manually. Wondering if there is a similar method like the .saveAsTable() to crea...

Data Engineering

1125 Views
2 replies
0 kudos

06-28-2022 12:55:20 PM

View Replies

Latest Reply

Kaniz
Community Manager

06-30-2022 4:30:38 AM

0 kudos

Hi @Howard Zhang, Here's a fantastic article for your use case. Please have a read.

0 kudos

06-30-2022 4:30:38 AM

1 More Replies

by xsn • New Contributor II

06-28-2022 11:06:10 AM

644 Views
2 replies
2 kudos

How unity catalog run agnostic to the underlined system

Data Engineering

644 Views
2 replies
2 kudos

06-28-2022 11:06:10 AM

View Replies

Latest Reply

Kaniz
Community Manager

07-04-2022 3:45:17 AM

2 kudos

Hi @xiaosong n, Unity Catalog requires a Databricks account on the Premium plan. In this guide, learn all about it.

2 kudos

07-04-2022 3:45:17 AM

1 More Replies

by avinash_goje • New Contributor II

06-26-2022 7:04:51 PM

1613 Views
3 replies
2 kudos

How to send metrics from GCP Databricks to Grafana Cloud through Prometheus?

While connecting the Databricks and Grafana, I have gone through the following approach.Install Grafna Agent in Databrics Clusters from Databricks console --> Not working since the system is not booted with systemd as init systemSince Spark 3 has Pro...

Data Engineering

1613 Views
3 replies
2 kudos

06-26-2022 7:04:51 PM

View Replies

Latest Reply

Kaniz
Community Manager

06-28-2022 12:44:04 PM

2 kudos

Hi @Avinash Goje, We haven’t heard from you on the last response from @Hubert Dudek , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpful to o...

2 kudos

06-28-2022 12:44:04 PM

2 More Replies

User

Count

1602

736

343

284

247

Databricks

Forum Posts

How can we connect to the databricks managed metastore

Resolved! Databricks runs cell, but stops output and hangs afterwards.

Resolved! Mounting of S3 bucket via Terraform is frequently timing out

JDBC Connection closes between 'stmt.execute( ... ) and stmt.executeQuery( ... )

Resolved! Making py connector to raise an error for wrong SQL when asking to plan a query

Resolved! Couple of Delta Lake questions

Delta Live Tables

How Streaming pieline handle by delta lake house

How to do python within sql query in Databricks ?

When is AQE becoming GA? Is it coming in older spark versions?

What’s the best way to pass dependency versions dynamically to a cluster

Have you compared the performance between ADWC and Delta Lake for data warehousing?

Resolved! How do I create an athena table (instead of hive table) in databricks?

How unity catalog run agnostic to the underlined system

How to send metrics from GCP Databricks to Grafana Cloud through Prometheus?

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...