cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

118004
by New Contributor II
  • 2279 Views
  • 1 replies
  • 2 kudos

Resolved! Installing pdpbox plugin on cluster

Hello,We are having issues installing the pdpbox library on a fresh cluster. This includes trying to upload and install a whl file, or using pip in a workbook. I have attached an example of an error received. Can anybody assist with installing the...

  • 2279 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

PDPbox is updated rarely, and it requires older versions of matplotlib (3.1.1)https://github.com/SauceCat/PDPboxIt tries to install but fails because matplotlib requires pkgconfig.The solution to that is to use Machine Learning runtime. There it will...

  • 2 kudos
PSY
by New Contributor III
  • 5708 Views
  • 5 replies
  • 2 kudos

Resolved! Updating git token fails

When updating an expired Azure DevOps personal access token (PAT) for git integration, I get the error message "Failed to save. Please try again.". The error persists with different tokens. Previously (months ago), updating the token did not result i...

Screenshot 2022-07-19 at 13.39.56
  • 5708 Views
  • 5 replies
  • 2 kudos
Latest Reply
Atanu
Databricks Employee
  • 2 kudos

Is this happening for all users @Pencho Yordanov​ 

  • 2 kudos
4 More Replies
al_joe
by Contributor
  • 6661 Views
  • 3 replies
  • 6 kudos

Resolved! Can I use Databricks CLI with community edition?

I installed the CLI but unable to configure it to connect to my instance -- as I am unable to find the "Generate Access tokens" option under User Settings page.Documentation does not say whether this feature is disabled for community edition.

  • 6661 Views
  • 3 replies
  • 6 kudos
Latest Reply
Prabakar
Databricks Employee
  • 6 kudos

hi @Al Jo​ we understand your interest in learning Databricks. However, the community edition is limited in features. Certain features are available only in the paid version. If you are interested, to use the full features, then I would suggest you g...

  • 6 kudos
2 More Replies
Ryan512
by New Contributor III
  • 1779 Views
  • 2 replies
  • 2 kudos

Autoloader (GCP) Custom PubSub Queue

I want to know if what I describe below is possible with AutoLoader in the Google Cloud Platform.Problem Description:We have GCS buckets for every client/account. Inside these buckets is a path/blob for each client's instances of our platform. A clie...

  • 1779 Views
  • 2 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 2 kudos

Hello @Ryan Ebanks​ Please let us know if more help is needed on this.

  • 2 kudos
1 More Replies
laus
by New Contributor III
  • 9723 Views
  • 6 replies
  • 3 kudos

Resolved! How to load a json file in pyspark with colon character in file name

Hi,I'm trying to load this json file which contains the colon character in its name: file_name.2022-03-05_11:30:00.json but I get the error in screenshot below saying that there is a relative path in an absolute url - Any idea how to read this file...

image
  • 9723 Views
  • 6 replies
  • 3 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 3 kudos

Hi @Laura Blancarte​ I hope that @Pearl Ubaru​'s answer would have helped you in resolving your issue.Please let us know if you need more help on this.

  • 3 kudos
5 More Replies
AP
by New Contributor III
  • 2985 Views
  • 2 replies
  • 2 kudos

How can we connect to the databricks managed metastore

Hi, I am trying to take advantage of the treasure trove of the information that metastore contains and take some actions to improve performance. In my case, the metastore is managed by databricks, we don't use external metastore.How can I connect to ...

  • 2985 Views
  • 2 replies
  • 2 kudos
Latest Reply
Prabakar
Databricks Employee
  • 2 kudos

@AKSHAY PALLERLA​ to get the jdbc/odbc information you can get it from the cluster configuration. In the cluster configuration page, under advanced options, you have JDBC/ODBC tab. Click on that tab and it should give you the details you are looking ...

  • 2 kudos
1 More Replies
ThomasKastl
by Contributor
  • 6509 Views
  • 6 replies
  • 5 kudos

Resolved! Databricks runs cell, but stops output and hangs afterwards.

tl;dr: A cell that executes purely on the head node stops printed output during execution, but output still shows up in the cluster logs. After execution of the cell, Databricks does not notice the cell is finished and gets stuck. When trying to canc...

  • 6509 Views
  • 6 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

As that library work on pandas problem can be that it doesn't support pandas on spark. On the local version, you probably use non-distributed pandas. You can check behavior by switching between:import pandas as pd import pyspark.pandas as pd

  • 5 kudos
5 More Replies
165036
by New Contributor III
  • 2522 Views
  • 1 replies
  • 1 kudos

Resolved! Mounting of S3 bucket via Terraform is frequently timing out

Summary of the problemWhen mounting an S3 bucket via Terraform the creation process is frequently timing out (running beyond 10 minutes). When I check the Log4j logs in the GP cluster I see the following error message repeated:```22/07/26 05:54:43 ER...

  • 2522 Views
  • 1 replies
  • 1 kudos
Latest Reply
165036
New Contributor III
  • 1 kudos

Solved. See here: https://github.com/databricks/terraform-provider-databricks/issues/1500

  • 1 kudos
dataAllMyLife
by New Contributor
  • 1385 Views
  • 1 replies
  • 0 kudos

JDBC Connection closes between 'stmt.execute( ... ) and stmt.executeQuery( ... )

I'm running a Java application that registers a CSV table with HIVE and then checks the number of rows imported. Its done in several steps.:Statement stmt = con.createStatement();....stmt.execute( "CREATE TABLE ( <definition> < > );.....ResultSet rs...

  • 1385 Views
  • 1 replies
  • 0 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 0 kudos

@Reto Matter​  Are you running a jar job or using dbconnect to run java code? Please provide how are you trying to make a connection and full exception stack trace.

  • 0 kudos
624398
by New Contributor III
  • 2766 Views
  • 3 replies
  • 2 kudos

Resolved! Making py connector to raise an error for wrong SQL when asking to plan a query

Hey all,My aim is to validate a given SQL string without actually running it.I thought I could use the `EXPLAIN` statement to do so.So I tried using the `databricks-sql-connector` for python to explain a query, and so determine whether it's valid or ...

  • 2766 Views
  • 3 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Please try in pyspark:try: spark.sql("SELECT BAD-QUERY AS FOO")._jdf.queryExecution().toString() except: print("incorrect query")or just:try: spark.sql("SELECT BAD-QUERY AS FOO").explain() except: print("incorrect query")

  • 2 kudos
2 More Replies
jayallenmn
by New Contributor III
  • 3007 Views
  • 4 replies
  • 3 kudos

Resolved! Couple of Delta Lake questions

Hey guys,We're considering Delta Lake as the storage for our project and have a couple questions. The first one is what's the pricing for Delta Lake - can't seem to find a page that says x amount costs y.The second question is more technical - if we...

  • 3007 Views
  • 4 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

delta lake itself is free. It is a file format. But you will have to pay for storage and compute of course.If you want to use Databricks with delta lake, it will not be free unless you use the community edition.Depending on what you are planning to...

  • 3 kudos
3 More Replies
Daps022
by New Contributor
  • 2417 Views
  • 3 replies
  • 1 kudos
  • 2417 Views
  • 3 replies
  • 1 kudos
Latest Reply
Rheiman
Contributor II
  • 1 kudos

Try looking into the structured streaming API. There you will learn about how to join streams and static data, how to set triggers for the streams, minibatching and other things that are important to the reliability of your application.Structured Str...

  • 1 kudos
2 More Replies
Yagao
by New Contributor
  • 5118 Views
  • 5 replies
  • 2 kudos

How to do python within sql query in Databricks ?

Can anyone show me one use case how to do python within sql query ?

  • 5118 Views
  • 5 replies
  • 2 kudos
Latest Reply
tomasz
Databricks Employee
  • 2 kudos

To run Python within a SQL query you have to first define a Python function and then register it as a UDF. Once that is done you are able to call that UDF within a SQL query. Please take a look at this documentation here:https://docs.databricks.com/s...

  • 2 kudos
4 More Replies
Komal7
by New Contributor
  • 1164 Views
  • 2 replies
  • 0 kudos
  • 1164 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Komal Gyanani​,AQE was a major improvement added to Spark 3.0. It was added since Databricks runtime 7.3 LT (Spark 3.0) https://docs.databricks.com/release-notes/runtime/releases.html and here is docs on AQE https://docs.databricks.com/spark/late...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels