cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

VovaVili
by New Contributor II
  • 1615 Views
  • 3 replies
  • 0 kudos

Databricks Runtime 13.3 - can I use Databricks Connect without Unity Catalog?

Hello all,The official documentation for Databricks Connect states that, for Databricks Runtime versions 13.0 and above, my cluster needs to have Unity Catalog enabled for me to use Databricks Connect, and use a Databricks cluster through an IDE like...

  • 1615 Views
  • 3 replies
  • 0 kudos
Latest Reply
mohaimen_syed
New Contributor III
  • 0 kudos

Hi, I'm currently using Databricks Connect without the Unity Catalog on VS Code. Although I have connected the Unity Catalog separately on multiple occasion I don't thing its required.Here is the doc:https://docs.databricks.com/en/dev-tools/databrick...

  • 0 kudos
2 More Replies
amartinez
by New Contributor III
  • 3941 Views
  • 6 replies
  • 5 kudos

Workaround for GraphFrames not working on Delta Live Table?

According to this page, the GraphFrames package is included in the databricks runtime since at least 11.0. However trying to run a connected components algorithm inside a delta live table notebook yields the error java.lang.ClassNotFoundException: or...

  • 3941 Views
  • 6 replies
  • 5 kudos
Latest Reply
lprevost
Contributor
  • 5 kudos

I'm also trying to use GraphFrames inside a DLT pipeline.   I get an error that graphframes not installed in the cluster.   i"m using it successfully in test notebooks using the ML version of the cluster.  Is there a way to use this inside a DLT job?

  • 5 kudos
5 More Replies
Maulik
by New Contributor
  • 297 Views
  • 1 replies
  • 0 kudos

how to set call back for Databricks Statement Execution SQL API Query?

I m using https://docs.databricks.com/api/workspace/statementexecution. using long running queries.my wait time is zero. queries might take 1 hour and I don't want to do pooling https://docs.databricks.com/api/workspace/statementexecution/getstatemen...

  • 297 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Maulik, When dealing with long-running queries in Databricks, there are a few strategies you can consider to optimize performance and avoid pooling. Let’s explore some options: Query Optimization: Check the query plan to identify any ineffic...

  • 0 kudos
camilo_s
by Contributor
  • 323 Views
  • 1 replies
  • 1 kudos

Parametrizing query for DEEP CLONE

Update: Hey moderator, I've removed the link to the Bobby tables XKCD to reassure that this post is not spam Hi, I'm somehow unable to write a parametrized query to create a DEEP CLONE. I'm trying really hard to avoid using string interpolation (to p...

  • 323 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @camilo_s, I appreciate your efforts to avoid string interpolation and prevent SQL injection. The approach you’ve taken using the IDENTIFIER clause with named parameter markers is indeed the right direction. However, there’s a subtle issue in your...

  • 1 kudos
anni
by New Contributor II
  • 1174 Views
  • 2 replies
  • 0 kudos

Classroom setup Error

 Encountering error in running classroom setup command. Help me to resolve this issue. Thank you.

Screenshot_20240628-033819_Chrome.jpg
  • 1174 Views
  • 2 replies
  • 0 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 0 kudos

The error happens in the classroom-setup notebook you're running. It is not possible to debug with the information given. 

  • 0 kudos
1 More Replies
Sadam97
by New Contributor
  • 317 Views
  • 1 replies
  • 0 kudos

Databricks (GCP) Cluster not resolving Hostname into IP address

we have #mongodb hosts that must be resolved to private internal loadbalancer ips ( of another cluster ), and that we are unable to add host aliases in the Databricks GKE cluster in order for the spark to be able to connect to a mongodb and resolve t...

  • 317 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Sadam97, Check if your DNS server is responding. You can do this by running a ping command from a notebook in Databricks to reach your secondary DNS server.Edit the /etc/resolv.conf file on the cluster and update the nameserver value with a wo...

  • 0 kudos
feliximmanuel
by New Contributor
  • 426 Views
  • 1 replies
  • 0 kudos

Error: oidc: fetch .well-known: Get "https://%E2%80%93host/oidc/.well-known/oauth-authorization-serv

I'm trying to authenticate databricks using WSL but suddenly getting this error./databricks-asset-bundle$ databricks auth login –host https://<XXXXXXXXX>.12.azuredatabricks.netDatabricks Profile Name:<XXXXXXXXX>Error: oidc: fetch .well-known: Get "ht...

  • 426 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @feliximmanuel, The error message you’re seeing indicates a problem with SSL certificate verification. To resolve this, follow these steps: In your Databricks configuration file (usually located at ~/.databrickscfg), add the following line to ...

  • 0 kudos
greyfine
by New Contributor II
  • 9216 Views
  • 8 replies
  • 7 kudos

Resolved! Hi Everyone , I was wondering if it is possible to have alerts set up on query level for pyspark notebooks that are run on schedule in databricks so if we have some expected result from it we can receive a mail alert ?

In Above you can see we have 3 workspaces - we have the alert option available in the sql workspace but not in our data science and engineering space , anyway we can incorporate this in our DS and Engineering space ?

image.png
  • 9216 Views
  • 8 replies
  • 7 kudos
Latest Reply
JKR
Contributor
  • 7 kudos

How can I receive call on teams/number/slack if any jobs fails?

  • 7 kudos
7 More Replies
ndatabricksuser
by New Contributor
  • 1618 Views
  • 3 replies
  • 3 kudos

Vacuum and Streaming Issue

Hi User Community,Requesting some advice on the below issue please:I have 4 Databricks notebooks, 1 That ingests data from a Kafka topic (metric data from many servers) and dumps the data in parquet format into a specified location. My 2nd data brick...

Data Engineering
Delta Lake
optimize
spark
structured streaming
vacuum
  • 1618 Views
  • 3 replies
  • 3 kudos
Latest Reply
mroy
Contributor
  • 3 kudos

Vacuuming is also a lot faster with inventory tables!

  • 3 kudos
2 More Replies
nolanlavender00
by New Contributor
  • 4704 Views
  • 4 replies
  • 1 kudos

Resolved! How to stop a Streaming Job based on time of the week

I have an always-on job cluster triggering Spark Streaming jobs. I would like to stop this streaming job once a week to run table maintenance. I was looking to leverage the foreachBatch function to check a condition and stop the job accordingly.

  • 4704 Views
  • 4 replies
  • 1 kudos
Latest Reply
mroy
Contributor
  • 1 kudos

You could also use the "Available-now micro-batch" trigger. It only processes one batch at a time, and you can do whatever you want in between batches (sleep, shut down, vacuum, etc.)

  • 1 kudos
3 More Replies
aranjan99
by New Contributor III
  • 1852 Views
  • 5 replies
  • 3 kudos

system.billing.usage table missing data for jobs running in my databricks account

I have some jobs running on databricks. I can obtain their jobId from the Jobs UI or List Job Runs API.However when trying to get DBU usage for the corresponding jobs from system.billing.usage, I do not see the same job_id in that table. Its been mor...

  • 1852 Views
  • 5 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi, @aranjan99. Apologies for the delayed response. If you’re not seeing job IDs from the UI or API in the billing table, it’s possible that the job run IDs are not being populated for long-running jobs.To address this, consider restarting the comput...

  • 3 kudos
4 More Replies
aranjan99
by New Contributor III
  • 1684 Views
  • 4 replies
  • 1 kudos

system.access.table_lineage table missing data

I am using the system.access.table_lineage table  to figure out the tables accessed by sql queries and the corresponding SQL queries. However I am noticing this table missing data or values very often.For eg for sql queries executed by our DBT jobs, ...

  • 1684 Views
  • 4 replies
  • 1 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 1 kudos

Is all your ETL querying/referencing the full table name (i.e. catalog.schema.table)? If you query delta files for example, metadata for data lineage will not be captured. 

  • 1 kudos
3 More Replies
dm7
by New Contributor II
  • 2421 Views
  • 2 replies
  • 0 kudos

Resolved! Unit Testing DLT Pipelines

Now we are moving our DLT Pipelines into production, we would like to start looking at unit testing the transformation logic inside DLT notebooks.We want to know how we can unit test the PySpark logic/transformations independently without having to s...

  • 2421 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @dm7,  Instead of embedding all your transformation logic directly in the DLT notebook, create separate Python modules (files) for your transformations.This allows you to interactively test transformations from notebooks and write unit tests speci...

  • 0 kudos
1 More Replies
WWoman
by New Contributor III
  • 1850 Views
  • 2 replies
  • 0 kudos

Resolved! Persisting query history data

Hello,I am looking for a way to persist query history data. I have not have direct access to the system tables. I do have access to a query_history view created by selecting from the system.query.history and system.access.audit system tables. I want ...

  • 1850 Views
  • 2 replies
  • 0 kudos
Latest Reply
syed_sr7
New Contributor II
  • 0 kudos

Is any system table there for query history?

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels