cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

anni
by New Contributor II
  • 1552 Views
  • 2 replies
  • 0 kudos

Classroom setup Error

 Encountering error in running classroom setup command. Help me to resolve this issue. Thank you.

Screenshot_20240628-033819_Chrome.jpg
  • 1552 Views
  • 2 replies
  • 0 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 0 kudos

The error happens in the classroom-setup notebook you're running. It is not possible to debug with the information given. 

  • 0 kudos
1 More Replies
ndatabricksuser
by New Contributor
  • 2660 Views
  • 2 replies
  • 1 kudos

Vacuum and Streaming Issue

Hi User Community,Requesting some advice on the below issue please:I have 4 Databricks notebooks, 1 That ingests data from a Kafka topic (metric data from many servers) and dumps the data in parquet format into a specified location. My 2nd data brick...

Data Engineering
Delta Lake
optimize
spark
structured streaming
vacuum
  • 2660 Views
  • 2 replies
  • 1 kudos
Latest Reply
mroy
Contributor
  • 1 kudos

Vacuuming is also a lot faster with inventory tables!

  • 1 kudos
1 More Replies
nolanlavender00
by New Contributor
  • 5844 Views
  • 2 replies
  • 1 kudos

Resolved! How to stop a Streaming Job based on time of the week

I have an always-on job cluster triggering Spark Streaming jobs. I would like to stop this streaming job once a week to run table maintenance. I was looking to leverage the foreachBatch function to check a condition and stop the job accordingly.

  • 5844 Views
  • 2 replies
  • 1 kudos
Latest Reply
mroy
Contributor
  • 1 kudos

You could also use the "Available-now micro-batch" trigger. It only processes one batch at a time, and you can do whatever you want in between batches (sleep, shut down, vacuum, etc.)

  • 1 kudos
1 More Replies
aranjan99
by Contributor
  • 2978 Views
  • 3 replies
  • 2 kudos

system.billing.usage table missing data for jobs running in my databricks account

I have some jobs running on databricks. I can obtain their jobId from the Jobs UI or List Job Runs API.However when trying to get DBU usage for the corresponding jobs from system.billing.usage, I do not see the same job_id in that table. Its been mor...

  • 2978 Views
  • 3 replies
  • 2 kudos
dm7
by New Contributor II
  • 3396 Views
  • 1 replies
  • 0 kudos

Unit Testing DLT Pipelines

Now we are moving our DLT Pipelines into production, we would like to start looking at unit testing the transformation logic inside DLT notebooks.We want to know how we can unit test the PySpark logic/transformations independently without having to s...

  • 3396 Views
  • 1 replies
  • 0 kudos
WWoman
by Contributor
  • 2666 Views
  • 2 replies
  • 0 kudos

Resolved! Persisting query history data

Hello,I am looking for a way to persist query history data. I have not have direct access to the system tables. I do have access to a query_history view created by selecting from the system.query.history and system.access.audit system tables. I want ...

  • 2666 Views
  • 2 replies
  • 0 kudos
Latest Reply
syed_sr7
New Contributor II
  • 0 kudos

Is any system table there for query history?

  • 0 kudos
1 More Replies
CarstenWeber
by New Contributor III
  • 11690 Views
  • 9 replies
  • 3 kudos

Resolved! Invalid configuration fs.azure.account.key trying to load ML Model with OAuth

Hi Community,i was trying to load a ML Model from a Azure Storageaccount (abfss://....) with: model = PipelineModel.load(path) i set the spark config:  spark.conf.set("fs.azure.account.auth.type", "OAuth") spark.conf.set("fs.azure.account.oauth.provi...

  • 11690 Views
  • 9 replies
  • 3 kudos
Latest Reply
chhavibansal
New Contributor III
  • 3 kudos

@daniel_sahal any possible reason you know of why it works in OSS spark while it does not work in databricks notebook ? Why is there a disparity.

  • 3 kudos
8 More Replies
camilo_s
by Contributor
  • 701 Views
  • 0 replies
  • 1 kudos

Parametrizing query for DEEP CLONE

Update: Hey moderator, I've removed the link to the Bobby tables XKCD to reassure that this post is not spam Hi, I'm somehow unable to write a parametrized query to create a DEEP CLONE. I'm trying really hard to avoid using string interpolation (to p...

  • 701 Views
  • 0 replies
  • 1 kudos
greyfine
by New Contributor II
  • 12546 Views
  • 5 replies
  • 5 kudos

Hi Everyone , I was wondering if it is possible to have alerts set up on query level for pyspark notebooks that are run on schedule in databricks so if we have some expected result from it we can receive a mail alert ?

In Above you can see we have 3 workspaces - we have the alert option available in the sql workspace but not in our data science and engineering space , anyway we can incorporate this in our DS and Engineering space ?

image.png
  • 12546 Views
  • 5 replies
  • 5 kudos
Latest Reply
JKR
Contributor
  • 5 kudos

How can I receive call on teams/number/slack if any jobs fails?

  • 5 kudos
4 More Replies
Aidzillafont
by New Contributor II
  • 1384 Views
  • 1 replies
  • 0 kudos

How to pick the right cluster for your workflow

Hi All,I am attempting to execute a workflow on various job clusters, including general-purpose and memory-optimized clusters. My main bottleneck is that data is being written to disk because I’m running out of RAM. This is due to the large dataset t...

  • 1384 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ravivarma
Databricks Employee
  • 0 kudos

Hello @Aidzillafont , Greetings! Please find below the document which explains the Compute configuration best practices Doc: https://docs.databricks.com/en/compute/cluster-config-best-practices.html I hope this helps you! Regards, Ravi

  • 0 kudos
Sadam97
by New Contributor III
  • 658 Views
  • 0 replies
  • 0 kudos

Databricks (GCP) Cluster not resolving Hostname into IP address

we have #mongodb hosts that must be resolved to private internal loadbalancer ips ( of another cluster ), and that we are unable to add host aliases in the Databricks GKE cluster in order for the spark to be able to connect to a mongodb and resolve t...

  • 658 Views
  • 0 replies
  • 0 kudos
feliximmanuel
by New Contributor II
  • 1174 Views
  • 0 replies
  • 1 kudos

Error: oidc: fetch .well-known: Get "https://%E2%80%93host/oidc/.well-known/oauth-authorization-serv

I'm trying to authenticate databricks using WSL but suddenly getting this error./databricks-asset-bundle$ databricks auth login –host https://<XXXXXXXXX>.12.azuredatabricks.netDatabricks Profile Name:<XXXXXXXXX>Error: oidc: fetch .well-known: Get "ht...

  • 1174 Views
  • 0 replies
  • 1 kudos
Sudheer_DB
by New Contributor II
  • 980 Views
  • 3 replies
  • 0 kudos

DLT SQL schema definition

Hi All,While defining a schema in creating a table using Autoloader and DLT using SQL, I am getting schema mismatch error between the defined schema and inferred schema. CREATE OR REFRESH STREAMING TABLE csv_test(a0 STRING,a1 STRING,a2 STRING,a3 STRI...

Sudheer_DB_0-1719375711422.png
  • 980 Views
  • 3 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@Sudheer_DB You can specify your own _rescued_data column name by setting up rescuedDataColumn option.https://docs.databricks.com/en/ingestion/auto-loader/schema.html#what-is-the-rescued-data-column

  • 0 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels