cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

greyfine
by New Contributor II
  • 9109 Views
  • 8 replies
  • 7 kudos

Resolved! Hi Everyone , I was wondering if it is possible to have alerts set up on query level for pyspark notebooks that are run on schedule in databricks so if we have some expected result from it we can receive a mail alert ?

In Above you can see we have 3 workspaces - we have the alert option available in the sql workspace but not in our data science and engineering space , anyway we can incorporate this in our DS and Engineering space ?

image.png
  • 9109 Views
  • 8 replies
  • 7 kudos
Latest Reply
JKR
Contributor
  • 7 kudos

How can I receive call on teams/number/slack if any jobs fails?

  • 7 kudos
7 More Replies
ndatabricksuser
by New Contributor
  • 1594 Views
  • 3 replies
  • 3 kudos

Vacuum and Streaming Issue

Hi User Community,Requesting some advice on the below issue please:I have 4 Databricks notebooks, 1 That ingests data from a Kafka topic (metric data from many servers) and dumps the data in parquet format into a specified location. My 2nd data brick...

Data Engineering
Delta Lake
optimize
spark
structured streaming
vacuum
  • 1594 Views
  • 3 replies
  • 3 kudos
Latest Reply
mroy
Contributor
  • 3 kudos

Vacuuming is also a lot faster with inventory tables!

  • 3 kudos
2 More Replies
nolanlavender00
by New Contributor
  • 4679 Views
  • 4 replies
  • 1 kudos

Resolved! How to stop a Streaming Job based on time of the week

I have an always-on job cluster triggering Spark Streaming jobs. I would like to stop this streaming job once a week to run table maintenance. I was looking to leverage the foreachBatch function to check a condition and stop the job accordingly.

  • 4679 Views
  • 4 replies
  • 1 kudos
Latest Reply
mroy
Contributor
  • 1 kudos

You could also use the "Available-now micro-batch" trigger. It only processes one batch at a time, and you can do whatever you want in between batches (sleep, shut down, vacuum, etc.)

  • 1 kudos
3 More Replies
aranjan99
by New Contributor III
  • 1843 Views
  • 5 replies
  • 3 kudos

system.billing.usage table missing data for jobs running in my databricks account

I have some jobs running on databricks. I can obtain their jobId from the Jobs UI or List Job Runs API.However when trying to get DBU usage for the corresponding jobs from system.billing.usage, I do not see the same job_id in that table. Its been mor...

  • 1843 Views
  • 5 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi, @aranjan99. Apologies for the delayed response. If you’re not seeing job IDs from the UI or API in the billing table, it’s possible that the job run IDs are not being populated for long-running jobs.To address this, consider restarting the comput...

  • 3 kudos
4 More Replies
aranjan99
by New Contributor III
  • 1672 Views
  • 4 replies
  • 1 kudos

system.access.table_lineage table missing data

I am using the system.access.table_lineage table  to figure out the tables accessed by sql queries and the corresponding SQL queries. However I am noticing this table missing data or values very often.For eg for sql queries executed by our DBT jobs, ...

  • 1672 Views
  • 4 replies
  • 1 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 1 kudos

Is all your ETL querying/referencing the full table name (i.e. catalog.schema.table)? If you query delta files for example, metadata for data lineage will not be captured. 

  • 1 kudos
3 More Replies
dm7
by New Contributor II
  • 2405 Views
  • 2 replies
  • 0 kudos

Resolved! Unit Testing DLT Pipelines

Now we are moving our DLT Pipelines into production, we would like to start looking at unit testing the transformation logic inside DLT notebooks.We want to know how we can unit test the PySpark logic/transformations independently without having to s...

  • 2405 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @dm7,  Instead of embedding all your transformation logic directly in the DLT notebook, create separate Python modules (files) for your transformations.This allows you to interactively test transformations from notebooks and write unit tests speci...

  • 0 kudos
1 More Replies
WWoman
by New Contributor III
  • 1839 Views
  • 2 replies
  • 0 kudos

Resolved! Persisting query history data

Hello,I am looking for a way to persist query history data. I have not have direct access to the system tables. I do have access to a query_history view created by selecting from the system.query.history and system.access.audit system tables. I want ...

  • 1839 Views
  • 2 replies
  • 0 kudos
Latest Reply
syed_sr7
New Contributor II
  • 0 kudos

Is any system table there for query history?

  • 0 kudos
1 More Replies
CarstenWeber
by New Contributor III
  • 3127 Views
  • 9 replies
  • 3 kudos

Resolved! Invalid configuration fs.azure.account.key trying to load ML Model with OAuth

Hi Community,i was trying to load a ML Model from a Azure Storageaccount (abfss://....) with: model = PipelineModel.load(path) i set the spark config:  spark.conf.set("fs.azure.account.auth.type", "OAuth") spark.conf.set("fs.azure.account.oauth.provi...

  • 3127 Views
  • 9 replies
  • 3 kudos
Latest Reply
chhavibansal
New Contributor III
  • 3 kudos

@daniel_sahal any possible reason you know of why it works in OSS spark while it does not work in databricks notebook ? Why is there a disparity.

  • 3 kudos
8 More Replies
Aidzillafont
by New Contributor II
  • 303 Views
  • 1 replies
  • 0 kudos

How to pick the right cluster for your workflow

Hi All,I am attempting to execute a workflow on various job clusters, including general-purpose and memory-optimized clusters. My main bottleneck is that data is being written to disk because I’m running out of RAM. This is due to the large dataset t...

  • 303 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ravivarma
New Contributor III
  • 0 kudos

Hello @Aidzillafont , Greetings! Please find below the document which explains the Compute configuration best practices Doc: https://docs.databricks.com/en/compute/cluster-config-best-practices.html I hope this helps you! Regards, Ravi

  • 0 kudos
Enrique1987
by New Contributor III
  • 1428 Views
  • 1 replies
  • 2 kudos

Resolved! when to activate photon and when not to ?

Photon appears as an option to check and uncheck as appropriate.The use of Photon leads to higher consumption of DBUs and higher costs.At what point does it pay off and when not to enable it.More costs for the use of photon, but at the same time less...

  • 1428 Views
  • 1 replies
  • 2 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 2 kudos

This is my own experience: For SQL workloads, with not too many joins, it will speed things up. For building facts and dimensions using many joins, I found Photon to increase costs by a lot, while not bringing much better performance. The only real w...

  • 2 kudos
Sudheer_DB
by New Contributor II
  • 411 Views
  • 3 replies
  • 0 kudos

DLT SQL schema definition

Hi All,While defining a schema in creating a table using Autoloader and DLT using SQL, I am getting schema mismatch error between the defined schema and inferred schema. CREATE OR REFRESH STREAMING TABLE csv_test(a0 STRING,a1 STRING,a2 STRING,a3 STRI...

Sudheer_DB_0-1719375711422.png
  • 411 Views
  • 3 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@Sudheer_DB You can specify your own _rescued_data column name by setting up rescuedDataColumn option.https://docs.databricks.com/en/ingestion/auto-loader/schema.html#what-is-the-rescued-data-column

  • 0 kudos
2 More Replies
drii_cavalcanti
by New Contributor III
  • 1988 Views
  • 3 replies
  • 0 kudos

DBUtils commands do not work on shared access mode clusters

Hi there,I am trying to upload a file to an s3 bucket. However, none of dbutils commands seem to work neither does the boto3 library. For clusters that have the configuration, except for the shared access mode, seem to work fine.Those are the error m...

  • 1988 Views
  • 3 replies
  • 0 kudos
Latest Reply
mchugani
New Contributor II
  • 0 kudos

@drii_cavalcanti Were you able to resolve this?

  • 0 kudos
2 More Replies
pm71
by New Contributor II
  • 860 Views
  • 4 replies
  • 3 kudos

Issue with os and sys Operations in Repo Path on Databricks

Hi,Starting from today, I have encountered an issue when performing operations using the os and sys modules within the Repo path in my Databricks environment. Specifically, any operation that involves these modules results in a timeout error. However...

  • 860 Views
  • 4 replies
  • 3 kudos
Latest Reply
mgradowski
New Contributor III
  • 3 kudos

https://status.azuredatabricks.net/pages/incident/5d49ec10226b9e13cb6a422e/667c08fa17fef71767abda04"Degraded performance" is a pretty mild way of saying almost nothing productve can be done ATM...

  • 3 kudos
3 More Replies
Hertz
by New Contributor II
  • 631 Views
  • 2 replies
  • 0 kudos

System Tables / Audit Logs action_name createWarehouse/createEndpoint

I am creating a cost dashboard across multiple accounts. I am working get sql warehouse names and warehouse ids so I can combine with system.access.billing on warehouse_id.  But the only action_names that include both the warehouse_id and warehouse_n...

Data Engineering
Audit Logs
cost monitor
createEndpoint
createWarehouse
  • 631 Views
  • 2 replies
  • 0 kudos
Latest Reply
Hertz
New Contributor II
  • 0 kudos

I just wanted to circle back to this. It appears that the ID is returned in the response column of the create action_name.

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels