Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Supposedly there are 4 major types of cluster in Datbricks that are- General Purpose, Storage Optimized, Memory Optimized and Compute Optimized Clusters but I'm not able to find detailed information as on which cluster to choose specifically in which...
What is the best method to expose Azure Databricks metrics to Prometheus specifically? And is it possible to get the underlying Spark metrics also? All I can see clearly defined in the documentation is the serving endpoint metrics:https://learn.micro...
Hi There!I am to trying write a batch data to kafka topic with schema registry in databricks using pyspark, i serialize the data with pyspark to_avro function and write it to the topic, but the consumers can’t read the schema id. If they do not separ...
To identify the reasons for a data process poor performance, we need to navigate and analyze the metrics in the Spark UI manually... However, replicating those steps for a giant group of spark applications would be very expensive in times...Given thi...
Hi all,I am calling get job run list API to get all task ids and refer them in dbt-artifacts view created by dbt job run. The question is I can see 'task run id' on screen but it doesn't come back in api response? Is there a way to get it? I checked ...
Never mind, I have found task_run_id present in getrun api https://docs.databricks.com/api/azure/workspace/jobs/getrunI overlooked at first instance as it is buried under nested json structuretasks[] > run_id.This clarifies and solves my problem!
Hi, I still have some questions, I have a Databricks on AWS and I need to mount S3 bucksts.According to the documentation, it is recommended to do it through the Unity Catalog, but how would I go about reading data from a notebook that would be mount...