- 1550 Views
- 1 replies
- 0 kudos
We have a structured streaming job configured to read from event-hub and persist to the delta raw/bronze layer via MERGE inside a foreachBatch, However of-late, the merge process is taking longer time. How can i optimize this pipeline ?
- 1550 Views
- 1 replies
- 0 kudos
Latest Reply
Delta Lake completes a MERGE in two stepsPerform an inner join between the target table and source table to select all files that have matches.Perform an outer join between the selected files in the target and source tables and write out the update...
- 919 Views
- 0 replies
- 0 kudos
How is it different from regular autologging? When should I consider enabling Auto autologging ? How can I switch the feature on?
- 919 Views
- 0 replies
- 0 kudos
- 1376 Views
- 1 replies
- 1 kudos
Would it require DB connect / DB CLI / API?
- 1376 Views
- 1 replies
- 1 kudos
Latest Reply
mlflow is an open source framework and you could pip install mlflow in your laptop for example. https://mlflow.org/docs/latest/quickstart.html
- 2447 Views
- 2 replies
- 1 kudos
Use Case BackgroundWe have an ongoing SecOps project going live here in 4 weeks. We have set up a Splunk to monitor syslogs logs and want to integrate this with Delta. Our forwarder collect the data from remote machines then forwards data to the inde...
- 2447 Views
- 2 replies
- 1 kudos
Latest Reply
The Databricks Add-on for Splunk built as part of Databricks Labs can be leveraged for Splunk integrationIt’s a bi-directional framework that allows for in-place querying of data in databricks from within Splunk by running queries, notebooks or jobs ...
1 More Replies
- 827 Views
- 1 replies
- 0 kudos
I want to understand more about the delta live cluster, the cluster when it starts we do not have visibility, I also heard that operational task like Optimize can happen on other cluster, living original cluster for only main work of data proces...
- 827 Views
- 1 replies
- 0 kudos
Latest Reply
Delta Live Table Pipeline definition has a place to define the cluster configuration. DLT execution is encapsulated in the Pipeline and you're monitoring the overall pipeline which is the higher order function vs having to monitor the cluster itself
by
aladda
• Databricks Employee
- 2905 Views
- 1 replies
- 0 kudos
I’m running 3 separate dbt processes in parallel. all of them are reading data from different databrick databases, creating different staging tables by using dbt alias, but they all at the end update/insert to the same target table. the 3 processes r...
- 2905 Views
- 1 replies
- 0 kudos
Latest Reply
You’re likely running into the issue described here and a solution to it as well. While Delta does support concurrent writers to separate partitions of a table, depending on your query structure join/filter/where in particular, there may still be a n...
- 6482 Views
- 1 replies
- 1 kudos
Can you please explain the difference between Jobs and Delta Live tables?
- 6482 Views
- 1 replies
- 1 kudos
Latest Reply
Jobs are designed for automated execution (scheduled or manually) of Databricks Notebooks, JARs, spark-submit jobs etc. Its essentially a generic framework to run any kind of Data Engg, Data Analysis or Data Science workload. Delta Live Tables on the...
- 2288 Views
- 1 replies
- 0 kudos
I see the default in the UI is to always create clusters in a single AZ (e.g. us-west-2a), but want to distribute workloads across all available AZs.
- 2288 Views
- 1 replies
- 0 kudos
Latest Reply
Found the answer - not available in the UI, but via API, you can submit the cluster definition with "aws_attributes": {
"zone_id": "auto"
},This is documented in the Cluster API: https://docs.databricks.com/dev-tools/api/latest/clusters.html#aw...