- 1762 Views
- 1 replies
- 0 kudos
The document has been followed to configure the instance profile. The ec2 instance is able to access the S3 bucket when configured the same instance profile. However, the cluster configured to use the same instance profile failed to access the S3 buc...
- 1762 Views
- 1 replies
- 0 kudos
Latest Reply
I suspect this is due to AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY has been added to the spark environmental variable. You can run %sh env | grep -i aws on your cluster and make sure AWS_ACCESS_KEY_ID is not present. If so, then please remove it e...
- 1264 Views
- 1 replies
- 0 kudos
I am trying to re-optimize the a delta table with a max file size of 32 MB. But after changing spark.databricks.delta.optimize.maxFileSize and trying to optimize a partition, it doesn't split larger files to smaller ones. How can i get it to work.
- 1264 Views
- 1 replies
- 0 kudos
Latest Reply
spark.databricks.delta.optimize.maxFileSize controls the target size to binpack files when you run OPTIMIZE command. But it will not split larger files to smaller ones today. File splitting happens when ZORDER is ran however.
- 766 Views
- 1 replies
- 0 kudos
We have a structured streaming job configured to read from event-hub and persist to the delta raw/bronze layer via MERGE inside a foreachBatch, However of-late, the merge process is taking longer time. How can i optimize this pipeline ?
- 766 Views
- 1 replies
- 0 kudos
Latest Reply
Delta Lake completes a MERGE in two stepsPerform an inner join between the target table and source table to select all files that have matches.Perform an outer join between the selected files in the target and source tables and write out the update...
- 477 Views
- 0 replies
- 0 kudos
How is it different from regular autologging? When should I consider enabling Auto autologging ? How can I switch the feature on?
- 477 Views
- 0 replies
- 0 kudos
- 737 Views
- 1 replies
- 1 kudos
Would it require DB connect / DB CLI / API?
- 737 Views
- 1 replies
- 1 kudos
Latest Reply
mlflow is an open source framework and you could pip install mlflow in your laptop for example. https://mlflow.org/docs/latest/quickstart.html
- 1497 Views
- 2 replies
- 1 kudos
Use Case BackgroundWe have an ongoing SecOps project going live here in 4 weeks. We have set up a Splunk to monitor syslogs logs and want to integrate this with Delta. Our forwarder collect the data from remote machines then forwards data to the inde...
- 1497 Views
- 2 replies
- 1 kudos
Latest Reply
The Databricks Add-on for Splunk built as part of Databricks Labs can be leveraged for Splunk integrationIt’s a bi-directional framework that allows for in-place querying of data in databricks from within Splunk by running queries, notebooks or jobs ...
1 More Replies
- 494 Views
- 1 replies
- 0 kudos
I want to understand more about the delta live cluster, the cluster when it starts we do not have visibility, I also heard that operational task like Optimize can happen on other cluster, living original cluster for only main work of data proces...
- 494 Views
- 1 replies
- 0 kudos
Latest Reply
Delta Live Table Pipeline definition has a place to define the cluster configuration. DLT execution is encapsulated in the Pipeline and you're monitoring the overall pipeline which is the higher order function vs having to monitor the cluster itself
by
aladda
• Honored Contributor II
- 959 Views
- 1 replies
- 0 kudos
I’m running 3 separate dbt processes in parallel. all of them are reading data from different databrick databases, creating different staging tables by using dbt alias, but they all at the end update/insert to the same target table. the 3 processes r...
- 959 Views
- 1 replies
- 0 kudos
Latest Reply
You’re likely running into the issue described here and a solution to it as well. While Delta does support concurrent writers to separate partitions of a table, depending on your query structure join/filter/where in particular, there may still be a n...
- 4331 Views
- 1 replies
- 1 kudos
Can you please explain the difference between Jobs and Delta Live tables?
- 4331 Views
- 1 replies
- 1 kudos
Latest Reply
Jobs are designed for automated execution (scheduled or manually) of Databricks Notebooks, JARs, spark-submit jobs etc. Its essentially a generic framework to run any kind of Data Engg, Data Analysis or Data Science workload. Delta Live Tables on the...