cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

sajith_appukutt
by Honored Contributor II
  • 1140 Views
  • 1 replies
  • 0 kudos
  • 1140 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

You could leverage SHOW GRANT which displays the privilegesSHOW GRANT [<user>] ON [CATALOG | DATABASE <database-name> | TABLE <table-name> | VIEW <view-name> | FUNCTION <function-name> | ANONYMOUS FUNCTION | ANY FILE]You could use this code snippet ...

  • 0 kudos
sajith_appukutt
by Honored Contributor II
  • 1218 Views
  • 1 replies
  • 0 kudos

Resolved! MERGE operation on PI data getting slower. How can I debug?

We have a structured streaming job configured to read from event-hub and persist to the delta raw/bronze layer via MERGE inside a foreachBatch, However of-late, the merge process is taking longer time. How can i optimize this pipeline ?

  • 1218 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Delta Lake completes a  MERGE  in two stepsPerform an inner join between the target table and source table to select all files that have matches.Perform an outer join between the selected files in the target and source tables and write out the update...

  • 0 kudos
Anonymous
by Not applicable
  • 712 Views
  • 0 replies
  • 0 kudos

What is Auto auto-logging?

How is it different from regular autologging? When should I consider enabling Auto autologging ? How can I switch the feature on?

  • 712 Views
  • 0 replies
  • 0 kudos
Anonymous
by Not applicable
  • 1082 Views
  • 1 replies
  • 1 kudos
  • 1082 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

mlflow is an open source framework and you could pip install mlflow in your laptop for example. https://mlflow.org/docs/latest/quickstart.html

  • 1 kudos
User16826987838
by Contributor
  • 1159 Views
  • 2 replies
  • 0 kudos
  • 1159 Views
  • 2 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

def getVaccumSize(table: String): Long = { val listFiles = spark.sql(s"VACUUM $table DRY RUN").select("path").collect().map(_(0)).toList var sum = 0L listFiles.foreach(x => sum += dbutils.fs.ls(x.toString)(0).size) sum }   getVaccumSize("<yo...

  • 0 kudos
1 More Replies
r_van_niekerk
by New Contributor II
  • 2012 Views
  • 2 replies
  • 1 kudos

I have a multi-part question around Databricks integration with Splunk?

Use Case BackgroundWe have an ongoing SecOps project going live here in 4 weeks. We have set up a Splunk to monitor syslogs logs and want to integrate this with Delta. Our forwarder collect the data from remote machines then forwards data to the inde...

  • 2012 Views
  • 2 replies
  • 1 kudos
Latest Reply
aladda
Honored Contributor II
  • 1 kudos

The Databricks Add-on for Splunk built as part of Databricks Labs can be leveraged for Splunk integrationIt’s a bi-directional framework that allows for in-place querying of data in databricks from within Splunk by running queries, notebooks or jobs ...

  • 1 kudos
1 More Replies
User16826994223
by Honored Contributor III
  • 666 Views
  • 1 replies
  • 0 kudos

Deltalive table Cluster

I want to understand more about the delta live cluster, the cluster when it starts we do not have visibility, I also heard that operational task like Optimize can happen on other cluster, living original cluster for only main work of data proces...

  • 666 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Delta Live Table Pipeline definition has a place to define the cluster configuration. DLT execution is encapsulated in the Pipeline and you're monitoring the overall pipeline which is the higher order function vs having to monitor the cluster itself

  • 0 kudos
User16826987838
by Contributor
  • 1532 Views
  • 1 replies
  • 0 kudos
  • 1532 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Databricks recommends launching the cluster so that the Spark driver is on an on-demand instance, which allows saving the state of the cluster even after losing spot instance nodes. If you choose to use all spot instances including the driver, any ca...

  • 0 kudos
Anonymous
by Not applicable
  • 1043 Views
  • 1 replies
  • 0 kudos
  • 1043 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

For Delta in general having Delta cache accelerates data reads by creating copies of remote files in nodes’ local storage using a fast intermediate data format. The data is cached automatically whenever a file has to be fetched from a remote locatio...

  • 0 kudos
aladda
by Honored Contributor II
  • 1858 Views
  • 1 replies
  • 0 kudos

Resolved! I read that Delta supports concurrent writes to separate partitions of the table but I'm getting an error when doing so

I’m running 3 separate dbt processes in parallel. all of them are reading data from different databrick databases, creating different staging tables by using dbt alias, but they all at the end update/insert to the same target table. the 3 processes r...

  • 1858 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

You’re likely running into the issue described here and a solution to it as well. While Delta does support concurrent writers to separate partitions of a table, depending on your query structure join/filter/where in particular, there may still be a n...

  • 0 kudos
aladda
by Honored Contributor II
  • 7668 Views
  • 1 replies
  • 1 kudos
  • 7668 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Honored Contributor II
  • 1 kudos

The Databricks Add-on for Splunk built as part of Databricks Labs can be leveraged for Splunk integrationIt’s a bi-directional framework that allows for in-place querying of data in databricks from within Splunk by running queries, notebooks or jobs ...

  • 1 kudos
Anonymous
by Not applicable
  • 5374 Views
  • 1 replies
  • 1 kudos

Resolved! Jobs - Delta Live tables difference

Can you please explain the difference between Jobs and Delta Live tables?

  • 5374 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Honored Contributor II
  • 1 kudos

Jobs are designed for automated execution (scheduled or manually) of Databricks Notebooks, JARs, spark-submit jobs etc. Its essentially a generic framework to run any kind of Data Engg, Data Analysis or Data Science workload. Delta Live Tables on the...

  • 1 kudos
aladda
by Honored Contributor II
  • 2950 Views
  • 1 replies
  • 0 kudos
  • 2950 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Notebooks in Databricks are part of the WebApp which is run & managed by databricks from the Control Plane. See the high level architecture here for details - https://docs.databricks.com/getting-started/overview.html

  • 0 kudos
tj-cycyota
by New Contributor III
  • 1765 Views
  • 1 replies
  • 0 kudos

Resolved! How can I make a cluster start up in the availability-zone (AZ) with the most available IPs?

I see the default in the UI is to always create clusters in a single AZ (e.g. us-west-2a), but want to distribute workloads across all available AZs.

  • 1765 Views
  • 1 replies
  • 0 kudos
Latest Reply
tj-cycyota
New Contributor III
  • 0 kudos

Found the answer - not available in the UI, but via API, you can submit the cluster definition with "aws_attributes": { "zone_id": "auto" },This is documented in the Cluster API: https://docs.databricks.com/dev-tools/api/latest/clusters.html#aw...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels