cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

User16790091296
by Contributor II
  • 777 Views
  • 1 replies
  • 0 kudos
  • 777 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Depends on what you're looking for from a management perspective, but one option is the Account API which allows deploying/updating/configuring multiple workspaces in a given E2 accountUse this API to programmatically deploy, update, and delete works...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 1145 Views
  • 1 replies
  • 0 kudos
  • 1145 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

curl -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2020-09-01" | jq '.compute.tagsList[] | select(.name=="Creator") | .value'

  • 0 kudos
aladda
by Honored Contributor II
  • 2649 Views
  • 1 replies
  • 1 kudos

Why do Databricks deployments require 2 subnets for each workspace

Databricks must have access to at least two subnets for each workspace, with each subnet in a different availability zone per docs here

  • 2649 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Honored Contributor II
  • 1 kudos

This is designed for optimal user experience and as a capacity planning strategy where if instances are not available in one AZ, the other subnet in a different AZ can be used to deploy instance from instead

  • 1 kudos
brickster_2018
by Esteemed Contributor
  • 1907 Views
  • 1 replies
  • 1 kudos
  • 1907 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 1 kudos

Find the DriverDaemon%sh jpsTake the heap dump%sh jmap -dump:live,format=b,file=pbs_worker_DriverDaemon.hprof 2413Copy out to download%sh cp pbs_worker_DriverDaemon.hprof /dbfs/FileStore/pbs_worker_04-30-2021T15-50-00.hprof

  • 1 kudos
User16826992666
by Valued Contributor
  • 4655 Views
  • 1 replies
  • 0 kudos

Resolved! When using MLflow should I use log_model or save_model?

They seem to have similar functions. What is the recommended pattern here?

  • 4655 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

mlflow.<model-type>.log_model(model, ...) saves the model to the MLflow tracking server. mlflow.<model-type>.save_model(model, modelpath) saved the model locally to  a DBFS path.More details at https://docs.databricks.com/applications/mlflow/models...

  • 0 kudos
Anonymous
by Not applicable
  • 1484 Views
  • 2 replies
  • 2 kudos

Resolved! Spot instances - Best practice

We are having difficulties running our jobs with spot instances that get re-claimed by AWS during shuffles. Do we have any documentation / best-practices around this? We went through this article but is there anything else to keep in mind?

  • 1484 Views
  • 2 replies
  • 2 kudos
Latest Reply
User16783853906
Contributor III
  • 2 kudos

Due to the recent changes in AWS spot market place , legacy techniques like higher spot bid price (>100%) are ineffective to retain the acquired spot node and the instances can be lost in 2 minutes notice causing workloads to fail.To mitigate this, w...

  • 2 kudos
1 More Replies
Ryan_Chynoweth
by Honored Contributor III
  • 693 Views
  • 1 replies
  • 0 kudos
  • 693 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

No, each table must be defined once. You can Use UNION If you need to combine multiple inputs to create a table. Adding or removing UNION from an incremental table is a breaking operation that requires a full-refresh.

  • 0 kudos
User16826992666
by Valued Contributor
  • 796 Views
  • 1 replies
  • 0 kudos

Where can I find the tables I created in my Delta Live Tables pipeline?

I created several tables in my DLT pipeline but didn't specify a location to save them on creation. The pipleline seems to have ran, but I don't know where the tables actually are. How can I find them?

  • 796 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Checkout the configuration storage under settings . If you didn't specify the  storage  setting, the system will default to a location in  dbfs:/pipelines/

  • 0 kudos
User16826987838
by Contributor
  • 910 Views
  • 1 replies
  • 0 kudos
  • 910 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

Yes, in your write stream you can save it as a table in the delta format without a problem. In DBR 8, the default table format is delta. See this code, please note that the "..." is supplied to show that additional options may be required: df.writeSt...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1757 Views
  • 1 replies
  • 0 kudos

When using Delta Live Tables, how do I set a table to be incremental vs complete using Python?

When using SQL, I can use the Create Live Table command and the Create Incremental Live Table command to set the run type I want the table to use. But I don't seem to have that same syntax for python. How can I set this table type while using Python?

  • 1757 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

The documentation at https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-user-guide.html#mixing-complete-tables-and-incremental-tables has an example the first two functions load data incrementally and the last one loads...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 5909 Views
  • 1 replies
  • 0 kudos
  • 5909 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

By default, only 10 MB of data can be broadcasted. spark.sql.autoBroadcastJoinThreshold can be increased up to 8GBThere is an upper limit in terms of records as well. We can't broadcast more than 512m records. So its either 512m records or 8GB which...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1621 Views
  • 1 replies
  • 0 kudos

Is it possible to disable the maintenance job associated with a Delta Live Table?

After creating my Delta Live Table and running it once, I notice that the maintenance job that was created along with it continues to run at the scheduled time. I have not made any updated to the DLT, so the maintenance job theoretically shouldn't ha...

  • 1621 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

You could change the table properties of the associated tables to disable automatic scheduled optimizations. More details at https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-language-ref.html#table-properties

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors