cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

brickster_2018
by Databricks Employee
  • 2315 Views
  • 1 replies
  • 0 kudos
  • 2315 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

curl -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2020-09-01" | jq '.compute.tagsList[] | select(.name=="Creator") | .value'

  • 0 kudos
aladda
by Databricks Employee
  • 5524 Views
  • 1 replies
  • 1 kudos

Why do Databricks deployments require 2 subnets for each workspace

Databricks must have access to at least two subnets for each workspace, with each subnet in a different availability zone per docs here

  • 5524 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Databricks Employee
  • 1 kudos

This is designed for optimal user experience and as a capacity planning strategy where if instances are not available in one AZ, the other subnet in a different AZ can be used to deploy instance from instead

  • 1 kudos
brickster_2018
by Databricks Employee
  • 3827 Views
  • 1 replies
  • 1 kudos
  • 3827 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 1 kudos

Find the DriverDaemon%sh jpsTake the heap dump%sh jmap -dump:live,format=b,file=pbs_worker_DriverDaemon.hprof 2413Copy out to download%sh cp pbs_worker_DriverDaemon.hprof /dbfs/FileStore/pbs_worker_04-30-2021T15-50-00.hprof

  • 1 kudos
User16826992666
by Databricks Employee
  • 7475 Views
  • 1 replies
  • 0 kudos

Resolved! When using MLflow should I use log_model or save_model?

They seem to have similar functions. What is the recommended pattern here?

  • 7475 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

mlflow.<model-type>.log_model(model, ...) saves the model to the MLflow tracking server. mlflow.<model-type>.save_model(model, modelpath) saved the model locally to  a DBFS path.More details at https://docs.databricks.com/applications/mlflow/models...

  • 0 kudos
Anonymous
by Not applicable
  • 3948 Views
  • 2 replies
  • 2 kudos

Resolved! Spot instances - Best practice

We are having difficulties running our jobs with spot instances that get re-claimed by AWS during shuffles. Do we have any documentation / best-practices around this? We went through this article but is there anything else to keep in mind?

  • 3948 Views
  • 2 replies
  • 2 kudos
Latest Reply
User16783853906
Databricks Employee
  • 2 kudos

Due to the recent changes in AWS spot market place , legacy techniques like higher spot bid price (>100%) are ineffective to retain the acquired spot node and the instances can be lost in 2 minutes notice causing workloads to fail.To mitigate this, w...

  • 2 kudos
1 More Replies
Ryan_Chynoweth
by Databricks Employee
  • 1454 Views
  • 1 replies
  • 0 kudos
  • 1454 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

No, each table must be defined once. You can Use UNION If you need to combine multiple inputs to create a table. Adding or removing UNION from an incremental table is a breaking operation that requires a full-refresh.

  • 0 kudos
User16826992666
by Databricks Employee
  • 1578 Views
  • 1 replies
  • 0 kudos

Where can I find the tables I created in my Delta Live Tables pipeline?

I created several tables in my DLT pipeline but didn't specify a location to save them on creation. The pipleline seems to have ran, but I don't know where the tables actually are. How can I find them?

  • 1578 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

Checkout the configuration storage under settings . If you didn't specify the  storage  setting, the system will default to a location in  dbfs:/pipelines/

  • 0 kudos
User16826987838
by Databricks Employee
  • 1841 Views
  • 1 replies
  • 0 kudos
  • 1841 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

Yes, in your write stream you can save it as a table in the delta format without a problem. In DBR 8, the default table format is delta. See this code, please note that the "..." is supplied to show that additional options may be required: df.writeSt...

  • 0 kudos
User16826992666
by Databricks Employee
  • 2517 Views
  • 1 replies
  • 0 kudos

When using Delta Live Tables, how do I set a table to be incremental vs complete using Python?

When using SQL, I can use the Create Live Table command and the Create Incremental Live Table command to set the run type I want the table to use. But I don't seem to have that same syntax for python. How can I set this table type while using Python?

  • 2517 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

The documentation at https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-user-guide.html#mixing-complete-tables-and-incremental-tables has an example the first two functions load data incrementally and the last one loads...

  • 0 kudos
User16826992666
by Databricks Employee
  • 2838 Views
  • 1 replies
  • 0 kudos

Is it possible to disable the maintenance job associated with a Delta Live Table?

After creating my Delta Live Table and running it once, I notice that the maintenance job that was created along with it continues to run at the scheduled time. I have not made any updated to the DLT, so the maintenance job theoretically shouldn't ha...

  • 2838 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

You could change the table properties of the associated tables to disable automatic scheduled optimizations. More details at https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-language-ref.html#table-properties

  • 0 kudos
User16790091296
by Databricks Employee
  • 3899 Views
  • 1 replies
  • 1 kudos

Secrets in databricks

I created a secret on databricks using the secrets API.Code :Scope_name : {"scope": "dbtest", "initial_manage_principal":"user"} Resp= requests.post('https://instancename.net/mynoteid/api/2.0/secrets/scopes/create',json=Scope_name)Similar way, I adde...

  • 3899 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Databricks Employee
  • 1 kudos

You'll have to specify the scope and the key in the format below to get the value. dbutils.secret.get(scope="dbtest", key="user") Probably a good idea to review the Secret Management documentation for details on how to get this setup the right way - ...

  • 1 kudos
User16826992666
by Databricks Employee
  • 2680 Views
  • 1 replies
  • 0 kudos

Resolved! MLflow Model Serving latency expectations

What kind of latency should I expect when using the built in model serving capability in MLflow. Evaluating whether it would be a good fit for our use case

  • 2680 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

What are your throughput requirements in addition to latency. Currently this is in private preview and databricks recommends this only for low throughput and non-critical applications. However, as it move towards GA, this would change. Please get in...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 1722 Views
  • 1 replies
  • 1 kudos
  • 1722 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 1 kudos

val oldestVersionAvailable = val newestVersionAvailable = val pathToDeltaTable = "" val pathToFileName = "" (oldestVersionAvailable to newestVersionAvailable).map { version => var df1 = spark.read.json(f"$pathToDeltaTable/_delta_log/$version%0...

  • 1 kudos
User16826992666
by Databricks Employee
  • 3851 Views
  • 1 replies
  • 1 kudos

Trying to write my dataframe out as a tab separated .txt file but getting an error

When I try to save my file I getorg.apache.spark.sql.AnalysisException: Text data source supports only a single column, and you have 2 columns.; Is there any way to save a dataframe with more than one column to a .txt file?

  • 3851 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 1 kudos

Would pyspark.sql.DataFrameWriter.csv work? You could specify the separator (sep) as tabdf.write.csv(os.path.join(tempfile.mkdtemp(), 'data'))

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels