Data Engineering

Forum Posts

Sorted by:

Start a conversation

by brickster_2018 • Databricks Employee

06-25-2021 3:28:28 PM

2315 Views
1 replies
0 kudos

How to get Get tag values in Azure VM using metadata endpoint

Data Engineering

2315 Views
1 replies
0 kudos

06-25-2021 3:28:28 PM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-25-2021 3:28:56 PM

0 kudos

curl -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2020-09-01" | jq '.compute.tagsList[] | select(.name=="Creator") | .value'

0 kudos

06-25-2021 3:28:56 PM

by aladda • Databricks Employee

06-25-2021 3:25:17 PM

5524 Views
1 replies
1 kudos

Why do Databricks deployments require 2 subnets for each workspace

Databricks must have access to at least two subnets for each workspace, with each subnet in a different availability zone per docs here

Data Engineering

5524 Views
1 replies
1 kudos

06-25-2021 3:25:17 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-25-2021 3:27:21 PM

1 kudos

This is designed for optimal user experience and as a capacity planning strategy where if instances are not available in one AZ, the other subnet in a different AZ can be used to deploy instance from instead

1 kudos

06-25-2021 3:27:21 PM

by brickster_2018 • Databricks Employee

06-25-2021 3:24:25 PM

3827 Views
1 replies
1 kudos

Resolved! How to capture the heap dump of the Spark driver JVM

Data Engineering

3827 Views
1 replies
1 kudos

06-25-2021 3:24:25 PM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-25-2021 3:26:29 PM

1 kudos

Find the DriverDaemon%sh jpsTake the heap dump%sh jmap -dump:live,format=b,file=pbs_worker_DriverDaemon.hprof 2413Copy out to download%sh cp pbs_worker_DriverDaemon.hprof /dbfs/FileStore/pbs_worker_04-30-2021T15-50-00.hprof

1 kudos

06-25-2021 3:26:29 PM

by User16826992666 • Databricks Employee

06-25-2021 10:42:17 AM

7475 Views
1 replies
0 kudos

Resolved! When using MLflow should I use log_model or save_model?

They seem to have similar functions. What is the recommended pattern here?

Data Engineering

7475 Views
1 replies
0 kudos

06-25-2021 10:42:17 AM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-25-2021 3:26:16 PM

0 kudos

mlflow.<model-type>.log_model(model, ...) saves the model to the MLflow tracking server. mlflow.<model-type>.save_model(model, modelpath) saved the model locally to a DBFS path.More details at https://docs.databricks.com/applications/mlflow/models...

0 kudos

06-25-2021 3:26:16 PM

by User16790091296 • Databricks Employee

06-25-2021 3:24:31 PM

891 Views
0 replies
0 kudos

Where can I get updated information on databricks features upcoming on GCP?

Data Engineering

891 Views
0 replies
0 kudos

06-25-2021 3:24:31 PM

by Anonymous • Not applicable

06-14-2021 2:26:10 PM

3948 Views
2 replies
2 kudos

Resolved! Spot instances - Best practice

We are having difficulties running our jobs with spot instances that get re-claimed by AWS during shuffles. Do we have any documentation / best-practices around this? We went through this article but is there anything else to keep in mind?

Data Engineering

3948 Views
2 replies
2 kudos

06-14-2021 2:26:10 PM

View Replies

Latest Reply

User16783853906
Databricks Employee

06-25-2021 3:08:38 PM

2 kudos

Due to the recent changes in AWS spot market place , legacy techniques like higher spot bid price (>100%) are ineffective to retain the acquired spot node and the instances can be lost in 2 minutes notice causing workloads to fail.To mitigate this, w...

2 kudos

06-25-2021 3:08:38 PM

1 More Replies

by Ryan_Chynoweth • Databricks Employee

06-25-2021 3:06:13 PM

1454 Views
1 replies
0 kudos

Can I have multiple queries in a pipeline writing to the same target table?

Data Engineering

1454 Views
1 replies
0 kudos

06-25-2021 3:06:13 PM

View Replies

Latest Reply

Ryan_Chynoweth
Databricks Employee

06-25-2021 3:06:35 PM

0 kudos

No, each table must be defined once. You can Use UNION If you need to combine multiple inputs to create a table. Adding or removing UNION from an incremental table is a breaking operation that requires a full-refresh.

0 kudos

06-25-2021 3:06:35 PM

by User16826992666 • Databricks Employee

06-25-2021 12:02:53 PM

1578 Views
1 replies
0 kudos

Where can I find the tables I created in my Delta Live Tables pipeline?

I created several tables in my DLT pipeline but didn't specify a location to save them on creation. The pipleline seems to have ran, but I don't know where the tables actually are. How can I find them?

Data Engineering

1578 Views
1 replies
0 kudos

06-25-2021 12:02:53 PM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-25-2021 3:04:17 PM

0 kudos

Checkout the configuration storage under settings . If you didn't specify the storage setting, the system will default to a location in dbfs:/pipelines/

0 kudos

06-25-2021 3:04:17 PM

by User16826987838 • Databricks Employee

06-25-2021 12:58:21 PM

1841 Views
1 replies
0 kudos

is there a way to write streaming data to delta and create a table in spark catalog at the same time?

Data Engineering

1841 Views
1 replies
0 kudos

06-25-2021 12:58:21 PM

View Replies

Latest Reply

Ryan_Chynoweth
Databricks Employee

06-25-2021 3:03:34 PM

0 kudos

Yes, in your write stream you can save it as a table in the delta format without a problem. In DBR 8, the default table format is delta. See this code, please note that the "..." is supplied to show that additional options may be required: df.writeSt...

0 kudos

06-25-2021 3:03:34 PM

by User16826992666 • Databricks Employee

06-25-2021 11:59:12 AM

2517 Views
1 replies
0 kudos

When using Delta Live Tables, how do I set a table to be incremental vs complete using Python?

When using SQL, I can use the Create Live Table command and the Create Incremental Live Table command to set the run type I want the table to use. But I don't seem to have that same syntax for python. How can I set this table type while using Python?

Data Engineering

2517 Views
1 replies
0 kudos

06-25-2021 11:59:12 AM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-25-2021 3:00:20 PM

0 kudos

The documentation at https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-user-guide.html#mixing-complete-tables-and-incremental-tables has an example the first two functions load data incrementally and the last one loads...

0 kudos

06-25-2021 3:00:20 PM

by User16826992666 • Databricks Employee

06-25-2021 12:24:14 PM

2838 Views
1 replies
0 kudos

Is it possible to disable the maintenance job associated with a Delta Live Table?

After creating my Delta Live Table and running it once, I notice that the maintenance job that was created along with it continues to run at the scheduled time. I have not made any updated to the DLT, so the maintenance job theoretically shouldn't ha...

Data Engineering

2838 Views
1 replies
0 kudos

06-25-2021 12:24:14 PM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-25-2021 2:51:42 PM

0 kudos

You could change the table properties of the associated tables to disable automatic scheduled optimizations. More details at https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-language-ref.html#table-properties

0 kudos

06-25-2021 2:51:42 PM

by User16790091296 • Databricks Employee

06-24-2021 8:40:07 AM

3899 Views
1 replies
1 kudos

Secrets in databricks

I created a secret on databricks using the secrets API.Code :Scope_name : {"scope": "dbtest", "initial_manage_principal":"user"} Resp= requests.post('https://instancename.net/mynoteid/api/2.0/secrets/scopes/create',json=Scope_name)Similar way, I adde...

Data Engineering

3899 Views
1 replies
1 kudos

06-24-2021 8:40:07 AM

View Replies

Latest Reply

aladda
Databricks Employee

06-25-2021 2:51:27 PM

1 kudos

You'll have to specify the scope and the key in the format below to get the value. dbutils.secret.get(scope="dbtest", key="user") Probably a good idea to review the Secret Management documentation for details on how to get this setup the right way - ...

1 kudos

06-25-2021 2:51:27 PM

by User16826992666 • Databricks Employee

06-25-2021 1:50:42 PM

2680 Views
1 replies
0 kudos

Resolved! MLflow Model Serving latency expectations

What kind of latency should I expect when using the built in model serving capability in MLflow. Evaluating whether it would be a good fit for our use case

Data Engineering

2680 Views
1 replies
0 kudos

06-25-2021 1:50:42 PM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-25-2021 2:47:58 PM

0 kudos

What are your throughput requirements in addition to latency. Currently this is in private preview and databricks recommends this only for low throughput and non-critical applications. However, as it move towards GA, this would change. Please get in...

0 kudos

06-25-2021 2:47:58 PM

by brickster_2018 • Databricks Employee

06-25-2021 2:45:10 PM

1722 Views
1 replies
1 kudos

Resolved! How to find which delta commit removed a specific file?

Data Engineering

1722 Views
1 replies
1 kudos

06-25-2021 2:45:10 PM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-25-2021 2:45:34 PM

1 kudos

val oldestVersionAvailable = val newestVersionAvailable = val pathToDeltaTable = "" val pathToFileName = "" (oldestVersionAvailable to newestVersionAvailable).map { version => var df1 = spark.read.json(f"$pathToDeltaTable/_delta_log/$version%0...

1 kudos

06-25-2021 2:45:34 PM

by User16826992666 • Databricks Employee

06-25-2021 2:10:37 PM

3851 Views
1 replies
1 kudos

Trying to write my dataframe out as a tab separated .txt file but getting an error

When I try to save my file I getorg.apache.spark.sql.AnalysisException: Text data source supports only a single column, and you have 2 columns.; Is there any way to save a dataframe with more than one column to a .txt file?

Data Engineering

3851 Views
1 replies
1 kudos

06-25-2021 2:10:37 PM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-25-2021 2:45:01 PM

1 kudos

Would pyspark.sql.DataFrameWriter.csv work? You could specify the separator (sep) as tabdf.write.csv(os.path.join(tempfile.mkdtemp(), 'data'))

1 kudos

06-25-2021 2:45:01 PM

Databricks Community

Forum Posts

How to get Get tag values in Azure VM using metadata endpoint

Why do Databricks deployments require 2 subnets for each workspace

Resolved! How to capture the heap dump of the Spark driver JVM

Resolved! When using MLflow should I use log_model or save_model?

Where can I get updated information on databricks features upcoming on GCP?

Resolved! Spot instances - Best practice

Can I have multiple queries in a pipeline writing to the same target table?

Where can I find the tables I created in my Delta Live Tables pipeline?

is there a way to write streaming data to delta and create a table in spark catalog at the same time?

When using Delta Live Tables, how do I set a table to be incremental vs complete using Python?

Is it possible to disable the maintenance job associated with a Delta Live Table?

Secrets in databricks

Resolved! MLflow Model Serving latency expectations

Resolved! How to find which delta commit removed a specific file?

Trying to write my dataframe out as a tab separated .txt file but getting an error

Join Us as a Local Community Builder!

Warehouse ID specified in job yaml file for sql ta...

Result Difference Between View and Manually Run Vi...

Unable to update DLT-based materialized view if cl...

AutoLoader Type Widening

Serverless Compute Access Restriction Not Supporte...