Data Engineering

Forum Posts

Sorted by:

Start a conversation

by User16869510359 • Esteemed Contributor

06-24-2021 1:18:07 PM

664 Views
1 replies
0 kudos

Resolved! Is there an easier way to rename a column in the Delta table, instead of reading and overwriting with the updated column name?

Data Engineering

664 Views
1 replies
0 kudos

06-24-2021 1:18:07 PM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-24-2021 1:18:25 PM

0 kudos

We can create a view on the delta table with a different column name.

0 kudos

06-24-2021 1:18:25 PM

by User16790091296 • Contributor II

06-24-2021 8:30:45 AM

1174 Views
1 replies
0 kudos

How to read a Databricks table via Databricks api in Python?

Using Python-3, I am trying to compare an Excel (xlsx) sheet to an identical spark table in Databricks. I want to avoid doing the compare in Databricks. So I am looking for a way to read the spark table via the Databricks api. Is this possible? How c...

Data Engineering

1174 Views
1 replies
0 kudos

06-24-2021 8:30:45 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-24-2021 1:17:33 PM

0 kudos

What is the format of the table - if It is delta, you could use the python bindings for the native Rust API and read the table from your python code and compare bypassing the metastore.

0 kudos

06-24-2021 1:17:33 PM

by User16869510359 • Esteemed Contributor

06-24-2021 12:50:25 PM

3069 Views
1 replies
0 kudos

Resolved! How to get the runId, jobId of ephemeral job created in a notebook workflow from parent notebook?

Data Engineering

3069 Views
1 replies
0 kudos

06-24-2021 12:50:25 PM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-24-2021 12:51:17 PM

0 kudos

One solution is to get the runId,jobId details using notebook context in child notebook and return these values using dbutils.notebook.exit to parent notebook.%scala val jobId = dbutils.notebook.getContext.tags("jobId").toString() val runId = dbutils...

0 kudos

06-24-2021 12:51:17 PM

by User16869510359 • Esteemed Contributor

06-24-2021 12:30:12 PM

1302 Views
1 replies
0 kudos

Resolved! Scheduled job did not trigger the job run

I have a job that is scheduled to run every one hour. But rarely I see the job runs are skipped

Data Engineering

1302 Views
1 replies
0 kudos

06-24-2021 12:30:12 PM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-24-2021 12:32:01 PM

0 kudos

If you choose a timezone with Daylight savings this issue can happen. We recommend choosing UTC timezone to avoid this issue. If you select a zone that observes daylight saving time, an hourly job will be skipped or may appear to not fire for an hour...

0 kudos

06-24-2021 12:32:01 PM

by User16783854657 • New Contributor III

06-23-2021 2:28:51 PM

6791 Views
2 replies
0 kudos

What is the difference between OPTIMIZE and Auto Optimize?

I see that Delta Lake has an OPTIMIZE command and also table properties for Auto Optimize. What are the differences between these and when should I use one over the other?

Data Engineering

6791 Views
2 replies
0 kudos

06-23-2021 2:28:51 PM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-24-2021 12:18:25 PM

0 kudos

From my Data+AI talk on Operating and Supporting Delta lake in production

0 kudos

06-24-2021 12:18:25 PM

1 More Replies

by User16869510359 • Esteemed Contributor

06-24-2021 11:32:58 AM

2449 Views
1 replies
0 kudos

Resolved! Unable to overwrite the schema of a Delta table

As per the docs, I can overwrite the schema of a Delta table using the "overWriteSchema" option. But i am unable to overwrite the schema for a Delta table.

Data Engineering

2449 Views
1 replies
0 kudos

06-24-2021 11:32:58 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-24-2021 11:34:03 AM

0 kudos

When Table ACLs are enabled, we can't change the schema of an operation through a write, which requires * MODIFY permissions, when schema changes require OWN permissions. Hence overwriting schema is not supported when Table ACL is enabled for the D...

0 kudos

06-24-2021 11:34:03 AM

by User16869510359 • Esteemed Contributor

06-24-2021 11:26:12 AM

2973 Views
1 replies
0 kudos

Resolved! How to get the total number of records in a delta table from the stats, without querying it?

Data Engineering

2973 Views
1 replies
0 kudos

06-24-2021 11:26:12 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-24-2021 11:27:27 AM

0 kudos

The below code can be used to get the number of records in a Delta table without querying it%scala import com.databricks.sql.transaction.tahoe.DeltaLog import org.apache.hadoop.fs.Path import org.apache.spark.sql.DataFrame import org.apache.spark.sql...

0 kudos

06-24-2021 11:27:27 AM

by User16869510359 • Esteemed Contributor

06-24-2021 11:10:13 AM

980 Views
1 replies
1 kudos

Resolved! Cluster logs missing

On the Databricks cluster UI, when I click on the Driver logs, sometimes I see historic logs and sometimes I see logs for the last few hours. Why do we see this inconsistency

Data Engineering

980 Views
1 replies
1 kudos

06-24-2021 11:10:13 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-24-2021 11:12:12 AM

1 kudos

This is working per design! This is the expected behavior. When the cluster is in terminated state, the logs are serviced by the Spark History server hosted on the Databricks control plane. When the cluster is up and running the logs are serviced by ...

1 kudos

06-24-2021 11:12:12 AM

by User16790091296 • Contributor II

06-24-2021 8:10:10 AM

1472 Views
2 replies
1 kudos

Database within a Database in Databricks

Is it possible to have a folder or database with a database in Azure Databricks? I know you can use the "create database if not exists xxx" to get a database, but I want to have folders within that database where I can put tables.

Data Engineering

1472 Views
2 replies
1 kudos

06-24-2021 8:10:10 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-24-2021 11:02:02 AM

1 kudos

The default location of a database will be in the /user/hive/warehouse/<databasename.db>. Irrespective of the location of the database the tables in the database can have different locations and they can be specified at the time of creation. Databas...

1 kudos

06-24-2021 11:02:02 AM

1 More Replies

by User16790091296 • Contributor II

06-24-2021 8:12:47 AM

613 Views
1 replies
0 kudos

How do we get logs on read queries from delta lake in Databricks?

I've tried with :df.write.mode("overwrite").format("com.databricks.spark.csv").option("header","true").csv(dstPath)anddf.write.format("csv").mode("overwrite").save(dstPath)but now I have 10 csv files but I need one file and name it.

Data Engineering

613 Views
1 replies
0 kudos

06-24-2021 8:12:47 AM

View Replies

Latest Reply

Ryan_Chynoweth
Honored Contributor III

06-24-2021 10:53:04 AM

0 kudos

The header question seems different than your body question. I am assuming that you are asking how to only get a single CSV file when writing? To do so you should use the coalesce:df.coalesce(1).write.format("csv").mode("overwrite").save(dstPath)This...

0 kudos

06-24-2021 10:53:04 AM

by User16869510359 • Esteemed Contributor

06-24-2021 10:38:28 AM

1101 Views
1 replies
0 kudos

Resolved! Is it recommended to turn on Spark speculative execution permanently

I had a job where the last step will get stuck forever. Turning on spark speculative execution did magic and resolved the issue. Is it safe to turn on Spark speculative execution permanently.

Data Engineering

1101 Views
1 replies
0 kudos

06-24-2021 10:38:28 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-24-2021 10:41:35 AM

0 kudos

It's not recommended to turn of Spark speculative execution permanently. For jobs where tasks are running slow or stuck because of transient network or storage issues, speculative execution can be very handy. However, it suppresses the actual problem...

0 kudos

06-24-2021 10:41:35 AM

by User16790091296 • Contributor II

06-24-2021 8:54:56 AM

1034 Views
1 replies
0 kudos

If possible, how can I update R Version on Azure Databricks?

Azure Databricks currently runs R version 3.4.4 (2018-03-15), which is unacceptable in my opinion since the latest R version on CRAN is 3.5.2 (2018-12-20).My question is: Is it possible for me to upgrade and install R version 3.5.2 on Azure Databrick...

Data Engineering

1034 Views
1 replies
0 kudos

06-24-2021 8:54:56 AM

View Replies

Latest Reply

User16752239289
Valued Contributor

06-24-2021 10:35:48 AM

0 kudos

You can change the R version by following this document we have https://docs.microsoft.com/en-us/azure/databricks/kb/r/change-r-versionThe R version comes with each DBR (Databricks Runtime) can be find in the release note https://docs.microsoft.com/e...

0 kudos

06-24-2021 10:35:48 AM

by User16869510359 • Esteemed Contributor

06-24-2021 10:20:53 AM

767 Views
1 replies
0 kudos

Resolved! Should I use DEEP clone or SHALLOW Clone? which is better?

Data Engineering

767 Views
1 replies
0 kudos

06-24-2021 10:20:53 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-24-2021 10:23:21 AM

0 kudos

It's the use case that decides the use of Shallow clone or DEEP Clone Data is physically copied to the clone table in the case of a Deep clone. A deep clone is very useful to copy the data and have a backup of the data in another region/env. The typ...

0 kudos

06-24-2021 10:23:21 AM

by User16790091296 • Contributor II

06-24-2021 8:55:59 AM

515 Views
0 replies
0 kudos

SAS Files in Databricks (Stack Overflow)

I am trying to convert SAS files to CSV in Azure Databricks. SAS files are in Azure Blob. I am successfully able to mount the azure blob in Databricks, but when I read from it, it has no files even though there are files on Blob. Has anyone done this...

Data Engineering

515 Views
0 replies
0 kudos

06-24-2021 8:55:59 AM

by User16790091296 • Contributor II

06-24-2021 8:50:33 AM

409 Views
0 replies
0 kudos

How to view Ephemeral Notebook Jobs on Databricks via CLI?

Data Engineering

409 Views
0 replies
0 kudos

06-24-2021 8:50:33 AM

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Resolved! Is there an easier way to rename a column in the Delta table, instead of reading and overwriting with the updated column name?

How to read a Databricks table via Databricks api in Python?

Resolved! How to get the runId, jobId of ephemeral job created in a notebook workflow from parent notebook?

Resolved! Scheduled job did not trigger the job run

What is the difference between OPTIMIZE and Auto Optimize?

Resolved! Unable to overwrite the schema of a Delta table

Resolved! How to get the total number of records in a delta table from the stats, without querying it?

Resolved! Cluster logs missing

Database within a Database in Databricks

How do we get logs on read queries from delta lake in Databricks?

Resolved! Is it recommended to turn on Spark speculative execution permanently

If possible, how can I update R Version on Azure Databricks?

Resolved! Should I use DEEP clone or SHALLOW Clone? which is better?

SAS Files in Databricks (Stack Overflow)

How to view Ephemeral Notebook Jobs on Databricks via CLI?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...