Data Engineering

Forum Posts

Sorted by:

Start a conversation

by jose_gonzalez • Moderator

06-16-2021 11:35:56 AM

18889 Views
1 replies
0 kudos

Resolved! What's the difference between mode("append") and mode("overwrite") on my Delta table

I would like to know the difference between .mode("append") and .mode("overwrite") when writing my Delta table

Data Engineering

18889 Views
1 replies
0 kudos

06-16-2021 11:35:56 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

06-16-2021 11:37:23 AM

0 kudos

Mode "append" atomically adds new data to an existing Delta table and "overwrite" atomically replaces all of the data in a table.

0 kudos

06-16-2021 11:37:23 AM

by jose_gonzalez • Moderator

06-16-2021 11:32:17 AM

1680 Views
1 replies
0 kudos

Resolved! Where does the schema for a Delta table set reside?

I would like to know where can I find the current schema information from my Delta table.

Data Engineering

1680 Views
1 replies
0 kudos

06-16-2021 11:32:17 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

06-16-2021 11:33:15 AM

0 kudos

The table name, path, database info are stored in Hive metastore, the actual schema is stored in the "_delta_log" directory that should be in the root path location where you Delta table is stored.

0 kudos

06-16-2021 11:33:15 AM

by jose_gonzalez • Moderator

06-16-2021 11:27:27 AM

4730 Views
1 replies
0 kudos

Resolved! How can I read a specific Delta table part file?

is there a way to read a specific part off a delta table? When I try to read the parquet file as parquet I get an error in the notebook that I’m using the incorrect format as it’s part of a delta table. I just want to read a single Parquet file, not ...

Data Engineering

4730 Views
1 replies
0 kudos

06-16-2021 11:27:27 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

06-16-2021 11:29:17 AM

0 kudos

Disable Delta format to read as Parquet you need to set to false the following Spark settings:>> SET spark.databricks.delta.formatCheck.enabled=false OR>> spark.conf.set("spark.databricks.delta.formatCheck.enabled", "false")its not recommended to re...

0 kudos

06-16-2021 11:29:17 AM

by jose_gonzalez • Moderator

06-16-2021 11:17:46 AM

1430 Views
1 replies
0 kudos

Resolved! should I run ANALYZE TABLE on Delta tables?

I would like to know if it recommended to run Analyze table on Delta tables or not. If not, why?

Data Engineering

1430 Views
1 replies
0 kudos

06-16-2021 11:17:46 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

06-16-2021 11:19:44 AM

0 kudos

You can run ANALYZE TABLE on Delta tables only on Databricks Runtime 8.3 and above. For more details please refer to the docs: https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-aux-analyze-table.html

0 kudos

06-16-2021 11:19:44 AM

by User16753724663 • Valued Contributor

06-16-2021 10:15:49 AM

1068 Views
1 replies
1 kudos

Download private repo from GitHub Enterprise in Databricks notebook

We are trying to download our repository which is hosted on GitHub Enterprise to use its python libraries in our notebooks.Earlier we had issues with downloading our repository using the repos feature in Databricks platform since only notebooks can b...

Data Engineering

1068 Views
1 replies
1 kudos

06-16-2021 10:15:49 AM

View Replies

Latest Reply

User16753724663
Valued Contributor

06-16-2021 10:16:59 AM

1 kudos

To fix the issue, we need to pass the token in the header itself git clone https://<token>:x-oauth-basic@github.com/owner/repo.gitExample:%sh git clone https://<token>@github.com/darshanbargal4747/databricks.git

1 kudos

06-16-2021 10:16:59 AM

by User16753724663 • Valued Contributor

06-16-2021 9:53:41 AM

677 Views
1 replies
0 kudos

Unable to use on prem Mysql server as we are not able to resolve the hostname

while connecting from notebook, it returns the error unable to resolve name.

Data Engineering

677 Views
1 replies
0 kudos

06-16-2021 9:53:41 AM

View Replies

Latest Reply

User16753724663
Valued Contributor

06-16-2021 9:55:02 AM

0 kudos

Since we are unable to resolve hostname, it point towards the DNS issue. We can use custom dns using init script and add in the cluster:%scala dbutils.fs.put("/databricks/<directory>/dns-masq.sh";,""" #!/bin/bash #####################################...

0 kudos

06-16-2021 9:55:02 AM

by User16783853906 • Contributor III

06-16-2021 9:47:49 AM

505 Views
0 replies
0 kudos

Verify auto-optimize from delta history

How can I verify if auto-optimize is activated from Delta history for the two scenarios below? Will the DESC history show the details in both the cases? 1). Auto-optimize set on the table properties2). Auto-optimize enabled in spark sessionP.S. - I'm...

Data Engineering

505 Views
0 replies
0 kudos

06-16-2021 9:47:49 AM

by User16753724663 • Valued Contributor

06-16-2021 9:24:21 AM

856 Views
1 replies
0 kudos

Resolved! Unable to create a token while deploying the workspace using terraform

we have automated out deployment with python API's however we have been caught in a situation which we cannot yet solve.We are looking to collect a token during the first deployment within the environment. currently our API requires a token.Is there...

Data Engineering

856 Views
1 replies
0 kudos

06-16-2021 9:24:21 AM

View Replies

Latest Reply

User16753724663
Valued Contributor

06-16-2021 9:25:23 AM

0 kudos

We can use below API to create a token and use the username and passwordcurl -X POST -u "admin_email":"xxxx" https://host/api/2.0/token/create -d' { "lifetime_seconds": 100, "comment": "this is an example token" }'

0 kudos

06-16-2021 9:25:23 AM

by User16826992666 • Valued Contributor

06-16-2021 6:13:43 AM

5910 Views
1 replies
1 kudos

Resolved! Can you import a Jupyter notebook to a Databricks workspace?

Also curious if you can export a notebook created in Databricks as a Jupyter notebook

Data Engineering

5910 Views
1 replies
1 kudos

06-16-2021 6:13:43 AM

View Replies

Latest Reply

User16826992666
Valued Contributor

06-16-2021 7:54:05 AM

1 kudos

Yes, the .ipynb format is a supported file type which can be imported to a Databricks workspace. Note that some special configurations may need to be adjusted to work in the Databricks environment. Additional accepted file formats which can be import...

1 kudos

06-16-2021 7:54:05 AM

by Srikanth_Gupta_ • Valued Contributor

06-16-2021 6:11:28 AM

529 Views
0 replies
0 kudos

Best practices for GC techniques to improve performance of spark job

Data Engineering

529 Views
0 replies
0 kudos

06-16-2021 6:11:28 AM

by User16826992666 • Valued Contributor

06-15-2021 8:34:28 PM

1050 Views
1 replies
0 kudos

Resolved! What should I be looking for when evaluating the performance of a Spark job?

Where do I start when starting performance tuning of my queries? Are there particular things I should be looking out for?

Data Engineering

1050 Views
1 replies
0 kudos

06-15-2021 8:34:28 PM

View Replies

Latest Reply

Srikanth_Gupta_
Valued Contributor

06-16-2021 5:35:48 AM

0 kudos

Few things on top of my mind.1) Check Spark UI and check which stage is taking more time.2) Check for data skewing3) Data skew can severely downgrade performance of queries, Spark SQL accepts skew hints in queries, also make sure to use proper join h...

0 kudos

06-16-2021 5:35:48 AM

by User16826992666 • Valued Contributor

06-15-2021 8:52:04 PM

437 Views
1 replies
0 kudos

Does Databricks SQL support any kind of custom visuals?

Wondering if I can make any kind of custom visuals or are the ones that come built in the only options?

Data Engineering

437 Views
1 replies
0 kudos

06-15-2021 8:52:04 PM

View Replies

Latest Reply

User16826992666
Valued Contributor

06-15-2021 9:35:13 PM

0 kudos

At this time the only available visuals are the ones that are included in the Databricks SQL environment. There is no way to import or create custom visuals.

0 kudos

06-15-2021 9:35:13 PM

by User16826992666 • Valued Contributor

06-15-2021 9:08:24 PM

485 Views
1 replies
0 kudos

Do I have to save data as Delta tables when using Databricks?

Data Engineering

485 Views
1 replies
0 kudos

06-15-2021 9:08:24 PM

View Replies

Latest Reply

User16826992666
Valued Contributor

06-15-2021 9:32:54 PM

0 kudos

No you do not. Although Delta is the default file format when writing data using Databricks, any file type supported by spark can be used when writing data.

0 kudos

06-15-2021 9:32:54 PM

by User16826992666 • Valued Contributor

06-15-2021 8:58:58 PM

1512 Views
1 replies
0 kudos

What happens if a spot instance worker is lost in the middle of a query?

Does the query have to be re-run from the start, or can it continue? Trying to evaluate what risk there is by using spot instances for production jobs

Data Engineering

1512 Views
1 replies
0 kudos

06-15-2021 8:58:58 PM

View Replies

Latest Reply

User16826992666
Valued Contributor

06-15-2021 9:27:59 PM

0 kudos

If a spot instance is reclaimed in the middle of a job, then spark will treat it as a lost worker. The spark engine will automatically retry the tasks from the lost worker on other available workers. So the query does not have to start over if indivi...

0 kudos

06-15-2021 9:27:59 PM

by User16826992666 • Valued Contributor

06-15-2021 9:10:02 PM

443 Views
0 replies
0 kudos

Can I query tables I have created in my Databricks workspace through Microstrategy?

I have created Delta tables in my Databricks workspace and would like to access them using Microstrategy. Is this possible?

Data Engineering

443 Views
0 replies
0 kudos

06-15-2021 9:10:02 PM

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Resolved! What's the difference between mode("append") and mode("overwrite") on my Delta table

Resolved! Where does the schema for a Delta table set reside?

Resolved! How can I read a specific Delta table part file?

Resolved! should I run ANALYZE TABLE on Delta tables?

Download private repo from GitHub Enterprise in Databricks notebook

Unable to use on prem Mysql server as we are not able to resolve the hostname

Verify auto-optimize from delta history

Resolved! Unable to create a token while deploying the workspace using terraform

Resolved! Can you import a Jupyter notebook to a Databricks workspace?

Best practices for GC techniques to improve performance of spark job

Resolved! What should I be looking for when evaluating the performance of a Spark job?

Does Databricks SQL support any kind of custom visuals?

Do I have to save data as Delta tables when using Databricks?

What happens if a spot instance worker is lost in the middle of a query?

Can I query tables I have created in my Databricks workspace through Microstrategy?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...