Data Engineering

Forum Posts

Sorted by:

by Hubert-Dudek • Esteemed Contributor III

03-09-2023 2:25:07 PM

574 Views
1 replies
7 kudos

Starting from databricks 12.2 LTS, the explode function can be used in the FROM statement to manipulate data in new and powerful ways. This function t...

Starting from databricks 12.2 LTS, the explode function can be used in the FROM statement to manipulate data in new and powerful ways. This function takes an array column as input and returns a new row for each element in the array, offering new poss...

Data Engineering

574 Views
1 replies
7 kudos

03-09-2023 2:25:07 PM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

03-09-2023 8:08:51 PM

7 kudos

It's very useful for SQL developers.

7 kudos

03-09-2023 8:08:51 PM

by Mado • Valued Contributor II

03-08-2023 7:53:06 PM

1916 Views
2 replies
0 kudos

Overwriting the existing table in Databricks; Mechanism and History?

Hi,Assume that I have a delta table stored on an Azure storage account. When new records arrive, I repeat the transformation and overwrite the existing table. (DF.write .format("delta") .mode("overwrite") .option("...

Data Engineering

1916 Views
2 replies
0 kudos

03-08-2023 7:53:06 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

03-09-2023 3:57:29 AM

0 kudos

the overwrite will add new files, keep the old ones and in a log keeps track of what is current data and what is old data.If the overwrite fails, you will get an error message in the spark program, and the data to be overwritten will still be the cur...

0 kudos

03-09-2023 3:57:29 AM

1 More Replies

by elgeo • Valued Contributor II

03-08-2023 3:29:15 AM

3888 Views
2 replies
0 kudos

Resolved! Convert date to integer

Hello. Is there a way in Databricks sql to convert a date to integer? In Db2, there is days function DAYS - IBM Documentation .For example '2023-03-01' is converted to 738580 value.Thank you in advance

Data Engineering

3888 Views
2 replies
0 kudos

03-08-2023 3:29:15 AM

View Replies

Latest Reply

SergeRielau
Valued Contributor

03-09-2023 9:19:30 AM

0 kudos

TRy this:CREATE OR REPLACE FUNCTION days(dt DATE) RETURN unix_date(dt) - unix_date(DATE'0001-01-01') + 1;SELECT current_date, days(current_date); 2023-03-09 738588I verified on Db2 for LUW and it matches up.

0 kudos

03-09-2023 9:19:30 AM

1 More Replies

by griffinw • New Contributor III

06-22-2022 7:43:10 AM

3018 Views
5 replies
3 kudos

Resolved! Unable to import tkinter in notebook

Hello,I am unable to import tkinter (or Tkinter) into a python notebook.I also tried %pip install tkinter at the top of the notebook.Has anyone else been successful at this, or if it's impossible, why? Thank you

Data Engineering

3018 Views
5 replies
3 kudos

06-22-2022 7:43:10 AM

View Replies

Latest Reply

ahmedE_
New Contributor II

03-08-2023 11:32:17 AM

3 kudos

Hi @Will Griffin Can you confirm if this worked for you? I get a message `ERROR: No matching distribution found for python3-tk`.

3 kudos

03-08-2023 11:32:17 AM

4 More Replies

by haylee • New Contributor II

02-06-2023 11:01:05 AM

1085 Views
4 replies
0 kudos

I added a secret scope to the databricks environment, and I get this error when trying to run either of the following:

Commands Attempted:dbutils.secrets.listScopes()dbutils.secrets.get(scope = "{InsertScope}", key = "{InsertKey}") Error: "shaded.v245.com.fasterxml.jackson.core.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (number, ...

Data Engineering

1085 Views
4 replies
0 kudos

02-06-2023 11:01:05 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

03-01-2023 10:32:57 AM

0 kudos

Hi @Haylee Gaddy,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

0 kudos

03-01-2023 10:32:57 AM

3 More Replies

by thushar • Contributor

01-23-2023 12:41:21 AM

1191 Views
6 replies
0 kudos

GeneratedAlwaysAs' along with dataframe.write

Is it possible to use a calculated column (as like in the delta table using generatedAlwaysAs) definition while writing the data frame as a delta file like df.write.format("delta").Any options are there with the dataframe.write method to achieve this...

Data Engineering

1191 Views
6 replies
0 kudos

01-23-2023 12:41:21 AM

View Replies

Latest Reply

pvignesh92
Honored Contributor

03-09-2023 6:27:58 AM

0 kudos

Hi @Thushar R ,This option is not a part of Dataframe write API as GeneratedAlwaysAs feature is only applicable to Delta format and df.write is a common API to handle writes for all formats. If you to achieve this programmatically, you can still use...

0 kudos

03-09-2023 6:27:58 AM

5 More Replies

by Dave_Nithio • Contributor

10-14-2022 1:29:07 PM

2011 Views
5 replies
7 kudos

Resolved! Delta Live Table Schema Comment

I predefined my schema for a Delta Live Table Autoload. This included comments for some attributes. When performing a standard readStream, my comments appear, but when in Delta Live Tables I get no comments. Is there anything I need to do get comment...

Data Engineering

2011 Views
5 replies
7 kudos

10-14-2022 1:29:07 PM

View Replies

Latest Reply

Kaniz
Community Manager

10-20-2022 8:11:01 AM

7 kudos

Hi @Dave Wilson , We haven’t heard from you since the last response from @Debayan Mukherjee and @Hubert Dudek, and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be ...

7 kudos

10-20-2022 8:11:01 AM

4 More Replies

by AmineHY • Contributor

01-11-2023 3:14:37 AM

13893 Views
4 replies
1 kudos

Resolved! How to get rid of "Command result size exceeds limit"

I am working on Databricks Notebook and trying to display a map using Floium and I keep getting this error > Command result size exceeds limit: Exceeded 20971520 bytes (current = 20973510)How can I get increase the memory limit?I already reduced the...

Data Engineering

13893 Views
4 replies
1 kudos

01-11-2023 3:14:37 AM

View Replies

Latest Reply

labromb
Contributor

03-09-2023 1:26:22 AM

1 kudos

Hi, I have the same problem with keplergl, and the save to disk option, whilst helpful isn't super practical... So how does one plot large datasets in kepler?Any thought welcome

1 kudos

03-09-2023 1:26:22 AM

3 More Replies

by raj123 • New Contributor II

02-16-2023 11:08:47 AM

828 Views
2 replies
3 kudos

Resolved! Data lineage graph now working

I created the below tables but when I click the lineage graph not able to see the upstream or downstream table .... the + sign goes away after a few sec but not able to click it ... anyone else having this issue?CREATE TABLE IF NOT EXISTS lineage_d...

Data Engineering

828 Views
2 replies
3 kudos

02-16-2023 11:08:47 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-09-2023 1:19:32 AM

3 kudos

Hi @Raj Sharma Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

3 kudos

03-09-2023 1:19:32 AM

1 More Replies

by Chilangdon • New Contributor

02-16-2023 10:39:02 AM

797 Views
2 replies
1 kudos

Resolved! How to load multiple xlsx that are storage in different folders with the same name in a blob storage in a delta table ?

Hi i have a blob storage with multile unzip folders with the same suffix folder_report_name_01_2023_01_02 -> file_name_2023_01_02.xlsxBut i want to load all of this data using pandas or pyspark and insert in my delta table.I'm trying to using widget...

Data Engineering

797 Views
2 replies
1 kudos

02-16-2023 10:39:02 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-09-2023 1:18:59 AM

1 kudos

Hi @Fernando Vázquez Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.T...

1 kudos

03-09-2023 1:18:59 AM

1 More Replies

by Ajay-Pandey • Esteemed Contributor III

01-28-2023 7:35:19 AM

1127 Views
2 replies
2 kudos

Why Azure Databricks needs to store data in temp storage in Azure before writing to the synapse.

I was following the tutorial about data transformation with azure databricks, and it says before loading data into azure synapse analytics, the data transformed by azure databricks would be saved on temp storage in azure blob storage first before loa...

Data Engineering

1127 Views
2 replies
2 kudos

01-28-2023 7:35:19 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-08-2023 8:08:55 PM

2 kudos

@Ajay Pandey Saving the transformed data to temporary storage in Azure Blob Storage before loading into Azure Synapse Analytics provides a number of benefits to ensure that the data is accurate, optimized, and performs well in the target environmen...

2 kudos

03-08-2023 8:08:55 PM

1 More Replies

by chhavibansal • New Contributor II

01-17-2023 1:22:22 AM

463 Views
1 replies
0 kudos

What is the upper bound limit for dataSkippingNumIndexedCols, to keeps stats in delta log file?

Is there an upper bound of number that i can assign to delta.dataSkippingNumIndexedCols for computing statistics. Is there some tradeoff benchmark available for increasing this number beyond 32.

Data Engineering

463 Views
1 replies
0 kudos

01-17-2023 1:22:22 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-08-2023 8:21:43 PM

0 kudos

@Chhavi Bansal :The delta.dataSkippingNumIndexedCols configuration property controls the maximum number of columns that Delta Lake will build statistics on during data skipping. By default, this value is set to 32. There is no hard upper bound on th...

0 kudos

03-08-2023 8:21:43 PM

by nounou • New Contributor II

01-13-2023 3:08:07 AM

2759 Views
1 replies
1 kudos

how can i export my dashboard en format html using databriks api

hi everyone, i would like to export my dashbord in html format and embed it in my body of my email in order to send it to my teamso there is my code python for the databriks api and i got this error and when i put my htm in the body of my message i...

Data Engineering

2759 Views
1 replies
1 kudos

01-13-2023 3:08:07 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-08-2023 8:20:36 PM

1 kudos

@mathild noun :import databricks.workspace as workspace_api import requests # set up your Databricks workspace credentials domain = "<your Databricks workspace domain>" token = "<your Databricks API token>" # set up the workspace client workspac...

1 kudos

03-08-2023 8:20:36 PM

by VinayEmmadi • New Contributor

01-25-2023 10:57:45 AM

3333 Views
1 replies
0 kudos

How does hash shuffle join work in Spark?

Hi All, I am trying to understand the internals shuffle hash join. I want to check if my understanding of it is correct. Let’s say I have two tables t1 and t2 joined on column country (8 distinct values). If I set the number of shuffle partitions as ...

Data Engineering

3333 Views
1 replies
0 kudos

01-25-2023 10:57:45 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-08-2023 8:17:39 PM

0 kudos

@Vinay Emmadi : In Spark, a hash shuffle join is a type of join that is used when joining two data sets on a common key. The data is first partitioned based on the join key, and then each partition is shuffled and sent to a node in the cluster. The ...

0 kudos

03-08-2023 8:17:39 PM

by Bartek • Contributor

01-27-2023 5:07:14 AM

1910 Views
1 replies
1 kudos

Save Spark DataFrame to shape file (.shp format)

Hello,I know how to create .shp file from Geopandas dataframe using code similar to this, also mentioned on SO:gpd_df = geopandas.GeoDataFrame(pandas_df, geometry='geom') gpd_df .to_file("username/nh.shp")However I have .parquet files that I can load...

Data Engineering

1910 Views
1 replies
1 kudos

01-27-2023 5:07:14 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-08-2023 8:14:39 PM

1 kudos

@Bartosz Maciejewski :Spark does not have native support for writing Shapefiles directly. However, you can use a third-party library such as GeoPandas or PyShp to write your Spark DataFrame to a Shapefile.Here's an example of how to use GeoPandas to...

1 kudos

03-08-2023 8:14:39 PM

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Starting from databricks 12.2 LTS, the explode function can be used in the FROM statement to manipulate data in new and powerful ways. This function t...

Overwriting the existing table in Databricks; Mechanism and History?

Resolved! Convert date to integer

Resolved! Unable to import tkinter in notebook

I added a secret scope to the databricks environment, and I get this error when trying to run either of the following:

GeneratedAlwaysAs' along with dataframe.write

Resolved! Delta Live Table Schema Comment

Resolved! How to get rid of "Command result size exceeds limit"

Resolved! Data lineage graph now working

Resolved! How to load multiple xlsx that are storage in different folders with the same name in a blob storage in a delta table ?

Why Azure Databricks needs to store data in temp storage in Azure before writing to the synapse.

What is the upper bound limit for dataSkippingNumIndexedCols, to keeps stats in delta log file?

how can i export my dashboard en format html using databriks api

How does hash shuffle join work in Spark?

Save Spark DataFrame to shape file (.shp format)

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...