Data Engineering

Forum Posts

Sorted by:

by hemprasad • New Contributor II

10-04-2024 11:28:15 AM

4384 Views
1 replies
0 kudos

I am trying to use spark session of the compute in java Jar to run queries against tables unity cata

I am trying to use spark session of the compute in java Jar to run queries against tables unity catalog . I get the following error SparkSession spark = SparkSession.builder() .appName("Databricks Query Example") .confi...

Data Engineering

4384 Views
1 replies
0 kudos

10-04-2024 11:28:15 AM

View Replies

Latest Reply

samantha789
New Contributor II

10-04-2024 10:49:23 PM

0 kudos

@hemprasad newjetnet aa loginwrote:I am trying to use spark session of the compute in java Jar to run queries against tables unity catalog . I get the following error SparkSession spark = SparkSession.builder() .appName("Databricks Qu...

0 kudos

10-04-2024 10:49:23 PM

by Majid • New Contributor

10-04-2024 12:37:55 PM

5092 Views
1 replies
1 kudos

Conversion of time zone from UTC to US/Eastern

Hi All,I am new to databricks and i am writing a query to fetch the data from Databricks and encountered with an error. Please see below the query details and the error. Any help would be appreciated. Thank You in Advance ! Query: cast(TO_UTC_TIMESTA...

Data Engineering

5092 Views
1 replies
1 kudos

10-04-2024 12:37:55 PM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

10-04-2024 1:15:44 PM

1 kudos

Hi @Majid ,Spark SQL doesn't support AT TIME ZONE, that's why you've got this error. To achieve similar result, you can use to_utc_timestamp or from_utc_timestamp function. Those functions support timezone parameter:https://docs.databricks.com/en/sq...

1 kudos

10-04-2024 1:15:44 PM

by Bilal1 • New Contributor III

02-16-2022 10:37:25 PM

47511 Views
7 replies
3 kudos

Resolved! Simply writing a dataframe to a CSV file (non-partitioned)

When writing a dataframe in Pyspark to a CSV file, a folder is created and a partitioned CSV file is created. I have then rename this file in order to distribute it my end user.Is there any way I can simply write my data to a CSV file, with the name ...

Data Engineering

47511 Views
7 replies
3 kudos

02-16-2022 10:37:25 PM

View Replies

Latest Reply

chris0706
New Contributor II

10-04-2024 10:36:57 AM

3 kudos

I know this post is a little old, but Chat GPT actually put together a very clean and straightforward solution for me (in scala): // Set the temporary output directory and the desired final file pathval tempDir = "/tmp/your_file_name"val finalOutputP...

3 kudos

10-04-2024 10:36:57 AM

6 More Replies

by Mathias • New Contributor II

10-04-2024 3:39:55 AM

2045 Views
3 replies
1 kudos

Connecting to Blob storage using abfss not working with serverless compute

I tried to follow the instructions found here: Connect to Azure Data Lake Storage Gen2 and Blob Storage - Azure Databricks | Microsoft LearnE.g. this code:spark.conf.set("fs.azure.account.key.<storage-account>.dfs.core.windows.net",dbutils.secrets.ge...

Data Engineering

2045 Views
3 replies
1 kudos

10-04-2024 3:39:55 AM

View Replies

Latest Reply

Mathias
New Contributor II

10-04-2024 5:10:13 AM

1 kudos

Can you point me to some documentation on how to do that?

1 kudos

10-04-2024 5:10:13 AM

2 More Replies

by sticky • New Contributor II

10-01-2024 5:28:56 AM

1428 Views
2 replies
0 kudos

Running a cell with R-script keeps waiting status

So, i have a R-notebook with different cells and a '15.4 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)' cluster.If i select 'run all' all cells will be run immediately and the run finishes quickly and fine. But if i would like to run the cells one...

Data Engineering

1428 Views
2 replies
0 kudos

10-01-2024 5:28:56 AM

View Replies

Latest Reply

sticky
New Contributor II

10-04-2024 8:03:52 AM

0 kudos

Today, I tried the glm function from the SparkR package. And it seemed to have initially solved the problem with the glm function. However, when you save the result of the glm function in a variable, things seem to go wrong. But only when the variabl...

0 kudos

10-04-2024 8:03:52 AM

1 More Replies

by omsurapu • New Contributor II

10-04-2024 1:57:03 AM

2599 Views
2 replies
0 kudos

One workspace can connects to the multiple AWS accounts/regions

HI,I'd like to know if one workspace can be used to connect to the multiple accounts (account A and account B) / regions. I knew that multiple accounts/regions can't be selected during the setup. is it possible?

Data Engineering

2599 Views
2 replies
0 kudos

10-04-2024 1:57:03 AM

View Replies

Latest Reply

omsurapu
New Contributor II

10-04-2024 2:56:09 AM

0 kudos

ok, thanks! there is no DB official documentation available for this requirement. I assume it can be done with the cross account IAM roles, but never tested. any leads?

0 kudos

10-04-2024 2:56:09 AM

1 More Replies

by SagarJi • New Contributor II

10-03-2024 6:17:38 PM

3520 Views
2 replies
1 kudos

SQL merge to update one of the nested column

I am having existing delta-lake as target, and the small set of records at hand as CURRENT_BATCH,I have a requirement to update dateTimeUpdated column inside parent2, using following merge query.========MERGE INTO mydataset AS targetUSING CURRENT_BA...

Data Engineering

3520 Views
2 replies
1 kudos

10-03-2024 6:17:38 PM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

10-03-2024 11:32:50 PM

1 kudos

Hi @SagarJi ,According to the documentation updates to the nested columns are not supported:What you can do you can construct the whole struct and update the parent:MERGE INTO mydataset AS target USING CURRENT_BATCH AS incoming ON target.parent1.comp...

1 kudos

10-03-2024 11:32:50 PM

1 More Replies

by jfpatenaude • New Contributor

10-03-2024 11:49:44 AM

1330 Views
1 replies
1 kudos

MalformedInputException when using extended ascii characters in dbutils.notebook.exit()

I have a specific use case where I call another notebook using the dbutils.notebook.run() function. The other notebook do some processing and return a string in the dbutils.notebook.exit() function to the caller notebook. The returned string has some...

Data Engineering

1330 Views
1 replies
1 kudos

10-03-2024 11:49:44 AM

View Replies

Latest Reply

jennie258fitz
New Contributor III

10-03-2024 9:16:16 PM

1 kudos

@jfpatenaude starbuckssecretmenu wrote:I have a specific use case where I call another notebook using the dbutils.notebook.run() function. The other notebook do some processing and return a string in the dbutils.notebook.exit() function to the caller...

1 kudos

10-03-2024 9:16:16 PM

by Kody_Devl • New Contributor II

05-10-2022 4:53:56 AM

31392 Views
2 replies
0 kudos

Export to Excel xlsx

Hi All Does anyone have some code or example of how to export my Databricks SQL results directly to an existing spreadsheet?Many ThanksKody_Devl

Data Engineering

31392 Views
2 replies
0 kudos

05-10-2022 4:53:56 AM

View Replies

Latest Reply

Emit
New Contributor II

10-03-2024 4:25:52 PM

0 kudos

There is an add-on directly import table to spreadsheet. https://workspace.google.com/marketplace/app/bricksheet/979793077657

0 kudos

10-03-2024 4:25:52 PM

1 More Replies

by MikeGo • Contributor II

10-03-2024 10:27:43 AM

1636 Views
3 replies
0 kudos

How to control file size by OPTIMIZE

Hi,I have a delta table under UC, no partition, no liquid clustering. I tried OPTIMIZE foo; -- OR ALTER TABLE foo SET TBLPROPERTIES(delta.targetFileSize = '128mb'); OPTIMIZE foo; I expect to see the files can have some change after above, but the OP...

Data Engineering

1636 Views
3 replies
0 kudos

10-03-2024 10:27:43 AM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

10-03-2024 4:06:13 PM

0 kudos

Hi @MikeGo ,Databricks is a big data processing engine. Instead of testing 3 files try to test 3000 files OPTIMIZE isn't merging your small files because there may not be enough files or data for it to act upon.Regarding why DESC DETAILS shows 3 fil...

0 kudos

10-03-2024 4:06:13 PM

2 More Replies

by kjoudeh • New Contributor II

10-03-2024 10:36:08 AM

2789 Views
2 replies
1 kudos

External Location not showing up

Hello, For some reason I am not able to see the external location that we have in our workspace. I am 100% sure that we have a lot that exist but some reason I am not able to see them, is there a reason why , am I missing something. I know other user...

Data Engineering

2789 Views
2 replies
1 kudos

10-03-2024 10:36:08 AM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

10-03-2024 11:54:56 AM

1 kudos

Hi @kjoudeh ,It is due to permissions.For external locations you would need to have BROWSE permissions:https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/manage-privileges/privilegesAsk the metastore admin or a workspace...

1 kudos

10-03-2024 11:54:56 AM

1 More Replies

by sathyafmt • New Contributor III

09-28-2024 1:31:33 PM

3318 Views
5 replies
3 kudos

Resolved! Cannot read JSON from /Volumes

I am trying to read in a JSON file with this in SQL Editor & it fails with None.get CREATE TEMPORARY VIEW multilineJson USING json OPTIONS (path="/Volumes/my_catalog/my_schema/jsondir/test.json", multiline=true); None.get is all the error it has.Th...

Data Engineering

3318 Views
5 replies
3 kudos

09-28-2024 1:31:33 PM

View Replies

Latest Reply

sathyafmt
New Contributor III

10-03-2024 10:52:13 AM

3 kudos

@filipniziol - Yes, I was on Serverless SQL Warehouse. It works with "CERATE TABLE .. " thx! I am surprised that the warehouse type is impacting this feature.But I got the SQL from databricks documentation -https://docs.databricks.com/en/query/format...

3 kudos

10-03-2024 10:52:13 AM

4 More Replies

by manish1987c • New Contributor III

04-15-2024 4:00:51 AM

7974 Views
1 replies
1 kudos

calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster

I want to confirm if this understanding is correct ???To calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster with the given configuration, we need to consider the number of executors that can run on each node a...

Data Engineering

7974 Views
1 replies
1 kudos

04-15-2024 4:00:51 AM

View Replies

Latest Reply

dylanberry
New Contributor II

10-03-2024 10:52:10 AM

1 kudos

Hi @Retired_mod , this is really fantastic guidance, will something similar be added to the Databricks docs?

1 kudos

10-03-2024 10:52:10 AM

by UdayRPai • New Contributor II

09-26-2024 7:30:55 AM

1935 Views
3 replies
0 kudos

Issue with Combination of INSERT + CTE (with clause) + Dynamic query (IDENTIFIER function)

Hi,We are trying to insert into a table using a CTE (WITH clause query).In the insert we are using the INDENTIFIER function as the catalog name is retrieved dynamically.This is causing the insert to fail with an error - The table or view `cte_query` ...

Data Engineering

1935 Views
3 replies
0 kudos

09-26-2024 7:30:55 AM

View Replies

Latest Reply

UdayRPai
New Contributor II

09-26-2024 10:20:27 AM

0 kudos

Please mark this as resolved.

0 kudos

09-26-2024 10:20:27 AM

2 More Replies

by giohappy • New Contributor III

02-06-2023 3:31:16 AM

4824 Views
3 replies
1 kudos

Resolved! SedonaSqlExtensions is not autoregistering types and functions

The usual way to use Apache Sedona inside pySpark is by first registering Sedona types and functions withSedonaRegistrator.registerAll(spark)We need to have these autoregistered when the cluster start (to be able, for example, to perform geospatial q...

Data Engineering

4824 Views
3 replies
1 kudos

02-06-2023 3:31:16 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-12-2023 2:41:21 AM

1 kudos

Hi @Giovanni Allegri Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

1 kudos

04-12-2023 2:41:21 AM

2 More Replies

Databricks Community

Forum Posts

I am trying to use spark session of the compute in java Jar to run queries against tables unity cata

Conversion of time zone from UTC to US/Eastern

Resolved! Simply writing a dataframe to a CSV file (non-partitioned)

Connecting to Blob storage using abfss not working with serverless compute

Running a cell with R-script keeps waiting status

One workspace can connects to the multiple AWS accounts/regions

SQL merge to update one of the nested column

MalformedInputException when using extended ascii characters in dbutils.notebook.exit()

Export to Excel xlsx

How to control file size by OPTIMIZE

External Location not showing up

Resolved! Cannot read JSON from /Volumes

calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster

Issue with Combination of INSERT + CTE (with clause) + Dynamic query (IDENTIFIER function)

Resolved! SedonaSqlExtensions is not autoregistering types and functions

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template