Data Engineering

Forum Posts

Sorted by:

by Orianh • Valued Contributor II

05-12-2022 1:30:00 AM

1413 Views
0 replies
0 kudos

Retrieve a row from indexed spark data frame.

Hello guys, I'm having an issue when trying to get a row values from spark data frame.I have a DF with index column, and i need to be able to return a row based on index in fastest way possible .I tried to partitionBy index column, optimize with zor...

Data Engineering

1413 Views
0 replies
0 kudos

05-12-2022 1:30:00 AM

by CrisBerg_65149 • New Contributor III

04-14-2022 2:34:45 AM

1677 Views
6 replies
6 kudos

Resolved! SELECT * FROM delta doesn't work on Spark 3.2

Using DBR 10 or later and I’m getting an error when running the following querySELECT * FROM delta.`s3://some_path`getting org.apache.spark.SparkException: Unable to fetch tables of db deltaFor 3.2.0+ they recommend reading like this:CREATE TEMPORAR...

Data Engineering

1677 Views
6 replies
6 kudos

04-14-2022 2:34:45 AM

View Replies

Latest Reply

CrisBerg_65149
New Contributor III

05-11-2022 5:46:31 AM

6 kudos

Got support from Databricks.Unfortunately, someone created a DB called delta, so the query was done against that DB instead. Issue was solved

6 kudos

05-11-2022 5:46:31 AM

5 More Replies

by zyx • New Contributor II

04-22-2022 12:30:34 AM

659 Views
4 replies
3 kudos

data bricks bi tool Supported from pdf formatted ?

as per Reporting point of view pdf formatted supporting or not.

Data Engineering

659 Views
4 replies
3 kudos

04-22-2022 12:30:34 AM

View Replies

Latest Reply

Kaniz
Community Manager

05-11-2022 5:09:41 AM

3 kudos

Hi @Bhanu aravapalli , Just a friendly follow-up. Do you still need help? Please let us know.

3 kudos

05-11-2022 5:09:41 AM

3 More Replies

by Development • New Contributor III

04-12-2022 11:25:00 PM

2460 Views
8 replies
5 kudos

Delta Table with 130 columns taking time

Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...

Data Engineering

2460 Views
8 replies
5 kudos

04-12-2022 11:25:00 PM

View Replies

Latest Reply

Development
New Contributor III

04-27-2022 8:27:46 AM

5 kudos

@Kaniz Fatma @Parker Temple I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing serialization issue ....

5 kudos

04-27-2022 8:27:46 AM

7 More Replies

by laus • New Contributor III

03-31-2022 9:35:11 AM

12053 Views
5 replies
2 kudos

Resolved! get a "Py4JJavaError: An error occurred while calling o5082.csv." when trying to save to csv file.

Hi, I'm trying to save a dataframe to csv with the code below:output.coalesce(1).write.mode('overwrite').option('header', 'true').csv(tmp_file_path) But it get "Py4JJavaError: An error occurred while calling o5082.csv." error. Any idea how to solve...

Data Engineering

12053 Views
5 replies
2 kudos

03-31-2022 9:35:11 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-04-2022 2:52:11 AM

2 kudos

Hi @Laura Blancarte , Looks like you want to save your dataframe as CSV. Did you try to download the preview?

2 kudos

04-04-2022 2:52:11 AM

4 More Replies

by Vee • New Contributor

04-11-2022 11:38:26 AM

2156 Views
3 replies
0 kudos

Tips for resolving follolwing errors related to AWS S3 read / write

Job aborted due to stage failure: Task 0 in stage 3084.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3084.0 (TID...., ip..., executor 0): org.apache.spark.SparkExecution: Task failed while writing rowsJob aborted due to stage failure:...

Data Engineering

2156 Views
3 replies
0 kudos

04-11-2022 11:38:26 AM

View Replies

Latest Reply

Kaniz
Community Manager

05-10-2022 11:41:15 PM

0 kudos

Hi @Vetrivel Senthil , Are you still facing the problem? Were you able to resolve it by yourself, or do you still need help? Please let us know.

0 kudos

05-10-2022 11:41:15 PM

2 More Replies

by Gerhard • New Contributor III

01-26-2022 12:46:50 AM

2635 Views
9 replies
5 kudos

Overall security/access rights concept needed (combine Table Access Control and Credential Passthrough), how to allow users the benefits of both worlds

What we have:Databricks Workspace Premium on AzureADLS Gen2 storage for raw data, processed data (tables) and files like CSV, models, etc.What we want to do:We have users that want to work on Databricks to create and work with Python algorithms. We d...

Data Engineering

2635 Views
9 replies
5 kudos

01-26-2022 12:46:50 AM

View Replies

Latest Reply

Gerhard
New Contributor III

05-10-2022 11:21:34 PM

5 kudos

Hey @Vartika Nain , we are still at the same situation as described above. The Hive Metastore is a weak point.I would love to have the functionality that a mount can be dedicated to a given cluster.Regards, Gerhard

5 kudos

05-10-2022 11:21:34 PM

8 More Replies

by Reza • New Contributor III

04-09-2022 1:41:25 PM

3929 Views
8 replies
8 kudos

Datepicker widget

There are textbox and dropdown list widgets in Databricks. Is there any datepicker widget? If not, is there any plan to add it?

Data Engineering

3929 Views
8 replies
8 kudos

04-09-2022 1:41:25 PM

View Replies

Latest Reply

Kaniz
Community Manager

04-26-2022 4:09:44 AM

8 kudos

Hi @Reza Rajabi , Just a friendly follow-up. Do you still need help, or does my response help you to find the solution? Please let us know.

8 kudos

04-26-2022 4:09:44 AM

7 More Replies

by Rahul_Samant • Contributor

03-14-2022 3:55:28 AM

6701 Views
5 replies
3 kudos

Resolved! Bucketing on Delta Tables

getting error as below while creating buckets on delta table.Error in SQL statement: AnalysisException: Delta bucketed tables are not supported.have fall back to parquet table due to this for some use cases. is their any alternative for this. i have...

Data Engineering

6701 Views
5 replies
3 kudos

03-14-2022 3:55:28 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-10-2022 5:57:58 AM

3 kudos

Hi @Rahul Samant , we checked internally on this due to certain limitations bucketing is not supported on delta tables, the only alternative for bucketing is to leverage the z ordering, below is the link for reference https://docs.databricks.com/de...

3 kudos

05-10-2022 5:57:58 AM

4 More Replies

by reedzhang • New Contributor III

03-23-2022 9:15:20 AM

1895 Views
6 replies
5 kudos

Resolved! uninstalled libraries continue to get installed on cluster startup

We have been trying to update some library versions by uninstalling the old versions and installing new ones. However, the old libraries continue to get installed on cluster startup despite not showing up in the "libraries" tab of the cluster page. W...

Data Engineering

1895 Views
6 replies
5 kudos

03-23-2022 9:15:20 AM

View Replies

Latest Reply

reedzhang
New Contributor III

05-08-2022 3:20:20 PM

5 kudos

The issue seemed to go away on its own. At some point the libraries page started showing what was getting installed to the cluster, and removing libraries from the page caused them to stop getting installed on cluster startup. I'm guessing there was ...

5 kudos

05-08-2022 3:20:20 PM

5 More Replies

by 818674 • New Contributor III

03-28-2022 4:30:11 PM

3885 Views
13 replies
8 kudos

Resolved! How to perform a cross-check for data in multiple columns in same table?

I am trying to check whether a certain datapoint exists in multiple locations.This is what my table looks like:I am checking whether the same datapoint is in two locations. The idea is that this datapoint should exist in BOTH locations, and be counte...

Data Engineering

3885 Views
13 replies
8 kudos

03-28-2022 4:30:11 PM

View Replies

Latest Reply

818674
New Contributor III

04-26-2022 2:58:02 PM

8 kudos

Hi,Thank you very much for following up. I no longer need assistance with this issue.

8 kudos

04-26-2022 2:58:02 PM

12 More Replies

by Michael_Galli • Contributor II

05-06-2022 4:19:28 AM

2560 Views
3 replies
2 kudos

Resolved! Spark Streaming - only process new files in streaming path?

In our streaming jobs, we currently run streaming (cloudFiles format) on a directory with sales transactions coming every 5 minutes.In this directory, the transactions are ordered in the following format:<streaming-checkpoint-root>/<transaction_date>...

Data Engineering

2560 Views
3 replies
2 kudos

05-06-2022 4:19:28 AM

View Replies

Latest Reply

Michael_Galli
Contributor II

05-09-2022 11:00:26 PM

2 kudos

Update:Seems that maxFileAge was not a good idea. The following with the option "includeExistingFiles" = False solved my problem:streaming_df = ( spark.readStream.format("cloudFiles") .option("cloudFiles.format", extension) .option("...

2 kudos

05-09-2022 11:00:26 PM

2 More Replies

by LightUp • New Contributor III

05-09-2022 6:59:21 AM

3789 Views
2 replies
4 kudos

Converting SQL Code to SQL Databricks

I am new to Databricks. Please excuse my ignorance. My requirement is to convert the SQL query below into Databricks SQL. The query comes from EventLog table and the output of the query goes into EventSummaryThese queries can be found hereCREATE TABL...

Data Engineering

3789 Views
2 replies
4 kudos

05-09-2022 6:59:21 AM

View Replies

Latest Reply

LightUp
New Contributor III

05-09-2022 8:43:07 AM

4 kudos

Thank you @Joseph Kambourakis The part that is not clear to me from the how to rework the part circled in the image above. Even this part of the code does not work in databricks:DATEADD(month, DATEDIFF(month, 0, DATEADD(month , 1 , EventStartDateTi...

4 kudos

05-09-2022 8:43:07 AM

1 More Replies

by colette_chavali • New Contributor III

05-03-2022 12:23:54 PM

617 Views
1 replies
6 kudos

Resolved! Nominations are OPEN for the Databricks Data Team Awards!

Databricks customers - nominate your data team and leaders for one (or more) of the six Data Team Award categories: Data Team Transformation AwardData Team for Good AwardData Team Disruptor AwardData Team Democratization AwardData Team Visionary Awar...

Data Engineering

617 Views
1 replies
6 kudos

05-03-2022 12:23:54 PM

View Replies

Latest Reply

Kaniz
Community Manager

05-09-2022 7:51:44 AM

6 kudos

Cool!

6 kudos

05-09-2022 7:51:44 AM

by AvijitDey • New Contributor III

05-02-2022 11:35:42 AM

2730 Views
3 replies
4 kudos

Resolved! Azure Databrick SQL bulk insert to AZ SQL

Env: Azure Databrick :version : 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)Work Type : 56 GB Memory 2-8 node ( standard D13_V2)No of rows : 2470350 and 115 Column Size : 2.2 GBTime taken approx. 9 min Python Code .What will be best approach for...

Data Engineering

2730 Views
3 replies
4 kudos

05-02-2022 11:35:42 AM

View Replies

Latest Reply

AvijitDey
New Contributor III

05-09-2022 6:59:22 AM

4 kudos

Any further suggestion

4 kudos

05-09-2022 6:59:22 AM

2 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Retrieve a row from indexed spark data frame.

Resolved! SELECT * FROM delta doesn't work on Spark 3.2

data bricks bi tool Supported from pdf formatted ?

Delta Table with 130 columns taking time

Resolved! get a "Py4JJavaError: An error occurred while calling o5082.csv." when trying to save to csv file.

Tips for resolving follolwing errors related to AWS S3 read / write

Overall security/access rights concept needed (combine Table Access Control and Credential Passthrough), how to allow users the benefits of both worlds

Datepicker widget

Resolved! Bucketing on Delta Tables

Resolved! uninstalled libraries continue to get installed on cluster startup

Resolved! How to perform a cross-check for data in multiple columns in same table?

Resolved! Spark Streaming - only process new files in streaming path?

Converting SQL Code to SQL Databricks

Resolved! Nominations are OPEN for the Databricks Data Team Awards!

Resolved! Azure Databrick SQL bulk insert to AZ SQL

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...