Data Engineering

Forum Posts

Sorted by:

by MarsSu • New Contributor II

06-22-2023 6:46:06 PM

9408 Views
3 replies
0 kudos

How to implement merge multiple rows in single row with array and do not result in OOM?

Hi, Everyone.Currently I try to implement spark structured streaming with Pyspark. And I would like to merge multiple rows in single row with array and sink to downstream message queue for another service to use. Related example can follow as:* Befor...

Data Engineering

9408 Views
3 replies
0 kudos

06-22-2023 6:46:06 PM

View Replies

Latest Reply

917074
New Contributor II

01-19-2024 12:05:15 PM

0 kudos

Is there any solution to this, @MarsSu were you able to solve this, kindly shed some light on this if you resolve this.

0 kudos

01-19-2024 12:05:15 PM

2 More Replies

by qwerty1 • Contributor

04-24-2023 1:56:26 AM

11262 Views
2 replies
2 kudos

Resolved! Doing a a join within the same row in SQL

My data is a dump of JSON response from an API. The schema of the json iscol_name data_type data array<struct<attributes:struct<name: String, age: Int relationships:struct<address:struct<data:arraay<struct<id: long, type: string>>>>>>> ...

Data Engineering

11262 Views
2 replies
2 kudos

04-24-2023 1:56:26 AM

View Replies

by Anonymous • Not applicable

04-17-2023 3:09:44 AM

2787 Views
1 replies
1 kudos

"[PARSE_SYNTAX_ERROR] Syntax error at or near 'ROW'(line 2, pos 4)".

Alter table <TABLE_NAME> SET ROW FILTER <func_name> on (COLUMN)Got the below error while running the below code "[PARSE_SYNTAX_ERROR] Syntax error at or near 'ROW'(line 2, pos 4)". Please help on this issue. we tried this code as part of access polic...

Data Engineering

2787 Views
1 replies
1 kudos

04-17-2023 3:09:44 AM

View Replies

Latest Reply

Rajeev45
Databricks Employee

04-18-2023 5:29:53 PM

1 kudos

Hello,Please can you confirm which DBR version are you using & do you use unity catalog?

1 kudos

04-18-2023 5:29:53 PM

by dshao • New Contributor II

12-19-2022 1:18:11 PM

7139 Views
2 replies
0 kudos

Resolved! Best way to get one row back per ID? Select Distinct is not working.

Here is the current output for my select statement. I would like it to return one row for this jobsubmissionid, where it selects only the non-zero value from each of the rows. I tried using SELECT DISTINCT jobsubmissionidbut it still returned 5 rows.

Data Engineering

7139 Views
2 replies
0 kudos

12-19-2022 1:18:11 PM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

12-20-2022 4:25:30 AM

0 kudos

Is that the complete query you are using. I'm guessing that you are using select distinct * from table_name. If you wanted a individual column distinct value you have to apply a filter condition or aggregate the data accordingly. Anyways, a complete ...

0 kudos

12-20-2022 4:25:30 AM

1 More Replies

by plynton • New Contributor II

10-08-2022 8:07:48 AM

1591 Views
3 replies
1 kudos

Incorrect results with df.query()

I have tried pulling a single row from a .csv using df.query()However, the data being returned doesn't coincide with the data I'm expecting - it is pulling a different row. Here is my code:df = spark.read.option("header",True).csv(data_fldr + "config...

Data Engineering

1591 Views
3 replies
1 kudos

10-08-2022 8:07:48 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-16-2022 10:06:03 PM

1 kudos

Hi @Peter Ott Does @Hubert Dudek response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

1 kudos

11-16-2022 10:06:03 PM

2 More Replies

by Orianh • Valued Contributor II

05-12-2022 1:30:00 AM

2673 Views
0 replies
0 kudos

Retrieve a row from indexed spark data frame.

Hello guys, I'm having an issue when trying to get a row values from spark data frame.I have a DF with index column, and i need to be able to return a row based on index in fastest way possible .I tried to partitionBy index column, optimize with zor...

Data Engineering

2673 Views
0 replies
0 kudos

05-12-2022 1:30:00 AM

by kjoth • Contributor II

10-12-2021 11:54:01 PM

12419 Views
9 replies
7 kudos

Resolved! Delete row from table is not working.

I have created External table using spark via below command. (Using Data science & Engineering)df.write.mode("overwrite").format("parquet").saveAsTable(name=f'{db_name}.{table_name}', path="dbfs:/reports/testing")I have tried to delete a row based on...

Data Engineering

12419 Views
9 replies
7 kudos

10-12-2021 11:54:01 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-18-2021 2:48:45 PM

7 kudos

hi @karthick J ,Can you try to delete the row and execute your command in a non high concurrency cluster? the reason why im asking this is because we first need to isolate the error message and undertand why is happening to be able to find the best ...

7 kudos

10-18-2021 2:48:45 PM

8 More Replies

Databricks Community

How to implement merge multiple rows in single row with array and do not result in OOM?

Resolved! Doing a a join within the same row in SQL

"[PARSE_SYNTAX_ERROR] Syntax error at or near 'ROW'(line 2, pos 4)".

Resolved! Best way to get one row back per ID? Select Distinct is not working.

Incorrect results with df.query()

Retrieve a row from indexed spark data frame.

Resolved! Delete row from table is not working.