cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MarsSu
by New Contributor II
  • 8145 Views
  • 3 replies
  • 0 kudos

How to implement merge multiple rows in single row with array and do not result in OOM?

Hi, Everyone.Currently I try to implement spark structured streaming with Pyspark. And I would like to merge multiple rows in single row with array and sink to downstream message queue for another service to use. Related example can follow as:* Befor...

  • 8145 Views
  • 3 replies
  • 0 kudos
Latest Reply
917074
New Contributor II
  • 0 kudos

Is there any solution to this, @MarsSu  were you able to solve this, kindly shed some light on this if you resolve this.

  • 0 kudos
2 More Replies
qwerty1
by Contributor
  • 10439 Views
  • 2 replies
  • 2 kudos

Resolved! Doing a a join within the same row in SQL

My data is a dump of JSON response from an API. The schema of the json iscol_name data_type   data array<struct<attributes:struct<name: String, age: Int relationships:struct<address:struct<data:arraay<struct<id: long, type: string>>>>>>>  ...

  • 10439 Views
  • 2 replies
  • 2 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 2 kudos

This widget could not be displayed.
My data is a dump of JSON response from an API. The schema of the json iscol_name data_type   data array<struct<attributes:struct<name: String, age: Int relationships:struct<address:struct<data:arraay<struct<id: long, type: string>>>>>>>  ...

This widget could not be displayed.
  • 2 kudos
This widget could not be displayed.
1 More Replies
Anonymous
by Not applicable
  • 2183 Views
  • 1 replies
  • 1 kudos

"[PARSE_SYNTAX_ERROR] Syntax error at or near 'ROW'(line 2, pos 4)".

Alter table <TABLE_NAME> SET ROW FILTER <func_name> on (COLUMN)Got the below error while running the below code "[PARSE_SYNTAX_ERROR] Syntax error at or near 'ROW'(line 2, pos 4)". Please help on this issue. we tried this code as part of access polic...

Image
  • 2183 Views
  • 1 replies
  • 1 kudos
Latest Reply
Rajeev45
Databricks Employee
  • 1 kudos

Hello,Please can you confirm which DBR version are you using & do you use unity catalog?

  • 1 kudos
dshao
by New Contributor II
  • 5645 Views
  • 2 replies
  • 0 kudos

Resolved! Best way to get one row back per ID? Select Distinct is not working.

Here is the current output for my select statement. I would like it to return one row for this jobsubmissionid, where it selects only the non-zero value from each of the rows. I tried using SELECT DISTINCT jobsubmissionidbut it still returned 5 rows.

image
  • 5645 Views
  • 2 replies
  • 0 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 0 kudos

Is that the complete query you are using. I'm guessing that you are using select distinct * from table_name. If you wanted a individual column distinct value you have to apply a filter condition or aggregate the data accordingly. Anyways, a complete ...

  • 0 kudos
1 More Replies
plynton
by New Contributor II
  • 1316 Views
  • 3 replies
  • 1 kudos

Incorrect results with df.query()

I have tried pulling a single row from a .csv using df.query()However, the data being returned doesn't coincide with the data I'm expecting - it is pulling a different row. Here is my code:df = spark.read.option("header",True).csv(data_fldr + "config...

  • 1316 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Peter Ott​ Does @Hubert Dudek​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 1 kudos
2 More Replies
Orianh
by Valued Contributor II
  • 2223 Views
  • 0 replies
  • 0 kudos

Retrieve a row from indexed spark data frame.

Hello guys, I'm having an issue when trying to get a row values from spark data frame.I have a DF with index column, and i need to be able to return a row based on index in fastest way possible .I tried to partitionBy index column, optimize with zor...

  • 2223 Views
  • 0 replies
  • 0 kudos
kjoth
by Contributor II
  • 10006 Views
  • 9 replies
  • 7 kudos

Resolved! Delete row from table is not working.

I have created External table using spark via below command. (Using Data science & Engineering)df.write.mode("overwrite").format("parquet").saveAsTable(name=f'{db_name}.{table_name}', path="dbfs:/reports/testing")I have tried to delete a row based on...

  • 10006 Views
  • 9 replies
  • 7 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 7 kudos

hi @karthick J​ ,Can you try to delete the row and execute your command in a non high concurrency cluster? the reason why im asking this is because we first need to isolate the error message and undertand why is happening to be able to find the best ...

  • 7 kudos
8 More Replies
Labels