cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MarsSu
by New Contributor II
  • 5920 Views
  • 3 replies
  • 0 kudos

How to implement merge multiple rows in single row with array and do not result in OOM?

Hi, Everyone.Currently I try to implement spark structured streaming with Pyspark. And I would like to merge multiple rows in single row with array and sink to downstream message queue for another service to use. Related example can follow as:* Befor...

  • 5920 Views
  • 3 replies
  • 0 kudos
Latest Reply
917074
New Contributor II
  • 0 kudos

Is there any solution to this, @MarsSu  were you able to solve this, kindly shed some light on this if you resolve this.

  • 0 kudos
2 More Replies
qwerty1
by Contributor
  • 3748 Views
  • 4 replies
  • 3 kudos

Resolved! Doing a a join within the same row in SQL

My data is a dump of JSON response from an API. The schema of the json iscol_name data_type   data array<struct<attributes:struct<name: String, age: Int relationships:struct<address:struct<data:arraay<struct<id: long, type: string>>>>>>>  ...

  • 3748 Views
  • 4 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Ashwin Bhaskar​, You can use the SQL JOIN operation to join the data and include arrays based on the id field. Here's an example SQL query that should accomplish this:SELECT data.attributes.name AS name, data.attributes.age AS age, included...

  • 3 kudos
3 More Replies
Anonymous
by Not applicable
  • 1366 Views
  • 1 replies
  • 1 kudos

"[PARSE_SYNTAX_ERROR] Syntax error at or near 'ROW'(line 2, pos 4)".

Alter table <TABLE_NAME> SET ROW FILTER <func_name> on (COLUMN)Got the below error while running the below code "[PARSE_SYNTAX_ERROR] Syntax error at or near 'ROW'(line 2, pos 4)". Please help on this issue. we tried this code as part of access polic...

Image
  • 1366 Views
  • 1 replies
  • 1 kudos
Latest Reply
Rajeev45
New Contributor III
  • 1 kudos

Hello,Please can you confirm which DBR version are you using & do you use unity catalog?

  • 1 kudos
dshao
by New Contributor II
  • 3492 Views
  • 2 replies
  • 0 kudos

Resolved! Best way to get one row back per ID? Select Distinct is not working.

Here is the current output for my select statement. I would like it to return one row for this jobsubmissionid, where it selects only the non-zero value from each of the rows. I tried using SELECT DISTINCT jobsubmissionidbut it still returned 5 rows.

image
  • 3492 Views
  • 2 replies
  • 0 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 0 kudos

Is that the complete query you are using. I'm guessing that you are using select distinct * from table_name. If you wanted a individual column distinct value you have to apply a filter condition or aggregate the data accordingly. Anyways, a complete ...

  • 0 kudos
1 More Replies
plynton
by New Contributor II
  • 771 Views
  • 3 replies
  • 1 kudos

Incorrect results with df.query()

I have tried pulling a single row from a .csv using df.query()However, the data being returned doesn't coincide with the data I'm expecting - it is pulling a different row. Here is my code:df = spark.read.option("header",True).csv(data_fldr + "config...

  • 771 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Peter Ott​ Does @Hubert Dudek​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 1 kudos
2 More Replies
Orianh
by Valued Contributor II
  • 1506 Views
  • 0 replies
  • 0 kudos

Retrieve a row from indexed spark data frame.

Hello guys, I'm having an issue when trying to get a row values from spark data frame.I have a DF with index column, and i need to be able to return a row based on index in fastest way possible .I tried to partitionBy index column, optimize with zor...

  • 1506 Views
  • 0 replies
  • 0 kudos
kjoth
by Contributor II
  • 6380 Views
  • 9 replies
  • 7 kudos

Resolved! Delete row from table is not working.

I have created External table using spark via below command. (Using Data science & Engineering)df.write.mode("overwrite").format("parquet").saveAsTable(name=f'{db_name}.{table_name}', path="dbfs:/reports/testing")I have tried to delete a row based on...

  • 6380 Views
  • 9 replies
  • 7 kudos
Latest Reply
jose_gonzalez
Moderator
  • 7 kudos

hi @karthick J​ ,Can you try to delete the row and execute your command in a non high concurrency cluster? the reason why im asking this is because we first need to isolate the error message and undertand why is happening to be able to find the best ...

  • 7 kudos
8 More Replies
Labels