cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

UmaMahesh1
by Honored Contributor III
  • 4827 Views
  • 2 replies
  • 15 kudos

Resolved! Pyspark dataframe column comparison

I have a string column which is a concatenation of elements with a hyphen as follows. Let 3 values from that column looks like below, Row 1 - A-B-C-D-E-FRow 2 - A-B-G-C-D-E-FRow 3 - A-B-G-D-E-FI want to compare 2 consecutive rows and create a column ...

  • 4827 Views
  • 2 replies
  • 15 kudos
Latest Reply
NhatHoang
Valued Contributor II
  • 15 kudos

Hi,I think you can follow these steps:1. Use window function to create a new column by shifting, then your df will look like thisid value lag1 A-B-C-D-E-F null2 A-B-G-C-D-E-F A-B-C-D-E-F3 A-B-G-D-E-F ...

  • 15 kudos
1 More Replies
irfanaziz
by Contributor II
  • 1782 Views
  • 2 replies
  • 3 kudos

How to make a string column with numeric and alphabet values use as partition?

So i have two partitions defined for this delta table, One is year('GJHAR') contains year values, and the other is a string column('BUKS') with around 124 unique values. However, there is one problem with the 2nd partition column('BUKS'), The values ...

  • 1782 Views
  • 2 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

@nafri A​ , So to make sure I understand correctly: if you partition the table with only numeric data in BUKS, new incoming data cannot be added if it contains a string; but the other way around it does work?Could it be that spark has inferred the co...

  • 3 kudos
1 More Replies
Gopal_Sir
by New Contributor III
  • 30244 Views
  • 5 replies
  • 7 kudos

Resolved! How to convert a string column to Array of Struct ?

I have a nested struct , where on of the field is a string , it looks something like this ....string = "[{\"to_loc\":\"6183\",\"to_loc_type\":\"S\",\"qty_allocated\":\"18\"},{\"to_loc\":\"6137\",\"to_loc_type\":\"S\",\"qty_allocated\":\"9\"},{\"to_lo...

  • 30244 Views
  • 5 replies
  • 7 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 7 kudos

Can you mark the question as answered so others can find the solution?

  • 7 kudos
4 More Replies
shan_chandra
by Databricks Employee
  • 19382 Views
  • 1 replies
  • 3 kudos

Resolved! dataframe - cast string to decimal when encountering zeros returns OE-16

The user is trying to cast string to decimal when encountering zeros. The cast function displays the  '0' as '0E-16'. could you please let us know your thoughts on whether 0s can be displayed as 0s?from pyspark.sql import functions as F df = spark.s...

Screen Shot 2022-03-09 at 12.13.11 PM
  • 19382 Views
  • 1 replies
  • 3 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 3 kudos

If the scale of decimal type is greater than 6, scientific notation kicks in hence seeing 0E-16.This behavior is described in the existing OSS spark issue - https://issues.apache.org/jira/browse/SPARK-25177Kindly cast the column to a decimal type les...

  • 3 kudos
Labels