Data Engineering

Forum Posts

Sorted by:

by SaraCorralLou • New Contributor III

02-07-2023 6:29:21 AM

16859 Views
3 replies
2 kudos

Resolved! Differences between lit(None) or lit(None).cast('string')

I want to define a column with null values in my dataframe using pyspark. This column will later be used for other calculations.What is the difference between creating it in these two different ways?df.withColumn("New_Column", lit(None))df.withColumn...

Data Engineering

16859 Views
3 replies
2 kudos

02-07-2023 6:29:21 AM

View Replies

Latest Reply

shadowinc
New Contributor III

01-31-2025 4:33:37 AM

2 kudos

For me df.withColumn("New_Column", lit(None).cast(StringType())) this didn't work.I used this instead df.withColumn("New_Column", lit(null).cast(StringType))

2 kudos

01-31-2025 4:33:37 AM

2 More Replies

by Aviral-Bhardwaj • Esteemed Contributor III

01-28-2023 10:16:45 PM

19736 Views
2 replies
13 kudos

Understanding Rename in Databricks Now there are multiple ways to rename Spark Data Frame Columns or Expressions. We can rename columns or expressions...

Understanding Rename in DatabricksNow there are multiple ways to rename Spark Data Frame Columns or Expressions.We can rename columns or expressions using alias as part of selectWe can add or rename columns or expressions using withColumn on top of t...

Data Engineering

19736 Views
2 replies
13 kudos

01-28-2023 10:16:45 PM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

01-30-2023 4:17:21 AM

13 kudos

Very informative, Thanks for sharing

13 kudos

01-30-2023 4:17:21 AM

1 More Replies

by noimeta • Contributor III

07-28-2022 4:56:56 AM

4199 Views
4 replies
1 kudos

Apply change data with delete and schema evolution

Hi,Currently, I'm using structure streaming to insert/update/delete to a table. A row will be deleted if value in 'Operation' column is 'deleted'. Everything seems to work fine until there's a new column.Since I don't need 'Operation' column in the t...

Data Engineering

4199 Views
4 replies
1 kudos

07-28-2022 4:56:56 AM

View Replies

Latest Reply

User16753725469
Contributor II

09-01-2022 12:33:09 AM

1 kudos

please go through this documentation https://docs.delta.io/latest/api/python/index.html

1 kudos

09-01-2022 12:33:09 AM

3 More Replies

by Confused • New Contributor III

12-03-2021 3:18:17 AM

9781 Views
7 replies
2 kudos

Schema evolution issue

Hi AllI am loading some data using auto loader but am having trouble with Schema evolution.A new column has been added to the data I am loading and I am getting the following error:StreamingQueryException: Encountered unknown field(s) during parsing:...

Data Engineering

9781 Views
7 replies
2 kudos

12-03-2021 3:18:17 AM

View Replies

Latest Reply

rgrosskopf
New Contributor II

07-15-2022 7:16:06 AM

2 kudos

I agree that hints are the way to go if you have the schema available but the whole point of schema evolution is that you might not always know the schema in advance.I received a similar error with a similar streaming query configuration. The issue w...

2 kudos

07-15-2022 7:16:06 AM

6 More Replies

by Abeeya • New Contributor II

04-01-2022 4:57:00 AM

6582 Views
1 replies
5 kudos

Resolved! How to Overwrite Using pyspark's JDBC without loosing constraints on table columns

Hello,My table has primary key constraint on a perticular column, Im loosing primary key constaint on that column each time I overwrite the table , What Can I do to preserve it? Any Heads up would be appreciatedTried Belowdf.write.option("truncate", ...

Data Engineering

6582 Views
1 replies
5 kudos

04-01-2022 4:57:00 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

04-01-2022 5:06:23 AM

5 kudos

@Abeeya . , Mode "truncate", is correct to preserve the table. However, when you want to add a new column (mismatched schema), it wants to drop it anyway.

5 kudos

04-01-2022 5:06:23 AM

by ahana • New Contributor III

11-09-2021 10:32:31 PM

15402 Views
11 replies
2 kudos

Resolved! i am trying to find different between two dates but i am getting null value in new column below are the dates in same format tryied to change the format but still it is not working is databricks

Data Engineering

15402 Views
11 replies
2 kudos

11-09-2021 10:32:31 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

11-12-2021 3:49:55 PM

2 kudos

Hi @ahana ahana ,Did any of the replies helped you solve this issue? would you be happy to mark their answer as best so that others can quickly find the solution?Thank you

2 kudos

11-12-2021 3:49:55 PM

10 More Replies

by omsas • New Contributor

10-15-2021 4:48:38 AM

2890 Views
2 replies
0 kudos

How to add Columns for Automatic Fill on Pandas Python

1. I have data x,I would like to create a new column with the condition that the value are 1, 2 or 32. The name of the column is SHIFT where this SHIFT column will be filled automatically if the TIME_CREATED column meets the conditions.3. the conditi...

Data Engineering

2890 Views
2 replies
0 kudos

10-15-2021 4:48:38 AM

View Replies

Latest Reply

Ryan_Chynoweth
Esteemed Contributor

10-15-2021 12:59:20 PM

0 kudos

You an do something like this in pandas. Note there could be a more performant way to do this too. import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,3,4]}) df.head() > a > 0 1 > 1 2 > 2 3 > 3 4 conditions = [(df['a'] <=2...

0 kudos

10-15-2021 12:59:20 PM

1 More Replies

Databricks Community

Resolved! Differences between lit(None) or lit(None).cast('string')

Understanding Rename in Databricks Now there are multiple ways to rename Spark Data Frame Columns or Expressions. We can rename columns or expressions...

Apply change data with delete and schema evolution

Schema evolution issue

Resolved! How to Overwrite Using pyspark's JDBC without loosing constraints on table columns

Resolved! i am trying to find different between two dates but i am getting null value in new column below are the dates in same format tryied to change the format but still it is not working is databricks

How to add Columns for Automatic Fill on Pandas Python