Topics with Label: Pyspark Dataframe

Forum Posts

Sorted by:

by FarBo • New Contributor III

01-05-2023 6:57:40 AM

7793 Views
5 replies
5 kudos

Spark issue handling data from json when the schema DataType mismatch occurs

Hi,I have encountered a problem using spark, when creating a dataframe from a raw json source.I have defined an schema for my data and the problem is that when there is a mismatch between one of the column values and its defined schema, spark not onl...

Data Engineering

7793 Views
5 replies
5 kudos

01-05-2023 6:57:40 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 6:11:47 AM

5 kudos

@Farzad Bonabi :Thank you for reporting this issue. It seems to be a known bug in Spark when dealing with malformed decimal values. When a decimal value in the input JSON data is not parseable by Spark, it sets not only that column to null but also ...

5 kudos

04-10-2023 6:11:47 AM

4 More Replies

by jonathan-dufaul • Valued Contributor

12-14-2022 9:50:16 AM

2383 Views
2 replies
1 kudos

How do I specify column types when writing to a MSSQL server using the JDBC driver (

I have a pyspark dataframe that I'm writing to an on-prem MSSQL server--it's a stopgap while we convert data warehousing jobs over to databricks. The processes that use those tables in the on-prem server rely on the tables maintaining the identical s...

Data Engineering

2383 Views
2 replies
1 kudos

12-14-2022 9:50:16 AM

View Replies

Latest Reply

dasanro
New Contributor II

11-08-2023 7:50:13 AM

1 kudos

It's happenging to me too!Did you find any solution @jonathan-dufaul ?Thanks!!

1 kudos

11-08-2023 7:50:13 AM

1 More Replies

by Christine • Contributor II

05-24-2022 11:42:57 PM

9053 Views
9 replies
5 kudos

Resolved! pyspark dataframe empties after it has been saved to delta lake.

Hi, I am facing a problem that I hope to get some help to understand. I have created a function that is supposed to check if the input data already exist in a saved delta table and if not, it should create some calculations and append the new data to...

Data Engineering

9053 Views
9 replies
5 kudos

05-24-2022 11:42:57 PM

View Replies

Latest Reply

SharathE
New Contributor III

09-23-2023 11:04:59 AM

5 kudos

Hi,im also having similar issue ..does creating temp view and reading it again after saving to a table works?? /

5 kudos

09-23-2023 11:04:59 AM

8 More Replies

by Skv • New Contributor II

05-11-2023 12:07:04 AM

8306 Views
2 replies
1 kudos

Resolved! Snowflake query with time travel not working from Databricks while reading into Dataframe.

I am trying to read the changes data from snowflake query into the dataframe using Databricks.Same query is working in snowflake but not in Databricks. Both sides timezones and format are same for the timestamp. I am trying to implement changetrackin...

Data Engineering

8306 Views
2 replies
1 kudos

05-11-2023 12:07:04 AM

View Replies

Latest Reply

sher
Valued Contributor II

06-23-2023 8:18:29 AM

1 kudos

you are format is wrong that's why you got an errortry thisSELECT * FROM TestTable CHANGES(INFORMATION => DEFAULT) AT(TIMESTAMP => TO_TIMESTAMP_TZ('2023-05-03 00:43:34.885','YYYY-MM-DD HH24:MI:SS.FF'))

1 kudos

06-23-2023 8:18:29 AM

1 More Replies

by Rishitha • New Contributor III

05-09-2023 10:50:02 AM

1931 Views
2 replies
2 kudos

Resolved! Normalizing data from autoloader

I have data on s3 and i'm using autoloader to load the data. My json docs have fields which are array of structures. When I don't specify any schema the whole data is stored as strings even the array of structures are just a blob of string making it ...

Data Engineering

1931 Views
2 replies
2 kudos

05-09-2023 10:50:02 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-20-2023 10:15:04 PM

2 kudos

Hi @Rishitha Reddy Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

2 kudos

05-20-2023 10:15:04 PM

1 More Replies

by kll • New Contributor III

05-01-2023 8:21:36 PM

9271 Views
2 replies
3 kudos

Nested struct type not supported pyspark error

I am attempting to apply a function to a pyspark DataFrame and save the API response to a new column and then parse using `json_normalize`. This works fine in pandas, however, I run into an exception with `pyspark`. import pyspark.pandas as ps i...

Data Engineering

9271 Views
2 replies
3 kudos

05-01-2023 8:21:36 PM

View Replies

Latest Reply

Anonymous
Not applicable

05-18-2023 11:25:47 PM

3 kudos

Hi @Keval Shah Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...

3 kudos

05-18-2023 11:25:47 PM

1 More Replies

by frank7 • New Contributor II

04-28-2023 12:25:13 PM

3515 Views
2 replies
1 kudos

Resolved! Is it possible to write a pyspark dataframe to a custom log table in Log Analytics workspace?

I have a pyspark dataframe that contains information about the tables that I have on sql database (creation date, number of rows, etc)Sample data: { "Day":"2023-04-28", "Environment":"dev", "DatabaseName":"default", "TableName":"discount"...

Data Engineering

3515 Views
2 replies
1 kudos

04-28-2023 12:25:13 PM

View Replies

Latest Reply

Anonymous
Not applicable

05-13-2023 9:55:43 AM

1 kudos

@Bruno Simoes :Yes, it is possible to write a PySpark DataFrame to a custom log table in Log Analytics workspace using the Azure Log Analytics Workspace API.Here's a high-level overview of the steps you can follow:Create an Azure Log Analytics Works...

1 kudos

05-13-2023 9:55:43 AM

1 More Replies

by DeviJaviya • New Contributor II

04-29-2023 10:15:56 PM

2966 Views
2 replies
1 kudos

Trying to build subquery in Databricks notebook, similar to SQL in a data frame with the Top(1)

Hello Everyone,I am new to Databricks, so I am at the learning stage. It would be very helpful if someone helps in resolving the issue or I can say helped me to fix my code.I have built the query that fetches the data based on CASE, in Case I have a ...

Data Engineering

2966 Views
2 replies
1 kudos

04-29-2023 10:15:56 PM

View Replies

Latest Reply

DeviJaviya
New Contributor II

05-03-2023 8:40:16 PM

1 kudos

Hello Rishabh,Thank you for your suggestion, we tried to limit 1 but the output values are coming the same for all the dates. which is not correct.

1 kudos

05-03-2023 8:40:16 PM

1 More Replies

by brian_0305 • New Contributor II

02-22-2023 11:45:58 AM

4515 Views
3 replies
2 kudos

Use JDBC connect to databrick default cluster and read table into pyspark dataframe. All the column turned into same as column name

I used code like below to Use JDBC connect to databrick default cluster and read table into pyspark dataframeurl = 'jdbc:databricks://[workspace domain]:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=[path];AuthMech=3;UID=token;PWD=[your_ac...

Data Engineering

4515 Views
3 replies
2 kudos

02-22-2023 11:45:58 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-24-2023 9:02:41 PM

2 kudos

@yu zhang :It looks like the issue with the first code snippet you provided is that it is not specifying the correct query to retrieve the data from your database.When using the load() method with the jdbc data source, you need to provide a SQL quer...

2 kudos

04-24-2023 9:02:41 PM

2 More Replies

by Vindhya • New Contributor II

04-18-2023 3:41:51 PM

2023 Views
1 replies
0 kudos

Dataframes to Pandas conversion step is failing with exception ""java.lang.IndexOutOfBoundsException: index: 16384, length: 4 (expected: range(0, 16384))"

Dataframes to Pandas conversion step is failing with exception ""java.lang.IndexOutOfBoundsException: index: 16384, length: 4 (expected: range(0, 16384))", PFB screenshot for more details

Data Engineering

2023 Views
1 replies
0 kudos

04-18-2023 3:41:51 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-23-2023 9:14:00 PM

0 kudos

Hi @Vindhya D Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

0 kudos

04-23-2023 9:14:00 PM

by elgeo • Valued Contributor II

02-21-2023 3:21:41 AM

8641 Views
1 replies
0 kudos

Iteration - Pyspark vs Pandas

Hello. Could someone please explain why iteration over a Pyspark dataframe is way slower than over a Pandas dataframe?Pysparkdf_list = df.collect()for index in range(0, len(df_list )):.....Pandasdf_pnd = df.toPandas() for index, row in df_p...

Data Engineering

8641 Views
1 replies
0 kudos

02-21-2023 3:21:41 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-22-2023 12:11:56 AM

0 kudos

Hi @ELENI GEORGOUSI Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us ...

0 kudos

04-22-2023 12:11:56 AM

by maartenvr • New Contributor III

02-28-2023 5:06:06 AM

29427 Views
9 replies
2 kudos

Resolved! Unable to clear cache using a pyspark session

Hi all,I am using a persist call on a spark dataframe inside an application to speed-up computations. The dataframe is used throughout my application and at the end of the application I am trying to clear the cache of the whole spark session by calli...

Data Engineering

29427 Views
9 replies
2 kudos

02-28-2023 5:06:06 AM

View Replies

Latest Reply

maartenvr
New Contributor III

03-13-2023 2:52:53 AM

2 kudos

No solution yet:Hi @Suteja Kanuri ,Thank you for thinking along and replying!Unfortunately, I have not found a solution yet.I am getting an error that there exists no ```.getCache()``` method on a spark context. Also note that I have tried to do som...

2 kudos

03-13-2023 2:52:53 AM

8 More Replies

by Mado • Valued Contributor II

03-16-2023 5:02:26 AM

6028 Views
4 replies
1 kudos

Resolved! How to set properties for a delta table when I want to write a DataFrame?

Hi,I have a PySpark DataFrame with 11 million records. I created the DataFrame on a cluster. It is not saved on DBFS or storage account. import pyspark.sql.functions as F from pyspark.sql.functions import col, when, floor, expr, hour, minute, to_time...

Data Engineering

6028 Views
4 replies
1 kudos

03-16-2023 5:02:26 AM

View Replies

Latest Reply

Lakshay
Databricks Employee

03-16-2023 5:57:04 AM

1 kudos

Hi @Mohammad Saber , Are you getting the error while writing the file to the table? Or before that?

1 kudos

03-16-2023 5:57:04 AM

3 More Replies

by uzairm • New Contributor III

02-28-2023 12:56:10 PM

15505 Views
2 replies
2 kudos

Resolved! ThreadPoolExecutor in Databricks

I am using a threadpool executor and running notebooks in parallel. However, these parallel notebooks are not using executors at all and all the load is going towards the driver node resulting in running out of memory for the driver node and eventual...

Data Engineering

15505 Views
2 replies
2 kudos

02-28-2023 12:56:10 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-12-2023 9:47:08 PM

2 kudos

Hi @uzair mustafa Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedbac...

2 kudos

03-12-2023 9:47:08 PM

1 More Replies

by raghub1 • New Contributor II

05-10-2022 1:12:02 AM

7477 Views
4 replies
3 kudos

Resolved! Writing PySpark DataFrame onto AWS Glue throwing error

I have followed the steps as mentioned in this blog : https://www.linkedin.com/pulse/aws-glue-data-catalog-metastore-databricks-deepak-rajak/ but when trying to saveAsTable(table_name), it is giving an error as IllegalArgumentException: Path must be ...

Data Engineering

7477 Views
4 replies
3 kudos

05-10-2022 1:12:02 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-21-2022 9:10:24 AM

3 kudos

Hey @Raghu Bharadwaj Tallapragada Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

3 kudos

06-21-2022 9:10:24 AM

3 More Replies