Data Engineering

Forum Posts

Sorted by:

by Sas • New Contributor II

05-14-2023 11:12:32 PM

1616 Views
1 replies
0 kudos

A streaming job going into infinite looping

HiBelow i am trying to read data from kafka, determine whether its fraud or not and then i need to write it back to mongodbbelow is my code read_kafka.pyfrom pyspark.sql import SparkSession from pyspark.sql.functions import * from pyspark.sql.types i...

Data Engineering

1616 Views
1 replies
0 kudos

05-14-2023 11:12:32 PM

View Replies

Latest Reply

swethaNandan
Databricks Employee

06-14-2023 10:35:00 AM

0 kudos

Hi Saswata,Can you remove the filter and see if it is printing output to console?kafka_df5=kafka_df4.filter(kafka_df4.status=="FRAUD")Thanks and RegardsSwetha Nandajan

0 kudos

06-14-2023 10:35:00 AM

by farefin • New Contributor II

10-19-2022 4:33:18 AM

2864 Views
2 replies
5 kudos

Need help in a pyspark code in Databricks to calculate a new measure column.

Details of the requirement is as below:I have a table with below structure:So i have to write a code in pyspark to calculate a new column.Logic for new column is Sum of Magnitude for different Categories divided by the total Magnitude.And it should b...

Data Engineering

2864 Views
2 replies
5 kudos

10-19-2022 4:33:18 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-27-2022 5:36:25 AM

5 kudos

Hi @Faizan Arefin Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

5 kudos

11-27-2022 5:36:25 AM

1 More Replies

by Databricks_7045 • New Contributor III

02-07-2022 4:53:18 AM

2783 Views
3 replies
0 kudos

Resolved! Encapsulate Databricks Pyspark/SparkSql code

Hi All ,I have Custom code ( Pyspark & SparkSQL) (notebooks) which I want to deploy at customer location and encapsulate so that end customers don't see the actual code. Currently we have all code in Notebooks (Pyspark/spark sql). Could you please l...

Data Engineering

2783 Views
3 replies
0 kudos

02-07-2022 4:53:18 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

02-08-2022 2:33:56 AM

0 kudos

With notebooks that is not possible.You can write your code in scala/java and build a jar, which you then run with spark-submit.(example)Or use python and deploy a wheel.(example)This can become quite complex when you have dependencies.Also: a jar et...

0 kudos

02-08-2022 2:33:56 AM

2 More Replies

by fymaterials_199 • New Contributor II

09-23-2021 8:38:26 AM

1186 Views
1 replies
0 kudos

pyspark intermediate dataframe consumes many memory

I have pyspark code running in my local mac, which has 6 cores and 16 GB. I run it in pycharm to do first test.spark = ( SparkSession.builder.appName("loc") .master("local[2]") .config("spark.driver.bindAddress","localhost") .config("...

Data Engineering

1186 Views
1 replies
0 kudos

09-23-2021 8:38:26 AM

View Replies

Latest Reply

fymaterials_199
New Contributor II

09-23-2021 8:41:37 AM

0 kudos

Here is my input fileEID,EffectiveTime,OrderHistory,dummy_col,Period_Start_Date11,2019-04-19T02:50:42.6918667Z,"[{'Codes': [{'CodeSystem': 'sys_1', 'Code': '1-2'}], 'EffectiveDateTime': '2019-04-18T23:48:00Z', 'ComponentResults': [{'Codes': [{'CodeSy...

0 kudos

09-23-2021 8:41:37 AM

Databricks Community

A streaming job going into infinite looping

Need help in a pyspark code in Databricks to calculate a new measure column.

Resolved! Encapsulate Databricks Pyspark/SparkSql code

pyspark intermediate dataframe consumes many memory