cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Sas
by New Contributor II
  • 856 Views
  • 1 replies
  • 0 kudos

A streaming job going into infinite looping

HiBelow i am trying to read data from kafka, determine whether its fraud or not and then i need to write it back to mongodbbelow is my code read_kafka.pyfrom pyspark.sql import SparkSession from pyspark.sql.functions import * from pyspark.sql.types i...

  • 856 Views
  • 1 replies
  • 0 kudos
Latest Reply
swethaNandan
New Contributor III
  • 0 kudos

Hi Saswata,Can you remove the filter and see if it is printing output to console?kafka_df5=kafka_df4.filter(kafka_df4.status=="FRAUD")Thanks and RegardsSwetha Nandajan

  • 0 kudos
farefin
by New Contributor II
  • 1564 Views
  • 2 replies
  • 5 kudos

Need help in a pyspark code in Databricks to calculate a new measure column.

Details of the requirement is as below:I have a table with below structure:So i have to write a code in pyspark to calculate a new column.Logic for new column is Sum of Magnitude for different Categories divided by the total Magnitude.And it should b...

Sample Data
  • 1564 Views
  • 2 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Faizan Arefin​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 5 kudos
1 More Replies
Databricks_7045
by New Contributor III
  • 1234 Views
  • 4 replies
  • 0 kudos

Resolved! Encapsulate Databricks Pyspark/SparkSql code

Hi All ,I have Custom code ( Pyspark & SparkSQL) (notebooks) which I want to deploy at customer location and encapsulate so that end customers don't see the actual code. Currently we have all code in Notebooks (Pyspark/spark sql). Could you please l...

  • 1234 Views
  • 4 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

With notebooks that is not possible.You can write your code in scala/java and build a jar, which you then run with spark-submit.(example)Or use python and deploy a wheel.(example)This can become quite complex when you have dependencies.Also: a jar et...

  • 0 kudos
3 More Replies
fymaterials_199
by New Contributor II
  • 677 Views
  • 1 replies
  • 0 kudos

pyspark intermediate dataframe consumes many memory

I have pyspark code running in my local mac, which has 6 cores and 16 GB. I run it in pycharm to do first test.spark = ( SparkSession.builder.appName("loc") .master("local[2]") .config("spark.driver.bindAddress","localhost") .config("...

  • 677 Views
  • 1 replies
  • 0 kudos
Latest Reply
fymaterials_199
New Contributor II
  • 0 kudos

Here is my input fileEID,EffectiveTime,OrderHistory,dummy_col,Period_Start_Date11,2019-04-19T02:50:42.6918667Z,"[{'Codes': [{'CodeSystem': 'sys_1', 'Code': '1-2'}], 'EffectiveDateTime': '2019-04-18T23:48:00Z', 'ComponentResults': [{'Codes': [{'CodeSy...

  • 0 kudos
Labels