cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

p42af
by New Contributor
  • 4245 Views
  • 4 replies
  • 1 kudos

Resolved! rdd.foreachPartition() does nothing?

I expected the code below to print "hello" for each partition, and "world" for each record. But when I ran it the code ran but had no print outs of any kind. No errors either. What is happening here?%scala   val rdd = spark.sparkContext.parallelize(S...

  • 4245 Views
  • 4 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Is it lazy evaluated so you need to trigger action I guess

  • 1 kudos
3 More Replies
Sandesh87
by New Contributor III
  • 2359 Views
  • 2 replies
  • 2 kudos

Resolved! create a dataframe with all the responses from the api requests within foreachPartition

I am trying to execute an api call to get an object(json) from amazon s3 and I am using foreachPartition to execute multiple calls in paralleldf.rdd.foreachPartition(partition => { //Initialize list buffer var buffer_accounts1 = new ListBuffer[St...

  • 2359 Views
  • 2 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

Hi @Sandesh Puligundla​ ,Thank you for sharing the solution. We will mark it as "best" response so, in the future is another user has the same question, they will be able to find the solution right away.

  • 2 kudos
1 More Replies
Sandesh87
by New Contributor III
  • 1647 Views
  • 3 replies
  • 2 kudos

Resolved! log error to cosmos db

Objective:- Retrieve objects from an S3 bucket using a 'get' api call, write the retrieved object to azure datalake and in case of errors like 404s (object not found) write the error message to cosmos DB"my_dataframe" consists of the a column (s3Obje...

  • 1647 Views
  • 3 replies
  • 2 kudos
Latest Reply
User16763506477
Contributor III
  • 2 kudos

Hi @Sandesh Puligundla​  issue is that you are using spark context inside foreachpartition. You can create a dataframe only on the spark driver. Few stack overflow references https://stackoverflow.com/questions/46964250/nullpointerexception-creatin...

  • 2 kudos
2 More Replies
halfwind22
by New Contributor III
  • 9280 Views
  • 11 replies
  • 12 kudos

Resolved! Unable to write csv files to Azure BLOB using pandas to_csv ()

I am using a Py function to read some data from a GET endpoint and write them as a CSV file to a Azure BLOB location.My GET endpoint takes 2 query parameters,param1 and param2. So initially, I have a dataframe paramDf that has two columns param1 and ...

  • 9280 Views
  • 11 replies
  • 12 kudos
Latest Reply
halfwind22
New Contributor III
  • 12 kudos

@Hubert Dudek​ I cant issue a spark command to executor node, throws up an error ,because foreach distributes the processing.

  • 12 kudos
10 More Replies
JoãoRafael
by New Contributor II
  • 2580 Views
  • 3 replies
  • 0 kudos

Double job execution caused by databricks' RemoteServiceExec using databricks-connector

Hello! I'm using databricks-connector to launch spark jobs using python. I've validated that the python version (3.8.10) and runtime version (8.1) are supported by the installed databricks-connect (8.1.10). Everytime a mapPartitions/foreachParti...

0693f000007OoMBAA0 0693f000007OoMAAA0
  • 2580 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

A community forum to discuss working with Databricks Cloud and Spark. ... Double job execution caused by databricks' RemoteServiceExec using databrick.MyBalanceNow

  • 0 kudos
2 More Replies
Labels