cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

halfwind22
by New Contributor III
  • 6874 Views
  • 11 replies
  • 12 kudos

Resolved! Unable to write csv files to Azure BLOB using pandas to_csv ()

I am using a Py function to read some data from a GET endpoint and write them as a CSV file to a Azure BLOB location.My GET endpoint takes 2 query parameters,param1 and param2. So initially, I have a dataframe paramDf that has two columns param1 and ...

  • 6874 Views
  • 11 replies
  • 12 kudos
Latest Reply
halfwind22
New Contributor III
  • 12 kudos

@Hubert Dudek​ I cant issue a spark command to executor node, throws up an error ,because foreach distributes the processing.

  • 12 kudos
10 More Replies
MartinB
by Contributor III
  • 6321 Views
  • 4 replies
  • 3 kudos

Resolved! Interoperability Spark ↔ Pandas: can't convert Spark dataframe to Pandas dataframe via df.toPandas() when it contains datetime value in distant future

Hi,I have multiple datasets in my data lake that feature valid_from and valid_to columns indicating validity of rows.If a row is valid currently, this is indicated by valid_to=9999-12-31 00:00:00.Example:Loading this into a Spark dataframe works fine...

Example_SCD2
  • 6321 Views
  • 4 replies
  • 3 kudos
Latest Reply
shan_chandra
Honored Contributor III
  • 3 kudos

Currently, out of bound timestamps are not supported in pyArrow/pandas. Please refer to the below associated JIRA issue. https://issues.apache.org/jira/browse/ARROW-5359?focusedCommentId=17104355&page=com.atlassian.jira.plugin.system.issuetabpanels%3...

  • 3 kudos
3 More Replies
brij
by New Contributor III
  • 3180 Views
  • 8 replies
  • 3 kudos

Resolved! Databricks snowflake dataframe.toPandas() taking more space and time

I have 2 exactly same table(rows and schema). One table recides in AZSQL server data base and other one is in snowflake database. Now we have some existing code which we want to migrate from azsql to snowflake but when we are trying to create a panda...

  • 3180 Views
  • 8 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Brijan Elwadhi​ - That's wonderful. Thank you for sharing your solution.

  • 3 kudos
7 More Replies
Jack
by New Contributor II
  • 1374 Views
  • 1 replies
  • 0 kudos

Resolved! Creating Pandas Data Frame of Features After Applying Variance Reduction

I am building a classification model using the following data frame of 120,000 records (sample of 5 records shown):Using this data, I have built the following model:from sklearn.model_selection import train_test_split from sklearn.feature_extraction....

df df3
  • 1374 Views
  • 1 replies
  • 0 kudos
Latest Reply
Dan_Z
Honored Contributor
  • 0 kudos

This is more of a scikit-learn question than a Databricks question. But poking around I think VT_reduced.get_support() is probably what you are looking for:https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold....

  • 0 kudos
SindhuG
by New Contributor
  • 592 Views
  • 1 replies
  • 0 kudos

Hi All, I need to extract rows of dates from a dataframe based on list of values(e.g. dates) located in a CSV file. Can anyone please help me? I have tried groupby function but am not able to get the expected result. Thanks in advance.

my dataframe looks like this.df = Datecolumn2column3Machine1-jan-2020A2-jan-2020--- A 18-jan-2020 A 11-jan-2020 B 12-jan-2020 B 6-feb-2020C7-feb-2020---C14-feb-2020C Date details csv file looks like this D = MachineSelected DateA15-jan-2020C12-f...

  • 592 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @ SindhuG! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
Labels