cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

halfwind22
by New Contributor III
  • 7767 Views
  • 11 replies
  • 12 kudos

Resolved! Unable to write csv files to Azure BLOB using pandas to_csv ()

I am using a Py function to read some data from a GET endpoint and write them as a CSV file to a Azure BLOB location.My GET endpoint takes 2 query parameters,param1 and param2. So initially, I have a dataframe paramDf that has two columns param1 and ...

  • 7767 Views
  • 11 replies
  • 12 kudos
Latest Reply
halfwind22
New Contributor III
  • 12 kudos

@Hubert Dudek​ I cant issue a spark command to executor node, throws up an error ,because foreach distributes the processing.

  • 12 kudos
10 More Replies
MartinB
by Contributor III
  • 7137 Views
  • 4 replies
  • 3 kudos

Resolved! Interoperability Spark ↔ Pandas: can't convert Spark dataframe to Pandas dataframe via df.toPandas() when it contains datetime value in distant future

Hi,I have multiple datasets in my data lake that feature valid_from and valid_to columns indicating validity of rows.If a row is valid currently, this is indicated by valid_to=9999-12-31 00:00:00.Example:Loading this into a Spark dataframe works fine...

Example_SCD2
  • 7137 Views
  • 4 replies
  • 3 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 3 kudos

Currently, out of bound timestamps are not supported in pyArrow/pandas. Please refer to the below associated JIRA issue. https://issues.apache.org/jira/browse/ARROW-5359?focusedCommentId=17104355&page=com.atlassian.jira.plugin.system.issuetabpanels%3...

  • 3 kudos
3 More Replies
brij
by New Contributor III
  • 3853 Views
  • 8 replies
  • 3 kudos

Resolved! Databricks snowflake dataframe.toPandas() taking more space and time

I have 2 exactly same table(rows and schema). One table recides in AZSQL server data base and other one is in snowflake database. Now we have some existing code which we want to migrate from azsql to snowflake but when we are trying to create a panda...

  • 3853 Views
  • 8 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Brijan Elwadhi​ - That's wonderful. Thank you for sharing your solution.

  • 3 kudos
7 More Replies
Jack
by New Contributor II
  • 1616 Views
  • 1 replies
  • 0 kudos

Resolved! Creating Pandas Data Frame of Features After Applying Variance Reduction

I am building a classification model using the following data frame of 120,000 records (sample of 5 records shown):Using this data, I have built the following model:from sklearn.model_selection import train_test_split from sklearn.feature_extraction....

df df3
  • 1616 Views
  • 1 replies
  • 0 kudos
Latest Reply
Dan_Z
Honored Contributor
  • 0 kudos

This is more of a scikit-learn question than a Databricks question. But poking around I think VT_reduced.get_support() is probably what you are looking for:https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold....

  • 0 kudos
SindhuG
by New Contributor
  • 753 Views
  • 1 replies
  • 0 kudos

Hi All, I need to extract rows of dates from a dataframe based on list of values(e.g. dates) located in a CSV file. Can anyone please help me? I have tried groupby function but am not able to get the expected result. Thanks in advance.

my dataframe looks like this.df = Datecolumn2column3Machine1-jan-2020A2-jan-2020--- A 18-jan-2020 A 11-jan-2020 B 12-jan-2020 B 6-feb-2020C7-feb-2020---C14-feb-2020C Date details csv file looks like this D = MachineSelected DateA15-jan-2020C12-f...

  • 753 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ SindhuG! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
Labels