cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

User16776430979
by New Contributor III
  • 1193 Views
  • 1 replies
  • 0 kudos

How to optimize conversion between PySpark and Arrow?

Seems like you can convert between dataframes and Arrow objects by using Pandas as an intermediary, but there are some limitations (e.g. it collects all records in the DataFrame to the driver and should be done on a small subset of the data, you hit ...

  • 1193 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @josephine.ho! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
User16776430979
by New Contributor III
  • 2166 Views
  • 1 replies
  • 0 kudos

How to optimize and convert a Spark DataFrame to Arrow?

Example use case: When connecting a sample Plotly Dash application to a large dataset, in order to test the performance, I need the file format to be in either hdf5 or arrow. According to this doc: Optimize conversion between PySpark and pandas DataF...

  • 2166 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ josephine.ho! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
Josh21
by New Contributor II
  • 887 Views
  • 1 replies
  • 1 kudos

2012-12-30 has year of both 2012 and 2013 sql

I am trying to obtain the month and year in the format of "MM-YYY", then "YYY" to get a values such as 12-2012. I noticed an error where a timestamp of 2012-12-30T00:00:00.000+0000 results in both 12-2013 and 2013. This is an error, since 2012-12-30...

  • 887 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @ Josh21! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 1 kudos
User15787040559
by New Contributor III
  • 1408 Views
  • 1 replies
  • 0 kudos

How to translate Apache Pig LOAD statement to Spark?

If you have the following Apache Pig LOAD statement:TOCCT = LOAD 'db_custbase.ods_corp_cust_t' using $HCatLoader;the equivalent code in Apache Spark is:TOCCT_DF = spark.read.table("db_custbase.ods_corp_cust_t")

  • 1408 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @User15787040559729892342! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a ...

  • 0 kudos
User15787040559
by New Contributor III
  • 1406 Views
  • 1 replies
  • 0 kudos

How to translate Apache Pig FILTER statement to Spark?

If you have the following Apache Pig FILTER statement:XCOCD_ACT_Y = FILTER XCOCD BY act_ind == 'Y';the equivalent code in Apache Spark is:XCOCD_ACT_Y_DF = (XCOCD_DF .filter(col("act_ind") == "Y"))

  • 1406 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @User15787040559729892342! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a ...

  • 0 kudos
Anonymous
by Not applicable
  • 810 Views
  • 1 replies
  • 0 kudos
  • 810 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @User16143885715632505170 ! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1037 Views
  • 1 replies
  • 0 kudos

If I write functionally equivalent code in Pyspark and Koalas, will they end up evaluating to the same execution plan?

I am wondering how similar the backend execution of the two API's are. If I have code that does the same operations written in both styles, is there any functional difference between them when it comes to the execution?

  • 1037 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ trevor.bishop! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
Charbel
by New Contributor II
  • 1272 Views
  • 1 replies
  • 1 kudos

Delta table is not writing data read from kafka

Guys, could you help me? I'm reading 5 kafka threads through a list and saving the data in a Delta table The execution will be 1x a day, it seems that everything is working but I noticed that when I read the topic and it has no message, it still gen...

0693f000007OoRrAAK
  • 1272 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Charbel! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 1 kudos
lycenok
by New Contributor II
  • 828 Views
  • 1 replies
  • 0 kudos

display function eats consecutive spaces

When using display, more than 1 spaces in strings are ignored. Can we change that behaviour? Are there any options for display functions? code example: display( spark.createDataFrame( [ ( 'a a' , 'a a' ) ], [ 'string_column', 'string_column_2' ] )...

  • 828 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ lycenok! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
AkankshaGupta
by New Contributor II
  • 1546 Views
  • 1 replies
  • 1 kudos

Target database.table1 must be delta table

I created a table1 with some data. Then I truncated it to load new dataset. When I do select * from table . I get row count 0. But when I am trying to copy into using following command. I get error saying target table must be delta table: COPY INTO...

  • 1546 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @AkankshaGupta! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 1 kudos
SindhuG
by New Contributor
  • 932 Views
  • 1 replies
  • 0 kudos

Hi All, I need to extract rows of dates from a dataframe based on list of values(e.g. dates) located in a CSV file. Can anyone please help me? I have tried groupby function but am not able to get the expected result. Thanks in advance.

my dataframe looks like this.df = Datecolumn2column3Machine1-jan-2020A2-jan-2020--- A 18-jan-2020 A 11-jan-2020 B 12-jan-2020 B 6-feb-2020C7-feb-2020---C14-feb-2020C Date details csv file looks like this D = MachineSelected DateA15-jan-2020C12-f...

  • 932 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ SindhuG! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
RajuNagarajan
by New Contributor
  • 746 Views
  • 1 replies
  • 0 kudos

GroupBy in a multi node environment

I have a group of rows with Information on a nested product calls. example- Trxn1-product1-caller1-local1 Trxn1-Product1-local1-local2 Trxn1-Product1-local2-local3 here’s is a expected calls for a product product1-caller1-local1 Product1-local1-loc...

  • 746 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ RajuNagarajan! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
lprevost
by Contributor
  • 1484 Views
  • 1 replies
  • 0 kudos

Incremental updates to s3 csv files, autoloader, and delta lake updates

I'm using the Databricks autoloader to incrementally load a series of csv files on s3 which I update with an API. My tyipcal work process is to update only the latest year file each night. But, there are ocassions where previous years also get update...

  • 1484 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ lprevost! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
MohitAnchlia
by New Contributor II
  • 1319 Views
  • 1 replies
  • 0 kudos

Accessing databricks from Presto SSQL

What's the best way to federate a query to delta lake or the databricks from presto sql without having to create external tables? PrestoSQL doesn't have access to S3. Can PrestoSQL be configured with jdbc driver or plugin?

  • 1319 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ MohitAnchlia! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
MalachiBunn
by New Contributor
  • 1459 Views
  • 2 replies
  • 0 kudos

Toggle titles to show by default for a user or notebook

I find titles to be useful in organizing my notebooks, but I don't like having to toggle the title display for each cell in order to add a title. Is there a way to toggle the UI to show titles by default for a user/notebook? This would be a good fea...

  • 1459 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi there!I am here to help you.If you want a new feature to be added here , you can request the feature here at this link:-https://docs.databricks.com/resources/ideas.html.

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels
Top Kudoed Authors