Data Engineering

Forum Posts

Sorted by:

by ImAbhishekTomar • New Contributor III

10-07-2022 6:45:43 AM

10723 Views
7 replies
4 kudos

kafkashaded.org.apache.kafka.common.errors.TimeoutException: topic-downstream-data-nonprod not present in metadata after 60000 ms.

I am facing an error when trying to write data on Kafka using spark stream.#Extract source_stream_df= (spark.readStream .format("cosmos.oltp.changeFeed") .option("spark.cosmos.container", PARM_CONTAINER_NAME) .option("spark.cosmos.read.inferSchema.en...

Data Engineering

10723 Views
7 replies
4 kudos

10-07-2022 6:45:43 AM

View Replies

Latest Reply

devmehta
New Contributor III

09-10-2024 2:54:28 AM

4 kudos

What event hub namespace you were using?I had same problem and resolved by changing pricing plan from basic to standard as Kafka apps is not supporting in basic planLet me know if you had anything else. Thanks

4 kudos

09-10-2024 2:54:28 AM

6 More Replies

by Sam • New Contributor III

12-02-2021 3:53:18 PM

1471 Views
1 replies
4 kudos

collect_set/ collect_list Pushdown

Hello,I've noticed that Collect_Set and Collect_List are not pushed down to the database?Runtime DB 9.1LTSSpark 3.1.2Database: SnowflakeIs there any way to get a distinct set from a group by in a way that will push down the query to the database?

Data Engineering

1471 Views
1 replies
4 kudos

12-02-2021 3:53:18 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-02-2021 11:29:53 PM

4 kudos

Hm so collect_set does not get translated to listagg.Can you try the following?use a more recent version of dbrxuse delta lake as spark sourceuse the latest version of the snowflake connectorcheck if pushdown to snowflake is enabled

4 kudos

12-02-2021 11:29:53 PM

by twotwoiscute • New Contributor

07-14-2021 8:00:55 PM

1741 Views
0 replies
0 kudos

PySpark pandas_udf slower than single thread

I used @pandas_udf write a function for speeding up the process(parsing xml file ) and then compare it's speed with single thread , Surprisingly , Using @pandas_udf is two times slower than single-thread code. And the number of xml files I need to p...

Data Engineering

1741 Views
0 replies
0 kudos

07-14-2021 8:00:55 PM

Databricks Community

kafkashaded.org.apache.kafka.common.errors.TimeoutException: topic-downstream-data-nonprod not present in metadata after 60000 ms.

collect_set/ collect_list Pushdown

PySpark pandas_udf slower than single thread