cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

VirajV
by New Contributor
  • 1098 Views
  • 1 replies
  • 0 kudos

mlflow project train and validate - Control over the data used in the script?

Hi there, Trying to decide if I am going to get started with ml and really enjoyed it so far. When going through the documentation, there was a blocker moment for me, as I feel the documentation doesn't mention much about the dataset used to train t...

0693f000007OoS1AAK
  • 1098 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ VirajV! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your question first. Or else I will follow up shortly with a response.

  • 0 kudos
fabiwilys84
by New Contributor II
  • 836 Views
  • 1 replies
  • 1 kudos

Databricks spark certification

Hi guys , Is there any way to get 100 showbox% off voucher or a good discount voucher for Databricks spark certification? Currently the certifica speed testtion is very costly (200$). Any help is appreciated.

  • 836 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @ fabiwilys84! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 1 kudos
LudovicBENARD
by New Contributor
  • 1776 Views
  • 1 replies
  • 0 kudos

I tried to install a cluster on Databricks and it doesn't work. I have the following message:

<br> <code>Time Message Cluster terminated.Reason:Network Configuration Failure The data plane network is misconfigured. Please verify that the network for your data plane is configured correctly. Instance ID: ............... Error mess...

  • 1776 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ LudovicBENARD! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response...

  • 0 kudos
MohitAnchlia
by New Contributor II
  • 1008 Views
  • 1 replies
  • 1 kudos

Change AWS storage setting and account

I am seeing a super weird behaviour in databricks. We initially configured the following: 1. Account X in Account Console -> AWS Account arn:aws:iam::X:role/databricks-s3 2. We setup databricks-s3 as S3 bucket in Account Console -> AWS Storage 3. W...

  • 1008 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @ MohitAnchlia! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 1 kudos
twotwoiscute
by New Contributor
  • 1477 Views
  • 1 replies
  • 0 kudos

PySpark pandas_udf slower than single thread

I used @pandas_udf write a function for speeding up the process(parsing xml file ) and then compare it's speed with single thread , Surprisingly , Using @pandas_udf is two times slower than single-thread code. And the number of xml files I need to p...

  • 1477 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ twotwoiscute ! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response...

  • 0 kudos
Digan_Parikh
by Valued Contributor
  • 1570 Views
  • 1 replies
  • 0 kudos

Widgets - Way to validate config parameters

Yes, you can use the widgets api to have some control to validate the input before you pass the values to the rest of your codeFor example:folder = dbutils.widgets.get("Folder") if folder == "": raise Exception("Folder missing")or to get spark se...

  • 1570 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @User16187108406241337282! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with ...

  • 0 kudos
User16776430979
by New Contributor III
  • 1207 Views
  • 1 replies
  • 0 kudos

How to optimize conversion between PySpark and Arrow?

Seems like you can convert between dataframes and Arrow objects by using Pandas as an intermediary, but there are some limitations (e.g. it collects all records in the DataFrame to the driver and should be done on a small subset of the data, you hit ...

  • 1207 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @josephine.ho! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
User16776430979
by New Contributor III
  • 2191 Views
  • 1 replies
  • 0 kudos

How to optimize and convert a Spark DataFrame to Arrow?

Example use case: When connecting a sample Plotly Dash application to a large dataset, in order to test the performance, I need the file format to be in either hdf5 or arrow. According to this doc: Optimize conversion between PySpark and pandas DataF...

  • 2191 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ josephine.ho! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
Josh21
by New Contributor II
  • 893 Views
  • 1 replies
  • 1 kudos

2012-12-30 has year of both 2012 and 2013 sql

I am trying to obtain the month and year in the format of "MM-YYY", then "YYY" to get a values such as 12-2012. I noticed an error where a timestamp of 2012-12-30T00:00:00.000+0000 results in both 12-2013 and 2013. This is an error, since 2012-12-30...

  • 893 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @ Josh21! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 1 kudos
User15787040559
by New Contributor III
  • 1427 Views
  • 1 replies
  • 0 kudos

How to translate Apache Pig LOAD statement to Spark?

If you have the following Apache Pig LOAD statement:TOCCT = LOAD 'db_custbase.ods_corp_cust_t' using $HCatLoader;the equivalent code in Apache Spark is:TOCCT_DF = spark.read.table("db_custbase.ods_corp_cust_t")

  • 1427 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @User15787040559729892342! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a ...

  • 0 kudos
User15787040559
by New Contributor III
  • 1419 Views
  • 1 replies
  • 0 kudos

How to translate Apache Pig FILTER statement to Spark?

If you have the following Apache Pig FILTER statement:XCOCD_ACT_Y = FILTER XCOCD BY act_ind == 'Y';the equivalent code in Apache Spark is:XCOCD_ACT_Y_DF = (XCOCD_DF .filter(col("act_ind") == "Y"))

  • 1419 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @User15787040559729892342! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a ...

  • 0 kudos
Anonymous
by Not applicable
  • 825 Views
  • 1 replies
  • 0 kudos
  • 825 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @User16143885715632505170 ! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1059 Views
  • 1 replies
  • 0 kudos

If I write functionally equivalent code in Pyspark and Koalas, will they end up evaluating to the same execution plan?

I am wondering how similar the backend execution of the two API's are. If I have code that does the same operations written in both styles, is there any functional difference between them when it comes to the execution?

  • 1059 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ trevor.bishop! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
Charbel
by New Contributor II
  • 1287 Views
  • 1 replies
  • 1 kudos

Delta table is not writing data read from kafka

Guys, could you help me? I'm reading 5 kafka threads through a list and saving the data in a Delta table The execution will be 1x a day, it seems that everything is working but I noticed that when I read the topic and it has no message, it still gen...

0693f000007OoRrAAK
  • 1287 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Charbel! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels