cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ackerman_chris
by New Contributor III
  • 1407 Views
  • 4 replies
  • 0 kudos

Resolved! Databricks Lakehouse Fundamentals Badge Not Found

Hello, I've successfully completed the Databricks Lakehouse Fundamentals and am looking to find where the badge is.I found this post here. But I haven't received email on my completion from <service.accredible.email@databricks.com> yet. I successfull...

  • 1407 Views
  • 4 replies
  • 0 kudos
Latest Reply
ackerman_chris
New Contributor III
  • 0 kudos

Thank You all for the great responses, I eventually received the Badge, it took around 30+ minutes to receive, but I finally did get the Email notification. I will mark this post as resolved

  • 0 kudos
3 More Replies
KrishZ
by Contributor
  • 10041 Views
  • 4 replies
  • 3 kudos

[Pyspark.Pandas] PicklingError: Could not serialize object (this error is happening only for large datasets)

Context: I am using pyspark.pandas in a Databricks jupyter notebook and doing some text manipulation within the dataframe..pyspark.pandas is the Pandas API on Spark and can be used exactly the same as usual PandasError: PicklingError: Could not seria...

  • 10041 Views
  • 4 replies
  • 3 kudos
Latest Reply
ryojikn
New Contributor III
  • 3 kudos

@Krishna Zanwar​ , i'm receiving the same error.​For me, the behavior is when trying to broadcast a random forest (sklearn 1.2.0) recently loaded from mlflow, and using Pandas UDF to predict a model.​However, the same code works perfectly on Spark 2....

  • 3 kudos
3 More Replies
anujsen18
by New Contributor
  • 1562 Views
  • 3 replies
  • 0 kudos

How to overwrite partition in DLT pipeline ?

I am trying to replicate my existing spark pipeline in DLT. I am not able to achieve desired result using DLT . Current pipeline : source set up : CSV file ingested in bronze using SCP frequency : monthly bronze dir : /cntdlt/bronze/emp/year=2022 /...

  • 1562 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Anuj kumar sen​ , We haven't heard from you on the last response from @Kristian Foster​  , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to...

  • 0 kudos
2 More Replies
SIRIGIRI
by Contributor
  • 402 Views
  • 1 replies
  • 1 kudos

sharikrishna26.medium.com

Difference between “ And ‘ in Spark Dataframe APIYou must tell your compiler that you want to represent a string inside a string using a different symbol for the inner string.Here is an example.“ Name = “HARI” “The above is wrong. Why? Because the in...

  • 402 Views
  • 1 replies
  • 1 kudos
Latest Reply
sher
Valued Contributor II
  • 1 kudos

thanks for sharing

  • 1 kudos
Raghu101
by New Contributor III
  • 2940 Views
  • 6 replies
  • 3 kudos

How to Call Oracle Stored Procedures from Databricks?

How to Call Oracle Stored Procedures from Databricks?

  • 2940 Views
  • 6 replies
  • 3 kudos
Latest Reply
sher
Valued Contributor II
  • 3 kudos

https://datathirst.net/blog/2018/10/12/executing-sql-server-stored-procedures-on-databricks-pyspark/try this link. this may help you

  • 3 kudos
5 More Replies
A_Jabbar
by New Contributor
  • 1258 Views
  • 2 replies
  • 2 kudos

Resolved! I am unable to create databricks community edition account!!!!!!

This is what I am doing,enter all the details on page 1 click on the Getting stated with community edition, after verification, I get the following error

Error Message on the second page of Registration
  • 1258 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Abdul Jabbar​ Thank you for reaching out, and we’re sorry to hear about this log-in issue! We have this Community Edition login troubleshooting post on Community. Please take a look, and follow the troubleshooting steps. If the steps do not resol...

  • 2 kudos
1 More Replies
KKo
by Contributor III
  • 920 Views
  • 2 replies
  • 5 kudos

Read and write to XMLA from Databricks notebook

I am trying to process power bi dataset partition refresh from Azure Databricks, using XMLA endpoint. I have power bi premium capacity and read/write enabled. Tried few approaches found in google did not work with one or the other reason. If any of y...

  • 920 Views
  • 2 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Kris Koirala​  Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.

  • 5 kudos
1 More Replies
KVNARK
by Honored Contributor II
  • 1325 Views
  • 4 replies
  • 6 kudos

Resolved! best practices for SQL DB authentication from data bricks

I would like to know the best practices to authenticate SQL db from databricks/python. More interested to hear about some token based DB authentication methods other than credential based(username/password)

  • 1325 Views
  • 4 replies
  • 6 kudos
Latest Reply
Vivian_Wilfred
Honored Contributor
  • 6 kudos

@KVNARK .​ Have you checked on the PAT token for authentication? https://docs.databricks.com/sql/api/authentication.html

  • 6 kudos
3 More Replies
jm99
by New Contributor III
  • 1835 Views
  • 1 replies
  • 1 kudos

Resolved! ForeachBatch() - Get results from batchDF._jdf.sparkSession().sql('merge stmt')

Most python examples show the structure of the foreachBatch method as:def foreachBatchFunc(batchDF, batchId): batchDF.createOrReplaceTempView('viewName') ( batchDF ._jdf.sparkSession() .sql( ...

  • 1835 Views
  • 1 replies
  • 1 kudos
Latest Reply
jm99
New Contributor III
  • 1 kudos

Just found a solution...Need to convert the Java Dataframe (jdf) to a DataFramefrom pyspark import sql   def batchFunc(batchDF, batchId): batchDF.createOrReplaceTempView('viewName') sparkSession = batchDF._jdf.sparkSession()   resJdf = sparkSes...

  • 1 kudos
ks1248
by New Contributor III
  • 1405 Views
  • 4 replies
  • 6 kudos

Resolved! Autoloader creates columns not present in the source

I have been exploring Autoloader to ingest gzipped JSON files from an S3 source.The notebook fails in the first run due to schema mismatch, after re-running the notebook, the schema evolves and the ingestion runs successfully.On analysing the schema ...

  • 1405 Views
  • 4 replies
  • 6 kudos
Latest Reply
ks1248
New Contributor III
  • 6 kudos

Hi @Debayan Mukherjee​ , @Kaniz Fatma​ Thank you for replying to my question.I was able to figure out the issue. I was creating the schema and checkpoint folders in the same path as the source location for the autoloader. This caused the schema to ch...

  • 6 kudos
3 More Replies
Phani1
by Valued Contributor
  • 1616 Views
  • 2 replies
  • 0 kudos

SUBNET_EXHAUSTED_FAILURE(CLOUD_FAILURE): or No more address space to create NIC within injected virtual network

Currently we are using an all-purpose compute cluster. When we tried to allocate the scheduled jobs to job cluster, we are blocked at the following error:SUBNET_EXHAUSTED_FAILURE(CLOUD_FAILURE): azure_error_code:SubnetIsFull,azure_error_message:No mo...

  • 1616 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

Answering your questions - yes, your vnet/subnet is out of non-occupied IPs and this can be fixed by allocating more IPs to your network address space.Each cluster requires it's own IP, so if there are none available, it simply cannot start.

  • 0 kudos
1 More Replies
lewit
by New Contributor II
  • 845 Views
  • 2 replies
  • 1 kudos

Is it possible to create a feature store training set directly from a feature store table?

Rather than joining features from different tables, I just wanted to use a single feature store table and select some of its features, but still log the model in the feature store. The problem I am facing is that I do not know how to create the train...

  • 845 Views
  • 2 replies
  • 1 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 1 kudos

Hi, Could you please refer https://docs.databricks.com/machine-learning/feature-store/train-models-with-feature-store.html#create-a-trainingset-using-the-same-feature-multiple-times and let us know if this helps.

  • 1 kudos
1 More Replies
gpzz
by New Contributor II
  • 820 Views
  • 2 replies
  • 1 kudos

MEMORY_ONLY not working

val doubledAmount = premiumCustomers.map(x=>(x._1, x._2*2)).persist(StorageLevel.MEMORY_ONLY) error: not found: value StorageLevel

  • 820 Views
  • 2 replies
  • 1 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 1 kudos

Hi @Gaurav Poojary​ ,Can you please try the below as displayed in the image it is working for me without any issues.Happy Learning!!

  • 1 kudos
1 More Replies
bozhu
by Contributor
  • 917 Views
  • 3 replies
  • 3 kudos

Set taskValues in DLT workbooks

Is "setting taskValues in DLT workbooks" supported?I tried setting a task value in a DLT workbook, but it does not seem supported, so downstream workbooks within the same workflows job cannot consume this task value.

  • 917 Views
  • 3 replies
  • 3 kudos
Latest Reply
Lê_Ngọc_Lợi
New Contributor III
  • 3 kudos

I have the same issue, I also want to know databricks support taskValue between taskJob and DLT or not?

  • 3 kudos
2 More Replies
Vik1
by New Contributor II
  • 5996 Views
  • 4 replies
  • 5 kudos

Some very simple functions in Pandas on Spark are very slow

I have a pandas on spark dataframe with 8 million rows and 20 columns. It took 3.48 minutes to run df.shape and it takes. It also takes a long time to run df.head took 4.55 minutes . By contrast df.var1.value_counts().reset_index() took only 0.18 sec...

  • 5996 Views
  • 4 replies
  • 5 kudos
Latest Reply
PeterDowdy
New Contributor II
  • 5 kudos

The reason why this is slow is because pandas needs an index column to perform `shape` or `head`. If you don't provide one, pyspark pandas enumerates the entire dataframe to create a default one. For example, given columns A, B, and C in dataframe `d...

  • 5 kudos
3 More Replies
Labels
Top Kudoed Authors