cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

KrishZ
by Contributor
  • 12253 Views
  • 4 replies
  • 3 kudos

[Pyspark.Pandas] PicklingError: Could not serialize object (this error is happening only for large datasets)

Context: I am using pyspark.pandas in a Databricks jupyter notebook and doing some text manipulation within the dataframe..pyspark.pandas is the Pandas API on Spark and can be used exactly the same as usual PandasError: PicklingError: Could not seria...

  • 12253 Views
  • 4 replies
  • 3 kudos
Latest Reply
ryojikn
New Contributor III
  • 3 kudos

@Krishna Zanwar​ , i'm receiving the same error.​For me, the behavior is when trying to broadcast a random forest (sklearn 1.2.0) recently loaded from mlflow, and using Pandas UDF to predict a model.​However, the same code works perfectly on Spark 2....

  • 3 kudos
3 More Replies
anujsen18
by New Contributor
  • 2081 Views
  • 3 replies
  • 0 kudos

How to overwrite partition in DLT pipeline ?

I am trying to replicate my existing spark pipeline in DLT. I am not able to achieve desired result using DLT . Current pipeline : source set up : CSV file ingested in bronze using SCP frequency : monthly bronze dir : /cntdlt/bronze/emp/year=2022 /...

  • 2081 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Anuj kumar sen​ , We haven't heard from you on the last response from @Kristian Foster​  , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to...

  • 0 kudos
2 More Replies
SIRIGIRI
by Contributor
  • 616 Views
  • 1 replies
  • 1 kudos

sharikrishna26.medium.com

Difference between “ And ‘ in Spark Dataframe APIYou must tell your compiler that you want to represent a string inside a string using a different symbol for the inner string.Here is an example.“ Name = “HARI” “The above is wrong. Why? Because the in...

  • 616 Views
  • 1 replies
  • 1 kudos
Latest Reply
sher
Valued Contributor II
  • 1 kudos

thanks for sharing

  • 1 kudos
Raghu101
by New Contributor III
  • 4000 Views
  • 6 replies
  • 3 kudos

How to Call Oracle Stored Procedures from Databricks?

How to Call Oracle Stored Procedures from Databricks?

  • 4000 Views
  • 6 replies
  • 3 kudos
Latest Reply
sher
Valued Contributor II
  • 3 kudos

https://datathirst.net/blog/2018/10/12/executing-sql-server-stored-procedures-on-databricks-pyspark/try this link. this may help you

  • 3 kudos
5 More Replies
A_Jabbar
by New Contributor
  • 1764 Views
  • 2 replies
  • 2 kudos

Resolved! I am unable to create databricks community edition account!!!!!!

This is what I am doing,enter all the details on page 1 click on the Getting stated with community edition, after verification, I get the following error

Error Message on the second page of Registration
  • 1764 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Abdul Jabbar​ Thank you for reaching out, and we’re sorry to hear about this log-in issue! We have this Community Edition login troubleshooting post on Community. Please take a look, and follow the troubleshooting steps. If the steps do not resol...

  • 2 kudos
1 More Replies
KKo
by Contributor III
  • 1315 Views
  • 2 replies
  • 5 kudos

Read and write to XMLA from Databricks notebook

I am trying to process power bi dataset partition refresh from Azure Databricks, using XMLA endpoint. I have power bi premium capacity and read/write enabled. Tried few approaches found in google did not work with one or the other reason. If any of y...

  • 1315 Views
  • 2 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Kris Koirala​  Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.

  • 5 kudos
1 More Replies
KVNARK
by Honored Contributor II
  • 1913 Views
  • 4 replies
  • 6 kudos

Resolved! best practices for SQL DB authentication from data bricks

I would like to know the best practices to authenticate SQL db from databricks/python. More interested to hear about some token based DB authentication methods other than credential based(username/password)

  • 1913 Views
  • 4 replies
  • 6 kudos
Latest Reply
Vivian_Wilfred
Honored Contributor
  • 6 kudos

@KVNARK .​ Have you checked on the PAT token for authentication? https://docs.databricks.com/sql/api/authentication.html

  • 6 kudos
3 More Replies
jm99
by New Contributor III
  • 2432 Views
  • 1 replies
  • 1 kudos

Resolved! ForeachBatch() - Get results from batchDF._jdf.sparkSession().sql('merge stmt')

Most python examples show the structure of the foreachBatch method as:def foreachBatchFunc(batchDF, batchId): batchDF.createOrReplaceTempView('viewName') ( batchDF ._jdf.sparkSession() .sql( ...

  • 2432 Views
  • 1 replies
  • 1 kudos
Latest Reply
jm99
New Contributor III
  • 1 kudos

Just found a solution...Need to convert the Java Dataframe (jdf) to a DataFramefrom pyspark import sql   def batchFunc(batchDF, batchId): batchDF.createOrReplaceTempView('viewName') sparkSession = batchDF._jdf.sparkSession()   resJdf = sparkSes...

  • 1 kudos
ks1248
by New Contributor III
  • 1963 Views
  • 4 replies
  • 6 kudos

Resolved! Autoloader creates columns not present in the source

I have been exploring Autoloader to ingest gzipped JSON files from an S3 source.The notebook fails in the first run due to schema mismatch, after re-running the notebook, the schema evolves and the ingestion runs successfully.On analysing the schema ...

  • 1963 Views
  • 4 replies
  • 6 kudos
Latest Reply
ks1248
New Contributor III
  • 6 kudos

Hi @Debayan Mukherjee​ , @Kaniz Fatma​ Thank you for replying to my question.I was able to figure out the issue. I was creating the schema and checkpoint folders in the same path as the source location for the autoloader. This caused the schema to ch...

  • 6 kudos
3 More Replies
Phani1
by Valued Contributor
  • 2063 Views
  • 2 replies
  • 0 kudos

SUBNET_EXHAUSTED_FAILURE(CLOUD_FAILURE): or No more address space to create NIC within injected virtual network

Currently we are using an all-purpose compute cluster. When we tried to allocate the scheduled jobs to job cluster, we are blocked at the following error:SUBNET_EXHAUSTED_FAILURE(CLOUD_FAILURE): azure_error_code:SubnetIsFull,azure_error_message:No mo...

  • 2063 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

Answering your questions - yes, your vnet/subnet is out of non-occupied IPs and this can be fixed by allocating more IPs to your network address space.Each cluster requires it's own IP, so if there are none available, it simply cannot start.

  • 0 kudos
1 More Replies
lewit
by New Contributor II
  • 1240 Views
  • 2 replies
  • 1 kudos

Is it possible to create a feature store training set directly from a feature store table?

Rather than joining features from different tables, I just wanted to use a single feature store table and select some of its features, but still log the model in the feature store. The problem I am facing is that I do not know how to create the train...

  • 1240 Views
  • 2 replies
  • 1 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 1 kudos

Hi, Could you please refer https://docs.databricks.com/machine-learning/feature-store/train-models-with-feature-store.html#create-a-trainingset-using-the-same-feature-multiple-times and let us know if this helps.

  • 1 kudos
1 More Replies
gpzz
by New Contributor II
  • 1196 Views
  • 2 replies
  • 1 kudos

MEMORY_ONLY not working

val doubledAmount = premiumCustomers.map(x=>(x._1, x._2*2)).persist(StorageLevel.MEMORY_ONLY) error: not found: value StorageLevel

  • 1196 Views
  • 2 replies
  • 1 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 1 kudos

Hi @Gaurav Poojary​ ,Can you please try the below as displayed in the image it is working for me without any issues.Happy Learning!!

  • 1 kudos
1 More Replies
bozhu
by Contributor
  • 1291 Views
  • 3 replies
  • 3 kudos

Set taskValues in DLT workbooks

Is "setting taskValues in DLT workbooks" supported?I tried setting a task value in a DLT workbook, but it does not seem supported, so downstream workbooks within the same workflows job cannot consume this task value.

  • 1291 Views
  • 3 replies
  • 3 kudos
Latest Reply
Lê_Ngọc_Lợi
New Contributor III
  • 3 kudos

I have the same issue, I also want to know databricks support taskValue between taskJob and DLT or not?

  • 3 kudos
2 More Replies
Vik1
by New Contributor II
  • 7305 Views
  • 4 replies
  • 5 kudos

Some very simple functions in Pandas on Spark are very slow

I have a pandas on spark dataframe with 8 million rows and 20 columns. It took 3.48 minutes to run df.shape and it takes. It also takes a long time to run df.head took 4.55 minutes . By contrast df.var1.value_counts().reset_index() took only 0.18 sec...

  • 7305 Views
  • 4 replies
  • 5 kudos
Latest Reply
PeterDowdy
New Contributor II
  • 5 kudos

The reason why this is slow is because pandas needs an index column to perform `shape` or `head`. If you don't provide one, pyspark pandas enumerates the entire dataframe to create a default one. For example, given columns A, B, and C in dataframe `d...

  • 5 kudos
3 More Replies
sunil_smile
by Contributor
  • 3178 Views
  • 2 replies
  • 1 kudos

Vnet peering settings is not enable in Azure databricks premium , even though its deployed inside my VNET?

Hi All,Vnet peering settings is not enabled in Azure databricks , even though its deployed inside my VNET?Here i not mentioned my vnet and subnet details , but filled this and created databricks (without private endpoint - allow public access)virtual...

image image image
  • 3178 Views
  • 2 replies
  • 1 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 1 kudos

Hi, VNET peering is not supported or possible on VNET-injected workspaces. Please refer: https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/vnet-peering#requirements

  • 1 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels