Hello, I've successfully completed the Databricks Lakehouse Fundamentals and am looking to find where the badge is.I found this post here. But I haven't received email on my completion from <service.accredible.email@databricks.com> yet. I successfull...
Thank You all for the great responses, I eventually received the Badge, it took around 30+ minutes to receive, but I finally did get the Email notification. I will mark this post as resolved
Context: I am using pyspark.pandas in a Databricks jupyter notebook and doing some text manipulation within the dataframe..pyspark.pandas is the Pandas API on Spark and can be used exactly the same as usual PandasError: PicklingError: Could not seria...
@Krishna Zanwar , i'm receiving the same error.For me, the behavior is when trying to broadcast a random forest (sklearn 1.2.0) recently loaded from mlflow, and using Pandas UDF to predict a model.However, the same code works perfectly on Spark 2....
I am trying to replicate my existing spark pipeline in DLT. I am not able to achieve desired result using DLT . Current pipeline : source set up : CSV file ingested in bronze using SCP frequency : monthly bronze dir : /cntdlt/bronze/emp/year=2022 /...
Hi @Anuj kumar sen , We haven't heard from you on the last response from @Kristian Foster , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to...
Difference between “ And ‘ in Spark Dataframe APIYou must tell your compiler that you want to represent a string inside a string using a different symbol for the inner string.Here is an example.“ Name = “HARI” “The above is wrong. Why? Because the in...
This is what I am doing,enter all the details on page 1 click on the Getting stated with community edition, after verification, I get the following error
Hi @Abdul Jabbar Thank you for reaching out, and we’re sorry to hear about this log-in issue! We have this Community Edition login troubleshooting post on Community. Please take a look, and follow the troubleshooting steps. If the steps do not resol...
I am trying to process power bi dataset partition refresh from Azure Databricks, using XMLA endpoint. I have power bi premium capacity and read/write enabled. Tried few approaches found in google did not work with one or the other reason. If any of y...
Hi @Kris Koirala Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.
I would like to know the best practices to authenticate SQL db from databricks/python. More interested to hear about some token based DB authentication methods other than credential based(username/password)
Most python examples show the structure of the foreachBatch method as:def foreachBatchFunc(batchDF, batchId):
batchDF.createOrReplaceTempView('viewName')
(
batchDF
._jdf.sparkSession()
.sql(
...
Just found a solution...Need to convert the Java Dataframe (jdf) to a DataFramefrom pyspark import sql
def batchFunc(batchDF, batchId):
batchDF.createOrReplaceTempView('viewName')
sparkSession = batchDF._jdf.sparkSession()
resJdf = sparkSes...
I have been exploring Autoloader to ingest gzipped JSON files from an S3 source.The notebook fails in the first run due to schema mismatch, after re-running the notebook, the schema evolves and the ingestion runs successfully.On analysing the schema ...
Hi @Debayan Mukherjee , @Kaniz Fatma Thank you for replying to my question.I was able to figure out the issue. I was creating the schema and checkpoint folders in the same path as the source location for the autoloader. This caused the schema to ch...
Currently we are using an all-purpose compute cluster. When we tried to allocate the scheduled jobs to job cluster, we are blocked at the following error:SUBNET_EXHAUSTED_FAILURE(CLOUD_FAILURE): azure_error_code:SubnetIsFull,azure_error_message:No mo...
Answering your questions - yes, your vnet/subnet is out of non-occupied IPs and this can be fixed by allocating more IPs to your network address space.Each cluster requires it's own IP, so if there are none available, it simply cannot start.
Rather than joining features from different tables, I just wanted to use a single feature store table and select some of its features, but still log the model in the feature store. The problem I am facing is that I do not know how to create the train...
Hi, Could you please refer https://docs.databricks.com/machine-learning/feature-store/train-models-with-feature-store.html#create-a-trainingset-using-the-same-feature-multiple-times and let us know if this helps.
Is "setting taskValues in DLT workbooks" supported?I tried setting a task value in a DLT workbook, but it does not seem supported, so downstream workbooks within the same workflows job cannot consume this task value.
I have a pandas on spark dataframe with 8 million rows and 20 columns. It took 3.48 minutes to run df.shape and it takes. It also takes a long time to run df.head took 4.55 minutes . By contrast df.var1.value_counts().reset_index() took only 0.18 sec...
The reason why this is slow is because pandas needs an index column to perform `shape` or `head`. If you don't provide one, pyspark pandas enumerates the entire dataframe to create a default one. For example, given columns A, B, and C in dataframe `d...