cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

databricks_amit
by New Contributor
  • 1797 Views
  • 0 replies
  • 0 kudos

UDF function while registering- PicklingError

PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers.I...

  • 1797 Views
  • 0 replies
  • 0 kudos
Mado
by Valued Contributor II
  • 9993 Views
  • 6 replies
  • 2 kudos

Resolved! How to see if condition is True / False for all rows in a DataFrame?

Assume that I have a Spark DataFrame, and I want to see if records satisfy a condition.Example dataset:# Prepare Data data = [('A', 1), \ ('A', 2), \ ('B', 3) ]   # Create DataFrame columns= ['col_1', 'col_2'] df = spark.createDataF...

image image
  • 9993 Views
  • 6 replies
  • 2 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 2 kudos

Hi you can use display() or show() function that will provide you expected results.

  • 2 kudos
5 More Replies
KVK
by New Contributor II
  • 2120 Views
  • 1 replies
  • 3 kudos

Unable to read image and vedio data in Databricks using OpenCV.

I have tried reading image and video data in Azure databricks using OpenCv. When I have checked the type of image, it’s shown as “NonType” and when I tried with vedio file, the file itself was not being opened. (Note: these files are stored on azure ...

  • 2120 Views
  • 1 replies
  • 3 kudos
Latest Reply
sachinkumar
New Contributor II
  • 3 kudos

Kindly let me know if you find the answer.!!

  • 3 kudos
lcalca95
by New Contributor II
  • 1846 Views
  • 0 replies
  • 0 kudos

Azure Databricks job and exception handling

Hi,I'm working on Azure Databricks and I created two jobs, one based on a python wheel and the other based on a notebook, with the same code. The code get data from Azure blob storage, process data with pyspark and send data to EventHub. The whole co...

  • 1846 Views
  • 0 replies
  • 0 kudos
Garvita1
by New Contributor II
  • 3202 Views
  • 5 replies
  • 2 kudos

Databricks Certified Data Engineer Associate Certificate and Badge not received

I have attempted the exam and also got passed but I have not received the badge and certificate. I have also raised the request but I have not got any response yet. It is urgently required. I request the databrick team to provide me with the same as ...

  • 3202 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Garvita Kumari​ Just a friendly follow-up. Are you able to get your certification? If yes, then mark the answer as best or if you need further assistance kindly let me know.Thanks and Regards

  • 2 kudos
4 More Replies
Manimkm08
by New Contributor III
  • 3330 Views
  • 3 replies
  • 0 kudos

Jobs are failed with AWS_INSUFFICIENT_FREE_ADDRESSES_IN_SUBNET_FAILURE

We have assigned 3 dedicated subnets (one per AZ ) to the Databricks workspace each with /24 CIDR but noticed that all the jobs are running into a single subnet which causes AWS_INSUFFICIENT_FREE_ADDRESSES_IN_SUBNET_FAILURE.Is there a way to segregat...

  • 3330 Views
  • 3 replies
  • 0 kudos
Latest Reply
Manimkm08
New Contributor III
  • 0 kudos

@karthik p​ Have configured one subnet per AZ(total 3). Have followed the same steps as mentioned in the document. Is there a way to check whether the Databricks uses all the subnets or not?@Debayan Mukherjee​ am not getting how to use LB in this set...

  • 0 kudos
2 More Replies
berserkersap
by Contributor
  • 14124 Views
  • 3 replies
  • 5 kudos

What is the timeout for dbutils.notebook.run, timeout = 0 ?

Hello everyone,I have several notebooks (around 10) and I want to run them in a sequential order. At first I thought of using %run but I have a variable that is repeatedly used in every notebook. So now I am thinking to pass that variable from one ma...

image
  • 14124 Views
  • 3 replies
  • 5 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 5 kudos

Hi @pavan venkata​ Yes, as the document says 0 means no timeout. It means that the notebook will take it's sweet time to complete execution without throwing an error due to a time limit. Be it if the notebook takes 1 min or 1 hour or 1 day or more. H...

  • 5 kudos
2 More Replies
Databrickguy
by New Contributor II
  • 7116 Views
  • 6 replies
  • 3 kudos

Resolved! How to parse/extract/format a string based a pattern?

How to parse, extract or form a string based on a pattern?SQL server has a function which will format the string based on a pattern. example,a string is "abcdefgh", the pattern is XX-XX-XXXX,the the string will be "ab-cd-efgh".How to archive this wit...

  • 7116 Views
  • 6 replies
  • 3 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 3 kudos

@Tim zhang​ ,thanks for your code, and here is your answer  I asked this question in Stackoverflow and got this answer Here is the Stackoverflow link- https://stackoverflow.com/questions/74845760/how-to-parse-a-pattern-and-use-it-to-format-a-string-u...

  • 3 kudos
5 More Replies
Pat
by Honored Contributor III
  • 8204 Views
  • 5 replies
  • 9 kudos

Reading data from "dbfs:/mnt/"

Hi community,I don't know what is happening TBH. I have a use case where data is written to the location "dbfs:/mnt/...", don't ask me why it's mounted, it's just a side project. I do believe that data is stored in ADLS2.I've been trying to read the ...

  • 8204 Views
  • 5 replies
  • 9 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 9 kudos

this is really interesting never faced this type od situation @Pat Sienkiewicz​  can you please share whole code by that we can test and debug this in our systemThanksAviral

  • 9 kudos
4 More Replies
bakselrud
by New Contributor III
  • 12387 Views
  • 12 replies
  • 2 kudos

Resolved! DLT pipeline failure - Detected a data update... This is currently not supported

We are using DLT pipeline in Databricks workspace hosted by Microsoft Azure platform which is failing intermittently and for unclear reason.The pipeline is as follows:spark.readStream.format("delta").option("mergeSchema", "true").option("ignoreChange...

  • 12387 Views
  • 12 replies
  • 2 kudos
Latest Reply
bakselrud
New Contributor III
  • 2 kudos

Ok, so after doing some investigation on the way to resolving my original question, I think we're getting some clarity after all.Consider the following data frame that is ingested by DLT streaming pipeline:dfMock = spark.sparkContext.parallelize([[1,...

  • 2 kudos
11 More Replies
SM14
by New Contributor
  • 1642 Views
  • 1 replies
  • 0 kudos

Row Level Validation

I have two array one of devl other one is prod.Inside this there are many tables .How do i compare and check the count difference.Wanted to create a automated script so as to check the count difference and perform row level validation.Pyspark script...

  • 1642 Views
  • 1 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, You can use except command for the same. Please refer: https://stackoverflow.com/questions/70366209/databricks-comparing-two-tables-to-see-which-records-are-missing. Please let us know if this helps.

  • 0 kudos
g96g
by New Contributor III
  • 1613 Views
  • 2 replies
  • 1 kudos

Databricks SQL permission problems

We are using a catalog and normally I have the ALL PREVELAGES user status but Im not able to modify the SQL script which is created by some of my colleagues. They have to give me an access and after that Im able to modify. How can I solve this proble...

  • 1613 Views
  • 2 replies
  • 1 kudos
Latest Reply
Debayan
Databricks Employee
  • 1 kudos

Hi, when you are not able to modify could you please confirm the error you are receiving? Also, you can refer to https://docs.databricks.com/_static/notebooks/set-owners-notebook.html and https://docs.databricks.com/sql/admin/transfer-ownership.html

  • 1 kudos
1 More Replies
Viren123
by Contributor
  • 6394 Views
  • 5 replies
  • 6 kudos

API to write into Databricks tables

Hello Experts,Is there any API in databricks that allows to write the data in the Databricks tables. I would like to send small size Logs information to Databricks tables from other service. What are my options?Thank you very much.

  • 6394 Views
  • 5 replies
  • 6 kudos
Latest Reply
jneira
New Contributor III
  • 6 kudos

and what about use the jdbc/odbc driver, either programatically or using a tool like dbeaver?​

  • 6 kudos
4 More Replies
CHANDAN_NANDY
by New Contributor III
  • 5041 Views
  • 2 replies
  • 4 kudos

Resolved! GitCopilot Support

Any Idea why GitCopiot is not available in Azure Databricks, though it supports Github? 

  • 5041 Views
  • 2 replies
  • 4 kudos
Latest Reply
nightcoder
New Contributor II
  • 4 kudos

That is true (this is not an answer but a comment) vscode is supported. But vscode is not integrating with notebook on aws. When will this feature be available?

  • 4 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 2575 Views
  • 3 replies
  • 0 kudos

Resolved! For the Autoloader, cloudFiles.includeExistingFiles option, is ordering respected?

If Yes, how is order ensured?  For example, let's say there are a number of CDC change files that are uploaded to a directory over time. If a table were to be created using the cloudFiles source, in what order would those files be processed?

  • 2575 Views
  • 3 replies
  • 0 kudos
Latest Reply
Hanish_Goel
New Contributor II
  • 0 kudos

Hi, Is there any new development in terms of ensuring ordering of the files in autoloader?

  • 0 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels