cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

akanksha_gupta
by New Contributor II
  • 2216 Views
  • 2 replies
  • 0 kudos

ERROR : Failure starting repl. Try detaching and re-attaching the notebook. Getting this error when running any python command in 10.4LTS cluster configured with https://github.com/mspnp/spark-monitoring to send databricks spark logs to Log Analytics.

ERROR Description :java.lang.Exception: Cannot run program "/local_disk0/pythonVirtualEnvDirs/virtualEnv-5acc1ea9-d03f-4de3-b76b-203d42614000/bin/python" (in directory "."): error=2, No such file or directory at java.lang.ProcessB...

  • 2216 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Akanksha Gupta​ :The error message suggests that the Python executable file specified in the configuration of the Databricks cluster cannot be found or accessed. Specifically, it seems that the Python executable file at the path "/local_disk0/python...

  • 0 kudos
1 More Replies
Raghu1216
by New Contributor II
  • 1830 Views
  • 3 replies
  • 0 kudos

Issue withpassing parameters to the queries in spark sql temporary function

I have created a function like belowcreate function test(location STRING, designation STRING, name STRING)RETURNS TABLE (cnt INT)RETURN(SELECT CASE WHEN location = 'INDIA' THEN (SELECT COUNT(*) FROM tbl_customers where job_role = design...

  • 1830 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Raghu Dandu​ :The error message suggests that the column "designation" does not exist in the table "tbl_customers". There could be several reasons for this error, such as a typo in the column name, a missing or deleted column, or a difference in the...

  • 0 kudos
2 More Replies
Anonymous
by Not applicable
  • 13209 Views
  • 2 replies
  • 7 kudos

�� �� Weekly Raffle to Win Ticket to Data + AI Summit 2023 �� ��  NO PURCHASE NECESSARY TO ENTER OR WIN. A PURCHASE OF ANY...

Weekly Raffle to Win Ticket to Data + AI Summit 2023 NO PURCHASE NECESSARY TO ENTER OR WIN. A PURCHASE OF ANY KIND WILL NOT INCREASE YOUR CHANCES OF WINNING. VOID WHERE PROHIBITED.We are giving away one ticket to Data + AI Summit 2023 every week ...

  • 13209 Views
  • 2 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

@Fjoraldo Mamutaj​ , @Aviral Bhardwaj​ and @Shubham Soni​ â€‹ : we have emailed you about some clarifications on your week 1 participation . Request you to reply to my email

  • 7 kudos
1 More Replies
Anonymous
by Not applicable
  • 1053 Views
  • 2 replies
  • 0 kudos

Tuesday get to know each other series: Hey everyone! I'm curious to know what tech books you've been reading lately. As a community of tech en...

Tuesday get to know each other series:Hey everyone! I'm curious to know what tech books you've been reading lately. As a community of tech enthusiasts, I'm sure we all have some great recommendations to share with each other.To get the ball rolling, ...

  • 1053 Views
  • 2 replies
  • 0 kudos
Latest Reply
Serlal
New Contributor III
  • 0 kudos

If you are interested in distributed machine learning I would suggest Scaling Machine Learning with Spark by Adi Pola k . It is just out so it has all the new goodies including the latest in ML Flow. I am half way though it already but it looks like ...

  • 0 kudos
1 More Replies
User16618471166
by New Contributor II
  • 4444 Views
  • 3 replies
  • 1 kudos

When I aggregate over more data, I get the below error message. I've tried multiple ways of diagnosis like going back to a version I know it was w...

When I aggregate over more data, I get the below error message. I've tried multiple ways of diagnosis like going back to a version I know it was working fine (but still got the same error below). Please advise as this is a critical report where the b...

  • 4444 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Jeff Wu​ :The error message suggests that there is a syntax error in a SQL statement, specifically near the end of the input. Without the full SQL statement or additional information, it's difficult to pinpoint the exact cause of the error. However,...

  • 1 kudos
2 More Replies
andrew0117
by Contributor
  • 3442 Views
  • 4 replies
  • 2 kudos

Resolved! master notebook cannot find the udf registered in the child notebook

The master notebook is calling a child notebook using dbutils.notebook.run("PathToChildnotebook"). The child notebook defines a user-defined function (UDF) and registers it using spark.udf.register. However, when the child notebook finishes running a...

  • 3442 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@andrew li​ :The reason why the UDF cannot be found is that when the child notebook finishes running, the Spark context that was used to define and register the UDF is destroyed. Therefore, the UDF is no longer available in the Spark context used by ...

  • 2 kudos
3 More Replies
ayesharahmat
by New Contributor II
  • 2671 Views
  • 3 replies
  • 2 kudos

AutoLoader issue - java.lang.AssertionError

The below error I am encountering . I am using microbatch for autoloader. please help to rectify this issuejava.lang.AssertionError: assertion failed: Invalid batch: path#36188,modificationTime#36189,length#36190L,content#36191,PROVIDER#36192,LOCATIO...

  • 2671 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Ayesha Rahmatali​ :The error message you provided suggests that there is an assertion failure due to invalid batch data in your AutoLoader implementation. The error specifically indicates that the schema of the incoming data is not matching with the...

  • 2 kudos
2 More Replies
Data_Engineer3
by Contributor III
  • 11126 Views
  • 4 replies
  • 5 kudos

How can i use the same spark session from onenotebook to another notebook in databricks

I want to use the same spark session which created in one notebook and need to be used in another notebook in across same environment, Example, if some of the (variable)object got initialized in the first notebook, i need to use the same object in t...

  • 11126 Views
  • 4 replies
  • 5 kudos
Latest Reply
Manoj12421
Valued Contributor II
  • 5 kudos

You can use %run and then use the location of the notebook - %run "/folder/notebookname"

  • 5 kudos
3 More Replies
yopbibo
by Contributor II
  • 3485 Views
  • 2 replies
  • 0 kudos

pip install in cluster using web UI and extra index

In an init script or a notebook, we can:pip install --index-url=<our private pypi url> --extra-index-url=https://pypi.org/simple <a module>In the cluster web UI (libraries -> install library), we can give only the url of our private repository, but n...

  • 3485 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Philippe CRAVE​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

  • 0 kudos
1 More Replies
Sabytheseeker
by New Contributor
  • 749 Views
  • 1 replies
  • 0 kudos

I just passed the Lakehouse Fundamentals Accreditation and I haven&#39;t received my badge yet and the certification seems to be messed up 

I just passed the Lakehouse Fundamentals Accreditation and I haven't received my badge yet and the certification seems to be messed up 

  • 749 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Sabyasachi Samaddar​ We are going through a contract renewal with our vendor, Accredible. Once our new contract goes through you will get your badge this week.Thank you for understanding.

  • 0 kudos
kris08
by New Contributor
  • 1880 Views
  • 1 replies
  • 0 kudos

Kafka consumer groups in Databricks

I was trying to find information about configuring the consumer groups for kafka stream in databricks. By doing so I want to parallelize the stream and load it into databricks tables. Does the databricks handle this internally? If we can configure th...

  • 1880 Views
  • 1 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, we have a few examples on stream processing using Kafka (https://docs.databricks.com/structured-streaming/kafka.html), there is no straight public document for Kafka consumer group creation. You can refer to https://kafka.apache.org/documentation...

  • 0 kudos
Data_Analytics1
by Contributor III
  • 3254 Views
  • 4 replies
  • 2 kudos

Delta table property is not set.

I have set the delta table property at cluster level.spark.databricks.delta.retentionDurationCheck.enabled falseWhen I create a new table, retentionDurationCheck property is not shown in the table details. But when I set this with ALTER TABLE for a s...

  • 3254 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Mahesh Chahare​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

  • 2 kudos
3 More Replies
Sachinbt
by New Contributor II
  • 1559 Views
  • 2 replies
  • 2 kudos

DataBricks Certification Exam Got Suspended. Need help in resolving the issue

Hi Team,My databricks exam got suspened on 16th April today Morning and it is still in the suspended state. I have raised a support request using the below linkhttps://help.databricks.com/s/contact-us?ReqType=training .​ but I haven’t received the ti...

image.png
  • 1559 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Sachin Kumara​ We are going through a contract renewal with our vendor, Accredible. Once our new contract goes through you will get your badge this week. Thank you for understanding!

  • 2 kudos
1 More Replies
tytytyc26
by New Contributor II
  • 2471 Views
  • 3 replies
  • 0 kudos

Resolved! Problem with accessing element using Pandas UDF in Image Processing

Hi everyone,I was stuck at this for very long time. Not a very familiar user of using Spark for image processing. I was trying to resize images that are loaded into a Spark DF. However, it keeps throwing error that I am not able to access the element...

  • 2471 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

 @Yan Chong Tan​ :The error you are facing is due to the fact that you are trying to access the attribute "width" of a string object in the resize_image function. Specifically, input_dim is a string object, but you are trying to access its width attr...

  • 0 kudos
2 More Replies
andrew0117
by Contributor
  • 5200 Views
  • 4 replies
  • 0 kudos

Resolved! partition on a csv file

When I use SQL code like "create table myTable (column1 string, column2 string) using csv options('delimiter' = ',', 'header' = 'true') location 'pathToCsv'" to create a table from a single CSV file stored in a folder within an Azure Data Lake contai...

  • 5200 Views
  • 4 replies
  • 0 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 0 kudos

Hi @andrew li​, When you specify a path with LOCATION keyword, Spark will consider that to be an EXTERNAL table. So when you dropped the table, you underlying data if any will not be cleared. So in you case, as this is an external table, you folder s...

  • 0 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels