cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Sricharan05
by New Contributor III
  • 353 Views
  • 4 replies
  • 2 kudos

Databricks Certified Associate Developer Exam Got Suspended. Require support for the same.

Request #00482566Hello Team, I encountered Pathetic experience while attempting my 1st Databricks certification. I had some network issues and lighting issues. My test was stopped in the middle and I was connected with the proctor for reviewing. As r...

  • 353 Views
  • 4 replies
  • 2 kudos
Latest Reply
Sricharan05
New Contributor III
  • 2 kudos

Hi @Kaniz @Sujitha @APadmanabhan @Cert-Team @Cert-Bricks @Cert-TeamOPS    I have been waiting for more than 40+ hours since I raised my ticket. Till now I dint get any response from the support team nor from anyone. Can you please escalate this issue...

  • 2 kudos
3 More Replies
zero234
by New Contributor III
  • 2283 Views
  • 3 replies
  • 2 kudos

Resolved! i have created a materialized view table using delta live table pipeline and its not appending data

i have created a materialized view table using delta live table pipeline , for some reason it is overwriting data every day , i want it to append data to the table instead of doing full refresh suppose i had 8 million records in table and if irun the...

  • 2283 Views
  • 3 replies
  • 2 kudos
Latest Reply
kulkpd
Contributor
  • 2 kudos

@zero234 ,Adding some suggestion based on answers from @Kaniz_Fatma. Important point to note here: "To define a materialized view in Python, apply @table to a query that performs a static read against a data source. To define a streaming table, apply...

  • 2 kudos
2 More Replies
alonisser
by Contributor
  • 375 Views
  • 2 replies
  • 1 kudos

Since moving to dbr 14.3 with python jobs I don't see the stack trace for exceptions

or even the logs don't contain the error line I see (downloaded all logs file from the UI and checked them)How can I see the stacktrace? it's essential to debug certain issues

  • 375 Views
  • 2 replies
  • 1 kudos
Latest Reply
alonisser
Contributor
  • 1 kudos

Thanks for the answer, but i fail to see what it has to do with my questions. it's not a "general python error", I run lots of jobs with python on Databricks clusters and know how to run python jobs and dependencies, I'm pointing to a specific issue ...

  • 1 kudos
1 More Replies
dzsuzs
by New Contributor II
  • 347 Views
  • 2 replies
  • 1 kudos

OOM Issue in Streaming with foreachBatch()

I have a stateless streaming application that uses foreachBatch. This function executes between 10-400 times each hour based on custom logic.  The logic within foreachBatch includes: collect() on very small DataFrames (a few megabytes) --> driver mem...

  • 347 Views
  • 2 replies
  • 1 kudos
Latest Reply
xorbix_rshiva
New Contributor III
  • 1 kudos

From the information you provided, your issue might be resolved by setting a watermark on the streaming dataframe. The purpose of watermarks is to set a maximum time for records to be retained in state. Without a watermark, records in your state will...

  • 1 kudos
1 More Replies
shanebo425
by New Contributor III
  • 562 Views
  • 2 replies
  • 0 kudos

Saving Widgets to Git

We use Databricks widgets in our python notebooks to pass parameters in jobs but also for when we are running the notebooks manually (outside of a job context) for various reasons. We're a small team, but I've noticed that when I create a notebook an...

  • 562 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@shanebo425 You can add your widgets to the code, ex:dbutils.widgets.text("test", "") dbutils.widgets.get("test") Remember that the cell with widget needs to be run in order for widgets to be actually visible in a notebook.

  • 0 kudos
1 More Replies
avrm91
by New Contributor III
  • 5127 Views
  • 3 replies
  • 1 kudos

Resolved! XML DLT Autoloader - Ingestion of XML Files

I want to ingest multiple XML files with varying but similar structures without defining a schema.For example:   <?xml version="1.0" encoding="ISO-8859-1"?> <LIEFERUNG> <ABSENDER> <RZLZ>R00000001</RZLZ> <NAME>Informatik GmbH </NAME> <ST...

  • 5127 Views
  • 3 replies
  • 1 kudos
Latest Reply
avrm91
New Contributor III
  • 1 kudos

@Kaniz_Fatma Thanks a lot.I found an issue in from_xml function.I posted above: SELECT from_xml(CONCAT('<ABSENDER>', ABSENDER, '</ABSENDER>'), schema_of_xml(' <ABSENDER> <RZLZ>R00000001</RZLZ> <NAME>Informatik GmbH</NAME> <STRASSE>M...

  • 1 kudos
2 More Replies
daindana
by New Contributor III
  • 3880 Views
  • 8 replies
  • 3 kudos

Resolved! How to preserve my database when the cluster is terminated?

Whenever my cluster is terminated, I lose my whole database(I'm not sure if it's related, I made those database with delta format. ) And since the cluster is terminated in 2 hours from not using it, I wake up with no database every morning.I don't wa...

  • 3880 Views
  • 8 replies
  • 3 kudos
Latest Reply
dhpaulino
New Contributor II
  • 3 kudos

 As the file still in the dbfs you can just recreate the reference of your tables and continue the work, with something like this:db_name = "mydb" from pathlib import Path path_db = f"dbfs:/user/hive/warehouse/{db_name}.db/" tables_dirs = dbutils.fs....

  • 3 kudos
7 More Replies
lnsnarayanan
by New Contributor II
  • 7203 Views
  • 8 replies
  • 11 kudos

Resolved! I cannot see the Hive databases or tables once I terminate the cluster and use another cluster.

I am using Databricks community edition for learning purposes. I created some Hive-managed tables through spark sql as well as with df.saveAsTable options. But when I connect to a new cluser, "Show databases" only returns the default database....

  • 7203 Views
  • 8 replies
  • 11 kudos
Latest Reply
dhpaulino
New Contributor II
  • 11 kudos

As the file still in the dbfs you can just recreate the reference of your tables and continue the work, with something like this:db_name = "mydb" from pathlib import Path path_db = f"dbfs:/user/hive/warehouse/{db_name}.db/" tables_dirs = dbutils.fs.l...

  • 11 kudos
7 More Replies
DBUser2
by New Contributor II
  • 337 Views
  • 2 replies
  • 0 kudos

Simba Spark ODBC driver .NET core compatibility

HiIs the Simba Spark ODBC driver (2.08.00.1002) compatible with .NET core?  

  • 337 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Honored Contributor
  • 0 kudos

Hi @DBUser2 , I checked the official doc https://www.databricks.com/spark/odbc-drivers-download we currently provide  Simba Apache Spark ODBC Connector 2.8.0 In the archives as well it is available until 2.6.15 https://www.databricks.com/spark/odbc-d...

  • 0 kudos
1 More Replies
v01d
by New Contributor III
  • 927 Views
  • 2 replies
  • 0 kudos

Databricks Auto Loader authorization exception

Hello,I'm trying to process the DB Auto Loader with notifications=true option (Azure ADLS) and get not clear authorization error. The exception log attached.Looks like all required permission are provided to the service principle: 

Screenshot_2024-06-01_at_14_32_06.png
  • 927 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @v01d, There can be three probable causes -  The service principal used for authentication lacks the necessary permissions. Confirm that the service principal has the required permissions on the ADLS.Specifically, ensure that it has Read permissio...

  • 0 kudos
1 More Replies
AkasBala
by New Contributor III
  • 1714 Views
  • 3 replies
  • 0 kudos

Primary Key not working as expected on Unity Catalog delta tables

Hi @Chetan Kardekar. I noticed that you had commented on Primary key on Delta tables. Do we have that feature already released in DataBricks Premium. I have a Unity Catalog and I created a table with Primary Key, though it doesnt act like Primary Key...

  • 1714 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Bala Akas​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
2 More Replies
paranoid_jvm
by New Contributor
  • 653 Views
  • 1 replies
  • 0 kudos

Spark tasks getting stick on one executor

Hi All,I am running a Spark job using cluster with 8 executor with 8 cores each. The job involves execution of UDF. The job processes rows in few 100 thousands. When I run the job, each executor is assigned 8 job each. Usually the job succeeds in les...

  • 653 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @paranoid_jvm,  Timeout exceptions can occur when the executor is under memory constraint or facing out-of-memory (OOM) issues while processing data. This can impact the garbage collection process, causing further delays.Consider increasing the ex...

  • 0 kudos
Erik
by Valued Contributor II
  • 4873 Views
  • 9 replies
  • 10 kudos

Resolved! How to use dbx for local development.

​Databricks connect is a program which allows you to run spark code locally, but the actual execution happens on a spark cluster. Noticeably, it allows you to debug and step through the code locally in your own IDE. Quite useful. But it is now beeing...

  • 4873 Views
  • 9 replies
  • 10 kudos
Latest Reply
FeliciaWilliam
Contributor
  • 10 kudos

I found answers to my questions here

  • 10 kudos
8 More Replies
Himanshu4
by New Contributor II
  • 950 Views
  • 4 replies
  • 2 kudos

Inquiry Regarding Enabling Unity Catalog in Databricks Cluster Configuration via API

Dear Databricks Community,I hope this message finds you well. I am currently working on automating cluster configuration updates in Databricks using the API. As part of this automation, I am looking to ensure that the Unity Catalog is enabled within ...

  • 950 Views
  • 4 replies
  • 2 kudos
Latest Reply
Himanshu4
New Contributor II
  • 2 kudos

Hi RaphaelCan we fetch job details from one workspace and create new job in new workspace with the same "job id" and configuration?

  • 2 kudos
3 More Replies
fury-kata
by New Contributor II
  • 540 Views
  • 2 replies
  • 0 kudos

ModuleNotFoundError when run with foreachBatch on serverless mode

I using Notebooks to do some transformations I install a new whl:  %pip install --force-reinstall /Workspace/<my_lib>.whl %restart_python  Then I  successfully import the installed lib  from my_lib.core import test  However when I run my code with fo...

  • 540 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @fury-kata,  Make sure that the path to your custom module is correctly added to the Python path (sys.path). You mentioned installing the .whl file, so ensure that the installation path is accessible from your Databricks notebook.Verify that th...

  • 0 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels