Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hi,I'm connecting to a Databricks instance on Azure from a Windows Application using Simba ODBC driver, and when running SQL statements on delta tables, like INSERT, UPDATE, DELETE commands using Execute, the result doesn't indicate the no. of rows a...
I am running the following structured streaming Scala code in DB 13.3LTS job: val query = spark.readStream.format("delta")
.option("ignoreDeletes", "true")
.option("maxFilesPerTrigger", maxEqlPerBatch)
.load(tblPath)
.writeStream
.qu...
I have "Git provider" job created and running fine on the remote git. The problem is that I have to manually trigger it. Is there a way to run the job automatically whenever a new commit to the branch? (In "Schedules & Triggers section", I can find a...
Hello,This is question on our platform with `Databricks Runtime 11.3 LTS`.I'm running a Job with multiple tasks in // using a shared cluster.Each task runs a dedicated scala class within a JAR library attached as a dependency.One of the task fails (c...
Hi,This actually should not be marked as solved. We are having the same problem, whenever a Shared Job Cluster crashes for some reason (generally OoM), all tasks will start failing until eternity, with the error message as described above. This is ac...
I am trying to ingest data into Databricks with Kafka. I have Kafka installed in a Virtual Machine where I already have the data I need in a Kafka Topic stored as json. In Databricks, I have the following code:```df = (spark.readStream
.format("kaf...
you need to check the driver's logs when your streaming is initializing. Please check the log4j output for the driver's logs. If there is an issue connecting to your Kafka broker, you will be able to see it
ConcurrentAppendException requires a good partitioning strategy, here my logic works without fault for "whenMatchedUpdate" and "whenNotMatchedInsert" logic. When using "whenNotMatchedBySourceUpdate" however it seems that the condition doesn't isolate...
Vectorized Pandas UDFs offer improved performance compared to standard PySpark UDFs by leveraging the power of Pandas and operating on entire columns of data at once, rather than row by row.They provide a more intuitive and familiar programming inter...
Getting this error in Databricks and don't know how to solveOSError: [Errno 7] Argument list too long: '/dbfs/databricks/aaecz/dev/w000aaecz/etl-framework-adb/0.4.31-20230503.131701-1/etl_libraries/utils/datadog/restart_datadog.sh'if anyone can help
@MUA
Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.
I am trying to troubleshoot why spill occurred during DeltaOptimizeWrite. I am running a 64-core cluster with 256 GB RAM, which I expect to be handle this amount data (see attached DAG).
You can resolver the Spill to memory by increasing the shuffle partitions, but 16 GB of spill memory should not create a major impact of your job execution. Could you share more details on the actual source code that you are running?
Got below failure on scheduled job on interactive cluster and the next scheduled run executed fine.I want to know why this error occurred and how can I prevent it to happen again.And how to debug these errors in future ? com.databricks.backend.commo...
@JKR Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.
Hello, I have a git repository on Databricks with notebooks that are meant to be shared with other users. The reason these notebooks are in git as opposed to the "shared" workspace already is because they are to be continuously improved and need sepa...
Hello,
Thanks for contacting Databricks Support.
I presume you're looking to transfer files from external repositories to Databricks workspace. I'm afraid currently there is no direct support on it.
You may consider to use REST API which allows for...
Hi We are experiencing instability when executing queries using Databricks Connect.Sometimes we are unable to receive the full result set without encountering an error with the message "Driver is up but is not responsive..."When we run the same query...
Hello Team,I am trying to use the below libraries in data bricks .But they are not supporting.import com.microsoft.spark.sqlanalyticsfrom com.microsoft.spark.sqlanalytics.Constants import ConstantsPlease advise the correct Libraries nameRegardsRohit
Hi @Rohit Kulkarni​ Yes, This module is not supported in databricks. May I know the use case behind using this library in databricks?FYI: To access Azure Synapse from Databricks using the Azure Synapse connector.Check the below doc: https://docs.data...
We use pyspark streaming listener and it is lagging for 10 hrsThe data streamed in 10 am IST is logged at 10 PM IstCan someone explain how logging listener interface work
If you're experiencing lag in a Spark Streaming application, there are several potential reasons and corresponding solutions you can try:1. **Resource Allocation**:- **Insufficient Resources**: Make sure that you have allocated enough resources (CPU,...
Hi Team,I am building a DLT pipeline and planning to use APPLY_CHANGES from Bronze to Silver. In the bronze table, a column has a json value. This value contains questions and answers as key, value pair and can change depending on list of questions h...