Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Has anyone successfully connected to a DB2 database on ZOS from a Databricks cluster using a JDBC connection?I also need to specify an SSL certificate path and not sure if I need to use an init script on the cluster to do so.Any examples would be ver...
Hello,I have an ETL pipeline in Databricks that works perfectly when I execute it manually in the notebook using an all-purpose cluster. However, when I try to schedule it using a job cluster, it fails immediately with the error message: 'Azure conta...
Hello.I'm getting a cluster validation error while trying to deploy a DLT pipeline via DAB.See attached screenshots for config and error.Hoping someone has run into this before and can guide me.Thanks.
Hey @ahab,Were you able to solve this issue, if yes would you mind sharing your finds If not, looking at your cluster policy, I'm seeing that you are using dlt as the cluster type which is the correct needed type for DLT pipelines, but in the resourc...
DBR 14.3, Spark 3.5.0. We use AWS Glue Metastore.On August 20th some of our pipelines started timing out during write to a Delta table. We're experiencing many hours of driver executing post commit hooks. We write dataframes to delta with `mode=overw...
spark.databricks.delta.catalog.update.enabled=true setting helped but I still don't understand why the problem started to occur.https://docs.databricks.com/en/archive/external-metastores/external-hive-metastore.html#external-apache-hive-metastore-leg...
I am trying to write logs to delta table, but after running for sometime the job is getting stuck at saveAsTable.Traceback (most recent call last):File "/databricks/spark/python/pyspark/errors/exceptions.py", line 228, in decoreturn f(*a, **kw)File ...
Hey @bhakti ! Please provide the full stack trace / error message. Your log doesn't provide any strong clue, the failure during write might occur for various reasons.
I have a large Notebook and want to divide it into multiple Notebooks and use Databricks jobs to run parallelly. However, one of the notebook is using a dataframe from one of the notebooks, so it has to be run downstream of the other ones. Now, since...
Hi @Amodak91, you could use the %run magic command from within the downstream notebook and call the upstream notebook thus having it run in the same context and have all it's variables accessible including the dataframe without needing to persist it....
Hello,We are trying to use this library (https://github.com/GoogleCloudDataproc/spark-bigquery-connector) to read Bigquery data from a Databricks cluster in Azure, could someone confirm if this library is fully available and supported on Databricks? ...
Hi @JaviPA ,Documentation referes to the library you're going to use:GitHub - GoogleCloudDataproc/spark-bigquery-connector: BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.Also, it...
Hello everyone,I am using UCX for the migration to Unity, and I've noticed that re-running the assessment does not update the dashboards with jobs that are incompatible with Unity. To get the dashboards updated, I had to uninstall and reinstall UCX, ...
Hello Team,I have file contain in txt format of 1.2gb file.I am trying to upload the data into ms sql server database table.Only 10% of the data able to upload it.example :Total records in a file : 51303483Number of records inserted :10224430I am usi...
We had error in DABs deploy and then subsequent retries resulted in a locked stateAnd as suggested in the logs, we use --force-lock option and the deploy succeededHowever, it created duplicate jobs for all assets in the bundle instead of updating the...
Is it possible to attach a notebook to cluster and run it via the REST API?The closest approach I have found is to run a notebook, export the results (HTML!) and import it into the workspace again, but this does not allow us to retain the original ex...
I'm looking for a way to programmatically copy a notebook in Databricks using the workspace/export and workspace/import APIs. Once the notebook is copied, I want to automatically attach it to a specific cluster using its cluster ID. The challenge is ...
@Sujitha I am happy to see workflows maturing day by day; this is going to be a game changer for the market. I am also very excited about the upcoming feature, Lakeflow.
Hello Everyone.First of all, I would like to thank you to databricks to enable system tables for customers. It does help a lot. I am working on cost optimization topic. Particularly sql warehouse serverless. I am not sure all of you have tried system...
Hey VIRALKUMAR,
I recommend using the billing usage system table to find total DBUs by SKU (SQL) and the pricing system table to find the appropriate price. You can use the sample queries in those pages to get started.
Hope that's helpful!
While DLT has some powerful features, I found myself doing a double-take when I realized it doesn’t natively support hard deletes. Instead, it leans on a delete flag identifier to manage these in the source table. A bit surprising for a tool of its c...
I'm facing an error in Delta Live Tables when I want to pivot a table. The error is the following: And the code to replicate the error is the following:import pandas as pd
import pyspark.sql.functions as F
pdf = pd.DataFrame({"A": ["foo", "foo", "f...
It's said in the DLT documentation that "pivot" is not supported in DLT but I noticed that if you want the pivot function to work you have to do one of the the following things:apply the pivot in your first a dlt.view + the config "spark.databricks.d...
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.