cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Karl
by New Contributor II
  • 207 Views
  • 0 replies
  • 0 kudos

DB2 JDBC Connection from Databricks cluster

Has anyone successfully connected to a DB2 database on ZOS from a Databricks cluster using a JDBC connection?I also need to specify an SSL certificate path and not sure if I need to use an init script on the cluster to do so.Any examples would be ver...

  • 207 Views
  • 0 replies
  • 0 kudos
bantarobugs
by New Contributor
  • 213 Views
  • 0 replies
  • 0 kudos

Job Run failure - Azure Container does not exist

Hello,I have an ETL pipeline in Databricks that works perfectly when I execute it manually in the notebook using an all-purpose cluster. However, when I try to schedule it using a job cluster, it fails immediately with the error message: 'Azure conta...

Screenshot 2024-08-28 154926.png
  • 213 Views
  • 0 replies
  • 0 kudos
ahab
by New Contributor
  • 1340 Views
  • 1 replies
  • 0 kudos

Error deploying DLT DAB: Validation failed for cluster_type, the value must be dlt (is "job")

Hello.I'm getting a cluster validation error while trying to deploy a DLT pipeline via DAB.See attached screenshots for config and error.Hoping someone has run into this before and can guide me.Thanks.

  • 1340 Views
  • 1 replies
  • 0 kudos
Latest Reply
MohcineRouessi
New Contributor II
  • 0 kudos

Hey @ahab,Were you able to solve this issue, if yes would you mind sharing your finds If not, looking at your cluster policy, I'm seeing that you are using dlt as the cluster type which is the correct needed type for DLT pipelines, but in the resourc...

  • 0 kudos
ivanychev
by Contributor II
  • 1542 Views
  • 3 replies
  • 0 kudos

Resolved! Delta table takes too long to write due to S3 full scan

DBR 14.3, Spark 3.5.0. We use AWS Glue Metastore.On August 20th some of our pipelines started timing out during write to a Delta table. We're experiencing many hours of driver executing post commit hooks. We write dataframes to delta with `mode=overw...

  • 1542 Views
  • 3 replies
  • 0 kudos
Latest Reply
ivanychev
Contributor II
  • 0 kudos

spark.databricks.delta.catalog.update.enabled=true setting helped but I still don't understand why the problem started to occur.https://docs.databricks.com/en/archive/external-metastores/external-hive-metastore.html#external-apache-hive-metastore-leg...

  • 0 kudos
2 More Replies
bhakti
by New Contributor II
  • 793 Views
  • 7 replies
  • 0 kudos

Databricks job getting stuck at saveAsTable

I am trying to write logs to delta table,  but after running for sometime the job is getting stuck at saveAsTable.Traceback (most recent call last):File "/databricks/spark/python/pyspark/errors/exceptions.py", line 228, in decoreturn f(*a, **kw)File ...

  • 793 Views
  • 7 replies
  • 0 kudos
Latest Reply
ivanychev
Contributor II
  • 0 kudos

Hey @bhakti ! Please provide the full stack trace / error message. Your log doesn't provide any strong clue, the failure during write might occur for various reasons. 

  • 0 kudos
6 More Replies
Amodak91
by New Contributor II
  • 500 Views
  • 1 replies
  • 4 kudos

Resolved! How to use a Dataframe created in one Notebook from Another, without writing it anywhere ?

I have a large Notebook and want to divide it into multiple Notebooks and use Databricks jobs to run parallelly. However, one of the notebook is using a dataframe from one of the notebooks, so it has to be run downstream of the other ones. Now, since...

  • 500 Views
  • 1 replies
  • 4 kudos
Latest Reply
menotron
Valued Contributor
  • 4 kudos

Hi @Amodak91, you could use the %run magic command from within the downstream notebook and call the upstream notebook thus having it run in the same context and have all it's variables accessible including the dataframe without needing to persist it....

  • 4 kudos
JaviPA
by New Contributor
  • 323 Views
  • 1 replies
  • 0 kudos

Spark Bigquery Connector Availability

Hello,We are trying to use this library (https://github.com/GoogleCloudDataproc/spark-bigquery-connector) to read Bigquery data from a Databricks cluster in Azure, could someone confirm if this library is fully available and supported on Databricks? ...

  • 323 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @JaviPA ,Documentation referes to the library you're going to use:GitHub - GoogleCloudDataproc/spark-bigquery-connector: BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.Also, it...

  • 0 kudos
RohitKulkarni
by Contributor II
  • 1319 Views
  • 11 replies
  • 6 kudos

Partially upload data of 1.2GB

Hello Team,I have file contain in txt format of 1.2gb file.I am trying to upload the data into ms sql server database table.Only 10% of the data able to upload it.example :Total records in a file : 51303483Number of records inserted :10224430I am usi...

  • 1319 Views
  • 11 replies
  • 6 kudos
Latest Reply
RohitKulkarni
Contributor II
  • 6 kudos

There was a access lines in the document. Because of this there was partially loading.Thanks for the support 

  • 6 kudos
10 More Replies
ahen
by New Contributor
  • 216 Views
  • 0 replies
  • 0 kudos

Deployed DABs job via Gitlab CICD. It is creating duplicate jobs.

We had error in DABs deploy and then subsequent retries resulted in a locked stateAnd as suggested in the logs, we use --force-lock option and the deploy succeededHowever, it created duplicate jobs for all assets in the bundle instead of updating the...

  • 216 Views
  • 0 replies
  • 0 kudos
akihiko
by New Contributor III
  • 3082 Views
  • 4 replies
  • 1 kudos

Resolved! Attach notebook to cluster via REST API

Is it possible to attach a notebook to cluster and run it via the REST API?The closest approach I have found is to run a notebook, export the results (HTML!) and import it into the workspace again, but this does not allow us to retain the original ex...

  • 3082 Views
  • 4 replies
  • 1 kudos
Latest Reply
baert23
New Contributor II
  • 1 kudos

I'm looking for a way to programmatically copy a notebook in Databricks using the workspace/export and workspace/import APIs. Once the notebook is copied, I want to automatically attach it to a specific cluster using its cluster ID. The challenge is ...

  • 1 kudos
3 More Replies
Rishabh-Pandey
by Esteemed Contributor
  • 456 Views
  • 2 replies
  • 1 kudos

Resolved! The Latest Improvements to Databricks Workflows

What's new in Workflows?  @Sujitha @Retired_mod

  • 456 Views
  • 2 replies
  • 1 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 1 kudos

@Sujitha I am happy to see workflows maturing day by day; this is going to be a game changer for the market. I am also very excited about the upcoming feature, Lakeflow.

  • 1 kudos
1 More Replies
VIRALKUMAR
by Contributor II
  • 925 Views
  • 2 replies
  • 0 kudos

How to Determine the Cost for Each Query Run Against SQL Warehouse Serverless?

Hello Everyone.First of all, I would like to thank you to databricks to enable system tables for customers. It does help a lot. I am working on cost optimization topic. Particularly sql warehouse serverless. I am not sure all of you have tried system...

  • 925 Views
  • 2 replies
  • 0 kudos
Latest Reply
katefray
Databricks Employee
  • 0 kudos

Hey VIRALKUMAR, I recommend using the billing usage system table to find total DBUs by SKU (SQL) and the pricing system table to find the appropriate price. You can use the sample queries in those pages to get started. Hope that's helpful!

  • 0 kudos
1 More Replies
CURIOUS_DE
by New Contributor III
  • 239 Views
  • 0 replies
  • 1 kudos

A Surprise Findings in Delta Live Table

While DLT has some powerful features, I found myself doing a double-take when I realized it doesn’t natively support hard deletes. Instead, it leans on a delete flag identifier to manage these in the source table. A bit surprising for a tool of its c...

  • 239 Views
  • 0 replies
  • 1 kudos
mangel
by New Contributor III
  • 6325 Views
  • 6 replies
  • 3 kudos

Resolved! Delta Live Tables error pivot

I'm facing an error in Delta Live Tables when I want to pivot a table. The error is the following: And the code to replicate the error is the following:import pandas as pd import pyspark.sql.functions as F   pdf = pd.DataFrame({"A": ["foo", "foo", "f...

image
  • 6325 Views
  • 6 replies
  • 3 kudos
Latest Reply
Khalil
Contributor
  • 3 kudos

It's said in the DLT documentation that "pivot" is not supported in DLT but I noticed that if you want the pivot function to work you have to do one of the the following things:apply the pivot in your first a dlt.view + the config "spark.databricks.d...

  • 3 kudos
5 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels