cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DeepankarB
by New Contributor III
  • 7774 Views
  • 2 replies
  • 2 kudos

Resolved! Error API calling with Service Principal Secret

Hi,I am working on Databricks workspace setup on AWS and trying to use Service Principal to execute API calls (CI/CD) deployment through Bitbucket. So I created secret for the service principal and trying to test the token. The test failed with below...

  • 7774 Views
  • 2 replies
  • 2 kudos
Latest Reply
DeepankarB
New Contributor III
  • 2 kudos

I have been able to resolve this issue. Apparently you need to generate access token using service principal client id and client secret.  saurabh18cs solution is more relevant to Azure Databricks. Got below link from Databricks which provide generic...

  • 2 kudos
1 More Replies
gourishrivastav
by New Contributor
  • 3961 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks Fundamentals Certificate

Dear Team,I have successfully completed the Databricks Fundamentals training and aced the certificate quiz with a perfect score of 200 out of 200. However, I have not yet received the certificate. Can you please let me know the expected timeline for ...

  • 3961 Views
  • 1 replies
  • 0 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 0 kudos

You should be able to receive it immediately. Can you share your use id with which you have taken the quiz?

  • 0 kudos
krocodl
by Contributor
  • 12343 Views
  • 12 replies
  • 3 kudos

OOM while loading a lot of data through JDBC

   public void bigDataTest() throws Exception { int rowsCount = 100_000; int colSize = 1024; int colCount = 12; String colValue = "'"+"x".repeat(colSize)+"'"; String query = "select explode(s...

Screenshot 2023-10-13 at 08.10.08.png Screenshot 2023-10-13 at 08.12.52.png
Data Engineering
JDBC
Out-of-memory
resource leaking
  • 12343 Views
  • 12 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

@Retired_modany idea?

  • 3 kudos
11 More Replies
KacperG
by New Contributor III
  • 3120 Views
  • 5 replies
  • 2 kudos

Resolved! Merge operation stuck on scanning files for matches

HiI'm executing simple merge, however it always stucks at "MERGE operation - scanning files for matches". Both delta tables are not big - source has about 100MiB in 1 file and target has 1,5GiB, 7 files, so it should be quite fast operation, however ...

KacperG_0-1729157109666.png
  • 3120 Views
  • 5 replies
  • 2 kudos
Latest Reply
KacperG
New Contributor III
  • 2 kudos

Well, in the end, it was caused by skewed data. Document_ID was -1 for returns in sales, so a big part of the table was filled with -1 values. Adding an extra column to the merger solved the problem.This article helped me a lot: https://www.databrick...

  • 2 kudos
4 More Replies
NagarajuBondala
by New Contributor II
  • 4425 Views
  • 1 replies
  • 1 kudos

Resolved! AI-Suggested Comments Not Appearing for Delta Live Tables Populated Tables

I'm working with Delta Live Tables (DLT) in Databricks and have noticed that AI-suggested comments for columns are not showing up for tables populated using DLT. Interestingly, this feature works fine for tables that are not populated using DLT. Is t...

Data Engineering
AI
Delta Live Tables
dlt
  • 4425 Views
  • 1 replies
  • 1 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 1 kudos

It's because materialized view in DLT (MV) and streaming table in DLT (ST) don't support ALTER (which is needed to persist those AI generated comments)

  • 1 kudos
ls
by New Contributor III
  • 3321 Views
  • 3 replies
  • 1 kudos

Resolved! Change spark configs in Serverless compute clusters

Howdy!I wanted to know how I can change some spark configs in a Serverless compute. I have a base.yml file and tried placing: spark_conf:     - spark.driver.maxResultSize: "16g"but I still get his error:[CONFIG_NOT_AVAILABLE] Configuration spark.driv...

  • 3321 Views
  • 3 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

To address the memory issue in your Serverless compute environment, you can consider the following strategies: Optimize the Query: Filter Early: Ensure that you are filtering the data as early as possible in your query to reduce the amount of data b...

  • 1 kudos
2 More Replies
Uj337
by New Contributor III
  • 4300 Views
  • 8 replies
  • 0 kudos

Library installation failed for library due to user error for wheel file

Hi All,Recently we have implemented the change to make databricks workspace accessible only via a private network. After this change, we found lot of errors on connectivity like from Power BI to Databricks, Azure Data factory to Databricks etc.I was ...

  • 4300 Views
  • 8 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi @Uj337,How are you doing today?This issue seems to be tied to the private network setup affecting access to the .whl file on DBFS. i recommend you to start by ensuring the driver node has proper access to the dbfs:/Volumes/any.whl path and that al...

  • 0 kudos
7 More Replies
jordan_boles
by New Contributor II
  • 3922 Views
  • 1 replies
  • 2 kudos

Future of iceberg-kafka-connect

Databricks acquired the iceberg kafka connect repo this past summer. There are open issues and PRs that devs would like to address and collaborate on to improve the connector. But Databricks has not yet engaged with this community in the ~6 months si...

  • 3922 Views
  • 1 replies
  • 2 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 2 kudos

Thanks for sharing this @jordan_boles . Happy Data Engineering!

  • 2 kudos
infinitylearnin
by New Contributor III
  • 572 Views
  • 1 replies
  • 2 kudos

Resolved! Role of Data Practitioner in AI Era

As the AI revolution takes off in 2025, there is a renewed emphasis on adopting a Data-First approach. Organizations are increasingly recognizing the need to establish a robust data foundation while preparing a skilled fleet of Data Engineers to tack...

  • 572 Views
  • 1 replies
  • 2 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 2 kudos

Good work @infinitylearnin . Keep it up.

  • 2 kudos
Cantheman
by New Contributor III
  • 1708 Views
  • 11 replies
  • 0 kudos

Weird workflow error - Error in run but job does not exist

Hello,I have an error: Joblink (116642657143475)Job run Task runlink (74750905368136)Status  FailedStarted at2025-01-14 07:48:16 UTCDuration2m 40sLaunchedManuallyIf I try to access this job I get  : The job you are looking for may have been moved or ...

  • 1708 Views
  • 11 replies
  • 0 kudos
Latest Reply
Cantheman
New Contributor III
  • 0 kudos

will do . Thanks

  • 0 kudos
10 More Replies
somedeveloper
by New Contributor III
  • 1945 Views
  • 3 replies
  • 2 kudos

Resolved! Accessing Application Listening to Port Through Driver Proxy URL

Good afternoon,I have an application, Evidently, that I am starting a dashboard service for and that listens to an open port. I would like to access this through the driver proxy URL, but when starting the service and accessing it, I am given a 502 B...

  • 1945 Views
  • 3 replies
  • 2 kudos
Latest Reply
VZLA
Databricks Employee
  • 2 kudos

Glad it helped! Thanks for confirming the solution.

  • 2 kudos
2 More Replies
iamgoda
by New Contributor III
  • 8942 Views
  • 15 replies
  • 4 kudos

Databricks SQL script slow execution in workflows using serverless

I am running a very simple SQL script within a notebook, using an X-Small SQL Serverless warehouse (that is already running). The execution time is different depending on how it's run:4s if run interactively (and through SQL editor)26s if run within ...

iamgoda_4-1720697910509.png iamgoda_5-1720697937883.png iamgoda_7-1720698691523.png iamgoda_0-1720701617441.png
  • 8942 Views
  • 15 replies
  • 4 kudos
Latest Reply
iamgoce
New Contributor III
  • 4 kudos

So I was told that the Q4 date was incorrect - in fact there is currently no ETA for when this issue will be fixed. It's considered lower priority by Databricks as not enough customers are impacted or have raised this type of an issue. I would recomm...

  • 4 kudos
14 More Replies
neeth
by New Contributor III
  • 1588 Views
  • 9 replies
  • 0 kudos

Data bricks -connect error

Hello, I new to Databricks and Scala. I created a scala application in my local machine and tried to connect to my cluster in databricks workspace using databricks connect as per the documentation. My cluster is using Databricks runtime version 16.0 ...

  • 1588 Views
  • 9 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 0 kudos

try this with parameters once: def get_remote_spark(host: str, cluster_id: str, token: str) -> SparkSession:    from databricks.connect import DatabricksSession    return DatabricksSession.builder.remote(host=host, cluster_id=cluster_id, token=token)...

  • 0 kudos
8 More Replies
CBL
by New Contributor
  • 1706 Views
  • 1 replies
  • 0 kudos

Schema Evolution in Azure databricks

Hi All -In my scenario, Loading data from 100 of Json files.Problem is, fields/columns are missing when JSON file contains new fields.Full Load: while writing JSON to delta use the option ("mergeschema", "true") so that we do not miss new columns Inc...

  • 1706 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

For these scenarios, you can use schema evolution capabilities like mergeSchema or opt to use the new VariantType to avoid requiring a schema at time of ingest.

  • 0 kudos
TheDataEngineer
by New Contributor
  • 5183 Views
  • 1 replies
  • 0 kudos

'replaceWhere' clause in spark.write for a partitioned table

Hi, I want to be clear about 'replaceWhere' clause in spark.write.Here is the scenario:I would like to add a column to few existing records.The table is already partitioned on "PickupMonth" column.Here is example: Without 'replaceWhere'spark.read \.f...

  • 5183 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

For this style of ETL, there are 2 methods. The first method, strictly for partitioned tables, is Dynamic Partition Overwrites, which require a Spark configuration to be set and detect which partitions that are to be overwritten by scanning the input...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels