cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

susanne
by Contributor
  • 863 Views
  • 3 replies
  • 0 kudos

Resolved! Authentication failure Lakeflow SQL Server Ingestion

Hi all I am trying to create a Lakeflow Ingestion Pipeline for SQL Server, but I am running into the following authentication error when using my Databricks Database User for the connection:Gateway is stopping. Authentication failure while obtaining ...

  • 863 Views
  • 3 replies
  • 0 kudos
Latest Reply
susanne
Contributor
  • 0 kudos

Hi @szymon_dybczak,thanks a lot, that did the trick

  • 0 kudos
2 More Replies
Alena
by New Contributor II
  • 297 Views
  • 1 replies
  • 0 kudos

Programmatically set minimum workers for a job cluster based on file size?

I’m running an ingestion pipeline with a Databricks job:A file lands in S3A Lambda is triggeredThe Lambda runs a Databricks jobThe incoming files vary a lot in size, which makes processing times vary as well. My job cluster has autoscaling enabled, b...

  • 297 Views
  • 1 replies
  • 0 kudos
Latest Reply
kerem
Contributor
  • 0 kudos

Hi Alena, Jobs API has update functionality to be able to do that: https://docs.databricks.com/api/workspace/jobs_21/updateIf for some reason you can’t update your pipeline before you trigger it you can also consider creating a new job with desired c...

  • 0 kudos
Nick_Pacey
by New Contributor III
  • 674 Views
  • 2 replies
  • 0 kudos

Question on best method to deliver Azure SQL Server data into Databricks Bronze and Silver.

Hi,We have a Azure SQL Server (replicating from an On Prem SQL Server) that is required to be in Databricks bronze and beyond.This database has 100s of tables that are all required.  Size of tables will vary from very small up to the biggest tables 1...

  • 674 Views
  • 2 replies
  • 0 kudos
Latest Reply
kerem
Contributor
  • 0 kudos

Hey Nick,Have you tried the SQL Server connector with Lakeflow Connect? This should provide native connection to your SQL server, potentially allowing for incremental updates and CDC setup. https://learn.microsoft.com/en-us/azure/databricks/ingestion...

  • 0 kudos
1 More Replies
yit
by Contributor III
  • 306 Views
  • 1 replies
  • 0 kudos

Unable to Upcast DECIMAL Field in Autoloader

I’m using Autoloader to read Parquet files and write them to a Delta table. I want to enforce a schema in which Column1 is defined as DECIMAL(10,2). However, in the Parquet files being ingested, Column1 is defined as DECIMAL(8,2).When Autoloader read...

  • 306 Views
  • 1 replies
  • 0 kudos
Latest Reply
kerem
Contributor
  • 0 kudos

Hi Yit,To potentially simplify your issue, why not read this column as String in your stream and then cast it to DECIMAL(10, 2) afterwards? That should eliminate the rescue behaviour. Kerem Durak

  • 0 kudos
ManojkMohan
by Honored Contributor
  • 335 Views
  • 2 replies
  • 0 kudos

Resolved! Compute kind SERVERLESS_REPL_VM is not allowed to use cluster scoped libraries.

i have a s3 uri 's3://salesforcedatabricksorders/orders_data.xlsx' , i have created a connector between data bricks and salesfoce, i am first gettig the orders_data.xlsx to databricks layer perform basic transformation on it and then send it to sales...

ManojkMohan_0-1754430186158.png
  • 335 Views
  • 2 replies
  • 0 kudos
Latest Reply
kerem
Contributor
  • 0 kudos

Hello,I’ve come across the same issue reading an Excel file into a PySpark dataframe via Serverless compute. As the error states with Serverless, you cannot install a cluster scoped library so you have to use notebook scoped libraries (%pip install…)...

  • 0 kudos
1 More Replies
Pratikmsbsvm
by Contributor
  • 613 Views
  • 1 replies
  • 1 kudos

Resolved! How to Create Metadata driven Data Pipeline in Databricks

I am creating a Data Pipeline as shown below.1. Files from multiple input source is coming to respective folder in bronze layer.2. Using Databricks to perform Transformation and load transformed data to Azure SQL. also to ADLS Gen2 Silver (not shown ...

Pratikmsbsvm_0-1754408926145.png
  • 613 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Pratikmsbsvm ,It's totally realistic requirement. In fact you can find many articles that suggests some approaches how to design such control table. Take for example following article: https://medium.com/dbsql-sme-engineering/a-primer-for-metadat...

  • 1 kudos
Hatter1337
by New Contributor III
  • 3471 Views
  • 5 replies
  • 3 kudos

Resolved! Write Spark DataFrame into OpenSearch

Hi Databricks Community,I'm trying to read an index from OpenSearch or write a DataFrame into an OpenSearch index using the native Spark OpenSearch connector:host = dbutils.secrets.get(scope="opensearch", key="host") port = dbutils.secrets.get(scope=...

  • 3471 Views
  • 5 replies
  • 3 kudos
Latest Reply
Alena
New Contributor II
  • 3 kudos

Thank you so much for your help—it works! One thing I’m trying to do is authenticate hadoop-opensearch using a different role than the one my cluster is mapped to. Environment variables only seem to work if they’re set in the cluster configuration. I...

  • 3 kudos
4 More Replies
Sainath368
by New Contributor III
  • 607 Views
  • 1 replies
  • 1 kudos

Resolved! How to Retrieve the spark.statistics.createdAt When Statistics Were Last Updated in Databricks?

Hi everyone,I regularly (once a week) run the analyze table compute statistics command on all my tables in Databricks to keep statistics up to date for query optimization.In the Spark table UI catalog, I can see some statistics metadata like spark.st...

Sainath368_0-1754309683688.png
  • 607 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Databricks Employee
  • 1 kudos

Hello @Sainath368! sql.statistics.createdAt reflects the epoch time when statistics were created. Unfortunately, there's no direct command available to check when the statistics were last updated. As a workaround, you can manually set the current tim...

  • 1 kudos
kenmyers-8451
by Contributor
  • 947 Views
  • 2 replies
  • 1 kudos

should have the option to mark succeeded with failures as a failure rather than a success

Hi we are having an issue with the way succeeded with failures is handled. We will get emails telling us that we have a failure, which is correct, but then the pipeline actually treats it like a success and keeps going, but actually we would like to ...

  • 947 Views
  • 2 replies
  • 1 kudos
Latest Reply
kenmyers-8451
Contributor
  • 1 kudos

thanks @Advika we'll give that a shot for now

  • 1 kudos
1 More Replies
Itai_Sharon
by New Contributor II
  • 638 Views
  • 3 replies
  • 1 kudos

dbutils.notebook.run() getting general error instead specific

Hi, In a python file using dbutils.notebook.run() I'm running specific notebook.The notebook is failling but i'm getting a general error log instead the real specific log.When I'm running the notebook directly - I'm getting the specific error log.gen...

  • 638 Views
  • 3 replies
  • 1 kudos
Latest Reply
Itai_Sharon
New Contributor II
  • 1 kudos

@Vinay_M_RBTW, when trying to run a job using Databricks API, I encounter the same issue (general "FAILED: Workload failed"):from databricks.sdk import WorkspaceClient client = WorkspaceClient() run = client.jobs.run_now(job_id) error message:state_...

  • 1 kudos
2 More Replies
Sadam97
by New Contributor III
  • 406 Views
  • 2 replies
  • 1 kudos

databricks job cancel does not wait for termination of streaming tasks

We have created databricks jobs and each has multiple tasks. Each task is 24/7 running streaming with checkpoint enabled. We want it to be stateful when cancel and run the job but it seems like, when we cancel the job run it kill the parent process a...

  • 406 Views
  • 2 replies
  • 1 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 1 kudos

If the “reporting” layer is essentially micro-batching over bounded backlogs, run it with availableNow (or a scheduled batch job) so each run is naturally bounded and exits cleanly on its own, no manual cancel. This greatly reduces chances of partial...

  • 1 kudos
1 More Replies
Srajole
by New Contributor
  • 678 Views
  • 1 replies
  • 1 kudos

Write data issue

My Databricks job is completing successful but my data is not written into the target table, source path is correct, each n every thing is correct, but I am not sure y data is not written into the delta table.

  • 678 Views
  • 1 replies
  • 1 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 1 kudos

hi @Srajole ,There are a bunch of possibilities as to why the data is not being written into the table -You’re writing to a path different from the table’s storage location, or using a write mode that doesn’t replace data as expected.spark.sql("DESCR...

  • 1 kudos
dbr_data_engg
by New Contributor III
  • 1261 Views
  • 2 replies
  • 0 kudos

Using Databrick Bladebridge or Lakebridge for SQL Migration

Getting Transpile Error while executing command for Databrick Bladebridge or Lakebridge,databricks labs lakebridge transpile --source-dialect mssql --input-source "<Path>/sample.sql" --output-folder "<Path>\output"Error :TranspileError(code=FAILURE, ...

  • 1261 Views
  • 2 replies
  • 0 kudos
Latest Reply
Abhimanyu
New Contributor II
  • 0 kudos

did you find a solution? 

  • 0 kudos
1 More Replies
juanjomendez96
by Contributor
  • 816 Views
  • 2 replies
  • 3 kudos

Resolved! Best practices for compute usage

Hello there!I am writing this open message to know how you guys are using the computes in your work cases.Currently, in my company, we have multiple compute instances that can be differentiated into two main types:Clusters with a large instance for b...

  • 816 Views
  • 2 replies
  • 3 kudos
Latest Reply
radothede
Valued Contributor II
  • 3 kudos

Hello @juanjomendez96 ,to my best knowledge and experience autoscaled shared cluster (using smaller instances) works good for most 2nd-case scenario (clusters for ad-hoc/development team usage).This approach allows You to reuse the resources across t...

  • 3 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels