cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ashraf1395
by Honored Contributor
  • 1584 Views
  • 2 replies
  • 1 kudos

Fething the catalog and schema which is set in dlt pipeline configuration

I have a dlt pipeline and the notebook which is running on the dlt pipeline has some requirements.I want to get the catalog and schema which is set my dlt pipeline. Reason for it: I have to specify my volume files paths etc and my volume is on the sa...

  • 1584 Views
  • 2 replies
  • 1 kudos
Latest Reply
SP_6721
Honored Contributor II
  • 1 kudos

Hi @ashraf1395 Can you try this to get the catalog and schema set by your DLT pipeline in the notebookcatalog = spark.conf.get("pipelines.catalog")schema = spark.conf.get("pipelines.schema")

  • 1 kudos
1 More Replies
maikel
by Contributor III
  • 404 Views
  • 3 replies
  • 1 kudos

Resolved! Job tasks monitoring

Hello Community,We have a case in our project that we would like to solve in an elegant and scalable manner. As always, I would really appreciate your suggestions and experience.In short:We have a multi-step job consisting of 4 stages. In one of the ...

  • 404 Views
  • 3 replies
  • 1 kudos
Latest Reply
maikel
Contributor III
  • 1 kudos

@MoJaMa thanks a lot for these suggestions!

  • 1 kudos
2 More Replies
Raj_DB
by Contributor
  • 163 Views
  • 1 replies
  • 1 kudos

Resolved! Automating Job Permission Updates in Databricks Using a Notebook

Hi everyone,I am looking to create a notebook that, when executed by a user, performs the following actions:Retrieves all Databricks jobs created by the current userChecks whether a specific role already has permissions on those jobsAutomatically add...

  • 163 Views
  • 1 replies
  • 1 kudos
Latest Reply
ziafazal
Databricks Partner
  • 1 kudos

Hi @Raj_DB You can use databricks SDK to retrieve all jobs filter them by selecting only those where owner is current usersomething like thisfrom databricks.sdk import WorkspaceClient w = WorkspaceClient() # Specify the user email/username you want...

  • 1 kudos
flourishingsing
by New Contributor III
  • 243 Views
  • 1 replies
  • 0 kudos

Resolved! How can retrieve backfill run parameter in Python?

I'm trying to run backfill with the following parameter. How can I access this in the Python script?Do I need to change anything in the yml?I usually set task parameters the following way:These are then parsed using argparse Python module.  

flourishingsing_0-1779284296139.png flourishingsing_1-1779284438804.png
  • 243 Views
  • 1 replies
  • 0 kudos
Latest Reply
flourishingsing
New Contributor III
  • 0 kudos

Found the following solution:Add job level parameters:parameters: - name: run_timestamp default: "some_default_value" Reference in task level parameters:tasks: - task_key: my_task spark_python_task: python_file: ../../script.py ...

  • 0 kudos
manish_de
by New Contributor II
  • 514 Views
  • 5 replies
  • 5 kudos

query based connector snapshot feature

In ingestion pipeline, for query based connector there is option of selecting batch snapshot instead of column name under dropdown - Cursor column. If I choose batch snapshot, will the databricks engine run select * from my source table, say Sql serv...

  • 514 Views
  • 5 replies
  • 5 kudos
Latest Reply
michaelfriendly
New Contributor II
  • 5 kudos

@rbtv It may execute something very similar to a `SELECT *` on the source table unless the platform adds its own partitioning or optimisation behind the scenes. From what I've observed, selecting batch snapshot often means the connector handles each ...

  • 5 kudos
4 More Replies
vedanth
by New Contributor
  • 167 Views
  • 1 replies
  • 0 kudos

Salesforce Connector - Lakeflow Connect 400 Error

HI All,I have been trying to setup Salesforce using Lakeflow Connect and followed instructions on the docshttps://docs.databricks.com/aws/en/connect/managed-ingestion#sfdcHowever I face into invalid_grant error  However login history on salesforce sh...

vedanth_0-1779009668052.png
  • 167 Views
  • 1 replies
  • 0 kudos
Latest Reply
GaneshI
New Contributor II
  • 0 kudos

Hi Vedanth,The invalid_grant error usually occurs due to authentication or OAuth configuration issues between Salesforce and Databricks Lakeflow Connect.Could you please verify the following points:Ensure the Salesforce user account is not locked and...

  • 0 kudos
DazzaiDe
by New Contributor III
  • 280 Views
  • 2 replies
  • 1 kudos

Best Practices: 1 job per 1 target table

We’re currently designing our Medallion Architecture pipelines using Lakeflow Jobs, and I wanted to get some opinions on orchestration best practices.Right now, our approach is essentially 1 job per target table (for example, each Bronze/Silver/Gold ...

  • 280 Views
  • 2 replies
  • 1 kudos
Latest Reply
LBoydston
New Contributor II
  • 1 kudos

We typically organize our workloads with one job per catalog, and then use one or more pipelines to load tables into the appropriate schemas. As our data engineers ingest raw data, this structure is primarily applied in the Silver and Gold layers of ...

  • 1 kudos
1 More Replies
theanhdo
by New Contributor III
  • 4889 Views
  • 5 replies
  • 1 kudos

Run continuous job for a period of time

Hi there,I have a job where the Trigger type is configured as Continuous. I want to only run the Continuous job for a period of time per day, e.g. 8AM - 5PM. I understand that we can achieve it by manually starting and cancelling the job on the UI, o...

  • 4889 Views
  • 5 replies
  • 1 kudos
Latest Reply
KrisJohannesen
Contributor II
  • 1 kudos

The "not-so-pretty-but-it-works" solution I have come across is exactly what you are hinting at yourself.Create the Continuous job - have it be pausedCreate a secondary "start job"-job - which is basically just that API call in a notebook or python f...

  • 1 kudos
4 More Replies
susanne
by Databricks Partner
  • 2134 Views
  • 4 replies
  • 0 kudos

Resolved! Authentication failure Lakeflow SQL Server Ingestion

Hi all I am trying to create a Lakeflow Ingestion Pipeline for SQL Server, but I am running into the following authentication error when using my Databricks Database User for the connection:Gateway is stopping. Authentication failure while obtaining ...

  • 2134 Views
  • 4 replies
  • 0 kudos
Latest Reply
rkhbo3003
New Contributor II
  • 0 kudos

I am also facing the same issue. We have user id as service principal name however in sql log it shows applicationID that it cannot login . Setvice principal ( name) has highest privileges in sql db . howevrr same is working fine through jdbc 

  • 0 kudos
3 More Replies
chiruinfo5262
by New Contributor II
  • 1606 Views
  • 6 replies
  • 0 kudos

Trying to convert oracle sql to databricks sql but not getting the desired output

ORACLE SQL: COUNT( CASE WHEN TRUNC(WORKORDER.REPORTDATE) BETWEEN SELECTED_PERIOD_START_DATE AND SELECTED_PERIOD_END_DATE THEN 1 END ) SELECTED_PERIOD_BM,COUNT( CASE WHEN TRUNC(WORKORDER.REPORTDATE) BETWEEN COMPARISON_PERIOD_START_DATE AND COMPARISON_...

  • 1606 Views
  • 6 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 0 kudos

You’re using date_format(...) which turns dates into strings, so BETWEEN becomes a string comparison. You can also look up for databricks lakebridge that can assist you in code conversion or migrations. https://databrickslabs.github.io/lakebridge/ 

  • 0 kudos
5 More Replies
ittzzmalind
by New Contributor III
  • 473 Views
  • 2 replies
  • 1 kudos

Resolved! Azure Databricks Serverless – SFTP Connectivity (external provider)

Hi,To establish connectivity from Azure Databricks serverless compute  to an external SFTP provider hosted outside organization (external provider).when i searched i figured out one way is whitelisting ip,1). The SFTP provider requires IP whitelistin...

  • 473 Views
  • 2 replies
  • 1 kudos
Latest Reply
Lu_Wang_ENB_DBX
Databricks Employee
  • 1 kudos

Recommendation: if the external SFTP vendor strictly requires source-IP allowlisting, the most reliable path is usually classic compute with your own NAT gateway/static public IP. For serverless, Azure Databricks can reach public external resources v...

  • 1 kudos
1 More Replies
pepco
by New Contributor III
  • 705 Views
  • 8 replies
  • 7 kudos

Resolved! DAB git - sometimes doesn't see modules

We are using DABs to deploy our jobs. DABs have source set to git branch or git tag depending on the environment.  Repository is structured in mono repo fashion. We don't use wheels for our modules. Sometimes when the jobs run they "randomly" fail th...

  • 705 Views
  • 8 replies
  • 7 kudos
Latest Reply
amirabedhiafi
New Contributor III
  • 7 kudos

Hello @pepco !I will share with you my personal experience about a very similar behaviour I got like you.If you check DBKS doc you will find that  git_source and task source: GIT are not recommended for DAB because local relative paths may not point ...

  • 7 kudos
7 More Replies
yanchr
by New Contributor II
  • 333 Views
  • 2 replies
  • 2 kudos

Resolved! Migrating external tables to managed tables from HMS to UC

I think the easiest way to do that is to use DEEP CLONE. However, since the SET MANAGED approach was introduced in DBR 17, wouldn't it be better to first migrate the table as external and then convert it to managed using SET MANAGED? The Databricks a...

Data Engineering
migration
UC
Unity Catalog
  • 333 Views
  • 2 replies
  • 2 kudos
Latest Reply
amirabedhiafi
New Contributor III
  • 2 kudos

Hello @yanchr !Yes if the table can first be registered as a UC external table, then ALTER TABLE ... SET MANAGED is now the better practice than doing a direct DEEP CLONE especially on DBR 17+ / serverless.It is recommended to use SET MANAGED for con...

  • 2 kudos
1 More Replies
tt_921
by New Contributor II
  • 440 Views
  • 2 replies
  • 2 kudos

Resolved! Lakeflow Declarative Pipeline queue

In the January 2026 release notes, it was announced that: "Pipelines now support queued execution mode, where multiple update requests are automatically queued and executed sequentially instead of failing with conflicts. This simplifies operations fo...

Data Engineering
DAB
pipeline
  • 440 Views
  • 2 replies
  • 2 kudos
Latest Reply
tt_921
New Contributor II
  • 2 kudos

Thank you very much for the detailed response! We unfortunately can't proceed with option 1, as we do require multiple places that can trigger the pipeline (an API call to the parent job, and a direct API call to the pipeline itself). This is due to ...

  • 2 kudos
1 More Replies
Labels