cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Mithos
by New Contributor
  • 176 Views
  • 1 replies
  • 0 kudos

ZCube Tags not present in Databricks Delta Tables

The design doc for Liquid Clustering for Delta refer to Z-Cube to enable  incremental clustering in batches. This is the link - https://docs.google.com/document/d/1FWR3odjOw4v4-hjFy_hVaNdxHVs4WuK1asfB6M6XEMw/edit?pli=1&tab=t.0.It is also mentioned th...

  • 176 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @Mithos thanks for the question! This is the OSS version of LC applicable to OSS Delta. Databricks has a different implementation, so you won't be able to find it in a liquid table written by DBR. 

  • 0 kudos
templier2
by New Contributor II
  • 420 Views
  • 3 replies
  • 0 kudos

Log jobs stdout to an Azure Logs Analytics workspace

Hello,I have enabled cluster logs sending through an mspnp/spark-monitoring, but I don't see there stdout/stderr/log4j logs.Is it supported?

  • 420 Views
  • 3 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @templier2  If it works, it’s not duct tape and chewing gum; it’s a paperclip away from advanced engineering!  You're right, I forgot this option is only there for AWS/S3. So, yeah I think that's the current and only way, mount points.

  • 0 kudos
2 More Replies
theanhdo
by New Contributor III
  • 913 Views
  • 3 replies
  • 0 kudos

Run continuous job for a period of time

Hi there,I have a job where the Trigger type is configured as Continuous. I want to only run the Continuous job for a period of time per day, e.g. 8AM - 5PM. I understand that we can achieve it by manually starting and cancelling the job on the UI, o...

  • 913 Views
  • 3 replies
  • 0 kudos
Latest Reply
theanhdo
New Contributor III
  • 0 kudos

Hi @MuthuLakshmi , thank you for your answer. However, your answer doesn't help with my question. Let me rephrase my question.In short, my question is how to configure a Continuous job to run for a period of time, e.g. from 8AM to 5PM every day, and ...

  • 0 kudos
2 More Replies
jkb7
by New Contributor III
  • 648 Views
  • 6 replies
  • 2 kudos

Resolved! Keep history of task runs in Databricks Workflows while moving it from one job to another

We are using Databricks Asset Bundles (DAB) to orchestrate multiple workflow jobs, each containing multiple tasks.The execution schedules is managed on the job level, i.e., all tasks within a job start together.We often face the issue of rescheduling...

  • 648 Views
  • 6 replies
  • 2 kudos
Latest Reply
Walter_C
Databricks Employee
  • 2 kudos

You can submit it through https://docs.databricks.com/en/resources/ideas.html#ideas

  • 2 kudos
5 More Replies
vickytscv
by New Contributor II
  • 390 Views
  • 3 replies
  • 0 kudos

Adobe query support from databricks

Hi Team,     We are working with Adobe tool for campaign metrics. which needs to pull data from AEP using explode option, when we pass query it is taking long time and performance is also very. Is there any better way to pull data from AEP, Please le...

  • 390 Views
  • 3 replies
  • 0 kudos
Latest Reply
jodbx
Databricks Employee
  • 0 kudos

https://github.com/Adobe-Marketing-Cloud/aep-cloud-ml-ecosystem 

  • 0 kudos
2 More Replies
T_I
by New Contributor II
  • 641 Views
  • 4 replies
  • 0 kudos

Connect Databricks to Airflow

Hi,I have Databricks on top of aws. I have a Databricks connection on Airflow (mwaa). I am able to conect and execute a Datbricks job via Airflow using a personal access token. I believe the best practice is to conect using a service principal. I und...

  • 641 Views
  • 4 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @T_I, Instead of the PAT token you have to specify the below settings to be able to use the Service Principal: For workspace-level operations, set the following environment variables: DATABRICKS_HOST, set to the Databricks workspace URL, for exam...

  • 0 kudos
3 More Replies
Steve_Harrison
by New Contributor III
  • 750 Views
  • 2 replies
  • 0 kudos

Invalid Path when getting Notebook Path

The undocumented feature to get a notebook path as is great but it does not actually return a valid path that can be used in python, e.g.:from pathlib import Pathprint(Path(dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPat...

  • 750 Views
  • 2 replies
  • 0 kudos
Latest Reply
Steve_Harrison
New Contributor III
  • 0 kudos

I actually think the major issue is that the above is undocumented and not supported. A supported and documented way of doing this would be much appreciated.

  • 0 kudos
1 More Replies
Phani1
by Valued Contributor II
  • 7370 Views
  • 10 replies
  • 10 kudos

Delta Live Table name dynamically

Hi Team,Can we pass Delta Live Table name dynamically [from a configuration file, instead of hardcoding the table name]? We would like to build a metadata-driven pipeline.

  • 7370 Views
  • 10 replies
  • 10 kudos
Latest Reply
bmhardy
New Contributor III
  • 10 kudos

Is this post referring to Direct Publishing Mode? As we are multi-tenanted we have to have separate schema per client, which currently means a single pipeline per client. This is not cost effective at all, so we are very much reliant on DPM. I believ...

  • 10 kudos
9 More Replies
maikl
by New Contributor III
  • 430 Views
  • 4 replies
  • 0 kudos

Resolved! DABs job name must start with a letter or underscore

Hi,In UI I used the pipeline name 00101_source_bronze. I wanted to do the same in the Databricks Asset Bundles.but when the configuration is refreshed against Databricks Workspace I see this error:I found that this issue can be connect to Terraform v...

maikl_0-1733912307017.png maikl_1-1733912509922.png
  • 430 Views
  • 4 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

As mentioned above, this is a limitation directly with Terraform due to this our engineering team is limited on the actions that can be done, you can find more information about this limitation on the Terraform documentation: https://developer.hashic...

  • 0 kudos
3 More Replies
Anonymous
by Not applicable
  • 694 Views
  • 1 replies
  • 1 kudos

Resolved! workflow set maximum queued items

Hi all,I have a question regarding Workflows and queuing of job runs. I'm running into a case where jobs are running longer than expected and result in job runs being queued, which is expected and desired. However, in this particular case we only nee...

  • 694 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Unfortunately there is no way to control the number of jobs that will be moved to queue status when queuing is enabled.

  • 1 kudos
alcatraz96
by New Contributor II
  • 548 Views
  • 3 replies
  • 0 kudos

Guidance Needed for Developing CI/CD Process in Databricks Using Azure DevOps

Hi everyone,I am working on setting up a complete end-to-end CI/CD process for my Databricks environment using Azure DevOps. So far, I have developed a build pipeline to create a Databricks artifact (DAB). Now, I need to create a release pipeline to ...

alcatraz96_1-1733897791930.png
  • 548 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @alcatraz96 ,One question, why don't you use Databricks Assets Bundles? Then the whole process would be much simplerHere you have a good end to end example:CI/CD Integration with Databricks Workflows - Databricks Community - 81821

  • 0 kudos
2 More Replies
skarpeck
by New Contributor III
  • 377 Views
  • 3 replies
  • 0 kudos

Update set in foreachBatch

I need to track codes of records that were ingested in foreachBatch function, and pass it as a task value, so downstream tasks can take actions based on this output. What would be the best approach to achieve that? Now, I have a following solution, b...

  • 377 Views
  • 3 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Another approach is to persist the collected codes in a Delta table and then read from this table in downstream tasks. Make sure to add ample logging and counts. Checkpointing also would help if you suspect the counts in set are not the same as what ...

  • 0 kudos
2 More Replies
JKR
by Contributor
  • 2763 Views
  • 1 replies
  • 0 kudos

Databricks sql variables and if/else workflow

I have 2 tasks in databricks job workflow first task is of type SQL and SQL task is query.In that query I've declared 2 variables and SET the values by running query.e.g:DECLARE VARIABLE max_timestamp TIMESTAMP DEFAULT '1970-01-01'; SET VARIABLE max_...

Data Engineering
databricks-sql
Workflows
  • 2763 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Please try with  max_timestamp = dbutils.jobs.taskValues("sql_task_1")["max_timestamp"] dbutils.jobs.taskValues("python_task_1", {"max_timestamp": max_timestamp}) Reference- https://docs.databricks.com/en/jobs/task-values.html  

  • 0 kudos
willie_nelson
by New Contributor II
  • 436 Views
  • 3 replies
  • 1 kudos

ABFS Authentication with a SAS token -> 403!

Hi guys,I'm running a streamReader/Writer with autoloader from StorageV2 (general purpose v2) over abfss instead of wasbs. My checkpoint location is valid, the reader properly reads the file schema and autoloader is able to sample 105 files to do so....

  • 436 Views
  • 3 replies
  • 1 kudos
Latest Reply
BricksGuy
New Contributor III
  • 1 kudos

Would you mind to paste the sample code please. I am trying to use abfs with autoloader and getting error like yours.

  • 1 kudos
2 More Replies
Vetrivel
by Contributor
  • 1296 Views
  • 3 replies
  • 1 kudos

Resolved! SSIS packages migration to Databricks Workflows

We are doing POC to migrate SSIS packages to Databricks workflows as part of our effort to build the analytics layer, including dimension and fact tables. How can we accelerate or automate the SSIS package migration to Databricks environment?

  • 1296 Views
  • 3 replies
  • 1 kudos
Latest Reply
BlakeHill
New Contributor II
  • 1 kudos

Thank you so much for the solution.

  • 1 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels