cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

f1nesse13
by New Contributor
  • 602 Views
  • 1 replies
  • 0 kudos

Question about notifications and failed jobs

Hello, I had a question involving rerunning a job from a checkpoint using ‘Repair Run’. I have a job which failed and Im looking to rerun the stream from a checkpoint. My job uses notifications for file detection (cloudFiles.useNotifications). My que...

  • 602 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

When rerunning your job from a checkpoint using Repair Run with cloudFiles.useNotifications, only unprocessed messages in the queue (representing new or failed-to-process files) will be consumed. Files or events already recorded in the checkpoint wil...

  • 0 kudos
eballinger
by Contributor
  • 1544 Views
  • 2 replies
  • 1 kudos

Resolved! Any way to ignore DLT tables in pipeline

Hello,In our testing environment we would like to be able to only update the DLT tables we are testing for our pipeline. This would help speed up the testing. We currently have the pipeline code being generated dynamically based on how many tables th...

  • 1544 Views
  • 2 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @eballinger. To address your requirement of updating only specific Delta Live Tables (DLT) in your testing environment without removing the others, you can leverage the @dlt.table decorator and the temporary parameter in your Python code. This app...

  • 1 kudos
1 More Replies
ynskrbn
by New Contributor II
  • 1500 Views
  • 4 replies
  • 0 kudos

"Databricks Bundle Deploy -t prod" command deletes log of historical runs

I'm using Databricks Asset Bundles with Azure DevOps CI/CD for workflow deployment. While the initial deployment to production works fine, I encounter an issue when updating the workflow in the development environment and redeploying it to production...

ynskrbn_0-1734355157245.png ynskrbn_1-1734355261180.png
  • 1500 Views
  • 4 replies
  • 0 kudos
Latest Reply
PabloCSD
Valued Contributor II
  • 0 kudos

When you re-deploy you job, do you augment the version? (e.g., 4.3.0 -> 4.3.1)I have been through this, when I change a definition in the databricks.yml, for example when changing the bundle name, because it detects as a new workflow.Can you explain ...

  • 0 kudos
3 More Replies
bcsalay
by New Contributor II
  • 2007 Views
  • 4 replies
  • 0 kudos

Random failure in the loop in pyspark

Hi,I'm encountering an issue in a pyspark code, where I'm calculating certain information monthly in a loop. The flow is pretty much as:Read input and create/read intermediate parquet files,Upsert records in intermediate parquet files with the monthl...

  • 2007 Views
  • 4 replies
  • 0 kudos
Latest Reply
JacekLaskowski
Databricks MVP
  • 0 kudos

Can you show some code to get the gist of what the code does? Are the parquet files accessed as a catalog table? Could it be that some other job makes changes to input tables?

  • 0 kudos
3 More Replies
eballinger
by Contributor
  • 1658 Views
  • 1 replies
  • 1 kudos

Resolved! Check for row level security and column masking

Hi All,We have sensitive tables and have applied row level security and column masking. I would like to build into our job a check to make sure these tables still have the row filters and column masks applied. This would help ensure these security fi...

  • 1658 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @eballinger. Have you tried using DESCRIBE TABLE EXTENDED on the table, that will give you details about filters applied to a table.

  • 1 kudos
mkEngineer
by New Contributor III
  • 2799 Views
  • 2 replies
  • 0 kudos

Integrating Azure Log Analytics with Delta Live Tables Pipelines and Job Clusters

Hi,I'm setting up a Delta Live Tables (DLT) pipeline for my medallion architecture. I’m interested in tracking, ingesting, and analyzing the log files in Azure Log Analytics. However, I haven’t found much information on how to configure this setup.Sp...

  • 2799 Views
  • 2 replies
  • 0 kudos
Latest Reply
mkEngineer
New Contributor III
  • 0 kudos

 "message": " File <command-68719476741>, line 10\n log_analytics_pkey = dbutils.secrets.get(scope=\"ScopeLogAnalyticsPKey\", key=\"LogAnalyticsPKey\")\n ^\nSyntaxError: invalid syntax\n", "error_class": "_UNCLASSIFIED_PYTHON_COMMAND_ERROR" It seems ...

  • 0 kudos
1 More Replies
data_eng_hard
by New Contributor III
  • 18440 Views
  • 4 replies
  • 2 kudos

how to check table size by partition?

I want to check the size of the delta table by partition.As you can see, only the size of the table can be checked, but not by partition.

  • 18440 Views
  • 4 replies
  • 2 kudos
Latest Reply
Carsten_Herbe
New Contributor II
  • 2 kudos

The previous two answers did not work for me (DBX 15.4).I found a hacky way using the delta log: find latest (group of) checkpoint (parquet) file(s) in delta log and use it as source prefix `000000000000xxxxxxx.checkpoint`:SELECT partition_column_1,...

  • 2 kudos
3 More Replies
Sadam97
by New Contributor III
  • 2804 Views
  • 7 replies
  • 0 kudos

Enable predictive optimization for my account

I want to enable predictive optimization for my account for but i can not see the option as mentioned in documentation Access the accounts console.Navigate to Settings, then Feature enablement.Select Enabled next to Predictive optimization.I am metas...

  • 2804 Views
  • 7 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Got it, after reviewing your workspace I can notice that it is GCP workspace located in europe-west3, as of now Predictive Optimization is not supported on GCP side, there are few regions that are set to be enabled by end of Q4 but unfortunately the ...

  • 0 kudos
6 More Replies
Mcnamara
by New Contributor
  • 1045 Views
  • 1 replies
  • 0 kudos

Pyspark and SQL Warehouse

If i write pyspark code and i need to get data in powerbi will it be possible to merge data into one semantic model? For instance the pipeline were developed using SQL so its directly compatible with SQL Warehouse 

  • 1045 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Yes, it is possible to merge data into one semantic model in Power BI when using PySpark code to get data. Databricks supports integration with Power BI, allowing you to create a unified semantic model. You can develop your data pipeline using PySpar...

  • 0 kudos
User16752244127
by Databricks Employee
  • 23477 Views
  • 4 replies
  • 5 kudos
  • 23477 Views
  • 4 replies
  • 5 kudos
Latest Reply
felixdmeshio
New Contributor III
  • 5 kudos

Hey,If you're working with SAP HANA data and looking to integrate it into Databricks, our SAP HANA to Databricks Connector can greatly simplify the process!The connector enables you to extract data directly from SAP HANA tables and load it iterativel...

  • 5 kudos
3 More Replies
ricard98
by New Contributor II
  • 12703 Views
  • 6 replies
  • 5 kudos

How to integrate SAP ERP to databricks

is there a way to integrate SAP erp to a databricks Notebook through python???,

  • 12703 Views
  • 6 replies
  • 5 kudos
Latest Reply
felixdmeshio
New Contributor III
  • 5 kudos

Hey,Yes, there is a way to integrate SAP (SAP HANA) data into Databricks, and our SAP HANA to Databricks connector makes this process seamless.The connector enables you to extract data directly from SAP HANA tables and load it iteratively into Databr...

  • 5 kudos
5 More Replies
amruth
by New Contributor
  • 5201 Views
  • 4 replies
  • 0 kudos

How do i retrieve timestamp data from history in databricks sql not using DELTA table,its data is coming from SAP

I am not using delta tables my data is from SAP ..how do i retrieve timestamp(history) dynamically from SAP table using databricks SQL

  • 5201 Views
  • 4 replies
  • 0 kudos
Latest Reply
felixdmeshio
New Contributor III
  • 0 kudos

Hello,If you’re trying to bring timestamp data or any other SAP Table from SAP (SAP HANA) into Databricks, our SAP HANA to Databricks Connector can help streamline this process. The connector enables you to extract data directly from SAP HANA tables ...

  • 0 kudos
3 More Replies
Suman-Sourav
by New Contributor II
  • 1339 Views
  • 2 replies
  • 0 kudos

Identify a job name in case notebook is triggered from another notebook/

I am running an another notebook from one of the notebook as below. The main notebook is scheuled with a job/workflow.dbutils.notebook.run("./ABC/XYZ/another_notebook", 1000) Usually to get the workflow/job name i use any of the below 2 options. 1.  ...

  • 1339 Views
  • 2 replies
  • 0 kudos
Latest Reply
Suman-Sourav
New Contributor II
  • 0 kudos

Thanks Alberto. I am aware of that solution but trying to get an option to get job name in the scenario i gave. So i assume there is no way to get the job name in that scenario. 

  • 0 kudos
1 More Replies
wallystart
by New Contributor III
  • 1902 Views
  • 3 replies
  • 0 kudos

Configure cluster single node with service principal in Azure

Hi! We can configure a cluster single node with single user as service principal using this command:databricks clusters create --json ' { "cluster_name": "my-cluster", "spark_version": "13.3.x-scala2.12", "node_type_id": "Standard_DS3_v2"...

  • 1902 Views
  • 3 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Can you share an screenshot of the cluster on the UI side please, if you are creating a cluster assigned to a service principal this means that it should have already the Private preview enabled. If the private preview is not enabled it will throw th...

  • 0 kudos
2 More Replies
MRTN
by Contributor
  • 15325 Views
  • 5 replies
  • 3 kudos

Resolved! Feature request delta tables : drop duplicate rows

A deltaTable.dropDuplicates(columns) would be a very nice feature, simplifying the complex procedures that are suggested online. Or am I missing any existing procedures that can be done withouth merge operations or similar?

  • 15325 Views
  • 5 replies
  • 3 kudos
Latest Reply
MRTN
Contributor
  • 3 kudos

I created a feature request in the delta table project: [Feature Request] data deduplication on existing delta table · Issue #1767 · delta-io/delta (github.com)

  • 3 kudos
4 More Replies
Labels