cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Mkk1
by New Contributor
  • 1076 Views
  • 1 replies
  • 0 kudos

Joining tables across DLT pipelines

How can I join a silver table (s1) from a DLT pipeline (D1) to another silver table (S2) from a different DLT pipeline (D2)?#DLT #DeltaLiveTables

  • 1076 Views
  • 1 replies
  • 0 kudos
Latest Reply
JothyGanesan
New Contributor II
  • 0 kudos

@Mkk1 Did you get to get this completed? We are in the similar situation, how did you get to acheive this?

  • 0 kudos
rpshgupta
by New Contributor III
  • 433 Views
  • 10 replies
  • 2 kudos

How to find the source code for the data engineering learning path?

Hi Everyone,I am taking data engineering learning path in customer-academy.databricks.com . I am not able to find any source code attached to the course. Can you please help me to find it so that I can try hands on as well ?ThanksRupesh

  • 433 Views
  • 10 replies
  • 2 kudos
Latest Reply
ogramos
New Contributor II
  • 2 kudos

Hello folks,I also opened a ticket with Databricks Academy, and it seems that Partner Learning doesn't include the code anymore. You need a Databricks labs subscription.Quote: "Are you referring to the labs that are not available?If so, We are sorry ...

  • 2 kudos
9 More Replies
SwathiChidurala
by New Contributor II
  • 4323 Views
  • 2 replies
  • 3 kudos

Resolved! deltaformat

Hi,I am a student who learning databricks, In the below code I tried to write data in delta format to a gold layer. I authenticated using the service principle method to read, write and execute data , I assigned the storage blob contributor role, but...

  • 4323 Views
  • 2 replies
  • 3 kudos
Latest Reply
Avinash_Narala
Valued Contributor II
  • 3 kudos

Hi @SwathiChidurala ,The error is because you don't have the folder trip_zone inside the gold folder, so you can try by removing the trip_zone from the location or adding the folder trip_zone inside the gold folder in adls and then try it again.If th...

  • 3 kudos
1 More Replies
Abdurrahman
by New Contributor II
  • 151 Views
  • 3 replies
  • 3 kudos

Move files from DBFS to Workspace Folders databricks

I want to move a zip file from DBFS to a workspace folder.I am using dbutils.fs.cp("dbfs file path", "workspace folder path"), in databricks notebook and I am seeing the following error - ExecutionError: An error occurred while calling o455.cp. : jav...

  • 151 Views
  • 3 replies
  • 3 kudos
Latest Reply
nick533
New Contributor
  • 3 kudos

Permission denied appears to be the cause of the error message. To read from the DBFS path and write to the workspace folder, please make sure you have the required permissions. The following permissions may be required:The DBFS file path can be read...

  • 3 kudos
2 More Replies
garciargs
by New Contributor III
  • 65 Views
  • 1 replies
  • 2 kudos

DLT multiple source table to single silver table generating unexpected result

Hi,I´ve been trying this all day long. I'm build a POC of a pipeline that would be used on my everyday ETL.I have two initial tables, vendas and produtos, and they are as the following:vendas_rawvenda_idproduto_iddata_vendaquantidadevalor_totaldth_in...

  • 65 Views
  • 1 replies
  • 2 kudos
Latest Reply
NandiniN
Databricks Employee
  • 2 kudos

When dealing with Change Data Capture (CDC) in Delta Live Tables, it's crucial to handle out-of-order data correctly. You can use the APPLY CHANGES API to manage this. The APPLY CHANGES API ensures that the most recent data is used by specifying a co...

  • 2 kudos
ashraf1395
by Valued Contributor
  • 115 Views
  • 1 replies
  • 2 kudos

Connecting Fivetran with databricks

So, We are migrating a hive metastore to UC catalog. We have some fivetran connections.We are creating all tables as external locations and we have specified the external locations at the schema level.So when we specify the destination in the fivetra...

ashraf1395_1-1737527775298.png
  • 115 Views
  • 1 replies
  • 2 kudos
Latest Reply
NandiniN
Databricks Employee
  • 2 kudos

This message is just mentioning, if you do not provide the {{path}} it will use the default location which is on DBFS. When configuring the Fivetran connector, you will be prompted to select the catalog name, schema name, and then specify the externa...

  • 2 kudos
yvishal519
by Contributor
  • 152 Views
  • 1 replies
  • 0 kudos

Identifying Full Refresh vs. Incremental Runs in Delta Live Tables

Hello Community,I am working with a Delta Live Tables (DLT) pipeline that primarily operates in incremental mode. However, there are specific scenarios where I need to perform a full refresh of the pipeline. I am looking for an efficient and reliable...

  • 152 Views
  • 1 replies
  • 0 kudos
Latest Reply
TakuyaOmi
Valued Contributor II
  • 0 kudos

Hello,There are two ways to determine whether a DLT pipeline is running in Full Refresh or Incremental mode:DLT Event Log SchemaThe details column in the DLT event log schema includes information on "full_refresh". You can use this to identify whethe...

  • 0 kudos
Phani1
by Valued Contributor II
  • 469 Views
  • 5 replies
  • 1 kudos

Cluster idle time and usage details

How can we find out the usage details of the Databricks cluster? Specifically, we need to know how many nodes are in use, how long the cluster is idle, the time it takes to start up, and the jobs it is running along with their durations. Is there a q...

  • 469 Views
  • 5 replies
  • 1 kudos
Latest Reply
Isi
New Contributor
  • 1 kudos

Hey @hboleto It’s difficult to accurately estimate the final cost of a Serverless cluster, as it is fully managed by Databricks. In contrast, Classic clusters allow for finer resource tuning since you can define spot instances and other instance type...

  • 1 kudos
4 More Replies
susanne
by New Contributor II
  • 61 Views
  • 0 replies
  • 0 kudos

Views in DLT with Private Preview feature Direct Publish

Hi everyone,I am building a dlt Pipeline and there I am using the Direct Publish feature which is as of now still under Private Preview.While it works well to create streaming tables and write them to another schema than the dlt  default schema, I ge...

  • 61 Views
  • 0 replies
  • 0 kudos
kazinahian
by New Contributor III
  • 3388 Views
  • 1 replies
  • 1 kudos

How can I create a new calculated field in databricks by using pyspark.

Hello:Great people. I am new to Databricks and pyspark learning. How can I create a new column called "sub_total"? Where I want to group by "category" "subcategory" and "monthly" sales value. Appreciate your empathic solution. 

Data Engineering
calculation
  • 3388 Views
  • 1 replies
  • 1 kudos
Latest Reply
Miguel_Suarez
Databricks Employee
  • 1 kudos

Hi @kazinahian, I believe what you're looking for is the .withColumn() Dataframe method in PySpark. It will allow you to create a new column with aggregations on other columns: https://docs.databricks.com/en/pyspark/basics.html#create-columns Best

  • 1 kudos
HoussemBL
by New Contributor III
  • 89 Views
  • 2 replies
  • 0 kudos

External tables in DLT pipelines

Hello community,I have implemented a DLT pipeline.In the "Destination" setting of the pipeline I have specified a unity catalog with target schema of type external referring to an S3 destination.My DLT pipeline works well. Yet, I noticed that all str...

  • 89 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @HoussemBL, You can use below code example: import dlt @dlt.create_streaming_table(name="your_table_name",path="s3://your-bucket/your-path/",schema="schema-definition")def your_table_function():return ( spark.readStream.format("your_format").op...

  • 0 kudos
1 More Replies
yvishal519
by Contributor
  • 655 Views
  • 8 replies
  • 2 kudos

Handling Audit Columns and SCD Type 1 in Databricks DLT Pipeline with Unity Catalog: Circular Depend

I am working on a Delta Live Tables (DLT) pipeline with Unity Catalog, where we are reading data from Azure Data Lake Storage (ADLS) and creating a table in the silver layer with Slowly Changing Dimensions (SCD) Type 1 enabled. In addition, we are ad...

yvishal519_0-1729619599002.png
  • 655 Views
  • 8 replies
  • 2 kudos
Latest Reply
yvishal519
Contributor
  • 2 kudos

@NandiniN  @RBlum I haven’t found an ideal solution for handling audit columns effectively in Databricks Delta Live Tables (DLT) when implementing SCD Type 1. It seems there’s no straightforward way to incorporate these columns into the apply_changes...

  • 2 kudos
7 More Replies
MauricioS
by New Contributor II
  • 114 Views
  • 3 replies
  • 2 kudos

Delta Live Tables - Dynamic Target Schema

Hi all,I have a requirement where I need to migrate a few jobs from standard databricks notebooks that are orchestrated by Azure Data Factory to DLT Pipelines, pretty straight forward so far. The tricky part is that the data tables in the catalog are...

image.png
  • 114 Views
  • 3 replies
  • 2 kudos
Latest Reply
fmadeiro
New Contributor III
  • 2 kudos

@MauricioS Great question!Databricks Delta Live Tables (DLT) pipelines are very flexible, but by default, the target schema specified in the pipeline configuration (such as target or schema) is fixed. That said, you can implement strategies to enable...

  • 2 kudos
2 More Replies
garciargs
by New Contributor III
  • 206 Views
  • 2 replies
  • 2 kudos

Resolved! Incremental load from two tables

Hi, I am looking to build a ETL process for a incremental load silver table.This silver table, lets say "contracts_silver", is built by joining two bronze tables, "contracts_raw" and "customer".contracts_silverCONTRACT_IDSTATUSCUSTOMER_NAME1SIGNEDPet...

  • 206 Views
  • 2 replies
  • 2 kudos
Latest Reply
garciargs
New Contributor III
  • 2 kudos

Hi @hari-prasad ,Thank you! Will give it a try.Regards!

  • 2 kudos
1 More Replies
Labels