cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Hubert-Dudek
by Esteemed Contributor III
  • 6293 Views
  • 1 replies
  • 0 kudos

dlt append_flow = multiple streams into a single Delta table

With the append_flow method in Delta Live Tables, you can effortlessly combine data from multiple streams into a single Delta table.

dlt_target.png
  • 6293 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Thank you for sharing this information @Hubert-Dudek 

  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 10106 Views
  • 1 replies
  • 3 kudos

row-level concurrency

Databricks Runtime 14.2 now has row-level concurrency generally available and enabled by default for Delta tables with deletion vectors. This feature dramatically reduces conflicts between concurrent write operations.

142.png
  • 10106 Views
  • 1 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Thank you for sharing this @Hubert-Dudek !!!

  • 3 kudos
grazie
by Contributor
  • 1804 Views
  • 0 replies
  • 1 kudos

Run a job as different service principals

We currently have several workflows that are basically copies with the only difference being that they run with different service principals and so have different permissions and configuration based on who is running. The way this is managed today is...

  • 1804 Views
  • 0 replies
  • 1 kudos
reshmir18
by New Contributor II
  • 2263 Views
  • 1 replies
  • 0 kudos

Unable to setcheckpointdir in unitycatalog enabled workspace

I have a Unity catalog enabled workspace where I am trying to setCheckpointDir during runtime. The method looks to authenticate using fs.azure.account.key instead of storage credentials. I am using databricks access connector which has "Storage Blob ...

Data Engineering
autoloader
Databricks
storagecredentials
streaming
unitycatalog
  • 2263 Views
  • 1 replies
  • 0 kudos
Latest Reply
reshmir18
New Contributor II
  • 0 kudos

@Retired_mod I have provided all the necessary permissions and were able to browse through the folders of the container added as an external location.I don't understand why the method setcheckpointdir looks for account key when the access is already ...

  • 0 kudos
Anup
by New Contributor III
  • 9232 Views
  • 1 replies
  • 1 kudos

Resolved! Copy Into : Pattern for sub-folders

While trying to ingest data from the S3 bucket, we are running into a situation where the data in s3 buckets is in sub-folders of multiple depths.Is there a good way of specifying patterns for the above case?We tried using the following for a depth o...

  • 9232 Views
  • 1 replies
  • 1 kudos
MinMin
by New Contributor II
  • 4206 Views
  • 3 replies
  • 0 kudos

Extra underscore behind ".xlsm" and ".xlsx" after exporting excel files from Databricks

Hi all, I tried to export several excel files from Databricks. But there will always be one extra underscore behind ".xlsm" and ".xlsx", if I export them and try to open the files on local system. I have to manually remove the underscore from the fil...

  • 4206 Views
  • 3 replies
  • 0 kudos
Latest Reply
DH_Fable
New Contributor II
  • 0 kudos

Hi, did you find a solution this? I have the same/similar problem where when I save a dataframe from a Databricks notebook using to_excel() it saves the file with extension ".xlsx_" rather then "xlsx", meaning to open I have to manually download and ...

  • 0 kudos
2 More Replies
Kira
by New Contributor
  • 1269 Views
  • 0 replies
  • 0 kudos

FeatureStoreClient speed up create_training_set

I am trying to create training set with 10 Feature Lookups (about 1200 features total). # all args for create_training_set df = fs.create_training_set(args).load_df()I must store this data to delta table for further analysis. Writing this returned da...

Data Engineering
Feature Store
MachineLearning
  • 1269 Views
  • 0 replies
  • 0 kudos
williamwjs
by New Contributor II
  • 8818 Views
  • 2 replies
  • 1 kudos

Issue with Could not initialize class $linec4a1686037264c21b0e58b369fab8f2d59.$read$

Our job is written in Scala on DataBricks. It used to have the same problem, but was managed to work with putting all case classes in a separate cell. However, lately it started to fail again due to the same error:Could not initialize class $linec4a1...

  • 8818 Views
  • 2 replies
  • 1 kudos
Latest Reply
williamwjs
New Contributor II
  • 1 kudos

Hi @Retired_mod , may I ask if there's any updates to this issue? Thank you!

  • 1 kudos
1 More Replies
fijoy
by Contributor
  • 20003 Views
  • 6 replies
  • 11 kudos

How to remove widgets from a notebook dashboard?

I'm creating a dashboard from the output of a notebook cell, but noticing that the dashboard displays the the widgets of the notebook in addition to the cell output. How can I remove the widgets from the dashboard?

  • 20003 Views
  • 6 replies
  • 11 kudos
Latest Reply
Nico2
New Contributor II
  • 11 kudos

Did you find any solution for this? I am facing a similar issue wanting to create multiple dashboads on a single notebook where not all widgets are relevant for both dashboards. this makes it difficult for users to understand the dahsboard.

  • 11 kudos
5 More Replies
Dp15
by Contributor
  • 9305 Views
  • 1 replies
  • 1 kudos

Schema Deletion -Structured Streaming

Hi,I have a Structured Stream which reads data from my silver layer and creates a gold layer using foreachBatch. The stream has been working fine, but now I have change where there are deletions to the schema and some of the columns from the silver l...

  • 9305 Views
  • 1 replies
  • 1 kudos
Latest Reply
Dp15
Contributor
  • 1 kudos

@Retired_mod Thank you so much for a detailed explanation 

  • 1 kudos
Phani1
by Databricks MVP
  • 13704 Views
  • 2 replies
  • 2 kudos

encryption

Hi Databricks, Could you please guide me on the below scenario?Here is the use case we are trying to solve forCurrently environment is using “Voltage” as an encryption tool for encrypting the data in S3 in conjunction with business business-provided ...

  • 13704 Views
  • 2 replies
  • 2 kudos
Latest Reply
AliaCollier
New Contributor II
  • 2 kudos

To replace "Voltage" with Databricks encryption, follow these steps: set up a Customer Managed Key in AWS, configure the S3 bucket, read data in Databricks, and implement custom UDFs for AES encryption/decryption.

  • 2 kudos
1 More Replies
TiagoMag
by New Contributor III
  • 11166 Views
  • 1 replies
  • 2 kudos

Resolved! DLT pipeline evolution schema error

Hello everyone, I am currently working on my first dlt pipeline, and I stumped on a problem which I am struggling to solve.I am working on several tables where I have a column called "my_column" with an array of json with two keys : 1 key : score, 2n...

  • 11166 Views
  • 1 replies
  • 2 kudos
david3
by New Contributor III
  • 4748 Views
  • 4 replies
  • 3 kudos

Resolved! delta live table udf not known when defined in python module

Hi I have the problem that my "module" is not known when used in a user defined function. The precise message is posted below. I have a repo structure as follows:  analytics_pipelines │ ├── __init__.py │ ├── coordinate_transformation.py │ ├── d...

  • 4748 Views
  • 4 replies
  • 3 kudos
Latest Reply
david3
New Contributor III
  • 3 kudos

Hiyes, I discovered three working possibilities:Define the pandas functions as inline function as pointed out aboveDefine the pandas function in the same script that is imported as "library" in the dlt config ( libraries: - notebook: path: ./pipeline...

  • 3 kudos
3 More Replies
561064
by New Contributor II
  • 8903 Views
  • 2 replies
  • 0 kudos

Exporting delta table to one CSV

Process to export a delta table is taking ~2hrs.Delta table has 66 partitions with total size of ~6gb, 4million rows and 270 columns.Used below commanddf.coalesce(1).write.csv("path")what are my options to reduce the time?

  • 8903 Views
  • 2 replies
  • 0 kudos
Latest Reply
Dribka
New Contributor III
  • 0 kudos

A very interesting task in front of you.... let me know how you solve it!

  • 0 kudos
1 More Replies
BenLambert
by Contributor
  • 13204 Views
  • 4 replies
  • 0 kudos

Resolved! Explode is giving unexpected results.

I have a dataframe with a schema similar to the following:id: stringarray_field: array   element: struct          field1: string          field2: string          array_field2: array               element: struct                     nested_field: stri...

  • 13204 Views
  • 4 replies
  • 0 kudos
Latest Reply
BenLambert
Contributor
  • 0 kudos

It turns out that if the exploded fields don't match the schema that was defined when reading the JSON in the first place that all the data that doesn't match is silently dropped. This is not really nice default behaviour.

  • 0 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels