cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

grazie
by Contributor
  • 1335 Views
  • 0 replies
  • 0 kudos

Run a job as different service principals

We currently have several workflows that are basically copies with the only difference being that they run with different service principals and so have different permissions and configuration based on who is running. The way this is managed today is...

  • 1335 Views
  • 0 replies
  • 0 kudos
reshmir18
by New Contributor II
  • 1645 Views
  • 1 replies
  • 0 kudos

Unable to setcheckpointdir in unitycatalog enabled workspace

I have a Unity catalog enabled workspace where I am trying to setCheckpointDir during runtime. The method looks to authenticate using fs.azure.account.key instead of storage credentials. I am using databricks access connector which has "Storage Blob ...

Data Engineering
autoloader
Databricks
storagecredentials
streaming
unitycatalog
  • 1645 Views
  • 1 replies
  • 0 kudos
Latest Reply
reshmir18
New Contributor II
  • 0 kudos

@Retired_mod I have provided all the necessary permissions and were able to browse through the folders of the container added as an external location.I don't understand why the method setcheckpointdir looks for account key when the access is already ...

  • 0 kudos
Anup
by New Contributor III
  • 8498 Views
  • 1 replies
  • 1 kudos

Resolved! Copy Into : Pattern for sub-folders

While trying to ingest data from the S3 bucket, we are running into a situation where the data in s3 buckets is in sub-folders of multiple depths.Is there a good way of specifying patterns for the above case?We tried using the following for a depth o...

  • 8498 Views
  • 1 replies
  • 1 kudos
MinMin
by New Contributor II
  • 3080 Views
  • 3 replies
  • 0 kudos

Extra underscore behind ".xlsm" and ".xlsx" after exporting excel files from Databricks

Hi all, I tried to export several excel files from Databricks. But there will always be one extra underscore behind ".xlsm" and ".xlsx", if I export them and try to open the files on local system. I have to manually remove the underscore from the fil...

  • 3080 Views
  • 3 replies
  • 0 kudos
Latest Reply
DH_Fable
New Contributor II
  • 0 kudos

Hi, did you find a solution this? I have the same/similar problem where when I save a dataframe from a Databricks notebook using to_excel() it saves the file with extension ".xlsx_" rather then "xlsx", meaning to open I have to manually download and ...

  • 0 kudos
2 More Replies
Kira
by New Contributor
  • 939 Views
  • 0 replies
  • 0 kudos

FeatureStoreClient speed up create_training_set

I am trying to create training set with 10 Feature Lookups (about 1200 features total). # all args for create_training_set df = fs.create_training_set(args).load_df()I must store this data to delta table for further analysis. Writing this returned da...

Data Engineering
Feature Store
MachineLearning
  • 939 Views
  • 0 replies
  • 0 kudos
williamwjs
by New Contributor II
  • 8274 Views
  • 2 replies
  • 1 kudos

Issue with Could not initialize class $linec4a1686037264c21b0e58b369fab8f2d59.$read$

Our job is written in Scala on DataBricks. It used to have the same problem, but was managed to work with putting all case classes in a separate cell. However, lately it started to fail again due to the same error:Could not initialize class $linec4a1...

  • 8274 Views
  • 2 replies
  • 1 kudos
Latest Reply
williamwjs
New Contributor II
  • 1 kudos

Hi @Retired_mod , may I ask if there's any updates to this issue? Thank you!

  • 1 kudos
1 More Replies
fijoy
by Contributor
  • 16721 Views
  • 6 replies
  • 11 kudos

How to remove widgets from a notebook dashboard?

I'm creating a dashboard from the output of a notebook cell, but noticing that the dashboard displays the the widgets of the notebook in addition to the cell output. How can I remove the widgets from the dashboard?

  • 16721 Views
  • 6 replies
  • 11 kudos
Latest Reply
Nico2
New Contributor II
  • 11 kudos

Did you find any solution for this? I am facing a similar issue wanting to create multiple dashboads on a single notebook where not all widgets are relevant for both dashboards. this makes it difficult for users to understand the dahsboard.

  • 11 kudos
5 More Replies
Dp15
by Contributor
  • 8728 Views
  • 1 replies
  • 1 kudos

Schema Deletion -Structured Streaming

Hi,I have a Structured Stream which reads data from my silver layer and creates a gold layer using foreachBatch. The stream has been working fine, but now I have change where there are deletions to the schema and some of the columns from the silver l...

  • 8728 Views
  • 1 replies
  • 1 kudos
Latest Reply
Dp15
Contributor
  • 1 kudos

@Retired_mod Thank you so much for a detailed explanation 

  • 1 kudos
Phani1
by Valued Contributor II
  • 12671 Views
  • 2 replies
  • 2 kudos

encryption

Hi Databricks, Could you please guide me on the below scenario?Here is the use case we are trying to solve forCurrently environment is using “Voltage” as an encryption tool for encrypting the data in S3 in conjunction with business business-provided ...

  • 12671 Views
  • 2 replies
  • 2 kudos
Latest Reply
AliaCollier
New Contributor II
  • 2 kudos

To replace "Voltage" with Databricks encryption, follow these steps: set up a Customer Managed Key in AWS, configure the S3 bucket, read data in Databricks, and implement custom UDFs for AES encryption/decryption.

  • 2 kudos
1 More Replies
TiagoMag
by New Contributor III
  • 9636 Views
  • 1 replies
  • 2 kudos

Resolved! DLT pipeline evolution schema error

Hello everyone, I am currently working on my first dlt pipeline, and I stumped on a problem which I am struggling to solve.I am working on several tables where I have a column called "my_column" with an array of json with two keys : 1 key : score, 2n...

  • 9636 Views
  • 1 replies
  • 2 kudos
david3
by New Contributor III
  • 3491 Views
  • 4 replies
  • 3 kudos

Resolved! delta live table udf not known when defined in python module

Hi I have the problem that my "module" is not known when used in a user defined function. The precise message is posted below. I have a repo structure as follows:  analytics_pipelines │ ├── __init__.py │ ├── coordinate_transformation.py │ ├── d...

  • 3491 Views
  • 4 replies
  • 3 kudos
Latest Reply
david3
New Contributor III
  • 3 kudos

Hiyes, I discovered three working possibilities:Define the pandas functions as inline function as pointed out aboveDefine the pandas function in the same script that is imported as "library" in the dlt config ( libraries: - notebook: path: ./pipeline...

  • 3 kudos
3 More Replies
561064
by New Contributor II
  • 6729 Views
  • 2 replies
  • 0 kudos

Exporting delta table to one CSV

Process to export a delta table is taking ~2hrs.Delta table has 66 partitions with total size of ~6gb, 4million rows and 270 columns.Used below commanddf.coalesce(1).write.csv("path")what are my options to reduce the time?

  • 6729 Views
  • 2 replies
  • 0 kudos
Latest Reply
Dribka
New Contributor III
  • 0 kudos

A very interesting task in front of you.... let me know how you solve it!

  • 0 kudos
1 More Replies
BenLambert
by Contributor
  • 11174 Views
  • 4 replies
  • 0 kudos

Resolved! Explode is giving unexpected results.

I have a dataframe with a schema similar to the following:id: stringarray_field: array   element: struct          field1: string          field2: string          array_field2: array               element: struct                     nested_field: stri...

  • 11174 Views
  • 4 replies
  • 0 kudos
Latest Reply
BenLambert
Contributor
  • 0 kudos

It turns out that if the exploded fields don't match the schema that was defined when reading the JSON in the first place that all the data that doesn't match is silently dropped. This is not really nice default behaviour.

  • 0 kudos
3 More Replies
afk
by New Contributor III
  • 5101 Views
  • 2 replies
  • 2 kudos

Change data feed from target tables of APPLY CHANGES

Up until yesterday I was (sort of) able to read changes from target tables of apply changes operations (either through tables_changes() or using readChangeFeed). I say sort of because the meta columns (_change_type, _commit_version, _commit_timestamp...

  • 5101 Views
  • 2 replies
  • 2 kudos
ElaPG
by New Contributor III
  • 2340 Views
  • 1 replies
  • 0 kudos

DLT concurrent pipeline updates.

Hi!Regarding this info "An Azure Databricks workspace is limited to 100 concurrent pipeline updates." (Release 2023.16 - Azure Databricks | Microsoft Learn), what is considered as an update? Changes in pipeline logic or each pipeline run?

  • 2340 Views
  • 1 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels