cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

William_Scardua
by Valued Contributor
  • 10 Views
  • 1 replies
  • 0 kudos

What the best Framework/Package for data quality

Hi everyone,I’m currently looking for a data-quality solution for my environment. I don’t have DTL tables or a Unity Catalog in place.In your opinion, what is the best framework or package to implement reliable data-quality checks under these conditi...

  • 10 Views
  • 1 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 0 kudos

Here are few DQ packages for DLT or LDP that you can try.1. Databricks Labs DQXPurpose-built for Spark and Databricks.Rule-based checks on DataFrames (batch & streaming).Supports quarantine and profiling.Lightweight and easy to integrate.2. Great Exp...

  • 0 kudos
ManojkMohan
by Honored Contributor II
  • 6 Views
  • 1 replies
  • 0 kudos

Accessing Databricks data in Salesforce via zero copy

I have uploaded clickstream data as shown belowDo i have to mandatorily share via Delta sharing for values to be exposed in Salesforce ?At the Salesforce end i have confirmed that i have a working connector where i am able to see samples data , but u...

ManojkMohan_0-1763394266370.png ManojkMohan_1-1763394302680.png ManojkMohan_3-1763394433335.png ManojkMohan_4-1763394570789.png
  • 6 Views
  • 1 replies
  • 0 kudos
Latest Reply
bianca_unifeye
New Contributor III
  • 0 kudos

@hasnat_unifeye  I think you setup recently the Salesforce connector. Can you please have a look at the issue described above? Thanks. 

  • 0 kudos
Nidhig
by Contributor
  • 2 Views
  • 0 replies
  • 0 kudos

Lakeflow jobs

 Hi I am currently working on migrating all ADF jobs to LakeFlow jobs. I have a few questions:Pipeline cost: What is the cost model for running LakeFlow pipelines? Any documentation available? ADF vs Lakeflow Job?Job reuse: Do LakeFlow jobs reuse the...

  • 2 Views
  • 0 replies
  • 0 kudos
ShivMukesh
by New Contributor
  • 3678 Views
  • 3 replies
  • 0 kudos

Upgrade HSM to UC using Ucx tool - workspace to workspace migration

Hello team,I understand that an automatic upgrade to UC utilizing the UCx tool (Databricks Lab project) is now available to complete this migration from HSM to UC in automate way. But does this tool allow workspace to workspace catalog/artifact migra...

  • 3678 Views
  • 3 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 0 kudos

@ShivMukesh I have used UCX to migrate to Unity catalog. It is a great tool. But it also involves lot of workarounds specially in group migration and table migration. In group migration it renames the old workspace group and assigns the same permissi...

  • 0 kudos
2 More Replies
Pratikmsbsvm
by Contributor
  • 43 Views
  • 3 replies
  • 1 kudos

Data Pipeline for Bringing Data from Oracle Fusion to Azure Databricks

I am trying to bring Oracle Fusion (SCM, HCM, Finance) Data and push to ADLS Gen2. Databricks used for Data Transformation and Powerbi used for Reports Visualization.I have 3 Option.Option 1 :Option 2 : Option 3May someone please help me which is bes...

Option1.png Option2.png Option3.png
  • 43 Views
  • 3 replies
  • 1 kudos
Latest Reply
Raman_Unifeye
Contributor
  • 1 kudos

Option-1 using Oracle's Bulk extraction utility BICC. It can directly export the extracted data files (typically CSV) to Oracle cloud storage destination and then you could use ADF to get it copied over to ADLS.

  • 1 kudos
2 More Replies
Naveenkumar1811
by New Contributor II
  • 20 Views
  • 2 replies
  • 0 kudos

SkipChangeCommit to True Scenario on Data Loss Possibility

Hi Team,I have Below Scenario,I have a Spark Streaming Job with trigger of Processing time as 3 secs Running Continuously 365 days.We are performing a weekly delete job from the source of this streaming job based on custom retention policy. it is a D...

  • 20 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

It shouldn't. You have append only stream and SkipChangeCommit will ignore any modification that were applied to already existing files 

  • 0 kudos
1 More Replies
Han_bbb
by Visitor
  • 16 Views
  • 1 replies
  • 0 kudos

Need to restore my scripts from the legacy version

Dear support team,The last time I used databricks was back in 2024 and I have several scripts stored in it. I really need to get access to them now but I can't login in with message "User is not a member of this workspace." Please help. Thanks

  • 16 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Han_bbb ,Could you provide more details? Is it workspace in your work or your private one? Which cloud provider? 

  • 0 kudos
StephenDsouza
by New Contributor II
  • 3052 Views
  • 2 replies
  • 0 kudos

Error during build process for serving model caused by detectron2

Hi All,Introduction: I am trying to register my model on Databricks so that I can serve it as an endpoint. The packages that I need are "torch", "mlflow", "torchvision", "numpy" and "git+https://github.com/facebookresearch/detectron2.git". For this, ...

  • 3052 Views
  • 2 replies
  • 0 kudos
Latest Reply
StephenDsouza
New Contributor II
  • 0 kudos

Found an answer!Basically pip was somehow installed the dependencies from the git repo first and was not following the given order so in order to solve this, I added the libraries for conda to install.``` conda_env = { "channels": [ "defa...

  • 0 kudos
1 More Replies
Naveenkumar1811
by New Contributor II
  • 18 Views
  • 4 replies
  • 0 kudos

How do i Create a workspace object with SP ownership

Hi Team,I have a scenario that i have a jar file(24MB) to be put on workspace directory. But the ownership should be associated to the SP with any Individual ID ownership. Tried the Databricks CLI export option but it has limitation of 10 MB max.Plea...

  • 18 Views
  • 4 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor
  • 0 kudos

Reference Link - https://docs.databricks.com/aws/en/volumes/volume-files#upload-files-to-a-volume

  • 0 kudos
3 More Replies
Sainath368
by New Contributor III
  • 15 Views
  • 2 replies
  • 0 kudos

Migrating from directory-listing to Autoloader Managed File events

Hi everyone,We are currently migrating from a directory listing-based streaming approach to managed file events in Databricks Auto Loader for processing our data in structured streaming.We have a function that handles structured streaming where we ar...

  • 15 Views
  • 2 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor
  • 0 kudos

Yes, for your setup, Databricks Auto Loader will create a separate event queue for each independent stream running with the cloudFiles.useManagedFileEvents = true option.As you are running - 1 stream per table, 1 unique directory per stream and 1 uni...

  • 0 kudos
1 More Replies
Naveenkumar1811
by New Contributor II
  • 62 Views
  • 3 replies
  • 0 kudos

Can we Change the ownership of Databricks Managed Secret to SP in Azure Data Bricks?

Hi Team,Earlier we faced an Issue where the jar file(Created by a old employee) in workspace directory is used as library in the cluster which is run from a SP. Since the employee left the org and the id got removed even though the SP is part of ADMI...

  • 62 Views
  • 3 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor III
  • 0 kudos

I think there is no other way. In any case, here is how I usually configure my (all-purpose and jobs compute) clusters to access secrets via environment variables so that you don't have to update all references if some similar issue arises again. The...

  • 0 kudos
2 More Replies
DarioB
by New Contributor III
  • 25 Views
  • 1 replies
  • 1 kudos

Resolved! Issues recreating Tables with enableRowTracking and DBR16.4 and below

We are running a Deep Clone script to copy Catalogs between Environments; this script is run through a job (run by SP) with DBR 16.4.12.Some tables are Deep Cloned and other ones are Dropped and Recreated to load partial data. The ones dropped are re...

  • 25 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Happy Monday @DarioB , I did some digging and would like to provide you with some helpful hints/tips. Thanks for the detailed context—this is a known rough edge in DBR 16.x when recreating tables that have row tracking materialized. What’s happening ...

  • 1 kudos
Volker
by Contributor
  • 4943 Views
  • 2 replies
  • 0 kudos

Structured Streaming schemaTrackingLocation does not work with starting_version

Hello Community,I came across a strange behviour when using structured streaming on top of a delta table. I have a stream that I wanted to start from a specific version  of a delta table using the option option("starting_version", x) because I did no...

Data Engineering
Delta Lake
schemaTrackingLocation
starting_version
structured streaming
  • 4943 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

This issue is related to how Delta Lake’s structured streaming interacts with schema evolution and options like startingVersion and schemaTrackingLocation. The behavior you've observed has been noted by other users, and can be subtle due to how check...

  • 0 kudos
1 More Replies
stevenayers-bge
by Contributor
  • 4117 Views
  • 2 replies
  • 1 kudos

Querying Unity Managed Tables from Redshift

I built a script about 6 months ago to make our Delta Tables accessible in Redshift for another team, but it's a bit nasty...Generate a delta lake manifest each time the databricks delta table is updatedRecreate the redshift external table (incase th...

  • 4117 Views
  • 2 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

There is indeed a better and more integrated way to make Delta Lake tables accessible in Redshift without manually generating manifests and dynamically creating external tables or partitions. Some important points and options: Databricks Delta Lake ...

  • 1 kudos
1 More Replies
Mangeysh
by New Contributor
  • 3745 Views
  • 2 replies
  • 0 kudos

Azure data bricks API for JSON output , displaying on UI

Hello AllI am new to Azure Data Bricks and trying to show the Azure data bricks table data onto UI using react JS. Lets say there 2 tables Emplyee and Salary , I need to join these two tables with empid and generate JSON out put and calling API (end ...

  • 3745 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The most effective way to display joined data from Azure Databricks tables (like Employee and Salary) in a React JS UI involves exposing your Databricks data through an API and then consuming that API in your frontend. Flask can work, but there are b...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels