cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

wschoi
by New Contributor III
  • 20018 Views
  • 17 replies
  • 17 kudos

How to fix plots and image color rendering on Notebooks?

I am currently running dark mode for my Databricks Notebooks, and am using the "new UI" released a few days ago (May 2023) and the "New notebook editor."Currently all plots (like matplotlib) are showing wrong colors. For example, denoting:```... p...

  • 20018 Views
  • 17 replies
  • 17 kudos
Latest Reply
griffen_kociela
  • 17 kudos

Still a problem when using Plotly visualizations.

  • 17 kudos
16 More Replies
Darshan137
by New Contributor II
  • 86 Views
  • 1 replies
  • 1 kudos

Transitioning from ADF to Databricks Workflows: Best Practices in a Multi-Workspace (dev-prod)

Hi Community,We have a data processing framework running on Azure Databricks with Unity Catalog, and we're evaluating options to consolidate our orchestration entirely within the Databricks ecosystem.CURRENT ARCHITECTURE:~20 use cases, each containin...

  • 86 Views
  • 1 replies
  • 1 kudos
Latest Reply
Lu_Wang_ENB_DBX
Databricks Employee
  • 1 kudos

Answers to your questions. Orchestration (replace ADF) Use Lakeflow Jobs (Databricks Jobs) as the primary orchestrator: one job per use case with a task graph (notebook / SQL / pipeline tasks) to express both sequential and parallel branches, retrie...

  • 1 kudos
shan-databricks
by Databricks Partner
  • 32 Views
  • 1 replies
  • 0 kudos

Lakeflow Connect: Data Ingestion from SQL Server to Databricks

We have a use case to ingest data from SQL Server into Databricks using Lakeflow Connect. There are 100 tables, and on a daily basis we will perform inserts, updates, and deletes based on CDC data. For this requirement, how can we enable multiple par...

  • 32 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @shan-databricks, Databricks recommends up to ~250 tables per pipeline, so 100 is well within guidance. Lakeflow Connect doesn’t offer a user-facing control for multiple parallel connections. Instead, configure a single SQL Server gateway with suf...

  • 0 kudos
faruko
by New Contributor II
  • 45 Views
  • 2 replies
  • 0 kudos

Best practices for initial large-scale ingestion from on‑premises Oracle to Databricks

Hello everyone,I am responsible for designing and implementing a Lakehouse architecture in an industrial company.I am currently facing some challenges regarding the initial ingestion of data from our on‑premise Oracle database into Databricks.The dat...

  • 45 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @faruko ,You can split  split initial load using partitioned reads. We did that approach in one of projects. So instead doing something like this:SELECT * FROM large_tableYou can do that:SELECT * FROM table WHERE id BETWEEN 0 AND 1,000,000With tha...

  • 0 kudos
1 More Replies
cvh
by New Contributor III
  • 238 Views
  • 7 replies
  • 3 kudos

Does Lakeflow Connect Have Any Change Tracking Diagnostics?

We have set up Change Tracking on multiple SQL Servers for Lakeflow Connect successfully in the past, but lately we are having lots of problems with a couple of servers. The latest utility script has been run and both lakeflowSetupChangeTracking and ...

  • 238 Views
  • 7 replies
  • 3 kudos
Latest Reply
cvh
New Contributor III
  • 3 kudos

Thanks @Ashwin_DSA , @amirabedhiafi for your swift responses.I had high hopes when I saw lakeflowUtilityVersion_1_5() is queried, as I found the database user for the ingestion gateway connection (i.e. the @User parameter for both dbo.lakeflowSetupCh...

  • 3 kudos
6 More Replies
kcyugesh
by New Contributor II
  • 50 Views
  • 1 replies
  • 0 kudos

Unity Catalog storage credential fails although same Access Connector works in another credential

  In Azure Databricks Unity Catalog, I have two storage credentials that use the same connector_id / Azure Databricks Access Connector.One credential works and can access ADLS Gen2 successfully, but the other fails with: Failed to access cloud storag...

  • 50 Views
  • 1 replies
  • 0 kudos
Latest Reply
juan_maedo
Contributor
  • 0 kudos

I've never used this scenario before so I just tested the exact same scenario and it works correctly with two storage credentials using the same Access Connector:cred1 → ext_1: abfss://data-test@data_test_storage.dfs.core.windows.net/path1/cred2 → ex...

  • 0 kudos
MikeGo
by Valued Contributor
  • 441 Views
  • 7 replies
  • 2 kudos

Table update trigger and File Arrival trigger latency

Hi team,When using table update or file arrival trigger, what latency I can expect for the trigger. Does Databricks poll the source by some schedule? If yes, whether the poll is free?Thanks

  • 441 Views
  • 7 replies
  • 2 kudos
Latest Reply
MikeGo
Valued Contributor
  • 2 kudos

Hi @Ashwin_DSA ,Appreciate for the further clarification. Let's make this even clearer. "the trigger hands your job a parameter payload with the updated table list and the most recent commit version"This is a good thing but likely it cannot be used, ...

  • 2 kudos
6 More Replies
Avinash_Narala
by Databricks Partner
  • 122 Views
  • 2 replies
  • 0 kudos

Data Loss in Incremental Batch Jobs Due to Latency in delta file write to blob

Hi everyone,I am facing a data consistency issue in my Databricks incremental pipeline where records are being skipped because of a time gap between when a record is processed and when the physical file is finalized in Azure Blob Storage (ABFS).Our A...

  • 122 Views
  • 2 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

You can handle it as belowFix the Bronze Write - The 20+ minutes commit gap suggests metadata contention or "Small File Issues" in the bronze delta tables. You can optimize tables manually or enable Optimized Write and Auto Optimize if feasible. This...

  • 0 kudos
1 More Replies
AdrianLobacz
by Databricks Partner
  • 132 Views
  • 1 replies
  • 0 kudos

Best option for parallel processing

I faced some challenges in my projects related to parallel processing in Databricks. In many cases, the issue was not the volume of data itself, but the overall execution time. I was processing a relatively small number of objects, but each object re...

  • 132 Views
  • 1 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

The Driver was the bottleneck in the Thread Pool approach. By moving to Serverless Workflows, you can shift the orchestration weight to the Databricks Control Plane.Eliminate Driver Saturation: Serverless compute for Workflows natively handles task d...

  • 0 kudos
RodrigoE
by New Contributor III
  • 185 Views
  • 4 replies
  • 0 kudos

Ingest data from REST endpoint into Databricks

Hello,I'm looking for the best option to retrieve between 1-1.5TB of data per day from a REST API into Databricks.Thank you,Rodrigo Escamilla

  • 185 Views
  • 4 replies
  • 0 kudos
Latest Reply
rohan22sri
New Contributor III
  • 0 kudos

Hi Rodrigo,One simple approach I’ve used is calling the REST API directly from a Databricks notebook using standard Python libraries—no extra setup or tools required.The idea is to keep it minimal: generate the API signature, call the endpoint, and l...

  • 0 kudos
3 More Replies
AlexSantiago
by New Contributor II
  • 17258 Views
  • 22 replies
  • 4 kudos

spotify API get token - raw_input was called, but this frontend does not support input requests.

hello everyone, I'm trying use spotify's api to analyse my music data, but i'm receiving a error during authentication, specifically when I try get the token, above my code.Is it a databricks bug?pip install spotipyfrom spotipy.oauth2 import SpotifyO...

  • 17258 Views
  • 22 replies
  • 4 kudos
Latest Reply
unifiedfilter
New Contributor
  • 4 kudos

That’s where unifiedfilter comes in  offering reliable air and water filtration solutions that help improve indoor air quality and water purity, ensuring a cleaner, unifiedfilter.com  safer, and more comfortable living space for you and your loved on...

  • 4 kudos
21 More Replies
Oumeima
by New Contributor III
  • 789 Views
  • 5 replies
  • 2 kudos

Resolved! Lakeflow Connect - SQL Server - Database Setup step keeps failing

Hello,I am trying to ingest data from an Azure SQL Database using lakeflow connect.- I'm using a service principle for authentication (created the login and user in the DB am trying to ingest)- The utility script was executed by a DB owner=== Install...

  • 789 Views
  • 5 replies
  • 2 kudos
Latest Reply
Oumeima
New Contributor III
  • 2 kudos

We figured out the issue finally! We checked the database sql audit logs and noticed that there was a particular query that was taking too long (4min) for the ingestion user. This was causing a timeout. This query is very simple and takes usually a c...

  • 2 kudos
4 More Replies
ashutoshacharya
by New Contributor
  • 135 Views
  • 1 replies
  • 1 kudos

Resolved! Unable to see lakeflow designer option in my free edition databricks account

I am unable to see the lakeflow designer option in my databricks account. Even the previews option is not there ... Please let me know how can access that 

  • 135 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @ashutoshacharya, Right now, Lakeflow Designer is in Public Preview, and it isn’t fully rolled out to Databricks Free Edition yet, which is why you don’t see it in the UI or under Previews. On full (paid or trial) workspaces, a workspace admin can...

  • 1 kudos
maikel
by Contributor II
  • 152 Views
  • 1 replies
  • 0 kudos

Uploading file to volume and start ingestion job

Hello Community!I am writing to you with my idea about data ingestion job which we have to implement in our project.The data which we have are in CSV file format and depending on the case it differs a little bit. Before uploading we pivoting csv file...

  • 152 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @maikel, You don't have to build a custom solution for this. Databricks now has native components that align very well with what you want. If you want the job to start as soon as new files land in a volume, the recommended approach is to use file-...

  • 0 kudos
Labels