cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sminamioka
by New Contributor III
  • 627 Views
  • 5 replies
  • 1 kudos

Resolved! Compute tab doesn't show and doesn't give the option to create a cluster

I've just created an Azure Databricks workspace, tier (Premium) and when trying to create a cluster, when I click on compute, the UI opens automatically the menu SQL Warehouse, not sure if it's a glitch as shown below. Someone said "Ask the admin to ...

sminamioka_0-1778276402869.png
Data Engineering
cluster
clusters
  • 627 Views
  • 5 replies
  • 1 kudos
Latest Reply
gcj0310
Databricks Partner
  • 1 kudos

Hi @sminamioka This does not look like a UI glitch. In newer Azure Databricks workspaces, access to classic compute / clusters depends on workspace entitlements and compute policy permissions.If clicking Compute takes you directly to SQL Warehouses, ...

  • 1 kudos
4 More Replies
Guillermo-HR
by New Contributor
  • 212 Views
  • 1 replies
  • 0 kudos

Streaming read and writing with aggregation

Hi,I have the following problem: on a medallion architecture on a bronze volume I get files every month containing the data for each sensor reading during the period 1 of month 00:00 to last day 23:00. I have a manual job that calls the python files ...

  • 212 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @Guillermo-HR  Yes — batch is usually the right fix here. What’s happening is that your query is using event-time window aggregation in Structured Streaming with append output mode. In that mode, Spark only emits a window after it is sure the wind...

  • 0 kudos
Radeesh
by New Contributor
  • 245 Views
  • 2 replies
  • 0 kudos

unable to download data ingestion with lake flow Notebook

I have registered for the Data Engineer Learning Plan, but I am unable to set up the lab shown in the video. Additionally, I cannot find where to download the notebook ZIP file. Could you please help me with this?

  • 245 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @Radeesh, Can you clarify which particular module you are referring to? Unfortunately, notebooks are not available for download in the current self-paced course. The narration is inherited from an earlier/instructor-led version of the material whe...

  • 0 kudos
1 More Replies
theanhdo
by New Contributor III
  • 5171 Views
  • 5 replies
  • 1 kudos

Run continuous job for a period of time

Hi there,I have a job where the Trigger type is configured as Continuous. I want to only run the Continuous job for a period of time per day, e.g. 8AM - 5PM. I understand that we can achieve it by manually starting and cancelling the job on the UI, o...

  • 5171 Views
  • 5 replies
  • 1 kudos
Latest Reply
KrisJohannesen
Contributor III
  • 1 kudos

The "not-so-pretty-but-it-works" solution I have come across is exactly what you are hinting at yourself.Create the Continuous job - have it be pausedCreate a secondary "start job"-job - which is basically just that API call in a notebook or python f...

  • 1 kudos
4 More Replies
Areqio
by New Contributor II
  • 339 Views
  • 2 replies
  • 1 kudos

trying to send data from a stream table to an azure event hub in a serverless cluster

Is there a way to stream data from Databricks to Azure event hubs in a serverless pipeline environment without using the azure-eventhub library, since it isn’t compatible with serverless pipelines, and instead rely solely on the Kafka-compatible inte...

  • 339 Views
  • 2 replies
  • 1 kudos
Latest Reply
amirabedhiafi
Contributor
  • 1 kudos

Hello @Areqio !Yes, you can use Azure event hubs through its Kafka compatible endpoint and not the azure-eventhubs-spark / azure-eventhub connector. JVM libraries are not allowed in LSDP and event hubs should be accessed through the built in Spark Ka...

  • 1 kudos
1 More Replies
HTD360
by New Contributor III
  • 496 Views
  • 3 replies
  • 4 kudos

Autoscaling with the autoloader without SDP

Hi there,I have a question regarding the autoloader without SDP and auto-scaling of clusters. I'm reading the following in the docs:Production considerations for Structured Streaming | Databricks on AWS:Do not enable autoscaling for compute for Struc...

  • 496 Views
  • 3 replies
  • 4 kudos
Latest Reply
HTD360
New Contributor III
  • 4 kudos

Hi, thank you for your answer. Could you elaborate a bit on this?for non SDP available now auto loader jobs autoscaling can be reasonableHow do you decide on whether it is reasonable or not? Especially you said it is not recommended to enable compute...

  • 4 kudos
2 More Replies
Abhishek_sinha
by New Contributor III
  • 533 Views
  • 2 replies
  • 3 kudos

Connecting DBeaver to Databricks Lakebase — Setup & Troubleshooting

I recently connected DBeaver to Databricks Lakebase and wanted to share the setup steps along with a couple of troubleshooting issues I encountered.Since Lakebase is PostgreSQL-compatible, the standard PostgreSQL driver works directly without requiri...

  • 533 Views
  • 2 replies
  • 3 kudos
Latest Reply
amirabedhiafi
Contributor
  • 3 kudos

Hello @Abhishek_sinha  ! Thanks for sharing this ! very useful  Few things I can add (from my personal XP), it is better to use the PostgreSQL driver and not the DBKS JDBC driver because Lakebase is PostgreSQL compatible so DBeaver should be configur...

  • 3 kudos
1 More Replies
dbr_data_engg
by New Contributor III
  • 2608 Views
  • 3 replies
  • 1 kudos

Using Databrick Bladebridge or Lakebridge for SQL Migration

Getting Transpile Error while executing command for Databrick Bladebridge or Lakebridge,databricks labs lakebridge transpile --source-dialect mssql --input-source "<Path>/sample.sql" --output-folder "<Path>\output"Error :TranspileError(code=FAILURE, ...

  • 2608 Views
  • 3 replies
  • 1 kudos
Latest Reply
Satyam4u
New Contributor III
  • 1 kudos

Looks like some dependency/runtime issue with LakeBridge installation on Windows.pip uninstall databricks-labs-lakebridge -ypip install databricks-labs-lakebridgeAlso check Python version compatibility once. Python 3.10/3.11 worked better in my case.

  • 1 kudos
2 More Replies
shan-databricks
by Databricks Partner
  • 432 Views
  • 1 replies
  • 0 kudos

Lakeflow Connect Data ingestion from SQL Server and PostgreSQL to Databricks with CDC

We have a requirement to use Lakeflow Connect for data ingestion from SQL Server and PostgreSQL into Databricks with CDC and Lakehouse federation. I would like to understand the pros and cons of Lakeflow Connect in the following areas Firewall/gatewa...

Data Engineering
@Lakeflow Connect @Lakehouse Federation
  • 432 Views
  • 1 replies
  • 0 kudos
Latest Reply
ziafazal
Databricks Partner
  • 0 kudos

Hi @shan-databricks You should setup postgresql for ingestion via Lakeflow connect. Once your Postgres logical replication is ready you have to create ingestion pipelines which comprise a gateway and ingestion pipeline. Your gateway pipeline is conti...

  • 0 kudos
Akshay_Petkar
by Valued Contributor
  • 841 Views
  • 6 replies
  • 4 kudos

Resolved! Lakebridge reconciliation code keeps running continuously without Spark jobs or errors

Hi,I am facing an issue while running the Lakebridge reconciliation code in Databricks using TriggerReconService.trigger_recon().The code keeps running continuously without any output, error, or logs. Also, no Spark jobs are getting triggered or show...

  • 841 Views
  • 6 replies
  • 4 kudos
Latest Reply
KrisJohannesen
Contributor III
  • 4 kudos

Could you give an example of the config files you have set up for running the reconciliation? The file determines most of the settings - so without it it is hard to assist you 

  • 4 kudos
5 More Replies
yit337
by Contributor
  • 340 Views
  • 2 replies
  • 1 kudos

Performance optimization on auto_cdc_flow

I've got a fact streaming table, which is updated by SCD2 records from the CDF of a silver table. The join is on pk  (hash key generated from dimensions business keys) and factory_code (60 unique values). On each incremental processing, it reads all ...

  • 340 Views
  • 2 replies
  • 1 kudos
Latest Reply
amirabedhiafi
Contributor
  • 1 kudos

Hi @yit337  !This is expected for AUTO CDC with SCD2 and it is not doing a simple append because it must upsert incoming CDC rows into the target based on the declared keys and for SCD2 it also maintains historical rows with __START_AT or __END_AT.So...

  • 1 kudos
1 More Replies
ideal_knee
by New Contributor III
  • 13946 Views
  • 7 replies
  • 8 kudos

Reading an Iceberg table with AWS Glue Data Catalog as metastore

I have created an Iceberg table using AWS Glue, however whenever I try to read it using a Databricks cluster, I get `java.lang.InstantiationException`. I have tried every combination of Spark configs for my Databricks compute cluster that I can think...

  • 13946 Views
  • 7 replies
  • 8 kudos
Latest Reply
ideal_knee
New Contributor III
  • 8 kudos

In case someone happens upon this in the future, I ended up using Unity Catalog with Hive metastore federation for Glue. The Iceberg support is currently "coming soon in Public Preview."

  • 8 kudos
6 More Replies
batch_bender
by New Contributor II
  • 1519 Views
  • 4 replies
  • 2 kudos

create_auto_cdc_from_snapshot_flow vs create_auto_cdc_flow – when is snapshot CDC actually worth it?

I am deciding between create_auto_cdc_from_snapshot_flow() and create_auto_cdc_flow() in a pipeline.My source is a daily full snapshot table:No operation column (no insert/update/delete flags)Order can be derived from snapshot_date (sequence by)Rows ...

  • 1519 Views
  • 4 replies
  • 2 kudos
Latest Reply
manish_de
New Contributor III
  • 2 kudos

Does this work only for tables with PK. What if the source table doesnt even have PK. Does it use any type of hashing by concatenating all columns and then use that key for merge? 

  • 2 kudos
3 More Replies
yit337
by Contributor
  • 549 Views
  • 2 replies
  • 2 kudos

Resolved! Does Lakeflow Connect guarantee no out-of-order records?

I use Lakeflow Connect to load data from my source databases to bronze tables. Then I have auto_cdc to track SCD2 changes in my silver tables. I use _commit_timestamp from the bronze CDF, as sequence_by property in auto_cdc in order to order the vers...

  • 549 Views
  • 2 replies
  • 2 kudos
Latest Reply
Lu_Wang_ENB_DBX
Databricks Employee
  • 2 kudos

Recommendation: use a business/effective timestamp in sequence_by if your source can emit late/backdated changes and you want SCD2 history to reflect source event time, not bronze arrival/commit time. If ties are possible, use a STRUCT for determinis...

  • 2 kudos
1 More Replies
Danish11052000
by Contributor
  • 369 Views
  • 3 replies
  • 1 kudos

Need to fetch Mount Point details

Hi Team,I’m currently working on building a consolidated view of access permissions across our Databricks environment.For Unity Catalog (UC) objects, I’m able to retrieve permission details using system tables (privileges / audit logs).However, for l...

  • 369 Views
  • 3 replies
  • 1 kudos
Latest Reply
amirabedhiafi
Contributor
  • 1 kudos

Hello @Danish11052000  !Thank you for the question it really helped me to review my knowledge and go back and pay attention to this subject and guess what ? you are correct because UC permissions alone will not give complete access governance for leg...

  • 1 kudos
2 More Replies
Labels