cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

TalessRocha
by New Contributor II
  • 5896 Views
  • 11 replies
  • 8 kudos

Resolved! Connect to azure data lake storage using databricks free edition

Hello guys, i'm using databricks free edition (serverless) and i am trying to connect to a azure data lake storage.The problem I'm having is that in the free edition we can't configure the cluster so I tried to make the connection via notebook using ...

  • 5896 Views
  • 11 replies
  • 8 kudos
Latest Reply
pjvi
New Contributor II
  • 8 kudos

If you want to read from your Azure storage account using Databricks Free Edition, you can add a specific option when reading:spark.read.option("fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",                  "your_storage_account...

  • 8 kudos
10 More Replies
maikel
by Contributor II
  • 336 Views
  • 4 replies
  • 1 kudos

Resolved! Uploading file to volume and start ingestion job

Hello Community!I am writing to you with my idea about data ingestion job which we have to implement in our project.The data which we have are in CSV file format and depending on the case it differs a little bit. Before uploading we pivoting csv file...

  • 336 Views
  • 4 replies
  • 1 kudos
Latest Reply
maikel
Contributor II
  • 1 kudos

Yeah, understood. Thank you very much once again! 

  • 1 kudos
3 More Replies
maikel
by Contributor II
  • 17 Views
  • 0 replies
  • 0 kudos

Job tasks monitoring

Hello Community,We have a case in our project that we would like to solve in an elegant and scalable manner. As always, I would really appreciate your suggestions and experience.In short:We have a multi-step job consisting of 4 stages. In one of the ...

  • 17 Views
  • 0 replies
  • 0 kudos
Danish11052000
by Contributor
  • 1104 Views
  • 7 replies
  • 1 kudos

Resolved! How should I correctly extract the full table name from request_params in audit logs?

’m trying to build a UC usage/refresh tracking table for every workspace. For each workspace, I want to know how many times a UC table was refreshed or accessed each month. To do this, I’m reading the Databricks audit logs and I need to extract only ...

  • 1104 Views
  • 7 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @Danish11052000, You are on the right track with the COALESCE approach. The reason for the inconsistency is that different Unity Catalog action types populate different keys in request_params. Here is a breakdown of the key fields and which action...

  • 1 kudos
6 More Replies
Pranav_1699
by New Contributor
  • 22 Views
  • 0 replies
  • 0 kudos

Building a Spark Declarative Pipeline OSS with Apache Iceberg and AWS Glue Catalog

Hey everyone,I recently worked on building a modern financial data lakehouse using Spark Declarative Pipeline OSS (SDP OSS), Apache Iceberg, and AWS Glue Catalog.The blog covers:- Building declarative data pipelines with Spark- Using Apache Iceberg a...

Data Engineering
Spark Declarative Pipelines
  • 22 Views
  • 0 replies
  • 0 kudos
mnissen1337
by New Contributor II
  • 66 Views
  • 1 replies
  • 0 kudos

Managing Unity Catalog Permissions for Databricks Apps via DABs

I’m currently developing a Databricks App, and the app’s service principal needs access to Unity Catalog tables. From what I can tell, it doesn’t seem possible to grant Unity Catalog permissions through DABs yet — only through the UI, based on the cu...

  • 66 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @mnissen1337 ,But there is a way to do this in DABs. Look at following section in documentation:Manage Databricks apps using Declarative Automation Bundles | Databricks on AWSIf my answer was helpful, please consider marking it as accepted solutio...

  • 0 kudos
sminamioka
by New Contributor III
  • 213 Views
  • 5 replies
  • 1 kudos

Compute tab doesn't show and doesn't give the option to create a cluster

I've just created an Azure Databricks workspace, tier (Premium) and when trying to create a cluster, when I click on compute, the UI opens automatically the menu SQL Warehouse, not sure if it's a glitch as shown below. Someone said "Ask the admin to ...

sminamioka_0-1778276402869.png
Data Engineering
cluster
clusters
  • 213 Views
  • 5 replies
  • 1 kudos
Latest Reply
gcj0310
Databricks Partner
  • 1 kudos

Hi @sminamioka This does not look like a UI glitch. In newer Azure Databricks workspaces, access to classic compute / clusters depends on workspace entitlements and compute policy permissions.If clicking Compute takes you directly to SQL Warehouses, ...

  • 1 kudos
4 More Replies
Guillermo-HR
by New Contributor
  • 70 Views
  • 1 replies
  • 0 kudos

Streaming read and writing with aggregation

Hi,I have the following problem: on a medallion architecture on a bronze volume I get files every month containing the data for each sensor reading during the period 1 of month 00:00 to last day 23:00. I have a manual job that calls the python files ...

  • 70 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @Guillermo-HR  Yes — batch is usually the right fix here. What’s happening is that your query is using event-time window aggregation in Structured Streaming with append output mode. In that mode, Spark only emits a window after it is sure the wind...

  • 0 kudos
Radeesh
by New Contributor
  • 78 Views
  • 2 replies
  • 0 kudos

unable to download data ingestion with lake flow Notebook

I have registered for the Data Engineer Learning Plan, but I am unable to set up the lab shown in the video. Additionally, I cannot find where to download the notebook ZIP file. Could you please help me with this?

  • 78 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @Radeesh, Can you clarify which particular module you are referring to? Unfortunately, notebooks are not available for download in the current self-paced course. The narration is inherited from an earlier/instructor-led version of the material whe...

  • 0 kudos
1 More Replies
theanhdo
by New Contributor III
  • 4376 Views
  • 5 replies
  • 1 kudos

Run continuous job for a period of time

Hi there,I have a job where the Trigger type is configured as Continuous. I want to only run the Continuous job for a period of time per day, e.g. 8AM - 5PM. I understand that we can achieve it by manually starting and cancelling the job on the UI, o...

  • 4376 Views
  • 5 replies
  • 1 kudos
Latest Reply
KrisJohannesen
Contributor
  • 1 kudos

The "not-so-pretty-but-it-works" solution I have come across is exactly what you are hinting at yourself.Create the Continuous job - have it be pausedCreate a secondary "start job"-job - which is basically just that API call in a notebook or python f...

  • 1 kudos
4 More Replies
Areqio
by New Contributor II
  • 165 Views
  • 2 replies
  • 1 kudos

trying to send data from a stream table to an azure event hub in a serverless cluster

Is there a way to stream data from Databricks to Azure event hubs in a serverless pipeline environment without using the azure-eventhub library, since it isn’t compatible with serverless pipelines, and instead rely solely on the Kafka-compatible inte...

  • 165 Views
  • 2 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor III
  • 1 kudos

Hello @Areqio !Yes, you can use Azure event hubs through its Kafka compatible endpoint and not the azure-eventhubs-spark / azure-eventhub connector. JVM libraries are not allowed in LSDP and event hubs should be accessed through the built in Spark Ka...

  • 1 kudos
1 More Replies
HTD360
by New Contributor III
  • 157 Views
  • 3 replies
  • 4 kudos

Autoscaling with the autoloader without SDP

Hi there,I have a question regarding the autoloader without SDP and auto-scaling of clusters. I'm reading the following in the docs:Production considerations for Structured Streaming | Databricks on AWS:Do not enable autoscaling for compute for Struc...

  • 157 Views
  • 3 replies
  • 4 kudos
Latest Reply
HTD360
New Contributor III
  • 4 kudos

Hi, thank you for your answer. Could you elaborate a bit on this?for non SDP available now auto loader jobs autoscaling can be reasonableHow do you decide on whether it is reasonable or not? Especially you said it is not recommended to enable compute...

  • 4 kudos
2 More Replies
Abhishek_sinha
by New Contributor II
  • 110 Views
  • 2 replies
  • 3 kudos

Connecting DBeaver to Databricks Lakebase — Setup & Troubleshooting

I recently connected DBeaver to Databricks Lakebase and wanted to share the setup steps along with a couple of troubleshooting issues I encountered.Since Lakebase is PostgreSQL-compatible, the standard PostgreSQL driver works directly without requiri...

  • 110 Views
  • 2 replies
  • 3 kudos
Latest Reply
amirabedhiafi
New Contributor III
  • 3 kudos

Hello @Abhishek_sinha  ! Thanks for sharing this ! very useful  Few things I can add (from my personal XP), it is better to use the PostgreSQL driver and not the DBKS JDBC driver because Lakebase is PostgreSQL compatible so DBeaver should be configur...

  • 3 kudos
1 More Replies
dbr_data_engg
by New Contributor III
  • 2424 Views
  • 3 replies
  • 1 kudos

Using Databrick Bladebridge or Lakebridge for SQL Migration

Getting Transpile Error while executing command for Databrick Bladebridge or Lakebridge,databricks labs lakebridge transpile --source-dialect mssql --input-source "<Path>/sample.sql" --output-folder "<Path>\output"Error :TranspileError(code=FAILURE, ...

  • 2424 Views
  • 3 replies
  • 1 kudos
Latest Reply
Satyam4u
New Contributor
  • 1 kudos

Looks like some dependency/runtime issue with LakeBridge installation on Windows.pip uninstall databricks-labs-lakebridge -ypip install databricks-labs-lakebridgeAlso check Python version compatibility once. Python 3.10/3.11 worked better in my case.

  • 1 kudos
2 More Replies
shan-databricks
by Databricks Partner
  • 94 Views
  • 1 replies
  • 0 kudos

Lakeflow Connect Data ingestion from SQL Server and PostgreSQL to Databricks with CDC

We have a requirement to use Lakeflow Connect for data ingestion from SQL Server and PostgreSQL into Databricks with CDC and Lakehouse federation. I would like to understand the pros and cons of Lakeflow Connect in the following areas Firewall/gatewa...

Data Engineering
@Lakeflow Connect @Lakehouse Federation
  • 94 Views
  • 1 replies
  • 0 kudos
Latest Reply
ziafazal
Databricks Partner
  • 0 kudos

Hi @shan-databricks You should setup postgresql for ingestion via Lakeflow connect. Once your Postgres logical replication is ready you have to create ingestion pipelines which comprise a gateway and ingestion pipeline. Your gateway pipeline is conti...

  • 0 kudos
Labels