cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DanielW
by New Contributor III
  • 1245 Views
  • 12 replies
  • 3 kudos

Resolved! Databricks Rest api swagger definition not handling bigint or integer

I want to test create a custom connector in a Power App that connects to table in Databricks.  The issue is if I have any columns like int or bigint. No matter what I define in the response in my swagger definition See  below), it is not correct type...

DanielW_0-1747312458356.png DanielW_0-1747313218694.png
  • 1245 Views
  • 12 replies
  • 3 kudos
Latest Reply
DanielW
New Contributor III
  • 3 kudos

Hi @lingareddy_Alva This might warrant another post to keep the conversation focussed, but I found a couple of things with the custom connector that make it a bit cumbersome to use.1) I don't seem to be able to have two post operations under /statmen...

  • 3 kudos
11 More Replies
chexa_Wee
by New Contributor III
  • 821 Views
  • 2 replies
  • 2 kudos

How to Implement Incremental Loading in Azure Databricks for ETL

Hi everyone,I'm currently working on an ETL process using Azure Databricks (Standard Tier) where I load data from Azure SQL Database into Databricks. I run a notebook daily to extract, transform, and load the data for Power BI reports.Right now, the ...

  • 821 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

In case you do not want to use dlt (and there are reasons not to), you can also check the docs for autoloader and merge notebooksThese 2 do basically the same as dlt but without the extra cost and more control.  You have to write more code though.For...

  • 2 kudos
1 More Replies
Heman2
by Valued Contributor II
  • 18423 Views
  • 5 replies
  • 22 kudos

Resolved! How to export the output data in the Excel format into the dbfs location

Is there any way to export the ​output data in the Excel format into the dbfs?, I'm only able to do it in the CSV format

  • 18423 Views
  • 5 replies
  • 22 kudos
Latest Reply
haidereli
New Contributor II
  • 22 kudos

As shared above I tested it and worked fine for loading and updating and saving import openpyxlwb = openpyxl.load_workbook('Test.xlsx')ws = wb.activefor row in ws.iter_rows():print([col.value for col in row]) #show all dataws['A1']='Data'wb.save('Tes...

  • 22 kudos
4 More Replies
Einsatz
by New Contributor II
  • 1159 Views
  • 1 replies
  • 0 kudos

Dataframe getting updated after creating temporary view

I'm observing different behavior between Databricks Runtime versions when working with DataFrames and temporary views, and would appreciate any clarification.In both environments, I performed the following steps in a notebook (each connected to its o...

  • 1159 Views
  • 1 replies
  • 0 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 0 kudos

Hi @Einsatz, this is expected in DBR version 14.3 and above since we don't have a Spark context. This is happening due to the cache invalidation.  To resolve the issue, please go with the dynamic name for the view every time. 

  • 0 kudos
mstfkmlbsbdk
by New Contributor II
  • 309 Views
  • 1 replies
  • 0 kudos

Analyzing Serverless SQL Warehouse Cost Projection Using System Tables

Hello everyone,I'm working on analyzing cost projections for Serverless SQL Warehouses using system tables, and I’d like to share a visualization approach I’m using to highlight some key differences between classic and serverless SQL warehouses. (Loo...

Screenshot 2025-05-21 at 11.51.03.png
  • 309 Views
  • 1 replies
  • 0 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 0 kudos

Hi @mstfkmlbsbdk, great analysis.  However, as you mentioned, the classic warehouse is not terminating even though there are no active queries on it. The reason for that is while creating a classic warehouse, by default it takes 45 minutes to termina...

  • 0 kudos
laus
by New Contributor III
  • 9766 Views
  • 6 replies
  • 6 kudos

Resolved! How to sort widgets in a specific order?

I'd like to have a couple of widgets, one for the start and another for end date. I want them to appear in that order but when I run the code below, end date shows up before the start date. How can order in the way I I desired?dbutils.widgets.text("s...

  • 9766 Views
  • 6 replies
  • 6 kudos
Latest Reply
markok
New Contributor II
  • 6 kudos

Doing it manually is not optimal. It should be possible to do this automatically - by creation date or extra function to sort widgets.

  • 6 kudos
5 More Replies
chexa_Wee
by New Contributor III
  • 479 Views
  • 1 replies
  • 0 kudos

How to Implement Incremental Loading in Azure Databricks for ETL

Hi everyone,I'm currently working on an ETL process using Azure Databricks (Standard Tier) where I load data from Azure SQL Database into Databricks. I run a notebook daily to extract, transform, and load the data for Power BI reports.Right now, the ...

  • 479 Views
  • 1 replies
  • 0 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 0 kudos

Hi @chexa_Wee, you can leverage DLT feature to do so.  Please check: https://docs.databricks.com/aws/en/dlt/transform https://docs.databricks.com/aws/en/dlt/stateful-processing Here is the step-by-step tutorial: https://docs.databricks.com/aws/en/dlt...

  • 0 kudos
BF7
by Contributor
  • 453 Views
  • 1 replies
  • 1 kudos

Resolved! Migrating DLT tables from TEST to PROD catalogs

Why can't we just copy all the DLT tables and materialized views from one UC catalog to another to get the historical data in place and then run the DLT pipelines on those UC tables?We are migrating many very large tables from our TEST catalog to our...

Data Engineering
Delta Live Tables
Unity Catalog
  • 453 Views
  • 1 replies
  • 1 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 1 kudos

Hi @BF7, we do not support moving the streaming table yet. If we clone the DLT streaming table, it will get converted into a normal delta table instead of a streaming table. In that condition, we need to go with the "full refresh all" option and inde...

  • 1 kudos
eballinger
by Contributor
  • 1243 Views
  • 4 replies
  • 0 kudos

List all users groups and the actual users in them in sql

We have a bunch of cloud AD groups in Databricks and I can see what users are in each group by using the user interface Manage Account -> Users and groups -> GroupsI would like to be able to produce this full list in SQL. I have found the below code ...

  • 1243 Views
  • 4 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

Ok. Well, my last suggestion is to have a look at the SCIM Users API, and SCIM Groups API. You should be able to make the API calls right in the notebook. Cheers, Lou.

  • 0 kudos
3 More Replies
rajib76
by New Contributor II
  • 3117 Views
  • 2 replies
  • 2 kudos

Resolved! DBFS with Google Cloud Storage(GCS)

Does DBFS support GCS?

  • 3117 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Yes you need just to create service account for databricks and than assign storage admin role to bucket. After that you can mount GCS standard way:bucket_name = "<bucket-name>" mount_name = "<mount-name>" dbutils.fs.mount("gs://%s" % bucket_name, "/m...

  • 2 kudos
1 More Replies
ChristianRRL
by Valued Contributor II
  • 457 Views
  • 1 replies
  • 1 kudos

Resolved! File Arrival Trigger - Reduce Listing

Hi there, the file arrival trigger seems handy, but I have questions about the performance and cost implications of using it. Per file arrival trigger documentation:"File arrival triggers do not incur additional costs other than cloud provider costs ...

  • 457 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor II
  • 1 kudos

Hi @ChristianRRL Your Assumptions - Partially CorrectYou're correct about several key points:1. File listing overhead: Yes, the trigger does need to list files in the monitored location to detect new arrivals2. Cloud provider costs: Listing operation...

  • 1 kudos
kenmyers-8451
by New Contributor III
  • 737 Views
  • 1 replies
  • 0 kudos

Resolved! Is it possible to selectively overwrite data in a partitioned delta table with a sql warehouse

My team has a workflow that currently runs with databricks sql using a standalone cluster. We are trying to switch this job to using a sql warehouse but I keep getting errors. The current job runs in a for-each loop to break up the work into smaller ...

  • 737 Views
  • 1 replies
  • 0 kudos
Latest Reply
kenmyers-8451
New Contributor III
  • 0 kudos

Teammate helped me realize that `replace where` doesn't work with insert overwrite, but does work with insert into.

  • 0 kudos
help_needed_445
by New Contributor III
  • 712 Views
  • 2 replies
  • 2 kudos

Resolved! Notebook cell won't finish running or cancelling. Interrupt button greyed out.

A cell in a notebook that is using the %run magic command to run another notebook was running for what I considered too long and so I clicked the interrupt button and now the button is greyed out and giving a spinning circle.The interrupt button at t...

  • 712 Views
  • 2 replies
  • 2 kudos
Latest Reply
Advika
Databricks Employee
  • 2 kudos

Hello @help_needed_445! Are you still experiencing this issue? You can try restarting your cluster to force-stop any ongoing processes. If that doesn’t resolve the problem, detaching and reattaching the notebook to the cluster might help.

  • 2 kudos
1 More Replies
HariharaSam
by Contributor
  • 32570 Views
  • 9 replies
  • 4 kudos

Resolved! To get Number of rows inserted after performing an Insert operation into a table

Consider we have two tables A & B.qry = """INSERT INTO Table ASelect * from Table B where Id is null """spark.sql(qry)I need to get the number of records inserted after running this in databricks.

  • 32570 Views
  • 9 replies
  • 4 kudos
Latest Reply
GRCL
New Contributor III
  • 4 kudos

Almost same advice than Hubert, I use the history of the delta table :df_history.select(F.col('operationMetrics')).collect()[0].operationMetrics['numOutputRows']You can find also other 'operationMetrics' values, like 'numTargetRowsDeleted'.

  • 4 kudos
8 More Replies
pavel_merkle
by New Contributor II
  • 13327 Views
  • 6 replies
  • 0 kudos

Databrikcs SDK - create new job using JSON

Hello,I am trying to create a Job via Databricks SDK. As input, I use the JSON generated via Workflows UI (Worklflows->Jobs->View YAML/JSON->JSON API->Create) generating pavel_job.json. When trying to run SDK function jobs.create asdbk = WorkspaceCli...

  • 13327 Views
  • 6 replies
  • 0 kudos
Latest Reply
mike933
New Contributor II
  • 0 kudos

This is probably the easiest way to create job from JSON:import json from databricks.sdk import WorkspaceClient from databricks.sdk.service.jobs import CreateJob client = WorkspaceClient( host=WORKSPACE_DICT[WORKSPACE_NAME]["host_name"], token=...

  • 0 kudos
5 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels