cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jar
by New Contributor III
  • 413 Views
  • 6 replies
  • 2 kudos

Resolved! Continuous workflow job creating new job clusters?

Hey.I am testing a continuous workflow job which executes the same notebook, so rather simple and it works well. It seems like it re-creates the job cluster for every iteration, instead of just re-using the one created at the first execution. Is that...

  • 413 Views
  • 6 replies
  • 2 kudos
Latest Reply
jar
New Contributor III
  • 2 kudos

Thank you all for your answers!I did use dbutils.notebook.run() inside a while-loop at first but ultimately would run into OOM errors, even if I tried writing in a clearing of cache after each iteration. I'm curious @RefactorDuncan, if you don't mind...

  • 2 kudos
5 More Replies
Mano99
by New Contributor II
  • 213 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks External table row maximum size

Hi Databricks Team/ Community,We have created a Databricks External table on top of ADLS Gen 2. Both parquet and delta tables. we are loading nested json structure into a table. Few column will have huge nested json data. Im getting results too large...

  • 213 Views
  • 2 replies
  • 0 kudos
Latest Reply
dennis65
New Contributor
  • 0 kudos

@Mano99 ktagwrote:Hi Databricks Team/ Community,We have created a Databricks External table on top of ADLS Gen 2. Both parquet and delta tables. we are loading nested json structure into a table. Few column will have huge nested json data. Im getting...

  • 0 kudos
1 More Replies
YuriS
by New Contributor II
  • 1430 Views
  • 2 replies
  • 0 kudos

VACUUM with Azure Storage Inventory Report is not working

Could someone please advise regarding VACUUM with Azure Storage Inventory Report as i have failed to make it work.DBR 15.4 LTS, VACUUM command is being run with USING INVENTORY clause, as follows:VACUUM schema.table USING INVENTORY ( select 'https://...

  • 1430 Views
  • 2 replies
  • 0 kudos
Latest Reply
YuriS
New Contributor II
  • 0 kudos

After additional investigation it turned out the proper "fully-qualified-URL" path should be'dbfs:/mnt/...''dbfs:/mnt/{endpoint}/' || ir.Name as path,and not 'https://xxx.blob.core.windows.net/' || ir.Name as path,

  • 0 kudos
1 More Replies
vziog
by New Contributor II
  • 328 Views
  • 2 replies
  • 0 kudos

Costs from cost managem azure portal are not allligned with costs calculated from usage system table

Hello,the costs regarding the databricks service from cost management in azure portal (45,869...) are not allligned with costs calculated from usage system table (75,34). The costs from the portal are filtered based on the desired period (usage_date ...

  • 328 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @vziog, The Azure portal typically aggregates costs from various billing categories (such as DBUs, infrastructure, storage, and networking) based on usage logs and pricing. On the other hand, the query designed by you extracts detailed cost estima...

  • 0 kudos
1 More Replies
petitregny
by New Contributor II
  • 184 Views
  • 3 replies
  • 0 kudos

Reading from an S3 bucket using boto3 on serverless cluster

Hello All,I am trying to read a CSV file from my S3 bucket in a notebook running on serverless.I am using the two standard functions below, but I get a credentials error (Error reading CSV from S3: Unable to locate credentials).I don't have this issu...

  • 184 Views
  • 3 replies
  • 0 kudos
Latest Reply
Isi
Contributor III
  • 0 kudos

Hi @petitregny ,The issue you’re encountering is likely due to the access mode of your cluster. Serverless compute uses standard/shared access mode, which does not allow you to directly access AWS credentials (such as the instance profile) in the sam...

  • 0 kudos
2 More Replies
smpa01
by New Contributor III
  • 154 Views
  • 1 replies
  • 0 kudos

Resolved! global temp view issue

I am following the doc1 and doc2 but I am getting an error.I was under the impression from the documentation that it is doable in pure sql. What am I doing wrong?I know how to do this in python using dataframe API and I am not looking for that soluti...

smpa01_0-1745253862295.png
  • 154 Views
  • 1 replies
  • 0 kudos
Latest Reply
smpa01
New Contributor III
  • 0 kudos

it is missing ;

  • 0 kudos
dbx_687_3__1b3Q
by New Contributor III
  • 4664 Views
  • 2 replies
  • 1 kudos

Databricks Asset Bundle (DAB) from existing workspace?

Can anyone point us to some documentation that explains how to create a DAB from an EXISTING workspace? We've been building pipelines, notebooks, tables, etc in a single workspace and a DAB seems like a great way to deploy it all to our Test and Prod...

  • 4664 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ezio
New Contributor
  • 1 kudos

https://docs.databricks.com/aws/en/dlt/convert-to-dabrefer this document to understand the approach to migrate code and workflows in DAB

  • 1 kudos
1 More Replies
HaripriyaP
by New Contributor II
  • 884 Views
  • 4 replies
  • 0 kudos

Multiple Notebooks Migration from one workspace to another without using Git.

Hi all!I need to migrate multiple notebooks from one workspace to another. Is there any way to do it without using Git?Since Manual Import and Export is difficult to do for multiple notebooks and folders, need an alternate solution.Please reply as so...

  • 884 Views
  • 4 replies
  • 0 kudos
Latest Reply
HaripriyaP
New Contributor II
  • 0 kudos

Thank you @aayrm5. I will check on it.

  • 0 kudos
3 More Replies
thackman
by New Contributor III
  • 351 Views
  • 2 replies
  • 0 kudos

How does liquid clustering handle high cardinality strings?

Liquid clustering on integers or dates seems intuitive. But it's less clear how it would decide to partion files when the key column is a high cardinality string. Does it try the first character of the string and then if that's not unique enough it g...

thackman_1-1744144297582.png thackman_2-1744144335807.png
  • 351 Views
  • 2 replies
  • 0 kudos
Latest Reply
emiliaswan
New Contributor
  • 0 kudos

I found this great breakdown of current typography trends that touches on immersive typography and where it's heading. If you're interested in how 3D and AR are evolving in the design world, especially in terms of functionality and aesthetics,

  • 0 kudos
1 More Replies
yopbibo
by Contributor II
  • 22153 Views
  • 5 replies
  • 4 kudos

How can I connect to an Azure SQL db from a Databricks notebook?

I know how to do it with spark, and read/write tables (like https://docs.microsoft.com/en-gb/azure/databricks/data/data-sources/sql-databases#python-example )But this time, I need to only update a field of a specific row in a table. I do not think I ...

  • 22153 Views
  • 5 replies
  • 4 kudos
Latest Reply
raopheefah
New Contributor
  • 4 kudos

Look at your compute configuration. Looks like this works perfectly on Dedicated (formerly: single user) or No isolation clusters, but not on Standard (formerly: Shared) ones.Maybe you need a disposable one-time job cluster with these settings.

  • 4 kudos
4 More Replies
Y2KEngineer
by New Contributor
  • 158 Views
  • 1 replies
  • 0 kudos

Query limiting to only 10000 rows

Hi I am query my Azure Databricks table using VB script/Simba Spark ODBC driver.While querying into the DB(lets say 'Select * from table_1') it is not returning any data. However while querying a limit (lets say 'Select TOP 10000 ID from table_1'), i...

Data Engineering
community
limitation in databricks
  • 158 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
New Contributor II
  • 0 kudos

Hi @Y2KEngineer This issue is likely related to row fetching limits or buffer sizes in the driver or system settings. You can try adjusting a couple of things in your ODBC connection string:Set RowsFetchedPerBlock=50000 to make sure it fetches all ro...

  • 0 kudos
vignesh22
by New Contributor
  • 109 Views
  • 2 replies
  • 0 kudos

Pipelines are expected to have at least one table Error While running DLT pipeline

Error :Pipelines are expected to have at least one table defined butno tables were found in your pipeline I wrote simple code as phase 1 debug%sql CREATE OR REFRESH STREAMING TABLE test_table AS SELECT "hello" as greeting; Can u plz help what's wrong...

  • 109 Views
  • 2 replies
  • 0 kudos
Latest Reply
aayrm5
Valued Contributor III
  • 0 kudos

Hey @vignesh22 - Adding to what @Takuya-Omi san has mentioned - the instantiation of streaming table in your definition is incorrect. You're trying to create a stream table using a batch source which will result in the DLT Analysis Exception as descr...

  • 0 kudos
1 More Replies
saikrishna1020
by New Contributor
  • 110 Views
  • 1 replies
  • 1 kudos

Community Edition Data recovery

I was using Databricks Community Edition for some practice work, and I had created a few notebooks as part of my learning. However, when I recently tried to log in, I received a message saying, "We were not able to find a Community Edition." Now, non...

  • 110 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 1 kudos

Hi saikrishna1020,How are you doing today? I totally understand how upsetting it can be to lose work you’ve put effort into. With Databricks Community Edition, unfortunately, inactivity for an extended period (usually 14–30 days) can cause the worksp...

  • 1 kudos
dplatform_user
by New Contributor
  • 84 Views
  • 1 replies
  • 0 kudos

INVALID_PARAMETER_VALUE.LOCATION_OVERLAP when trying to copy from s3 location

Hi,Currently we are getting an issue when we try to copy a file from s3 location using dbutils.fs.cp, please see example below:source = s3://test-bucket/external/zones/{database_name}/{table_name}/test.csvdestination = s3://test-bucket/external/desti...

  • 84 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi dplatform_user,How are you doing today?, As per my understanding, this error is actually a common one when working with external storage paths that overlap with Unity Catalog-managed locations. The error message is basically saying that your sourc...

  • 0 kudos
Mado
by Valued Contributor II
  • 14443 Views
  • 4 replies
  • 0 kudos

Resolved! How to enforce delta table column to have unique values?

Hi,I have defined a delta table with a primary key:%sql   CREATE TABLE IF NOT EXISTS test_table_pk ( table_name STRING NOT NULL, label STRING NOT NULL, table_location STRING NOT NULL,   CONSTRAINT test_table_pk_col PRIMARY KEY(table_name) ...

image
  • 14443 Views
  • 4 replies
  • 0 kudos
Latest Reply
SibbirSihan
New Contributor
  • 0 kudos

CREATE TABLE table_name (id_col1 BIGINT GENERATED ALWAYS AS IDENTITY,id_col2 BIGINT GENERATED ALWAYS AS IDENTITY (START WITH -1 INCREMENT BY 1),id_col3 BIGINT GENERATED BY DEFAULT AS IDENTITY,id_col4 BIGINT GENERATED BY DEFAULT AS IDENTITY (START WIT...

  • 0 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels