cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ankitmit
by New Contributor III
  • 1718 Views
  • 5 replies
  • 0 kudos

How to specify path while creating tables using DLT

Hi All,I am trying to create table using DLT and would like to specify the path where all the files should reside.I am trying something like this:dlt.create_streaming_table( name="test", schema="""product_id STRING NOT NULL PRIMARY KEY, ...

Data Engineering
Databricks
dlt
Unity Catalog
  • 1718 Views
  • 5 replies
  • 0 kudos
Latest Reply
joma
New Contributor II
  • 0 kudos

tengo un inconveniente igual. no me gusta guardar con un nombre aleatorio dentro de __unitystorage java.lang.IllegalArgumentException: Cannot specify an explicit path for a table when using Unity Catalog. Remove the explicit path:

  • 0 kudos
4 More Replies
Sunflower7500
by New Contributor II
  • 3612 Views
  • 4 replies
  • 2 kudos

Databricks PySpark error: OutOfMemoryError: GC overhead limit exceeded

I have a Databricks pyspark query that has been running fine for the last two weeks but am now getting the following error despite no changes to the query: OutOfMemoryError: GC overhead limit exceeded.I have done some research on possible solutions a...

Sunflower7500_0-1738624317697.png
  • 3612 Views
  • 4 replies
  • 2 kudos
Latest Reply
loic
Contributor
  • 2 kudos

When you say: "I have a Databricks pyspark query that has been running fine for the last two weeks but am now getting the following error despite no changes to the query: OutOfMemoryError: GC overhead limit exceeded."Can you tell us how do you execut...

  • 2 kudos
3 More Replies
g96g
by New Contributor III
  • 818 Views
  • 2 replies
  • 0 kudos

Streaming with Medalion Architchture and star schema Help

What are the best practices for implementing non-stop streaming in a Medallion Architecture with a Star Schema?Use Case:We have operational data and need to enable near real-time reporting in Power BI, with a maximum latency of 3 minutes. No Delta li...

  • 818 Views
  • 2 replies
  • 0 kudos
Latest Reply
MadhuB
Valued Contributor
  • 0 kudos

@g96g I've setup a near real-time (30-minute latency) streaming solution that ingests data from SQL Server into Delta Lake.Changes in the source SQL Server tables are captured using Change Data Capture (CDC) and written to CSV files in a data lake.A ...

  • 0 kudos
1 More Replies
ac567
by New Contributor III
  • 2287 Views
  • 3 replies
  • 0 kudos

Resolved! com.databricks.backend.common.rpc.DriverStoppedException

com.databricks.backend.common.rpc.DriverStoppedException: Driver down cause: driver state change (exit code: 143)facing this cluster issue while i deploy and run my workflow through asset bundle. i have tried everything to update in spark configurati...

  • 2287 Views
  • 3 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Awesome, good to hear!

  • 0 kudos
2 More Replies
Kayla
by Valued Contributor II
  • 3219 Views
  • 14 replies
  • 6 kudos

New error: middleware.base:exception while intercepting server message

We started getting a very weird error at random from Databricks. This is from cells that routinely work, and after it happens once it will happen on every cell. It appears to be including full text of a .py file we're importing, that I've had to remo...

  • 3219 Views
  • 14 replies
  • 6 kudos
Latest Reply
Kayla
Valued Contributor II
  • 6 kudos

@TKr Hey everybody - sorry that you experienced these issues. We identified the issue and reverted the feature causing it. Things should be back to normal already.I'm glad to hear that. Are you a Databricks employee?Referring to your question, we did...

  • 6 kudos
13 More Replies
cdn_yyz_yul
by New Contributor II
  • 1495 Views
  • 3 replies
  • 4 kudos

Resolved! Shoud data in Raw /Bronze be in Catalog?

Hello,What are the benefits of not "registering" Raw data into Unity Catalog when the data in Raw will be in its original format, such as .csv, .json, .parquet, etc?An example scenario could be:Data arrives at Landing as .zip; The zip will be verifie...

  • 1495 Views
  • 3 replies
  • 4 kudos
Latest Reply
cdn_yyz_yul
New Contributor II
  • 4 kudos

Thanks @Rjdudley I meant to say, the scenario is:Data arrives at Landing as .zip;   The zip will be verified for correctness, and then unzipped, the extracted files will be saved to Raw as-is, in a pre-defined folder structure. Unity Catalog will not...

  • 4 kudos
2 More Replies
Sega2
by New Contributor III
  • 4196 Views
  • 1 replies
  • 0 kudos

Adding a message to azure service bus

I am trying to send a message to a service bus in azure. But I get following error:ServiceBusError: Handler failed: DefaultAzureCredential failed to retrieve a token from the included credentials.This is the line that fails: credential = DefaultAzure...

  • 4196 Views
  • 1 replies
  • 0 kudos
Latest Reply
Panda
Valued Contributor
  • 0 kudos

@Sega2 - Can you explicitly useClientSecretCredential and try

  • 0 kudos
noorbasha534
by Valued Contributor II
  • 2776 Views
  • 1 replies
  • 0 kudos

Spot instances usage in Azure Databricks

Hi all,as per the below article -https://community.databricks.com/t5/technical-blog/optimize-costs-for-your-data-and-ai-workloads-with-azure-and-aws/ba-p/662411. it is possible to choose the number of spot instances using 'availability' parameter. Bu...

  • 2776 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @noorbasha534, Thanks for your question! 1. 'availability' Parameter: The 'availability' parameter in Azure Databricks controls whether the compute uses on-demand or spot instances. The values for this parameter are: ON_DEMAND_AZURE: This value...

  • 0 kudos
Shivaprasad
by New Contributor III
  • 2538 Views
  • 11 replies
  • 3 kudos

Accessing delta tables using API outside azure (Workiva)

I need to access delta tables with API outside azure using in a reporting tool workiva with using the connector. Can someone able to provide the details on how I can achieve it

  • 2538 Views
  • 11 replies
  • 3 kudos
Latest Reply
jack533
New Contributor III
  • 3 kudos

 I need to retrieve delta tables using an API outside of Azure without requiring a connection using a reporting tool named Workiva. I would appreciate it if someone could tell me exactly how to do that.

  • 3 kudos
10 More Replies
negrinij
by New Contributor II
  • 37876 Views
  • 4 replies
  • 2 kudos

Understanding Used Memory in Databricks Cluster

Hello, I wonder if anyone could give me any insights regarding used memory and how could I change my code to "release" some memory as the code runs. I am using a Databricks Notebook.Basically, what we need to do is perform a query, create a spark sql...

image.png image
  • 37876 Views
  • 4 replies
  • 2 kudos
Latest Reply
loic
Contributor
  • 2 kudos

I have exactly the same kind of problem.I really do not understand why my driver goes out of memory meanwhile I do not cache anything in Spark.Since I don't cache anything, I expect references to objects that are not used anymore to be freed.Even a s...

  • 2 kudos
3 More Replies
asingamaneni
by New Contributor II
  • 3015 Views
  • 1 replies
  • 2 kudos

Brickflow - An opinionated python framework to help build and deploy Databricks workflows at scale

Hello, Databricks Community, I’m thrilled to introduce “Brickflow,” our innovative solution designed to streamline the development and deployment of workflows at scale, leveraging straightforward Python coding. This tool has already achieved a signif...

  • 3015 Views
  • 1 replies
  • 2 kudos
Latest Reply
matthiasg
New Contributor II
  • 2 kudos

Brickflow is indeed an awesome tool to work w/ Databricks!

  • 2 kudos
kasiviss42
by New Contributor III
  • 643 Views
  • 2 replies
  • 0 kudos

Query related to Z ordering

I have join on two large tables.If i apply Z ordering on 3 columns for both the tables ,I am joining two tables on the basis on the same 3 columns used for Z ordering ,Will i get any benefit of Z ordering on performance when i use Joins here .So as p...

  • 643 Views
  • 2 replies
  • 0 kudos
Latest Reply
kasiviss42
New Contributor III
  • 0 kudos

I have asked this query because w.r.t Predicate query push down to storage .So here as part of joins . Data needs to be loaded into memory first and then join is performed . So How does Z order help here if it can't skip the data being fetched from s...

  • 0 kudos
1 More Replies
kumarPatra_07
by New Contributor
  • 3895 Views
  • 1 replies
  • 0 kudos

Resolved! getting short of error while mount to storage account.

while mount to the storage account using this below code dbutils.fs.mount(  source=f"wasbs://{cointainer_name}@{storage_name}.blob.core.windows.net",  mount_point=f"/mnt/{cointainer_name}",  extra_configs={f"fs.azure.account.key.{storage_name}.blob.c...

  • 3895 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @kumarPatra_07 , Greetings! From the above code which you have shared, I could see that you are using a WASBS driver to mount the storage and as of now, WASB is already deprecated.  Reference document :  https://learn.microsoft.com/en-us/azure/dat...

  • 0 kudos
ksenija
by Contributor
  • 3352 Views
  • 1 replies
  • 0 kudos

Log data from reports in PowerBI

Where to find log data from PowerBI? I need to find what tables are being used in my PowerBI reports that are pointing to Databricks. I tried system.access.audit but I'm not finding new data when I refresh my report

  • 3352 Views
  • 1 replies
  • 0 kudos
Latest Reply
Allia
Databricks Employee
  • 0 kudos

@ksenija   To enable ODBC logging in PowerBI, go to C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Simba Spark ODBC Driver folder and create or edit the file microsoft.sparkodbc.ini and update as below:   [Driver] LogLevel=6 LogPath=<...

  • 0 kudos
shahabm
by New Contributor III
  • 13938 Views
  • 5 replies
  • 2 kudos

Resolved! Databricks job keep getting failed due to GC issue

There is a job that running successful but it's for more than a month we are experiencing long run which gets failed. In the stdout log file(attached), there are numerous following messages:[GC (Allocation Failure) [PSYoungGen:...]    and   [Full GC ...

  • 13938 Views
  • 5 replies
  • 2 kudos
Latest Reply
siddhu30
New Contributor II
  • 2 kudos

Thanks a lot @shahabm for your prompt response, appreciate it. I'll try to debug in this direction.Thanks again!

  • 2 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels