cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

asisaarav
by New Contributor
  • 829 Views
  • 1 replies
  • 0 kudos

Error : The spark driver has stopped unexpectedly and is restarting

Hi community,Getting an error in the code: Error : The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically restarted. Cancel you help here in understanding what methods we can use to get it fixed. I tried look...

  • 829 Views
  • 1 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 0 kudos

The error message indicates an issue with the Spark driver in your Databricks environment. This can be caused by various factors such as:Check Cluster Configuration: Ensure that your Databricks cluster has sufficient resources (CPU, memory) to handle...

  • 0 kudos
giladba
by New Contributor III
  • 8798 Views
  • 12 replies
  • 11 kudos

access to event_log TVF

Hi, According to the documentation:https://docs.databricks.com/en/delta-live-tables/observability.html"The event_log TVF can be called only by the pipeline owner and a view created over the event_log TVF can be queried only by the pipeline owner. The...

  • 8798 Views
  • 12 replies
  • 11 kudos
Latest Reply
larsbbb
New Contributor III
  • 11 kudos

@LakehouseGuy  @mkEngineer @hcjp @neha_ayodhya I just saw the following option in dlt pipelines! I haven't testing it yet, but it looks promising.This also looks new documentation:https://learn.microsoft.com/en-us/azure/databricks/dlt/observability#q...

  • 11 kudos
11 More Replies
sandeepmankikar
by Contributor
  • 947 Views
  • 1 replies
  • 0 kudos

Complex Embedded Workflows

Can complex embedded workflows be created using Databricks Bundle, where multiple workflows are interconnected in a parent-child format? If Databricks Bundle doesn't support this, what would be the best alternative for creating and deploying such wor...

  • 947 Views
  • 1 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Yup you can create , complex workflows as well in databricks bundles , some examples can beYou can have all of them defined from child to parent and can call those chirl workflows as workflow task in the parent workflows, all different kinds of tasks...

  • 0 kudos
ShivangiB
by New Contributor III
  • 1116 Views
  • 3 replies
  • 0 kudos

Liquid Clustering limitation clustering on write does not support source queries that include filter

I have a query :%sqlinsert into ucdata.brz.liquidcluster_table_data select sum(col1) as col1,col2,sum(col3) as col3 from ucdata.brz.liquidcluster_table_data  group by col2This query I am running with run time version 13.3 and it is still working. But...

  • 1116 Views
  • 3 replies
  • 0 kudos
Latest Reply
ShivangiB
New Contributor III
  • 0 kudos

Hey Team, can you please help on this

  • 0 kudos
2 More Replies
Datanoob123
by New Contributor II
  • 3760 Views
  • 6 replies
  • 1 kudos

Query to show column names in common between multiple tables

Hi all, I have a large amount of tables that I would like a query to pull the column names present in these tables that are common between all the tables.  I know about show columns, but can't seem to use this or another method to achieve this. This ...

Data Engineering
comparing tables
show columns
sql
  • 3760 Views
  • 6 replies
  • 1 kudos
Latest Reply
KaranamS
Contributor III
  • 1 kudos

Hi @Datanoob123 ,I agree with @Stefan-Koch! It could be that you don't have access to the system tables. Please reach out to your Databricks Admin to grant you required permissions to system tables. You can then use the query I shared to get the requ...

  • 1 kudos
5 More Replies
HarryRichard08
by New Contributor II
  • 1700 Views
  • 3 replies
  • 0 kudos

Unable to Access S3 from Serverless but Works on Cluster

Hi everyone,I am trying to access data from S3 using an access key and secret. When I run the code through Databricks clusters, it works fine. However, when I try to do the same from a serverless cluster , I am unable to access the data.I have alread...

  • 1700 Views
  • 3 replies
  • 0 kudos
Latest Reply
KaranamS
Contributor III
  • 0 kudos

Hi @HarryRichard08, Databricks recommends using instance profiles (IAM roles) to connect to AWS S3 as they provide a secure and scalable method without embedding credentials in a notebook. Have you tried this approach?https://docs.databricks.com/aws/...

  • 0 kudos
2 More Replies
HoussemBL
by New Contributor III
  • 1494 Views
  • 2 replies
  • 0 kudos

Databricks asset bundle deploys DLT pipelines as duplicate resources

Dear Community,I have a deployment issue after restructuring my project.Previously, our project was organized with the following structure:project/src/project/resources/project/databricks.ymlAs part of an optimization effort, we have transitioned to ...

  • 1494 Views
  • 2 replies
  • 0 kudos
Latest Reply
HoussemBL
New Contributor III
  • 0 kudos

Hi @ashraf1395 ,I am creating two separate databricks.yml for each sub-project.

  • 0 kudos
1 More Replies
kamilmuszynski
by New Contributor II
  • 11494 Views
  • 4 replies
  • 1 kudos

Asset Bundles - path is not contained in bundle root path

I'm trying to adopt a code base to use asset bundles. I was trying to come up with folder structure that will work for our bundles and came up with layout as below:common/ (source code)services/ (source code)dist/ (here artifacts from monorepo are bu...

Data Engineering
asset-bundles
  • 11494 Views
  • 4 replies
  • 1 kudos
Latest Reply
PabloCSD
Valued Contributor II
  • 1 kudos

When I have worked with Databricks Asset Bundles (DAB), I left the databricks.yaml file in the root, and just one databricks.yaml file.I also made a simple functional DAB project, the file system structure is like this, if it helps you:dab_test_repo/...

  • 1 kudos
3 More Replies
Ian_P
by New Contributor II
  • 9771 Views
  • 6 replies
  • 2 kudos

Databricks Unity Catalog Shared Mode Cluster Py4J Security Issue

Hi there, I am getting this error when trying to use Databricks Runtime 13.1, Shared Mode (We need unity catalog), multimode cluster (this works in single user mode, but we need shared mode): py4j.security.Py4JSecurityException: Method public java.la...

Ian_P_0-1690531566535.png
Data Engineering
Databricks
spark
Unity Catalog
  • 9771 Views
  • 6 replies
  • 2 kudos
Latest Reply
DB_Learner17
New Contributor II
  • 2 kudos

Hi, i too am working to create a job cluster in databricks workflows which should be unity catalog enabled.But, it works only for single-user mode and not shared,while the team where i work needs it as shared one.I too got the same error like as show...

  • 2 kudos
5 More Replies
minhhung0507
by Valued Contributor
  • 1598 Views
  • 2 replies
  • 2 kudos

How to setup alert and retry policy for specific pipeline?

Hi everyone,I’m running multiple real-time pipelines on Databricks using a single job that submit them via thread pool. Most of the pipelines work fine, but a few of them occasionally get stuck for several hours, causing data loss. The challenge is t...

  • 1598 Views
  • 2 replies
  • 2 kudos
Latest Reply
minhhung0507
Valued Contributor
  • 2 kudos

Hi @Brahmareddy ,Thanks a lot for your solution. We are currently using Databricks with GCP. We will try it and see if it solves our problem.Regards,

  • 2 kudos
1 More Replies
messiah
by New Contributor II
  • 7625 Views
  • 5 replies
  • 0 kudos

How to Create Iceberg Tables in Databricks Using Parquet Files from S3?

Hi Databricks Community,I’m trying to create Apache Iceberg tables in Databricks using Parquet files stored in an S3 bucket. I found a guide from Dremio, but I’m unable to create Iceberg tables using that method.Here’s what I need:Read Parquet files ...

  • 7625 Views
  • 5 replies
  • 0 kudos
Latest Reply
Raashid_Khan
New Contributor II
  • 0 kudos

How to create/insert in databricks tables for iceberg format? I have iceberg parquets in gcs and want to store them as iceberg tables in databricks catalogs.

  • 0 kudos
4 More Replies
MuesLee
by New Contributor
  • 2298 Views
  • 1 replies
  • 0 kudos

Merge rewrites many unmodified files

Hello. I want to do a merge on a subset of my delta table partitions to do incremental upserts to keep two tables in sync. I do not use a whenNotMatchedBySource statement to clean up stale rows in my target because of this GitHub IssueBecause of that...

  • 2298 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi MuesLee,How are you doing today?, as per my understanding, Yes, your understanding is mostly correct. The reason even unchanged partitions are being rewritten is likely because of how Delta Lake’s merge operation handles partition pruning and upda...

  • 0 kudos
code_vibe
by New Contributor
  • 1016 Views
  • 1 replies
  • 0 kudos

Delta lake federated table not working as expected

I’m facing an issue while working with federated Redshift tables in Databricks, and I’m hoping someone here can help me out.I have a source table(material) in Redshift that I’m querying through the Delta lake federation in Databricks. when I run the ...

  • 1016 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi Code_Vide,How are you doing today?, As per my understanding, It looks like the issue might be due to predicate pushdown not happening when querying the federated Redshift table in Databricks. Predicate pushdown helps filter data at the source (Red...

  • 0 kudos
Jorge3
by New Contributor III
  • 1598 Views
  • 1 replies
  • 0 kudos

Too many small files in the "landing area"

Hello everyone,I’m currently working on a setup where my unprocessed real-time data arrives as .json files in Azure Data Lake Storage (ADLS). Every x minutes, I use Databricks Autoloader to pick up the new data, run my ETL transformations, and store ...

  • 1598 Views
  • 1 replies
  • 0 kudos
Latest Reply
koji_kawamura
Databricks Employee
  • 0 kudos

Hi @Jorge3 Since you mentioned the "cloudFiles.useNotifications" option, I assume you know AutoLoader's File Detection Mode. It should be the best solution to your situation. Have you tried it already and encountered an issue? If so, please let us kn...

  • 0 kudos
Kayla
by Valued Contributor II
  • 1430 Views
  • 4 replies
  • 3 kudos

Unity Catalog "Sync" Question

I'm having a little trouble fully following the documentation on the SYNC command.I have a table in hive_metastore that still needs to be able to be updated daily for the next few months, but I also need to define a view in Unity Catalog based on tha...

  • 1430 Views
  • 4 replies
  • 3 kudos
Latest Reply
Nivethan_Venkat
Contributor III
  • 3 kudos

Hi @Kayla,SYNC command is to sync your hive EXTERNAL table to your Unity Catalog name space. If the table is external, the UC table will be in sync with your external location. If it is hive managed table, you can't use SYNC command to have your mana...

  • 3 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels