cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Ian_P
by New Contributor II
  • 11021 Views
  • 6 replies
  • 2 kudos

Databricks Unity Catalog Shared Mode Cluster Py4J Security Issue

Hi there, I am getting this error when trying to use Databricks Runtime 13.1, Shared Mode (We need unity catalog), multimode cluster (this works in single user mode, but we need shared mode): py4j.security.Py4JSecurityException: Method public java.la...

Ian_P_0-1690531566535.png
Data Engineering
Databricks
spark
Unity Catalog
  • 11021 Views
  • 6 replies
  • 2 kudos
Latest Reply
DB_Learner17
New Contributor II
  • 2 kudos

Hi, i too am working to create a job cluster in databricks workflows which should be unity catalog enabled.But, it works only for single-user mode and not shared,while the team where i work needs it as shared one.I too got the same error like as show...

  • 2 kudos
5 More Replies
minhhung0507
by Valued Contributor
  • 2322 Views
  • 2 replies
  • 2 kudos

How to setup alert and retry policy for specific pipeline?

Hi everyone,I’m running multiple real-time pipelines on Databricks using a single job that submit them via thread pool. Most of the pipelines work fine, but a few of them occasionally get stuck for several hours, causing data loss. The challenge is t...

  • 2322 Views
  • 2 replies
  • 2 kudos
Latest Reply
minhhung0507
Valued Contributor
  • 2 kudos

Hi @Brahmareddy ,Thanks a lot for your solution. We are currently using Databricks with GCP. We will try it and see if it solves our problem.Regards,

  • 2 kudos
1 More Replies
messiah
by Databricks Partner
  • 10397 Views
  • 5 replies
  • 0 kudos

How to Create Iceberg Tables in Databricks Using Parquet Files from S3?

Hi Databricks Community,I’m trying to create Apache Iceberg tables in Databricks using Parquet files stored in an S3 bucket. I found a guide from Dremio, but I’m unable to create Iceberg tables using that method.Here’s what I need:Read Parquet files ...

  • 10397 Views
  • 5 replies
  • 0 kudos
Latest Reply
Raashid_Khan
New Contributor II
  • 0 kudos

How to create/insert in databricks tables for iceberg format? I have iceberg parquets in gcs and want to store them as iceberg tables in databricks catalogs.

  • 0 kudos
4 More Replies
MuesLee
by New Contributor
  • 3783 Views
  • 1 replies
  • 0 kudos

Merge rewrites many unmodified files

Hello. I want to do a merge on a subset of my delta table partitions to do incremental upserts to keep two tables in sync. I do not use a whenNotMatchedBySource statement to clean up stale rows in my target because of this GitHub IssueBecause of that...

  • 3783 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi MuesLee,How are you doing today?, as per my understanding, Yes, your understanding is mostly correct. The reason even unchanged partitions are being rewritten is likely because of how Delta Lake’s merge operation handles partition pruning and upda...

  • 0 kudos
code_vibe
by New Contributor
  • 1381 Views
  • 1 replies
  • 0 kudos

Delta lake federated table not working as expected

I’m facing an issue while working with federated Redshift tables in Databricks, and I’m hoping someone here can help me out.I have a source table(material) in Redshift that I’m querying through the Delta lake federation in Databricks. when I run the ...

  • 1381 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi Code_Vide,How are you doing today?, As per my understanding, It looks like the issue might be due to predicate pushdown not happening when querying the federated Redshift table in Databricks. Predicate pushdown helps filter data at the source (Red...

  • 0 kudos
Jorge3
by New Contributor III
  • 2450 Views
  • 1 replies
  • 0 kudos

Too many small files in the "landing area"

Hello everyone,I’m currently working on a setup where my unprocessed real-time data arrives as .json files in Azure Data Lake Storage (ADLS). Every x minutes, I use Databricks Autoloader to pick up the new data, run my ETL transformations, and store ...

  • 2450 Views
  • 1 replies
  • 0 kudos
Latest Reply
koji_kawamura
Databricks Employee
  • 0 kudos

Hi @Jorge3 Since you mentioned the "cloudFiles.useNotifications" option, I assume you know AutoLoader's File Detection Mode. It should be the best solution to your situation. Have you tried it already and encountered an issue? If so, please let us kn...

  • 0 kudos
Kayla
by Valued Contributor II
  • 2117 Views
  • 4 replies
  • 3 kudos

Unity Catalog "Sync" Question

I'm having a little trouble fully following the documentation on the SYNC command.I have a table in hive_metastore that still needs to be able to be updated daily for the next few months, but I also need to define a view in Unity Catalog based on tha...

  • 2117 Views
  • 4 replies
  • 3 kudos
Latest Reply
Nivethan_Venkat
Databricks MVP
  • 3 kudos

Hi @Kayla,SYNC command is to sync your hive EXTERNAL table to your Unity Catalog name space. If the table is external, the UC table will be in sync with your external location. If it is hive managed table, you can't use SYNC command to have your mana...

  • 3 kudos
3 More Replies
the_dude
by New Contributor II
  • 3592 Views
  • 1 replies
  • 0 kudos

Impossibility to have multiple versions of the same Python package installed

Hello, We package our Spark jobs + utilities in a custom package to be used in wheel tasks in Databricks. In my opinion, having several versions of this job (say "production" and "dev") run on the same cluster against different versions of this custo...

  • 3592 Views
  • 1 replies
  • 0 kudos
Latest Reply
the_dude
New Contributor II
  • 0 kudos

If someone comes across this post - as per documentation, library/package installation can be Notebook-scoped. Thus, in order to overcome the limitation described in the initial post instead we are experimenting with Notebook tasks whose only respons...

  • 0 kudos
Phani1
by Databricks MVP
  • 871 Views
  • 1 replies
  • 0 kudos

Reading Multiple Data Formats

 Hi All, I'm looking to develop generic code that can read multiple data formats, such as Parquet, Delta, Iceberg and save it as delta. Can you provide some insights or guidance on how to achieve this?Regards,Phani

  • 871 Views
  • 1 replies
  • 0 kudos
Latest Reply
Erika_Fonseca
Databricks Employee
  • 0 kudos

Take a look at these 2 projects that follow a metadata-driven approach: Lakehouse Engine DLT Meta

  • 0 kudos
zmsoft
by Contributor
  • 1274 Views
  • 1 replies
  • 0 kudos

How to copy file from UC volume to external location folder

Hi there, How to copy file from UC volume to external location folder Thanks&Regards, zmsoft

  • 1274 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika_
Databricks Employee
  • 0 kudos

Hello @zmsoft! To copy a file from a UC volume to an external location, you can use:   dbutils.fs.cp( "UC_volume_path", "external_location_path" )   Ensure the external location is preconfigured in Unity Catalog and you have the necessary permission...

  • 0 kudos
Fatimah-Tariq
by New Contributor III
  • 1791 Views
  • 4 replies
  • 0 kudos

Schema update Issue in DLT

I have a pipeline in databricks with this flowSQL SERVER (Source) -> Staging (Parquet) -> Bronze (DLT) -> Silver(DLT) -> Gold (DLT)The pipeline is up and running smoothly for months but recently, there was a schema update at my source level and one o...

  • 1791 Views
  • 4 replies
  • 0 kudos
Latest Reply
Fatimah-Tariq
New Contributor III
  • 0 kudos

Hi @Alberto_Umana, is there any word on how to fix my data and bring all the records back to the pipeline schema?

  • 0 kudos
3 More Replies
zmsoft
by Contributor
  • 1552 Views
  • 1 replies
  • 0 kudos

Resolved! How do I use the azure databricks dlt pipeline to consume azure Event Center data

Hi there, How do I use the azure databricks dlt pipeline to consume azure Event Hub dataCode :TOPIC = "myeventhub" KAFKA_BROKER = "" GROUP_ID = "group_dev" raw_kafka_events = (spark.readStream .format("kafka") .option("subscribe", EH_NAME) .opt...

  • 1552 Views
  • 1 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hi there @zmsoft ,Did you have a look at this ref doc : https://docs.databricks.com/aws/en/dlt/event-hubsThis might help

  • 0 kudos
Brianben
by New Contributor III
  • 962 Views
  • 1 replies
  • 0 kudos

Getting Errors when reading data from Excel InternalError: pip is not installed for /local_disk

Hi all,We have a daily Databricks job that downloads excel files from SharePoint and read them, the job works fine until today (3March). We are getting the following error message when running the code to read the excel file:org.apache.spark.SparkExc...

  • 962 Views
  • 1 replies
  • 0 kudos
Latest Reply
Renu_
Valued Contributor II
  • 0 kudos

I think the issue comes from installing Office365-REST-Python-Client using dbutils.library.installPyPI, which seems to create a conflicting Python environment for Spark executors. Since notebook specific installs modify the environment dynamically, t...

  • 0 kudos
xx123
by New Contributor III
  • 881 Views
  • 1 replies
  • 0 kudos

ETL Pipeline work fine, but when executed via Workflow it fails due to StorageAccessError

I have a fairly simple ETL Pipeline that uses dlt. It streams data from ADLS2 SA and creates materialized view using two tables. It works fine when i execute it on its own. Materialized view is properly refreshed.Now I wanted to add this as a task to...

xx123_0-1741285604974.png xx123_1-1741285627523.png
  • 881 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nivethan_Venkat
Databricks MVP
  • 0 kudos

Hi @xx123,Could you please provide more details on the cluster configuration?. I am guessing the cluster policy you might be using for deploying the job in workflow / and when you are testing might be different. Please try to use same cluster policy ...

  • 0 kudos
Nalapriya
by Databricks Partner
  • 2439 Views
  • 3 replies
  • 0 kudos

I've data in s3/Iceberg tables. How to read it using databricks SparkSQL ?

I tried this method: df = spark.read.format("iceberg").load("s3-bucket-path")But got an error: Multiple sources found for iceberg (com.databricks.sql.transaction.tahoe.uniform.sources.IcebergBrowseOnlyDataSource, org.apache.iceberg.spark.source.Icebe...

  • 2439 Views
  • 3 replies
  • 0 kudos
Latest Reply
Nalapriya
Databricks Partner
  • 0 kudos

Hi @Alberto_Umana, I tried the steps you've provided but still I'm not able to read data which is in iceberg format. It would be useful if I get any other suggestions.

  • 0 kudos
2 More Replies
Labels