cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

code_vibe
by New Contributor
  • 705 Views
  • 1 replies
  • 0 kudos

Delta lake federated table not working as expected

I’m facing an issue while working with federated Redshift tables in Databricks, and I’m hoping someone here can help me out.I have a source table(material) in Redshift that I’m querying through the Delta lake federation in Databricks. when I run the ...

  • 705 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi Code_Vide,How are you doing today?, As per my understanding, It looks like the issue might be due to predicate pushdown not happening when querying the federated Redshift table in Databricks. Predicate pushdown helps filter data at the source (Red...

  • 0 kudos
Jorge3
by New Contributor III
  • 1023 Views
  • 1 replies
  • 0 kudos

Too many small files in the "landing area"

Hello everyone,I’m currently working on a setup where my unprocessed real-time data arrives as .json files in Azure Data Lake Storage (ADLS). Every x minutes, I use Databricks Autoloader to pick up the new data, run my ETL transformations, and store ...

  • 1023 Views
  • 1 replies
  • 0 kudos
Latest Reply
koji_kawamura
Databricks Employee
  • 0 kudos

Hi @Jorge3 Since you mentioned the "cloudFiles.useNotifications" option, I assume you know AutoLoader's File Detection Mode. It should be the best solution to your situation. Have you tried it already and encountered an issue? If so, please let us kn...

  • 0 kudos
Kayla
by Valued Contributor II
  • 1013 Views
  • 4 replies
  • 3 kudos

Unity Catalog "Sync" Question

I'm having a little trouble fully following the documentation on the SYNC command.I have a table in hive_metastore that still needs to be able to be updated daily for the next few months, but I also need to define a view in Unity Catalog based on tha...

  • 1013 Views
  • 4 replies
  • 3 kudos
Latest Reply
Nivethan_Venkat
Contributor III
  • 3 kudos

Hi @Kayla,SYNC command is to sync your hive EXTERNAL table to your Unity Catalog name space. If the table is external, the UC table will be in sync with your external location. If it is hive managed table, you can't use SYNC command to have your mana...

  • 3 kudos
3 More Replies
the_dude
by New Contributor II
  • 2572 Views
  • 1 replies
  • 0 kudos

Impossibility to have multiple versions of the same Python package installed

Hello, We package our Spark jobs + utilities in a custom package to be used in wheel tasks in Databricks. In my opinion, having several versions of this job (say "production" and "dev") run on the same cluster against different versions of this custo...

  • 2572 Views
  • 1 replies
  • 0 kudos
Latest Reply
the_dude
New Contributor II
  • 0 kudos

If someone comes across this post - as per documentation, library/package installation can be Notebook-scoped. Thus, in order to overcome the limitation described in the initial post instead we are experimenting with Notebook tasks whose only respons...

  • 0 kudos
Phani1
by Valued Contributor II
  • 553 Views
  • 1 replies
  • 0 kudos

Reading Multiple Data Formats

 Hi All, I'm looking to develop generic code that can read multiple data formats, such as Parquet, Delta, Iceberg and save it as delta. Can you provide some insights or guidance on how to achieve this?Regards,Phani

  • 553 Views
  • 1 replies
  • 0 kudos
Latest Reply
Erika_Fonseca
Databricks Employee
  • 0 kudos

Take a look at these 2 projects that follow a metadata-driven approach: Lakehouse Engine DLT Meta

  • 0 kudos
zmsoft
by Contributor
  • 785 Views
  • 1 replies
  • 0 kudos

How to copy file from UC volume to external location folder

Hi there, How to copy file from UC volume to external location folder Thanks&Regards, zmsoft

  • 785 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika_
Databricks Employee
  • 0 kudos

Hello @zmsoft! To copy a file from a UC volume to an external location, you can use:   dbutils.fs.cp( "UC_volume_path", "external_location_path" )   Ensure the external location is preconfigured in Unity Catalog and you have the necessary permission...

  • 0 kudos
Eduard
by New Contributor II
  • 117638 Views
  • 3 replies
  • 1 kudos

Cluster xxxxxxx was terminated during the run.

Hello,I have a problem with the autoscaling of a cluster. Every time the autoscaling is activated I get this error. Does anyone have any idea why this could be?"Cluster xxxxxxx was terminated during the run (cluster state message: Lost communication ...

  • 117638 Views
  • 3 replies
  • 1 kudos
Latest Reply
louisgarza
New Contributor II
  • 1 kudos

Hello Databricks Community,The error message indicates that the driver node was lost, which can happen due to network issues or malfunctioning instances. Here are a few possible reasons and solutions:Instance Instability: If your cloud provider has u...

  • 1 kudos
2 More Replies
Fatimah-Tariq
by New Contributor III
  • 841 Views
  • 4 replies
  • 0 kudos

Schema update Issue in DLT

I have a pipeline in databricks with this flowSQL SERVER (Source) -> Staging (Parquet) -> Bronze (DLT) -> Silver(DLT) -> Gold (DLT)The pipeline is up and running smoothly for months but recently, there was a schema update at my source level and one o...

  • 841 Views
  • 4 replies
  • 0 kudos
Latest Reply
Fatimah-Tariq
New Contributor III
  • 0 kudos

Hi @Alberto_Umana, is there any word on how to fix my data and bring all the records back to the pipeline schema?

  • 0 kudos
3 More Replies
cpayne_vax
by New Contributor III
  • 24363 Views
  • 15 replies
  • 9 kudos

Resolved! Delta Live Tables: dynamic schema

Does anyone know if there's a way to specify an alternate Unity schema in a DLT workflow using the @Dlt.table syntax? In my case, I’m looping through folders in Azure datalake storage to ingest data. I’d like those folders to get created in different...

  • 24363 Views
  • 15 replies
  • 9 kudos
Latest Reply
abhishek_02
New Contributor II
  • 9 kudos

Hi @kuldeep-in, Could you please provide the exact location how to disable DPM enabled option as i was not able to locate it in pipeline settings or Databricks settings.Thank you

  • 9 kudos
14 More Replies
zmsoft
by Contributor
  • 665 Views
  • 1 replies
  • 0 kudos

Resolved! How do I use the azure databricks dlt pipeline to consume azure Event Center data

Hi there, How do I use the azure databricks dlt pipeline to consume azure Event Hub dataCode :TOPIC = "myeventhub" KAFKA_BROKER = "" GROUP_ID = "group_dev" raw_kafka_events = (spark.readStream .format("kafka") .option("subscribe", EH_NAME) .opt...

  • 665 Views
  • 1 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hi there @zmsoft ,Did you have a look at this ref doc : https://docs.databricks.com/aws/en/dlt/event-hubsThis might help

  • 0 kudos
Brianben
by New Contributor III
  • 547 Views
  • 1 replies
  • 0 kudos

Getting Errors when reading data from Excel InternalError: pip is not installed for /local_disk

Hi all,We have a daily Databricks job that downloads excel files from SharePoint and read them, the job works fine until today (3March). We are getting the following error message when running the code to read the excel file:org.apache.spark.SparkExc...

  • 547 Views
  • 1 replies
  • 0 kudos
Latest Reply
Renu_
Valued Contributor II
  • 0 kudos

I think the issue comes from installing Office365-REST-Python-Client using dbutils.library.installPyPI, which seems to create a conflicting Python environment for Spark executors. Since notebook specific installs modify the environment dynamically, t...

  • 0 kudos
xx123
by New Contributor III
  • 360 Views
  • 1 replies
  • 0 kudos

ETL Pipeline work fine, but when executed via Workflow it fails due to StorageAccessError

I have a fairly simple ETL Pipeline that uses dlt. It streams data from ADLS2 SA and creates materialized view using two tables. It works fine when i execute it on its own. Materialized view is properly refreshed.Now I wanted to add this as a task to...

xx123_0-1741285604974.png xx123_1-1741285627523.png
  • 360 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nivethan_Venkat
Contributor III
  • 0 kudos

Hi @xx123,Could you please provide more details on the cluster configuration?. I am guessing the cluster policy you might be using for deploying the job in workflow / and when you are testing might be different. Please try to use same cluster policy ...

  • 0 kudos
Nalapriya
by New Contributor II
  • 1202 Views
  • 3 replies
  • 0 kudos

I've data in s3/Iceberg tables. How to read it using databricks SparkSQL ?

I tried this method: df = spark.read.format("iceberg").load("s3-bucket-path")But got an error: Multiple sources found for iceberg (com.databricks.sql.transaction.tahoe.uniform.sources.IcebergBrowseOnlyDataSource, org.apache.iceberg.spark.source.Icebe...

  • 1202 Views
  • 3 replies
  • 0 kudos
Latest Reply
Nalapriya
New Contributor II
  • 0 kudos

Hi @Alberto_Umana, I tried the steps you've provided but still I'm not able to read data which is in iceberg format. It would be useful if I get any other suggestions.

  • 0 kudos
2 More Replies
Srujanm01
by New Contributor III
  • 3069 Views
  • 1 replies
  • 0 kudos

Databricks Managed RG Storage cost is High

Hi Community,How to calculate the databricks storage cost and where to see the data which is stored and charged in databricks.I'm trying to understand the storage cost on a managed resource group and i'm clueless about the data and where it is stored...

  • 3069 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi,How are you doing today? To understand Databricks storage costs in Azure, you can check where your data is stored and how it’s being charged. Managed tables, DBFS files, and Unity Catalog volumes are usually stored in an Azure Data Lake Storage (A...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels