cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Jorge3
by New Contributor III
  • 6670 Views
  • 3 replies
  • 1 kudos

Dynamic partition overwrite with Streaming Data

Hi,I'm working on a job that propagate updates of data from a delta table to a parquet files (requirement of the consumer). The data is partitioned by day (year > month > day) and the daily data is updated every hour. I'm using table read streaming w...

  • 6670 Views
  • 3 replies
  • 1 kudos
Latest Reply
JacintoArias
New Contributor III
  • 1 kudos

We had a similar situation, @Hubert-Dudek we are using delta, but we are having some problems when propagating updates via merge, as you cannot read the resulting table as streaming source anymore... so using complete overwrite over parquet partition...

  • 1 kudos
2 More Replies
dbengineer516
by New Contributor III
  • 4945 Views
  • 1 replies
  • 0 kudos

Resolved! IOStream.flush Timed Out

Hello,I'm encountering an issue with a Python script/notebook that I have developed and used in a daily job ran in Databricks. It has worked perfectly fine for months, but now continues to fail constantly. After digging a little deeper, when running ...

  • 4945 Views
  • 1 replies
  • 0 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 0 kudos

Hello @dbengineer516  From my research it looks to be an IPython cache error. Maybe your python REPL is getting throttled due to too many requests. Please check: https://github.com/ipython/ipykernel/issues/334 This comment seems to be a possible solu...

  • 0 kudos
Philospher1425
by New Contributor II
  • 5574 Views
  • 4 replies
  • 2 kudos

Rename the file in Databricks is so hard.How to make it simpler

Hi Community  Actually my requirement is simple , I need to drop the files into Azure data Lake gen 2 storage from Databricks. But When I use df.coalesce(1).write.csv("url to gen 2/stage/) It's creating part .CSV file . But I need to rename to a cust...

  • 5574 Views
  • 4 replies
  • 2 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 2 kudos

Hi @Philospher1425,   Allow me to clarify that dbutils.fs serves as an interface to submit commands to your cloud provider storage. As such, the speed of copy operations is determined by the cloud provider and is beyond Databricks' control.   That be...

  • 2 kudos
3 More Replies
Lizhi_Dong
by New Contributor II
  • 2699 Views
  • 6 replies
  • 1 kudos

Tables disappear when I re-start a new cluster on Community Edition

What would be the best plan for independent course creator?Hi folks! I want to use databrick community edition as the platform to teach online courses. As you may know, for community edition, you need to create a new cluster when the old one terminat...

  • 2699 Views
  • 6 replies
  • 1 kudos
Latest Reply
Shivanshu_
Contributor
  • 1 kudos

I believe only the metadata get's removed from HMS not the delta files from dbfs. Instead of loading the data again and again try using ctas with that dbfs location.

  • 1 kudos
5 More Replies
JonM
by New Contributor
  • 628 Views
  • 0 replies
  • 0 kudos

Information_schema appears empty

Hi,We've encountered a problem with the information schema for one of our catalogs. For context: we're using dbt to implement our logic. We noticed this issue because dbt queries the information_schema.tables view to check which tables should be drop...

  • 628 Views
  • 0 replies
  • 0 kudos
Pratibha
by New Contributor II
  • 1026 Views
  • 2 replies
  • 0 kudos

which cluster/worker/driver type is best for analytics work?

which cluster/worker/driver type is best for analytics work?

  • 1026 Views
  • 2 replies
  • 0 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 0 kudos

Analytics work as in querying and analyzing data? Preferably using Databricks SQL? If so, then a SQL Warehouse is your best friend. 

  • 0 kudos
1 More Replies
kiko_roy
by Contributor
  • 2019 Views
  • 5 replies
  • 2 kudos

Resolved! Delta sharing open protocol in Unity catalog: FileNotFoundError

Hi TeamI have created a recipient under delta sharing (azure databricks) . Unity catalog is enabled and data is stored in ADLS gen2. I have downloaded the credential file and trying to resue in my python script (as per databricks documentation) for a...

  • 2019 Views
  • 5 replies
  • 2 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 2 kudos

I wasn't able to reproduce your issue. Is your delta table operable? can you see sample data from within databricks and query the table from within databricks? It almost looks like some parquet files are missing, causing your delta not queryable anym...

  • 2 kudos
4 More Replies
medha
by New Contributor
  • 685 Views
  • 1 replies
  • 0 kudos

Issues with Common Data Model as Source - different column size for blobs

I have a Dataverse Synapse link set up to extract data into ADLS gen2. I am trying to connect ADLS gen2 as the data source to read the data files in Databricks. I have CDC enabled for CDM Data with the partition of Year and Month.So, for example, if ...

  • 685 Views
  • 1 replies
  • 0 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 0 kudos

This is a thoughtful consideration, but have you considered using .option("mergeSchema", "true") when writing?Do keep in mind that this will affect the target table and possible downstream consumers. Ideally you want to have strict a schema contract ...

  • 0 kudos
maskepravin02
by New Contributor II
  • 1935 Views
  • 2 replies
  • 0 kudos

Resolved! How can we connect to 2 different hive spark.hadoop.hive.metastore.uris

We need to read a table from 2 different spark.hadoop.hive.metastore.uris and do some validations.We are not able to connect to both spark.hadoop.hive.metastore.uris at the same time using sparkSession.I will be using Spark version: 3.1.1 and the lan...

  • 1935 Views
  • 2 replies
  • 0 kudos
Latest Reply
ashraf1395
Valued Contributor II
  • 0 kudos

Hi there @maskepravin02,We have once implemented this approach of two reading two different hive metasores, but it was not on AWS and GCP, maybe the docs can help.Though it is not recommended The best approach is to create separate spark applications...

  • 0 kudos
1 More Replies
pt07
by New Contributor
  • 579 Views
  • 1 replies
  • 0 kudos

how do i pass the value from one task in the workflow to another task?

how do i pass the value from one task in the workflow to another task? #worksflow #orchestration 

  • 579 Views
  • 1 replies
  • 0 kudos
Latest Reply
brockb
Databricks Employee
  • 0 kudos

Hi @pt07 ,    This may be what you're looking for. Can you please take a look? https://www.databricks.com/blog/2022/08/02/sharing-context-between-tasks-in-databricks-workflows.html  

  • 0 kudos
subashdsouza
by New Contributor
  • 674 Views
  • 1 replies
  • 0 kudos
  • 674 Views
  • 1 replies
  • 0 kudos
Latest Reply
mhiltner
Databricks Employee
  • 0 kudos

Delta Uniform is the way to go at the moment: https://www.databricks.com/blog/delta-lake-universal-format-uniform-iceberg-compatibility-now-ga In one word, tables are written in Delta but have compatibility with Iceberg as they will also save iceberg...

  • 0 kudos
Peter-M
by New Contributor II
  • 2074 Views
  • 3 replies
  • 2 kudos
  • 2074 Views
  • 3 replies
  • 2 kudos
Latest Reply
mhiltner
Databricks Employee
  • 2 kudos

You could have a workflow with two tasks, one being a "trigger checker" that could be a super light task scheduled to run every X hours/minutes. This first task would check for your different triggers and define a success criteria for your next task....

  • 2 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels