cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

mmendez1012
by New Contributor
  • 276 Views
  • 0 replies
  • 0 kudos

Workflows

Someone Can give me some advices about parquet size files whem moving data

  • 276 Views
  • 0 replies
  • 0 kudos
ashwinhabbu
by New Contributor
  • 93 Views
  • 0 replies
  • 0 kudos

Summit 24 experience

Great to see collaboration between Nvidia and Databricks!! Excited about everything serverless

  • 93 Views
  • 0 replies
  • 0 kudos
Michael_Galli
by Contributor II
  • 324 Views
  • 3 replies
  • 1 kudos

Resolved! Importing data into Excel from Databricks over ODBC OAuth / Simba Spark Driver

Hi all,I am refering to this articleConnect to Azure Databricks from Microsoft Excel - Azure Databricks | Microsoft LearnI use the latest SimbaSparkODBC-2.8.2.1013-Windows-64bit driver and configured in like in that documentation.In Databricks I use ...

  • 324 Views
  • 3 replies
  • 1 kudos
Latest Reply
Michael_Galli
Contributor II
  • 1 kudos

Can you provide a link please?

  • 1 kudos
2 More Replies
Jorge3
by New Contributor III
  • 2346 Views
  • 4 replies
  • 3 kudos

Dynamic partition overwrite with Streaming Data

Hi,I'm working on a job that propagate updates of data from a delta table to a parquet files (requirement of the consumer). The data is partitioned by day (year > month > day) and the daily data is updated every hour. I'm using table read streaming w...

  • 2346 Views
  • 4 replies
  • 3 kudos
Latest Reply
JacintoArias
New Contributor III
  • 3 kudos

We had a similar situation, @Hubert-Dudek we are using delta, but we are having some problems when propagating updates via merge, as you cannot read the resulting table as streaming source anymore... so using complete overwrite over parquet partition...

  • 3 kudos
3 More Replies
dbengineer516
by New Contributor III
  • 445 Views
  • 1 replies
  • 0 kudos

IOStream.flush Timed Out

Hello,I'm encountering an issue with a Python script/notebook that I have developed and used in a daily job ran in Databricks. It has worked perfectly fine for months, but now continues to fail constantly. After digging a little deeper, when running ...

  • 445 Views
  • 1 replies
  • 0 kudos
Latest Reply
raphaelblg
Honored Contributor
  • 0 kudos

Hello @dbengineer516  From my research it looks to be an IPython cache error. Maybe your python REPL is getting throttled due to too many requests. Please check: https://github.com/ipython/ipykernel/issues/334 This comment seems to be a possible solu...

  • 0 kudos
Philospher1425
by New Contributor II
  • 498 Views
  • 4 replies
  • 2 kudos

Rename the file in Databricks is so hard.How to make it simpler

Hi Community  Actually my requirement is simple , I need to drop the files into Azure data Lake gen 2 storage from Databricks. But When I use df.coalesce(1).write.csv("url to gen 2/stage/) It's creating part .CSV file . But I need to rename to a cust...

  • 498 Views
  • 4 replies
  • 2 kudos
Latest Reply
raphaelblg
Honored Contributor
  • 2 kudos

Hi @Philospher1425,   Allow me to clarify that dbutils.fs serves as an interface to submit commands to your cloud provider storage. As such, the speed of copy operations is determined by the cloud provider and is beyond Databricks' control.   That be...

  • 2 kudos
3 More Replies
Lizhi_Dong
by New Contributor II
  • 1357 Views
  • 6 replies
  • 1 kudos

Tables disappear when I re-start a new cluster on Community Edition

What would be the best plan for independent course creator?Hi folks! I want to use databrick community edition as the platform to teach online courses. As you may know, for community edition, you need to create a new cluster when the old one terminat...

  • 1357 Views
  • 6 replies
  • 1 kudos
Latest Reply
Shivanshu_
Contributor
  • 1 kudos

I believe only the metadata get's removed from HMS not the delta files from dbfs. Instead of loading the data again and again try using ctas with that dbfs location.

  • 1 kudos
5 More Replies
JonM
by New Contributor
  • 165 Views
  • 0 replies
  • 0 kudos

Information_schema appears empty

Hi,We've encountered a problem with the information schema for one of our catalogs. For context: we're using dbt to implement our logic. We noticed this issue because dbt queries the information_schema.tables view to check which tables should be drop...

  • 165 Views
  • 0 replies
  • 0 kudos
bulbur
by New Contributor
  • 138 Views
  • 0 replies
  • 0 kudos

Use pandas in DLT pipeline

Hi,I am trying to work with pandas in a delta live table. I have created some example code: import pandas as pd import pyspark.sql.functions as F pdf = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "...

  • 138 Views
  • 0 replies
  • 0 kudos
Pratibha
by New Contributor II
  • 368 Views
  • 2 replies
  • 0 kudos

which cluster/worker/driver type is best for analytics work?

which cluster/worker/driver type is best for analytics work?

  • 368 Views
  • 2 replies
  • 0 kudos
Latest Reply
jacovangelder
Contributor III
  • 0 kudos

Analytics work as in querying and analyzing data? Preferably using Databricks SQL? If so, then a SQL Warehouse is your best friend. 

  • 0 kudos
1 More Replies
kiko_roy
by Contributor
  • 605 Views
  • 5 replies
  • 2 kudos

Resolved! Delta sharing open protocol in Unity catalog: FileNotFoundError

Hi TeamI have created a recipient under delta sharing (azure databricks) . Unity catalog is enabled and data is stored in ADLS gen2. I have downloaded the credential file and trying to resue in my python script (as per databricks documentation) for a...

  • 605 Views
  • 5 replies
  • 2 kudos
Latest Reply
jacovangelder
Contributor III
  • 2 kudos

I wasn't able to reproduce your issue. Is your delta table operable? can you see sample data from within databricks and query the table from within databricks? It almost looks like some parquet files are missing, causing your delta not queryable anym...

  • 2 kudos
4 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels