cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sashikanth
by New Contributor II
  • 451 Views
  • 2 replies
  • 0 kudos

Updates are going as insert in a databricks job

There is no code change in a databricks job or notebook but we have observed the malfunction in cdc. The records which are meant for updates are going for inserts and causing duplicacy. At the sametime we had make sure the PKs based on which merge ru...

  • 451 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @sashikanth, I would recommend opening a case with us further investigate this behavior since this requires a code logic review and additional checks. Please refer to: https://docs.databricks.com/en/resources/support.html

  • 0 kudos
1 More Replies
Youngwb
by New Contributor II
  • 1090 Views
  • 1 replies
  • 1 kudos

Databricks ODBC driver tooks long time to list columns

I'm testing the performance of databricks, when I use ODBC driver to submit query, I found it slower than notebook,and it looks like odbc driver will send "list columns" request to databricks, 1. Is there a way to prevent the ODBC driver from sending...

Youngwb_0-1722594481062.png
Data Engineering
Databricks
deltalake
ODBC driver
  • 1090 Views
  • 1 replies
  • 1 kudos
Latest Reply
varunjaincse
New Contributor III
  • 1 kudos

@Youngwb Did you fixed this issue?

  • 1 kudos
mangel
by New Contributor III
  • 12914 Views
  • 7 replies
  • 3 kudos

Resolved! Delta Live Tables error pivot

I'm facing an error in Delta Live Tables when I want to pivot a table. The error is the following: And the code to replicate the error is the following:import pandas as pd import pyspark.sql.functions as F   pdf = pd.DataFrame({"A": ["foo", "foo", "f...

image
  • 12914 Views
  • 7 replies
  • 3 kudos
Latest Reply
Michiel_Povre
New Contributor II
  • 3 kudos

Hi, Was this a specific design choice to not allow Pivots in DLT? I'm under the impression they expect fixed table structures in DLT design for a reason, but I don't understand the reason? Conceptually, I understand the fixed structures makes lineage...

  • 3 kudos
6 More Replies
anantkharat
by New Contributor II
  • 935 Views
  • 2 replies
  • 1 kudos

Resolved! Getting

payload = {"clusters": [{"num_workers": 4}],"pipeline_id": pipeline_id}update_url = f"{workspace_url}/api/2.0/pipelines/{pipeline_id}"response = requests.put(update_url, headers=headers, json=payload)for this, i'm getting below output with status cod...

Data Engineering
Databricks
Delta Live Tables
  • 935 Views
  • 2 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Thank you for accepting the solution, I am glad it worked.

  • 1 kudos
1 More Replies
furqanraza
by New Contributor II
  • 641 Views
  • 2 replies
  • 0 kudos

ETL job execution failure in serverless compute

Hi,We facing an issue when executing the ELT pipeline on serverless compute. In the ETL pipeline, for some users, a task gets stuck every time we run the job. However, a similar ETL pipeline works fine for other users. Furthermore, there are canceled...

  • 641 Views
  • 2 replies
  • 0 kudos
Latest Reply
ozaaditya
Contributor
  • 0 kudos

Hi,Could you please share the error message or any logs you’re seeing when the task gets stuck? This will help in diagnosing the issue and identifying potential solutions.

  • 0 kudos
1 More Replies
NhanNguyen
by Contributor III
  • 3306 Views
  • 8 replies
  • 2 kudos

ConcurrentAppendException After Delta Table was enable Liquid Clustering and Row level concurrency

Everytime I run parallel job it always failed with this error: ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.I did a lot of reseaches also create liquid clustering table an...

  • 3306 Views
  • 8 replies
  • 2 kudos
Latest Reply
cgrant
Databricks Employee
  • 2 kudos

Thanks for sharing. In the original screenshots I've noticed that you've set delta.isolationLevel to Serializable, which is the strongest (and most strict) level. Please try WriteSerializable, which is the default level.

  • 2 kudos
7 More Replies
JissMathew
by Valued Contributor
  • 994 Views
  • 1 replies
  • 0 kudos

Auto Loader

from pyspark.sql.types import StructType, StructField, LongType, StringType, TimestampTypefrom pyspark.sql import functions as F, Windowfrom delta.tables import DeltaTableimport logging# Set up logginglogging.basicConfig(level=logging.INFO)logger = l...

  • 994 Views
  • 1 replies
  • 0 kudos
Latest Reply
aayrm5
Honored Contributor
  • 0 kudos

If you intend to capture data changes, take a look at this doc, which talks about change data feed in Databricks.

  • 0 kudos
nalindhar
by New Contributor II
  • 656 Views
  • 2 replies
  • 1 kudos

Historical Migration of Data from Delta Tables to Delta Live Tables

Our team is planning to migrate to the Delta Live Tables (DLT) framework for data ingestion. We currently have Delta tables populated with several years of data from ingested files and wish to avoid re-ingesting these files. What is the best approach...

  • 656 Views
  • 2 replies
  • 1 kudos
Latest Reply
aayrm5
Honored Contributor
  • 1 kudos

Hey @nalindhar I assume you want the target table to have the same name once you declare your ETL operations in the DLT pipeline, if that's the case, begin with renaming your delta table to specify that they contain historical data, you can do this v...

  • 1 kudos
1 More Replies
kfloresip
by New Contributor
  • 1336 Views
  • 1 replies
  • 0 kudos

Bloomberg API and Databricks

Dear Databricks Community,Do you have any experience connecting Databricks (Python) to the Bloomberg Terminal to retrieve data in an automated way? I tried running the following code without success:%python%pip install blpapiThanks for your help,Kevi...

  • 1336 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @kfloresip ,As per documentation you need to run:%pip install --index-url=https://blpapi.bloomberg.com/repository/releases/python/simple/ blpapi It means that the package is not available on the standard Python Package Index (PyPI) and Bloomberg h...

  • 0 kudos
Smu_Tan
by New Contributor
  • 3941 Views
  • 6 replies
  • 1 kudos

Resolved! Does Databricks supports the Pytorch Distributed Training for multiple devices?

Hi, Im trying to use the databricks platform to do the pytorch distributed training, but I didnt find any info about this. What I expected is using multiple clusters to run a common job using pytorch distributed data parallel (DDP) with the code belo...

  • 3941 Views
  • 6 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

If only the driver is active, this probably means you are not using Spark.  When running pure python,... code, the driver will execute that.If Spark is active, workers receive their tasks from the driver.  Generally the driver is not that active, the...

  • 1 kudos
5 More Replies
Layer
by New Contributor
  • 588 Views
  • 1 replies
  • 0 kudos

Can i send multiple post requests to an API endpoint and get the info if all succeeded ?

Hello I am trying to send multiple post requests to an endpoint, i have a spark dataframe and each column of this dataframe is sent through the payload of the post request.However when i run this in my notebook, no exception is raised. I'm guessing i...

  • 588 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

The return type for foreachPartition is None, so this is expected. If you're looking to do arbitrary code execution and return a result, mapInPandas or Pandas UDFs are good choices - you'd want to combine those with something like a .toLocalIterator ...

  • 0 kudos
lauraxyz
by Contributor
  • 2064 Views
  • 8 replies
  • 4 kudos

How to execute .sql file in volume

I have giant queries (SELECT.. FROM) that i store in .sql files. I want to put those files in the Volume, and run the queries from a workflow task.I can load the file content into a 'text' format string, then run the query.  My question is,  is there...

  • 2064 Views
  • 8 replies
  • 4 kudos
Latest Reply
lauraxyz
Contributor
  • 4 kudos

issue resolved:for .py, i was using spark, and I have to explicitly create the spark session so that it can be run properly and insert data. 

  • 4 kudos
7 More Replies
100databricks
by New Contributor III
  • 959 Views
  • 2 replies
  • 1 kudos

Resolved! How can I force a data frame to evaluate without saving it?

The problem in my hand requires me to take a set of actions on a very large data frame df_1. This set of actions results in a second data frame df_2, and from this second data frame, I have multiple downstream tasks, task_1, task_2 ...  By default, t...

  • 959 Views
  • 2 replies
  • 1 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 1 kudos

Hi @100databricks,Hi, yes, you can run df_2.cache() or df_2.persist()(df_2.cache() is a shortcut for df_2.persist(StorageLevel.MEMORY_ONLY)Here is the pseudo-code:# df_1 is your large initial DataFrame df_1 = ... # Perform expensive transformations ...

  • 1 kudos
1 More Replies
RobsonNLPT
by Contributor III
  • 2351 Views
  • 2 replies
  • 0 kudos

Delta Identity latest value after insert

Hi all.I would like to know if databricks has some feature to retrieve the latest identity column value (always generated) after insert or upserts operations? (dataframe apis and sql)Database engines as Azure SQL  and Oracle have feature that enable ...

  • 2351 Views
  • 2 replies
  • 0 kudos
Latest Reply
tapash-db
Databricks Employee
  • 0 kudos

Hi, You can always query "SELECT MAX(identity_column) FROM your_table_name" and see the latest value of the identity column. However, there are no direct functions available to give the latest identity column value.

  • 0 kudos
1 More Replies
eballinger
by Contributor
  • 1497 Views
  • 2 replies
  • 0 kudos

Looking for ways to speed up DLT testing

Hi Guys,I am new to this community. I am guessing we have a typical setup (DLT tables, 3 layers - bronze, silver and gold) and while it works fine in our development environment I have always looked for ways to speed things up for testers. For exampl...

  • 1497 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

There isn't a direct way to achieve this within the current DLT framework. When a DLT table is undeclared, it is designed to be removed from the pipeline, which includes the underlying data. However, there are a few strategies you can consider to spe...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels