cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

explorer
by New Contributor III
  • 3233 Views
  • 4 replies
  • 1 kudos

Resolved! Deleting records manually in databricks streaming table.

Hi Team , Let me know if there is any ways I can delete records manually from databricks streaming table without corrupting table and data.Can we delete the few records (based on some condition) manually in databricks streaming table (having checkpoi...

  • 3233 Views
  • 4 replies
  • 1 kudos
Latest Reply
JunYang
New Contributor III
  • 1 kudos

  If you use the applyChanges method in DLT for Change Data Capture (CDC), you can delete records manually without affecting the consistency of the table, as applyChanges respects manual deletions. You must configure your DLT pipeline to respect manu...

  • 1 kudos
3 More Replies
nadia
by New Contributor II
  • 18832 Views
  • 3 replies
  • 2 kudos

Resolved! Executor heartbeat timed out

Hello, I'm trying to read a table that is located on Postgreqsl and contains 28 million rows. I have the following result:"SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in sta...

  • 18832 Views
  • 3 replies
  • 2 kudos
Latest Reply
JunYang
New Contributor III
  • 2 kudos

Please also review the Spark UI to see the failed Spark job and Spark stage. Please check on the GC time and data spill to memory and disk. See if there is any error in the failed task in the Spark stage view. This will confirm data skew or GC/memory...

  • 2 kudos
2 More Replies
Chris_Konsur
by New Contributor III
  • 15104 Views
  • 4 replies
  • 6 kudos

Resolved! Error: The associated location ... is not empty but it's not a Delta table

I try to create a table but I get this error: AnalysisException: Cannot create table ('`spark_catalog`.`default`.`citation_all_tenants`'). The associated location ('dbfs:/user/hive/warehouse/citation_all_tenants') is not empty but it's not a Delta t...

  • 15104 Views
  • 4 replies
  • 6 kudos
Latest Reply
sachin_tirth
New Contributor II
  • 6 kudos

Hi Team, I am facing the same issue. When we try to load data to table in production batch getting error as table not in delta format. there is no recent change in table. and we are not trying any create or replace table. this is existing table in pr...

  • 6 kudos
3 More Replies
wyzer
by Contributor II
  • 6357 Views
  • 8 replies
  • 4 kudos

Resolved! How to pass parameters in SSRS/Power BI (report builder) ?

Hello,In SSRS/Power BI (report builder), how to query a table in Databricks with parameters please ?Because this code doesn't works :SELECT * FROM TempBase.Customers WHERE Name = {{ @P_Name }}Thanks.

  • 6357 Views
  • 8 replies
  • 4 kudos
Latest Reply
Nj11
New Contributor II
  • 4 kudos

Hi, I am not able to see the data in SSRS while I am using date parameters but with manual dates data is populating fine. The database is pointing to databricks. I am not sure what I am missing here. Please help me in this. ThanksI am trying with que...

  • 4 kudos
7 More Replies
Abbe
by New Contributor II
  • 1730 Views
  • 2 replies
  • 0 kudos

Update data type of a column within a table that has a GENERATED ALWAYS AS IDENTITY-column

I want to cast the data type of a column "X" in a table "A" where column "ID" is defined as GENERATED ALWAYS AS IDENTITY. Databricks refer to overwrite to achieve this: https://docs.databricks.com/delta/update-schema.htmlThe following operation:(spar...

  • 1730 Views
  • 2 replies
  • 0 kudos
Latest Reply
RajuBolla
New Contributor II
  • 0 kudos

Update is not working but delete is when i changed to DEFAULT property AnalysisException: UPDATE on IDENTITY column "XXXX_ID" is not supported.

  • 0 kudos
1 More Replies
MBV3
by New Contributor III
  • 9513 Views
  • 6 replies
  • 7 kudos

Resolved! External table from parquet partition

Hi,I have data in parquet format in GCS buckets partitioned by name eg. gs://mybucket/name=ABCD/I am trying to create a table in Databaricks as followsDROP TABLE IF EXISTS name_test; CREATE TABLE name_testUSING parquetLOCATION "gs://mybucket/name=*/...

  • 9513 Views
  • 6 replies
  • 7 kudos
Latest Reply
Pat
Honored Contributor III
  • 7 kudos

Hi @M Baig​ ,the error doesn't tell me much, but you could try:CREATE TABLE name_test USING parquet PARTITIONED BY ( name STRING) LOCATION "gs://mybucket/";

  • 7 kudos
5 More Replies
AkifCakir
by New Contributor II
  • 18090 Views
  • 4 replies
  • 3 kudos

Resolved! Why Spark Save Modes , "overwrite" always drops table although "truncate" is true ?

Hi Dear Team, I am trying to import data from databricks to Exasol DB. I am using following code in below with Spark version is 3.0.1 ,dfw.write \ .format("jdbc") \ .option("driver", exa_driver) \ .option("url", exa_url) \ .option("db...

  • 18090 Views
  • 4 replies
  • 3 kudos
Latest Reply
Gembo
New Contributor II
  • 3 kudos

@AkifCakir , Were you able to find a way to truncate without dropping the table using the .write function as I am facing the same issue as well.

  • 3 kudos
3 More Replies
Graham
by New Contributor III
  • 5517 Views
  • 5 replies
  • 2 kudos

"MERGE" always slower than "CREATE OR REPLACE"

OverviewTo update our Data Warehouse tables, we have tried two methods: "CREATE OR REPLACE" and "MERGE". With every query we've tried, "MERGE" is slower.My question is this: Has anyone successfully gotten a "MERGE" to perform faster than a "CREATE OR...

  • 5517 Views
  • 5 replies
  • 2 kudos
Latest Reply
Manisha_Jena
New Contributor III
  • 2 kudos

Hi @Graham Can you please try Low Shuffle Merge [LSM]  and see if it helps? LSM is a new MERGE algorithm that aims to maintain the existing data organization (including z-order clustering) for unmodified data, while simultaneously improving performan...

  • 2 kudos
4 More Replies
my_community2
by New Contributor III
  • 10410 Views
  • 8 replies
  • 6 kudos

Resolved! dropping a managed table does not remove the underlying files

the documentation states that "drop table":Deletes the table and removes the directory associated with the table from the file system if the table is not EXTERNAL  table. An exception is thrown if the table does not exist.In case of an external table...

image.png
  • 10410 Views
  • 8 replies
  • 6 kudos
Latest Reply
MajdSAAD_7953
New Contributor II
  • 6 kudos

Hi,There is a way to force delete files after drop the table and don't wait 30 days to see size in S3 decrease?Tables that I dropped related to the dev and staging, I don't want to keep there files for 30 days 

  • 6 kudos
7 More Replies
Juha
by New Contributor II
  • 1790 Views
  • 3 replies
  • 2 kudos
  • 1790 Views
  • 3 replies
  • 2 kudos
Latest Reply
lawrence009
Contributor
  • 2 kudos

Have you figured out what the problem was? Could the issue be permission related?

  • 2 kudos
2 More Replies
HariharaSam
by Contributor
  • 74024 Views
  • 6 replies
  • 3 kudos

Resolved! Alter Delta table column datatype

Hi ,I am having a delta table and table contains data and I need to alter the datatype for a particular column.For example :Consider the table name is A and column name is Amount with datatype Decimal(9,4).I need alter the Amount column datatype from...

  • 74024 Views
  • 6 replies
  • 3 kudos
Latest Reply
saipujari_spark
Valued Contributor
  • 3 kudos

Hi @HariharaSam The following documents the info about how to alter a Delta table schema.https://docs.databricks.com/delta/update-schema.html

  • 3 kudos
5 More Replies
Eelke
by New Contributor II
  • 6234 Views
  • 3 replies
  • 0 kudos

I want to perform interpolation on a streaming table in delta live tables.

I have the following code:from pyspark.sql.functions import * !pip install dbl-tempo from tempo import TSDF   from pyspark.sql.functions import *   # interpolate target_cols column linearly for tsdf dataframe def interpolate_tsdf(tsdf_data, target_c...

  • 6234 Views
  • 3 replies
  • 0 kudos
Latest Reply
Eelke
New Contributor II
  • 0 kudos

The issue was not resolved because we were trying to use a streaming table within TSDF which does not work.

  • 0 kudos
2 More Replies
HariharaSam
by Contributor
  • 20819 Views
  • 10 replies
  • 4 kudos

Resolved! To get Number of rows inserted after performing an Insert operation into a table

Consider we have two tables A & B.qry = """INSERT INTO Table ASelect * from Table B where Id is null """spark.sql(qry)I need to get the number of records inserted after running this in databricks.

  • 20819 Views
  • 10 replies
  • 4 kudos
Latest Reply
GRCL
New Contributor III
  • 4 kudos

Almost same advice than Hubert, I use the history of the delta table :df_history.select(F.col('operationMetrics')).collect()[0].operationMetrics['numOutputRows']You can find also other 'operationMetrics' values, like 'numTargetRowsDeleted'.

  • 4 kudos
9 More Replies
tototox
by New Contributor III
  • 11492 Views
  • 3 replies
  • 2 kudos

how to check table size by partition?

I want to check the size of the delta table by partition.As you can see, only the size of the table can be checked, but not by partition.

  • 11492 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@jin park​ :You can use the Databricks Delta Lake SHOW TABLE EXTENDED command to get the size of each partition of the table. Here's an example:%sql SHOW TABLE EXTENDED LIKE '<table_name>' PARTITION (<partition_column> = '<partition_value>') SELECT...

  • 2 kudos
2 More Replies
Labels