cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

param3sh
by New Contributor
  • 2675 Views
  • 3 replies
  • 0 kudos

Performance b/w Managed Table and Un-Managed table

I am using Databricks in Azure. I want to mount ADLS Gen2 on Databricks and create unmanged (external) tables on the mount point. But before that I want to know which will give best performance, is it Managed table (stores data in DBFS root)or Un-ma...

  • 2675 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Paramesh Malla​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedba...

  • 0 kudos
2 More Replies
Upendra_Kumar
by New Contributor
  • 2211 Views
  • 3 replies
  • 0 kudos

Not able to perform update in delta table in databricks using 3 tables

Hi,I am able to perform merge from 2 tables but have requirement to update table based on 3 tables like following query.update a set a.name=b.namefrom table1 a inner join table2 b on a.id=b.idinner join table3 c on a.id=c.idThanks in advance..

  • 2211 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @upendra kumar sharma​ Help us build a vibrant and resourceful community by recognizing and highlighting insightful contributions. Mark the best answers and show your appreciation!Thanks and Regards

  • 0 kudos
2 More Replies
DataBricks_Use1
by New Contributor
  • 2570 Views
  • 2 replies
  • 0 kudos

DLT live Table-Incremental Refresh

Hi All,In our ETL Framework, we have four layers Raw, Foundation ,Trusted & Unified .In raw we are copying the file in JSON Format from a source, using ADF pipeline .In the next Layer(i.e. Foundation) we are flattening the Json Files and converting t...

  • 2570 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @DataBricks_User9 c​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 4641 Views
  • 1 replies
  • 0 kudos

I am getting an exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive."

I have a parquet dataframe df. I first add a column using df.withColumn("version",lit(currentTimestamp)) and append it a table db.tbl with format parquet and partitioned by the "version" column. I then ran MSCK REPAIR TABLE db.tbl. I have then create...

Image
  • 4641 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@vikashk84The exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive" typically occurs when there is an issue with Hive metadata related to partitioning in Databricks. Here are a few steps you ...

  • 0 kudos
oleole
by Contributor
  • 15317 Views
  • 1 replies
  • 1 kudos

Resolved! MERGE to update a column of a table using Spark SQL

Coming from MS SQL background, I'm trying to write a query in Spark SQL that simply update a column value of table A (source table) by INNER JOINing a new table B with a filter.MS SQL query looks like this:UPDATE T SET T.OfferAmount = OSE.EndpointEve...

  • 15317 Views
  • 1 replies
  • 1 kudos
Latest Reply
oleole
Contributor
  • 1 kudos

Posting answer to my question:   MERGE into TempOffer VIEW USING OfferSeq OSE ON VIEW.OfferId = OSE.OfferID AND OSE.OfferId = 1 WHEN MATCHED THEN UPDATE set VIEW.OfferAmount = OSE.EndpointEventAmountValue;

  • 1 kudos
andrew0117
by Contributor
  • 3093 Views
  • 3 replies
  • 2 kudos

Resolved! Will a table backed by a SQL server database table automatically get updated if the base table in SQL server database is updated?

If I creat a table using the code below: CREATE TABLE IF NOT EXISTS jdbcTableusing org.apache.spark.sql.jdbcoptions( url "sql_server_url", dbtable "sqlserverTable", user "username", password "password")will jdbcTable always be automatically sync...

  • 3093 Views
  • 3 replies
  • 2 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 2 kudos

Hi @andrew li​ There is a feature introduced from DBR11 where you can directly ingest the data to the table from a selected list of sources. As you are creating a table, I believe this command will create a managed table by loading the data from the...

  • 2 kudos
2 More Replies
Kayla
by Valued Contributor II
  • 4378 Views
  • 2 replies
  • 0 kudos

BigQuery - Delete or update from Databricks?

I'm trying to sync a delta table in Databricks to a BigQuery table. For the most part, appending is sufficient, but occassionally we need to overwrite rows - which we've only been able to do by overwriting the entire table.Is there any way to do upda...

  • 4378 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Kayla Pomakoy​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 0 kudos
1 More Replies
Mado
by Valued Contributor II
  • 5337 Views
  • 2 replies
  • 0 kudos

Overwriting the existing table in Databricks; Mechanism and History?

Hi,Assume that I have a delta table stored on an Azure storage account. When new records arrive, I repeat the transformation and overwrite the existing table. (DF.write   .format("delta")   .mode("overwrite")   .option("...

  • 5337 Views
  • 2 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

the overwrite will add new files, keep the old ones and in a log keeps track of what is current data and what is old data.If the overwrite fails, you will get an error message in the spark program, and the data to be overwritten will still be the cur...

  • 0 kudos
1 More Replies
tw1
by New Contributor III
  • 13383 Views
  • 9 replies
  • 3 kudos

Resolved! Can't write / overwrite delta table with error: oxxxx.saveAsTable. (Driver Error: OutOfMemory)

Current Cluster Config:Standard_DS3_v2 (14GB, 4 Cores) 2-6 workersStandard_DS3_v2 (14GB, 4Cores) for driverRuntime: 10.4x-scala2.12We want to overwrite a temporary delta table with new records. The records will be load by another delta table and tran...

image image
  • 13383 Views
  • 9 replies
  • 3 kudos
Latest Reply
tw1
New Contributor III
  • 3 kudos

Hi,thank you for your help!We tested the configuration settings and it runs without any errors.Could you give us some more information, where we can find some documentation about such settings. We searched hours to fix our problem. So we contacted th...

  • 3 kudos
8 More Replies
Bie1234
by New Contributor III
  • 5825 Views
  • 3 replies
  • 4 kudos

Resolved! How to delete records that column have same value in another table?

delete from DWH.SALES_FACT where SALES_DATE in (select SALES_DATE from STG.SALES_FACT_SRC) AND STORE_ID in (select STORE_ID from STG.SALES_FACT_SRC)output : Error in SQL statement: DeltaAnalysisException: Nested subquery is not supported in the...

  • 5825 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @pansiri panaudom​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.T...

  • 4 kudos
2 More Replies
JRT5933
by New Contributor III
  • 4485 Views
  • 4 replies
  • 7 kudos

Resolved! GOLD table slowed down at MERGE INTO

Howdy - I recently took a table FACT_TENDER and made it into a medalliona tyle TABLE to test performance since I suspected medallion would be quicker. Key differences: Both tables use bronze dataoriginal has all logic in one long notebookMERGE INTO t...

  • 4485 Views
  • 4 replies
  • 7 kudos
Latest Reply
JRT5933
New Contributor III
  • 7 kudos

I ended up instituing true and tried PARTITIONING and PRUNING methods to boost performance, which has succeeded.

  • 7 kudos
3 More Replies
zeta_load
by New Contributor II
  • 2146 Views
  • 2 replies
  • 0 kudos

Resolved! When does delta lake actually compute a table?

Maybe I'm completely wrong, but from my understanding delta lake only calculates a table at certain points, for instance when you display your data. Before that point, operations are only written to the log file and are not executed (meaning no chang...

  • 2146 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Lukas Goldschmied​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

  • 0 kudos
1 More Replies
nagini_sitarama
by New Contributor III
  • 3767 Views
  • 3 replies
  • 2 kudos

Error while optimizing the table . Failure of InSet.sql for UTF8String collection

count of the table : 1125089 for october month data , So I am optimizing the table. optimize table where batchday >="2022-10-01" and batchday<="2022-10-31"I am getting error like : GC overhead limit exceeded    at org.apache.spark.unsafe.types.UTF8St...

image.png
  • 3767 Views
  • 3 replies
  • 2 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 2 kudos

Hi @Nagini Sitaraman​ To understand the issue better I would like to get some more information. Does the error occur at the driver side or executor side? Can you please share the full error stack trace? You may need to check the spark UI to find wher...

  • 2 kudos
2 More Replies
pasiasty2077
by New Contributor
  • 8449 Views
  • 1 replies
  • 1 kudos

Partition filter is skipped when table is used in where condition, why?

Hi,maybe someone can help me i do want to run very narrow query SELECT * FROM my_table WHERE snapshot_date IN('2023-01-06', '2023-01-07')   -- part of the physical plan: -- Location: PreparedDeltaFileIndex [dbfs:/...] -- PartitionFilters: [cast(snaps...

  • 8449 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

No hints on partition pruning afaik.The reason the partitions were not pruned is because the second query generates a completely different plan.To be able to filter the partitions, a join first has to happen. And in this case it means the table has...

  • 1 kudos
lmcglone
by New Contributor II
  • 8147 Views
  • 2 replies
  • 3 kudos

Comparing 2 dataframes and create columns from values within a dataframe

Hi,I have a dataframe that has name and companyfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()columns = ["company","name"]data = [("company1", "Jon"), ("company2", "Steve"), ("company1", "...

image
  • 8147 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

You need to join and pivotdf .join(df2, on=[df.company == df2.job_company])) .groupBy("company", "name") .pivot("job_company") .count()

  • 3 kudos
1 More Replies
Labels