cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DataBricks_Use1
by New Contributor
  • 1812 Views
  • 2 replies
  • 0 kudos

DLT live Table-Incremental Refresh

Hi All,In our ETL Framework, we have four layers Raw, Foundation ,Trusted & Unified .In raw we are copying the file in JSON Format from a source, using ADF pipeline .In the next Layer(i.e. Foundation) we are flattening the Json Files and converting t...

  • 1812 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @DataBricks_User9 c​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 3424 Views
  • 1 replies
  • 0 kudos

I am getting an exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive."

I have a parquet dataframe df. I first add a column using df.withColumn("version",lit(currentTimestamp)) and append it a table db.tbl with format parquet and partitioned by the "version" column. I then ran MSCK REPAIR TABLE db.tbl. I have then create...

Image
  • 3424 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@vikashk84The exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive" typically occurs when there is an issue with Hive metadata related to partitioning in Databricks. Here are a few steps you ...

  • 0 kudos
oleole
by Contributor
  • 11984 Views
  • 1 replies
  • 1 kudos

Resolved! MERGE to update a column of a table using Spark SQL

Coming from MS SQL background, I'm trying to write a query in Spark SQL that simply update a column value of table A (source table) by INNER JOINing a new table B with a filter.MS SQL query looks like this:UPDATE T SET T.OfferAmount = OSE.EndpointEve...

  • 11984 Views
  • 1 replies
  • 1 kudos
Latest Reply
oleole
Contributor
  • 1 kudos

Posting answer to my question:   MERGE into TempOffer VIEW USING OfferSeq OSE ON VIEW.OfferId = OSE.OfferID AND OSE.OfferId = 1 WHEN MATCHED THEN UPDATE set VIEW.OfferAmount = OSE.EndpointEventAmountValue;

  • 1 kudos
andrew0117
by Contributor
  • 2230 Views
  • 3 replies
  • 2 kudos

Resolved! Will a table backed by a SQL server database table automatically get updated if the base table in SQL server database is updated?

If I creat a table using the code below: CREATE TABLE IF NOT EXISTS jdbcTableusing org.apache.spark.sql.jdbcoptions( url "sql_server_url", dbtable "sqlserverTable", user "username", password "password")will jdbcTable always be automatically sync...

  • 2230 Views
  • 3 replies
  • 2 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 2 kudos

Hi @andrew li​ There is a feature introduced from DBR11 where you can directly ingest the data to the table from a selected list of sources. As you are creating a table, I believe this command will create a managed table by loading the data from the...

  • 2 kudos
2 More Replies
Kayla
by Valued Contributor II
  • 3310 Views
  • 2 replies
  • 0 kudos

BigQuery - Delete or update from Databricks?

I'm trying to sync a delta table in Databricks to a BigQuery table. For the most part, appending is sufficient, but occassionally we need to overwrite rows - which we've only been able to do by overwriting the entire table.Is there any way to do upda...

  • 3310 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Kayla Pomakoy​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 0 kudos
1 More Replies
Mado
by Valued Contributor II
  • 4078 Views
  • 2 replies
  • 0 kudos

Overwriting the existing table in Databricks; Mechanism and History?

Hi,Assume that I have a delta table stored on an Azure storage account. When new records arrive, I repeat the transformation and overwrite the existing table. (DF.write   .format("delta")   .mode("overwrite")   .option("...

  • 4078 Views
  • 2 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

the overwrite will add new files, keep the old ones and in a log keeps track of what is current data and what is old data.If the overwrite fails, you will get an error message in the spark program, and the data to be overwritten will still be the cur...

  • 0 kudos
1 More Replies
tw1
by New Contributor III
  • 10202 Views
  • 9 replies
  • 3 kudos

Resolved! Can't write / overwrite delta table with error: oxxxx.saveAsTable. (Driver Error: OutOfMemory)

Current Cluster Config:Standard_DS3_v2 (14GB, 4 Cores) 2-6 workersStandard_DS3_v2 (14GB, 4Cores) for driverRuntime: 10.4x-scala2.12We want to overwrite a temporary delta table with new records. The records will be load by another delta table and tran...

image image
  • 10202 Views
  • 9 replies
  • 3 kudos
Latest Reply
tw1
New Contributor III
  • 3 kudos

Hi,thank you for your help!We tested the configuration settings and it runs without any errors.Could you give us some more information, where we can find some documentation about such settings. We searched hours to fix our problem. So we contacted th...

  • 3 kudos
8 More Replies
Bie1234
by New Contributor III
  • 4617 Views
  • 3 replies
  • 4 kudos

Resolved! How to delete records that column have same value in another table?

delete from DWH.SALES_FACT where SALES_DATE in (select SALES_DATE from STG.SALES_FACT_SRC) AND STORE_ID in (select STORE_ID from STG.SALES_FACT_SRC)output : Error in SQL statement: DeltaAnalysisException: Nested subquery is not supported in the...

  • 4617 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @pansiri panaudom​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.T...

  • 4 kudos
2 More Replies
JRT5933
by New Contributor III
  • 3456 Views
  • 4 replies
  • 7 kudos

Resolved! GOLD table slowed down at MERGE INTO

Howdy - I recently took a table FACT_TENDER and made it into a medalliona tyle TABLE to test performance since I suspected medallion would be quicker. Key differences: Both tables use bronze dataoriginal has all logic in one long notebookMERGE INTO t...

  • 3456 Views
  • 4 replies
  • 7 kudos
Latest Reply
JRT5933
New Contributor III
  • 7 kudos

I ended up instituing true and tried PARTITIONING and PRUNING methods to boost performance, which has succeeded.

  • 7 kudos
3 More Replies
zeta_load
by New Contributor II
  • 1526 Views
  • 2 replies
  • 0 kudos

Resolved! When does delta lake actually compute a table?

Maybe I'm completely wrong, but from my understanding delta lake only calculates a table at certain points, for instance when you display your data. Before that point, operations are only written to the log file and are not executed (meaning no chang...

  • 1526 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Lukas Goldschmied​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

  • 0 kudos
1 More Replies
nagini_sitarama
by New Contributor III
  • 2439 Views
  • 3 replies
  • 2 kudos

Error while optimizing the table . Failure of InSet.sql for UTF8String collection

count of the table : 1125089 for october month data , So I am optimizing the table. optimize table where batchday >="2022-10-01" and batchday<="2022-10-31"I am getting error like : GC overhead limit exceeded    at org.apache.spark.unsafe.types.UTF8St...

image.png
  • 2439 Views
  • 3 replies
  • 2 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 2 kudos

Hi @Nagini Sitaraman​ To understand the issue better I would like to get some more information. Does the error occur at the driver side or executor side? Can you please share the full error stack trace? You may need to check the spark UI to find wher...

  • 2 kudos
2 More Replies
pasiasty2077
by New Contributor
  • 7862 Views
  • 1 replies
  • 1 kudos

Partition filter is skipped when table is used in where condition, why?

Hi,maybe someone can help me i do want to run very narrow query SELECT * FROM my_table WHERE snapshot_date IN('2023-01-06', '2023-01-07')   -- part of the physical plan: -- Location: PreparedDeltaFileIndex [dbfs:/...] -- PartitionFilters: [cast(snaps...

  • 7862 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

No hints on partition pruning afaik.The reason the partitions were not pruned is because the second query generates a completely different plan.To be able to filter the partitions, a join first has to happen. And in this case it means the table has...

  • 1 kudos
lmcglone
by New Contributor II
  • 6420 Views
  • 2 replies
  • 3 kudos

Comparing 2 dataframes and create columns from values within a dataframe

Hi,I have a dataframe that has name and companyfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()columns = ["company","name"]data = [("company1", "Jon"), ("company2", "Steve"), ("company1", "...

image
  • 6420 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

You need to join and pivotdf .join(df2, on=[df.company == df2.job_company])) .groupBy("company", "name") .pivot("job_company") .count()

  • 3 kudos
1 More Replies
Anatoly
by New Contributor III
  • 6543 Views
  • 5 replies
  • 4 kudos

"Detected schema change" error while reading from delta table in streaming after applying "ALTER COLUMN DROP NOT NULL" to more than one columns.

Hi!I have a delta table and a process that reading a stream from this table.I need to drop the NOT NULL constraint from some of the columns of this table.The first drop command does not affect the reading stream.But the second command results in erro...

  • 6543 Views
  • 5 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Anatoly Tikhonov​ Hope everything is going great.Does @Kaniz Fatma​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 4 kudos
4 More Replies
Jennifer_Lu
by New Contributor III
  • 1390 Views
  • 1 replies
  • 3 kudos

Why does DLT CDC some time manifests the results table as a table and other times as a view?

I have a simple DLT pipeline that reads from an existing table, do some transformations, saves to a view, and then uses dlt.apply_changes() to insert the view into a results table. My question is:why is my results table a view and not a table like I ...

  • 1390 Views
  • 1 replies
  • 3 kudos
Latest Reply
Jfoxyyc
Valued Contributor
  • 3 kudos

I find most of my apply_changes tables are being created as materialized views as well. They do recalculate at runtime, so they're up to date and behave a lot like a table, but they aren't tables in the same sense.

  • 3 kudos
Labels