cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826992666
by Valued Contributor
  • 2011 Views
  • 1 replies
  • 0 kudos

When using Delta Live Tables, how do I set a table to be incremental vs complete using Python?

When using SQL, I can use the Create Live Table command and the Create Incremental Live Table command to set the run type I want the table to use. But I don't seem to have that same syntax for python. How can I set this table type while using Python?

  • 2011 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

The documentation at https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-user-guide.html#mixing-complete-tables-and-incremental-tables has an example the first two functions load data incrementally and the last one loads...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1544 Views
  • 1 replies
  • 0 kudos

How can I run OPTIMIZE on a table if I am streaming to it 24/7?

I have a table that I need to be continuously streaming into. I know it's best practice to run Optimize on my tables periodically. But if I never stop writing to the table, how and when can I run OPTIMIZE against it?

  • 1544 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

If the streaming job is making bling appends to the delta table, then it's perfectly fine to run OPTIMIZE query in parallel.However, if the streaming job is performing MERGE or UPDATE then it can conflict with the OPTIMIZE operations. In such cases w...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 1591 Views
  • 1 replies
  • 0 kudos

Resolved! Unable to drop a table

I have a Table which and I do not have access to the underlying data any longer. We do not need this dataset anymore, but unable to drop the table

  • 1591 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Use the below code snippet to forcefully drop the table:package org.apache.spark.sql.hive { import org.apache.spark.sql.hive.HiveUtils import org.apache.spark.SparkContext   object utils { def dropTable(sc: SparkContext, dbName: String, tableName...

  • 0 kudos
User16752241457
by New Contributor II
  • 1102 Views
  • 1 replies
  • 0 kudos

Overwriting Delta Table Using SQL

I have a delta table that is updated nightly, that I drop and recreate at the start of each day. However, this isn't ideal because every time I drop the table I lose all the info in the transaction log. Is there a way that I can do the equivalent of:...

  • 1102 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

I think you are looking for the INSERT OVERWRITE command in Spark SQL. Check out the documentation here: https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-dml-insert-overwrite-table.html

  • 0 kudos
User16826992666
by Valued Contributor
  • 2098 Views
  • 1 replies
  • 0 kudos

Resolved! When running a Merge, if records from the table are deleted are the underlying files that contain the records deleted as well?

I know I have the option to delete rows from a Delta table when running a merge. But I'm confused about how that would actually affect the files that contain the deleted records. Are those files deleted, or are they rewritten, or what?

  • 2098 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Delta implements MERGE by physically rewriting existing files. It is implemented  in two steps.Perform an inner join between the target table and source table to select all files that have matches.Perform an outer join between the selected files in t...

  • 0 kudos
jose_gonzalez
by Databricks Employee
  • 6746 Views
  • 1 replies
  • 0 kudos

Resolved! How can I read a specific Delta table part file?

is there a way to read a specific part off a delta table? When I try to read the parquet file as parquet I get an error in the notebook that I’m using the incorrect format as it’s part of a delta table. I just want to read a single Parquet file, not ...

  • 6746 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Disable Delta format to read as Parquet you need to set to false the following Spark settings:>> SET spark.databricks.delta.formatCheck.enabled=false OR>> spark.conf.set("spark.databricks.delta.formatCheck.enabled", "false")its not recommended to re...

  • 0 kudos
User16826988857
by Databricks Employee
  • 2870 Views
  • 0 replies
  • 0 kudos

How to allow Table deletion without requiring ownership on table? Problem Description In DBR 6 (and earlier), a non-admin user can delete a table that...

How to allow Table deletion without requiring ownership on table?Problem DescriptionIn DBR 6 (and earlier), a non-admin user can delete a table that the user doesn't own, as long as the user has ownership on the table's parent database (perhaps throu...

  • 2870 Views
  • 0 replies
  • 0 kudos
Anonymous
by Not applicable
  • 912 Views
  • 0 replies
  • 0 kudos

Escaped quotes mess up table records

When table content is dumped from the RDBMS (e.g. Oracle), some column values may contain escaped double quotes (\") in the column values, which may cause the values from multiple columns to be concatenated into one value and result in corrupted reco...

  • 912 Views
  • 0 replies
  • 0 kudos
Anonymous
by Not applicable
  • 1190 Views
  • 0 replies
  • 0 kudos

Newline characters mess up the table records

When creating tables from text files containing newline characters in the middle of the lines, the table records will null column values because the newline characters in the middle of the lines break the lines into two different records and fill up ...

  • 1190 Views
  • 0 replies
  • 0 kudos
KiranRastogi
by New Contributor
  • 38406 Views
  • 2 replies
  • 2 kudos

Pandas dataframe to a table

I want to write a pandas dataframe to a table, how can I do this ? Write command is not working, please help.

  • 38406 Views
  • 2 replies
  • 2 kudos
Latest Reply
amy_wang
New Contributor II
  • 2 kudos

Hey Kiran, Just taking a stab in the dark but do you want to convert the Pandas DataFrame to a Spark DataFrame and then write out the Spark DataFrame as a non-temporary SQL table? import pandas as pd ## Create Pandas Frame pd_df = pd.DataFrame({u'20...

  • 2 kudos
1 More Replies
ChristianKeller
by New Contributor II
  • 15130 Views
  • 6 replies
  • 0 kudos

Two stage join fails with java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary

Sometimes the error is part of "org.apache.spark.SparkException: Exception thrown in awaitResult:". The error source is the step, where we extract the second time the rows, where the data is updated. We can count the rows, but we cannot display or w...

  • 15130 Views
  • 6 replies
  • 0 kudos
Latest Reply
activescott
New Contributor III
  • 0 kudos

Thanks Lleido. I eventually found I had changed the schema of a partitioned DataFrame that I had made inadvertently where I narrowed a column's type from a long to an integer. While rather obvious cause of the problem in hindsight it was terribly di...

  • 0 kudos
5 More Replies
RobertWalsh
by New Contributor II
  • 20248 Views
  • 11 replies
  • 0 kudos

Dataframe Write Append to Parquet Table - Partition Issue

Hello, I am attempting to append new json files into an existing parquet table defined in Databricks. Using a dataset defined by this command (dataframe initially added to a temp table): val output = sql("select headers.event_name, to_date(from_unix...

0693f000007OoJYAA0 0693f000007OoJZAA0
  • 20248 Views
  • 11 replies
  • 0 kudos
Latest Reply
anil_s_langote
New Contributor II
  • 0 kudos

We came across similar situation we are using spark 1.6.1, we have a daily load process to pull data from oracle and write as parquet files, this works fine for 18 days of data (till 18th run), the problem comes after 19th run where the data frame l...

  • 0 kudos
10 More Replies
Labels