cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826988857
by Databricks Employee
  • 3365 Views
  • 1 replies
  • 0 kudos

How to allow Table deletion without requiring ownership on table? Problem Description In DBR 6 (and earlier), a non-admin user can delete a table that...

How to allow Table deletion without requiring ownership on table?Problem DescriptionIn DBR 6 (and earlier), a non-admin user can delete a table that the user doesn't own, as long as the user has ownership on the table's parent database (perhaps throu...

  • 3365 Views
  • 1 replies
  • 0 kudos
Latest Reply
abueno
Contributor
  • 0 kudos

I am having the same issue but on Python 3.10.12.I need to be able to have another user have "manage" access to a table in the unity catalog.  We both have write access to the schema. 

  • 0 kudos
boskicl
by New Contributor III
  • 35483 Views
  • 8 replies
  • 11 kudos

Resolved! Table write command stuck "Filtering files for query."

Hello all,Background:I am having an issue today with databricks using pyspark-sql and writing a delta table. The dataframe is made by doing an inner join between two tables and that is the table which I am trying to write to a delta table. The table ...

filtering job_info spill_memory
  • 35483 Views
  • 8 replies
  • 11 kudos
Latest Reply
nvashisth
New Contributor III
  • 11 kudos

@timo199 , @boskicl I had similar issue and job was getting stuck at Filtering Files for Query indefinitely. I checked SPARK logs and based on that figured out that we had enabled PHOTON acceleration on our cluster for job and datatype of our columns...

  • 11 kudos
7 More Replies
HariharaSam
by Contributor
  • 33829 Views
  • 10 replies
  • 4 kudos

Resolved! To get Number of rows inserted after performing an Insert operation into a table

Consider we have two tables A & B.qry = """INSERT INTO Table ASelect * from Table B where Id is null """spark.sql(qry)I need to get the number of records inserted after running this in databricks.

  • 33829 Views
  • 10 replies
  • 4 kudos
Latest Reply
User16653924625
Databricks Employee
  • 4 kudos

in case someone is looking for purely SQL based solution: (add LIMIT 1 to the query if you are looking for last op only)   select t.timestamp, t.operation, t.operationMetrics.numOutputRows as numOutputRows from ( DESCRIBE HISTORY <catalog>.<schema>....

  • 4 kudos
9 More Replies
tototox
by New Contributor III
  • 17063 Views
  • 4 replies
  • 2 kudos

how to check table size by partition?

I want to check the size of the delta table by partition.As you can see, only the size of the table can be checked, but not by partition.

  • 17063 Views
  • 4 replies
  • 2 kudos
Latest Reply
Carsten_Herbe
New Contributor II
  • 2 kudos

The previous two answers did not work for me (DBX 15.4).I found a hacky way using the delta log: find latest (group of) checkpoint (parquet) file(s) in delta log and use it as source prefix `000000000000xxxxxxx.checkpoint`:SELECT partition_column_1,...

  • 2 kudos
3 More Replies
nadia
by New Contributor II
  • 29394 Views
  • 4 replies
  • 2 kudos

Resolved! Executor heartbeat timed out

Hello, I'm trying to read a table that is located on Postgreqsl and contains 28 million rows. I have the following result:"SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in sta...

  • 29394 Views
  • 4 replies
  • 2 kudos
Latest Reply
SparkJun
Databricks Employee
  • 2 kudos

Please also review the Spark UI to see the failed Spark job and Spark stage. Please check on the GC time and data spill to memory and disk. See if there is any error in the failed task in the Spark stage view. This will confirm data skew or GC/memory...

  • 2 kudos
3 More Replies
my_community2
by New Contributor III
  • 19981 Views
  • 9 replies
  • 6 kudos

Resolved! dropping a managed table does not remove the underlying files

the documentation states that "drop table":Deletes the table and removes the directory associated with the table from the file system if the table is not EXTERNAL  table. An exception is thrown if the table does not exist.In case of an external table...

image.png
  • 19981 Views
  • 9 replies
  • 6 kudos
Latest Reply
MajdSAAD_7953
New Contributor II
  • 6 kudos

Hi,There is a way to force delete files after drop the table and don't wait 30 days to see size in S3 decrease?Tables that I dropped related to the dev and staging, I don't want to keep there files for 30 days 

  • 6 kudos
8 More Replies
explorer
by New Contributor III
  • 6032 Views
  • 4 replies
  • 1 kudos

Resolved! Deleting records manually in databricks streaming table.

Hi Team , Let me know if there is any ways I can delete records manually from databricks streaming table without corrupting table and data.Can we delete the few records (based on some condition) manually in databricks streaming table (having checkpoi...

  • 6032 Views
  • 4 replies
  • 1 kudos
Latest Reply
SparkJun
Databricks Employee
  • 1 kudos

  If you use the applyChanges method in DLT for Change Data Capture (CDC), you can delete records manually without affecting the consistency of the table, as applyChanges respects manual deletions. You must configure your DLT pipeline to respect manu...

  • 1 kudos
3 More Replies
Chris_Konsur
by New Contributor III
  • 23782 Views
  • 4 replies
  • 7 kudos

Resolved! Error: The associated location ... is not empty but it's not a Delta table

I try to create a table but I get this error: AnalysisException: Cannot create table ('`spark_catalog`.`default`.`citation_all_tenants`'). The associated location ('dbfs:/user/hive/warehouse/citation_all_tenants') is not empty but it's not a Delta t...

  • 23782 Views
  • 4 replies
  • 7 kudos
Latest Reply
sachin_tirth
New Contributor II
  • 7 kudos

Hi Team, I am facing the same issue. When we try to load data to table in production batch getting error as table not in delta format. there is no recent change in table. and we are not trying any create or replace table. this is existing table in pr...

  • 7 kudos
3 More Replies
wyzer
by Contributor II
  • 10825 Views
  • 8 replies
  • 4 kudos

Resolved! How to pass parameters in SSRS/Power BI (report builder) ?

Hello,In SSRS/Power BI (report builder), how to query a table in Databricks with parameters please ?Because this code doesn't works :SELECT * FROM TempBase.Customers WHERE Name = {{ @P_Name }}Thanks.

  • 10825 Views
  • 8 replies
  • 4 kudos
Latest Reply
Nj11
New Contributor II
  • 4 kudos

Hi, I am not able to see the data in SSRS while I am using date parameters but with manual dates data is populating fine. The database is pointing to databricks. I am not sure what I am missing here. Please help me in this. ThanksI am trying with que...

  • 4 kudos
7 More Replies
Abbe
by New Contributor II
  • 3039 Views
  • 2 replies
  • 0 kudos

Update data type of a column within a table that has a GENERATED ALWAYS AS IDENTITY-column

I want to cast the data type of a column "X" in a table "A" where column "ID" is defined as GENERATED ALWAYS AS IDENTITY. Databricks refer to overwrite to achieve this: https://docs.databricks.com/delta/update-schema.htmlThe following operation:(spar...

  • 3039 Views
  • 2 replies
  • 0 kudos
Latest Reply
RajuBolla
New Contributor II
  • 0 kudos

Update is not working but delete is when i changed to DEFAULT property AnalysisException: UPDATE on IDENTITY column "XXXX_ID" is not supported.

  • 0 kudos
1 More Replies
MBV3
by Contributor
  • 14673 Views
  • 5 replies
  • 7 kudos

Resolved! External table from parquet partition

Hi,I have data in parquet format in GCS buckets partitioned by name eg. gs://mybucket/name=ABCD/I am trying to create a table in Databaricks as followsDROP TABLE IF EXISTS name_test; CREATE TABLE name_testUSING parquetLOCATION "gs://mybucket/name=*/...

  • 14673 Views
  • 5 replies
  • 7 kudos
Latest Reply
Pat
Esteemed Contributor
  • 7 kudos

Hi @M Baig​ ,the error doesn't tell me much, but you could try:CREATE TABLE name_test USING parquet PARTITIONED BY ( name STRING) LOCATION "gs://mybucket/";

  • 7 kudos
4 More Replies
AkifCakir
by New Contributor II
  • 26034 Views
  • 3 replies
  • 4 kudos

Resolved! Why Spark Save Modes , "overwrite" always drops table although "truncate" is true ?

Hi Dear Team, I am trying to import data from databricks to Exasol DB. I am using following code in below with Spark version is 3.0.1 ,dfw.write \ .format("jdbc") \ .option("driver", exa_driver) \ .option("url", exa_url) \ .option("db...

  • 26034 Views
  • 3 replies
  • 4 kudos
Latest Reply
Gembo
New Contributor III
  • 4 kudos

@AkifCakir , Were you able to find a way to truncate without dropping the table using the .write function as I am facing the same issue as well.

  • 4 kudos
2 More Replies
Graham
by New Contributor III
  • 10756 Views
  • 5 replies
  • 3 kudos

"MERGE" always slower than "CREATE OR REPLACE"

OverviewTo update our Data Warehouse tables, we have tried two methods: "CREATE OR REPLACE" and "MERGE". With every query we've tried, "MERGE" is slower.My question is this: Has anyone successfully gotten a "MERGE" to perform faster than a "CREATE OR...

  • 10756 Views
  • 5 replies
  • 3 kudos
Latest Reply
Manisha_Jena
Databricks Employee
  • 3 kudos

Hi @Graham Can you please try Low Shuffle Merge [LSM]  and see if it helps? LSM is a new MERGE algorithm that aims to maintain the existing data organization (including z-order clustering) for unmodified data, while simultaneously improving performan...

  • 3 kudos
4 More Replies
Juha
by New Contributor II
  • 3489 Views
  • 3 replies
  • 2 kudos
  • 3489 Views
  • 3 replies
  • 2 kudos
Latest Reply
lawrence009
Contributor
  • 2 kudos

Have you figured out what the problem was? Could the issue be permission related?

  • 2 kudos
2 More Replies
Labels