cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ivanychev
by Contributor II
  • 1188 Views
  • 3 replies
  • 0 kudos

Resolved! Delta table takes too long to write due to S3 full scan

DBR 14.3, Spark 3.5.0. We use AWS Glue Metastore.On August 20th some of our pipelines started timing out during write to a Delta table. We're experiencing many hours of driver executing post commit hooks. We write dataframes to delta with `mode=overw...

  • 1188 Views
  • 3 replies
  • 0 kudos
Latest Reply
ivanychev
Contributor II
  • 0 kudos

spark.databricks.delta.catalog.update.enabled=true setting helped but I still don't understand why the problem started to occur.https://docs.databricks.com/en/archive/external-metastores/external-hive-metastore.html#external-apache-hive-metastore-leg...

  • 0 kudos
2 More Replies
Rishabh-Pandey
by Esteemed Contributor
  • 268 Views
  • 2 replies
  • 1 kudos

Resolved! The Latest Improvements to Databricks Workflows

What's new in Workflows?  @Sujitha @Kaniz_Fatma

  • 268 Views
  • 2 replies
  • 1 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 1 kudos

@Sujitha I am happy to see workflows maturing day by day; this is going to be a game changer for the market. I am also very excited about the upcoming feature, Lakeflow.

  • 1 kudos
1 More Replies
Mangeysh
by New Contributor
  • 230 Views
  • 2 replies
  • 0 kudos

Converting databricks query output in JSON and creating end point

Hello  I am very new to DB and building an UI where I need to show data from databricks table. Unfortunately , I am getting access delta sharing feature by administrator.  Planning to develop own API and expose endpoint with JSON output. I am sure th...

  • 230 Views
  • 2 replies
  • 0 kudos
Latest Reply
menotron
Valued Contributor
  • 0 kudos

Hi @Mangeysh,You could achieve this using Databricks SQL Statement Execution API. I would recommend going through the docs and looking at the functionality and limitations and see if it serves your need before planning to develop your own APIs.

  • 0 kudos
1 More Replies
LasseL
by New Contributor II
  • 428 Views
  • 4 replies
  • 3 kudos

The best practice to remove old data from DLT pipeline created tables

Hi, didn't find any "reasonable" way to clean old data from DLT pipeline tables. In DLT we have used materialized views and streaming tables (scd1, append only). What is the best way to delete old data from the tables (storage size increases linearly...

  • 428 Views
  • 4 replies
  • 3 kudos
Latest Reply
LasseL
New Contributor II
  • 3 kudos

Exactly, this is not a "trivial problem", one possible solution is take bronze out of DLT pipeline (to manage by yourself, for example structured streaming from source with skipChangeCommits and partitioned by year/month, what ever you want to do to ...

  • 3 kudos
3 More Replies
szatricia
by New Contributor
  • 175 Views
  • 0 replies
  • 0 kudos

Test And Tren Cycle Reviews 2024: Safe Or Not?

Unquestionably, "Look before you leap." Sure, why not? Where can their circles smoke out incomparable Muscle Building Supplement schedules? This didn't take long. Are we satisfied to assume that in connection with this trite remark? That is pointless...

  • 175 Views
  • 0 replies
  • 0 kudos
hrushi512
by New Contributor II
  • 210 Views
  • 1 replies
  • 1 kudos

Resolved! External Table on Databricks using DBT(Data Build Tool) Models

How can we create external tables in Databricks using DBT Models?

  • 210 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Contributor
  • 1 kudos

Hi @hrushi512 ,You can try to use location_root config parameter, as they did below:https://discourse.getdbt.com/t/add-location-to-create-database-schema-statement-in-databricks-to-enable-creation-of-managed-tables-on-external-storage-accounts/6894

  • 1 kudos
Phani1
by Valued Contributor II
  • 375 Views
  • 3 replies
  • 2 kudos

Multi languages support in Databricks

Hi Team,How can I set up multiple languages in Databricks? For example, if I connect from Germany, the workspace and data should support German. If I connect from China, it should support Chinese, and if I connect from the US, it should be in English...

  • 375 Views
  • 3 replies
  • 2 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 2 kudos

1-In that case you need to encode the data in that language format , ex if the data is in japanease then u need to encode in UTF-8 REATE OR REPLACE TEMP VIEW japanese_dataAS SELECT * FROM csv.`path/to/japanese_data.csv` OPTIONS ('encoding'='UTF-8')al...

  • 2 kudos
2 More Replies
berk
by New Contributor II
  • 325 Views
  • 2 replies
  • 1 kudos

Delete Managed Table from S3 Bucket

Hello,I am encountering an issue with our managed tables in Databricks. The tables are stored in S3 Bucket. When I drop a managed table (either through UI or through running a drop table code in a notebook), the associated data is not being deleted f...

  • 325 Views
  • 2 replies
  • 1 kudos
Latest Reply
berk
New Contributor II
  • 1 kudos

@kenkoshaw, thank you for your reply. It is indeed interesting that the data isn't immediately deleted after the table is dropped, and that there's no way to force this process. I suppose I'll have to manually delete the files from the S3 Bucket if I...

  • 1 kudos
1 More Replies
Ian_Neft
by New Contributor
  • 9546 Views
  • 3 replies
  • 0 kudos

Data Lineage in Unity Catalog not Populating

I have been trying to get the data lineage to populate with the simplest of queries on a unity enabled catalog with a unity enabled cluster.  I am essentially running the example provided with more data to see how it works with various aggregates dow...

  • 9546 Views
  • 3 replies
  • 0 kudos
Latest Reply
AlexYu
New Contributor III
  • 0 kudos

You might need to update your outbound firewall rules to allow for connectivity to the Amazon Kinesis / Event Hubs endpoint.https://docs.databricks.com/en/data-governance/unity-catalog/data-lineage.html#:~:text=To%20view%20lineage%20for%20a,Runtime%2...

  • 0 kudos
2 More Replies
Roxio
by New Contributor II
  • 220 Views
  • 1 replies
  • 1 kudos

Resolved! Materilized view quite slower than table and lots of time on "Optimizing query & pruning files"

I have a query that calls different materialized views, anyway most of the time of the query is spent in "Optimizing query & pruning files" vs the execution.The difference is like 2-3 secs for the optimization and 300-400ms for the executionSimilar i...

  • 220 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Valued Contributor II
  • 1 kudos

Hi Roxio, How are you doing today?The difference in query times between materialized views and tables likely comes from the complexity of the views, as they often involve more steps in the background. To reduce the optimization time, you can try simp...

  • 1 kudos
Zeruno
by New Contributor
  • 139 Views
  • 0 replies
  • 0 kudos

How to use DLT Expectations for uniqueness checks on a dataset?

I am using dlt through python to build a DLT pipeline. One of things I would like to do is to check that each incoming row does not exist in the target table; i want to be sure that each row is unique.I am confused because it seems like this is not p...

  • 139 Views
  • 0 replies
  • 0 kudos
pgrandjean
by New Contributor III
  • 9110 Views
  • 7 replies
  • 2 kudos

How to transfer ownership of a database and/or table?

We created a new Service Principal (SP) on Azure and would like to transfer the ownership of the databases and tables created with the old SP. The issue is that these databases and tables are not visible to the users using the new SP.I am using a Hiv...

  • 9110 Views
  • 7 replies
  • 2 kudos
Latest Reply
VivekChandran
New Contributor II
  • 2 kudos

Regarding the [PARSE_SYNTAX_ERROR] Syntax error at or near 'OWNER'.Remember to wrap the new owner name in the SQL statement with the Grave Accent (`) as the below sample. ALTER SCHEMA schema_name OWNER TO `new_oner_name`;  

  • 2 kudos
6 More Replies
ashraf1395
by Contributor II
  • 272 Views
  • 1 replies
  • 0 kudos

Resolved! Authentication Issue while connecting to Databricks using Looker Studio

So previously I created source connections from looker with Databricks using my personal access token.I followed this databricks docs. https://docs.databricks.com/en/partners/bi/looker-studio.htmlBut from 10 July, I think basic authentication has bee...

ashraf1395_0-1723631031231.png ashraf1395_1-1723631308249.png ashraf1395_2-1723631479463.png
  • 272 Views
  • 1 replies
  • 0 kudos
Latest Reply
menotron
Valued Contributor
  • 0 kudos

Hi,You would still connect using OAuth tokens. It is just that Databricks recommends using personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service pri...

  • 0 kudos
Labels