cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ivanychev
by Contributor II
  • 2618 Views
  • 3 replies
  • 0 kudos

Resolved! Delta table takes too long to write due to S3 full scan

DBR 14.3, Spark 3.5.0. We use AWS Glue Metastore.On August 20th some of our pipelines started timing out during write to a Delta table. We're experiencing many hours of driver executing post commit hooks. We write dataframes to delta with `mode=overw...

  • 2618 Views
  • 3 replies
  • 0 kudos
Latest Reply
ivanychev
Contributor II
  • 0 kudos

spark.databricks.delta.catalog.update.enabled=true setting helped but I still don't understand why the problem started to occur.https://docs.databricks.com/en/archive/external-metastores/external-hive-metastore.html#external-apache-hive-metastore-leg...

  • 0 kudos
2 More Replies
Rishabh-Pandey
by Esteemed Contributor
  • 839 Views
  • 2 replies
  • 1 kudos

Resolved! The Latest Improvements to Databricks Workflows

What's new in Workflows?  @Sujitha @Retired_mod

  • 839 Views
  • 2 replies
  • 1 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 1 kudos

@Sujitha I am happy to see workflows maturing day by day; this is going to be a game changer for the market. I am also very excited about the upcoming feature, Lakeflow.

  • 1 kudos
1 More Replies
Mangeysh
by New Contributor
  • 1020 Views
  • 2 replies
  • 0 kudos

Converting databricks query output in JSON and creating end point

Hello  I am very new to DB and building an UI where I need to show data from databricks table. Unfortunately , I am getting access delta sharing feature by administrator.  Planning to develop own API and expose endpoint with JSON output. I am sure th...

  • 1020 Views
  • 2 replies
  • 0 kudos
Latest Reply
menotron
Valued Contributor
  • 0 kudos

Hi @Mangeysh,You could achieve this using Databricks SQL Statement Execution API. I would recommend going through the docs and looking at the functionality and limitations and see if it serves your need before planning to develop your own APIs.

  • 0 kudos
1 More Replies
hrushi512
by New Contributor II
  • 1670 Views
  • 1 replies
  • 1 kudos

Resolved! External Table on Databricks using DBT(Data Build Tool) Models

How can we create external tables in Databricks using DBT Models?

  • 1670 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @hrushi512 ,You can try to use location_root config parameter, as they did below:https://discourse.getdbt.com/t/add-location-to-create-database-schema-statement-in-databricks-to-enable-creation-of-managed-tables-on-external-storage-accounts/6894

  • 1 kudos
Phani1
by Valued Contributor II
  • 1750 Views
  • 3 replies
  • 2 kudos

Multi languages support in Databricks

Hi Team,How can I set up multiple languages in Databricks? For example, if I connect from Germany, the workspace and data should support German. If I connect from China, it should support Chinese, and if I connect from the US, it should be in English...

  • 1750 Views
  • 3 replies
  • 2 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 2 kudos

1-In that case you need to encode the data in that language format , ex if the data is in japanease then u need to encode in UTF-8 REATE OR REPLACE TEMP VIEW japanese_dataAS SELECT * FROM csv.`path/to/japanese_data.csv` OPTIONS ('encoding'='UTF-8')al...

  • 2 kudos
2 More Replies
berk
by New Contributor II
  • 2076 Views
  • 2 replies
  • 1 kudos

Delete Managed Table from S3 Bucket

Hello,I am encountering an issue with our managed tables in Databricks. The tables are stored in S3 Bucket. When I drop a managed table (either through UI or through running a drop table code in a notebook), the associated data is not being deleted f...

  • 2076 Views
  • 2 replies
  • 1 kudos
Latest Reply
berk
New Contributor II
  • 1 kudos

@kenkoshaw, thank you for your reply. It is indeed interesting that the data isn't immediately deleted after the table is dropped, and that there's no way to force this process. I suppose I'll have to manually delete the files from the S3 Bucket if I...

  • 1 kudos
1 More Replies
Roxio
by New Contributor II
  • 1418 Views
  • 1 replies
  • 1 kudos

Resolved! Materilized view quite slower than table and lots of time on "Optimizing query & pruning files"

I have a query that calls different materialized views, anyway most of the time of the query is spent in "Optimizing query & pruning files" vs the execution.The difference is like 2-3 secs for the optimization and 300-400ms for the executionSimilar i...

  • 1418 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 1 kudos

Hi Roxio, How are you doing today?The difference in query times between materialized views and tables likely comes from the complexity of the views, as they often involve more steps in the background. To reduce the optimization time, you can try simp...

  • 1 kudos
pgrandjean
by New Contributor III
  • 14776 Views
  • 6 replies
  • 2 kudos

How to transfer ownership of a database and/or table?

We created a new Service Principal (SP) on Azure and would like to transfer the ownership of the databases and tables created with the old SP. The issue is that these databases and tables are not visible to the users using the new SP.I am using a Hiv...

  • 14776 Views
  • 6 replies
  • 2 kudos
Latest Reply
VivekChandran
New Contributor II
  • 2 kudos

Regarding the [PARSE_SYNTAX_ERROR] Syntax error at or near 'OWNER'.Remember to wrap the new owner name in the SQL statement with the Grave Accent (`) as the below sample. ALTER SCHEMA schema_name OWNER TO `new_oner_name`;  

  • 2 kudos
5 More Replies
ashraf1395
by Honored Contributor
  • 2016 Views
  • 1 replies
  • 0 kudos

Resolved! Authentication Issue while connecting to Databricks using Looker Studio

So previously I created source connections from looker with Databricks using my personal access token.I followed this databricks docs. https://docs.databricks.com/en/partners/bi/looker-studio.htmlBut from 10 July, I think basic authentication has bee...

ashraf1395_0-1723631031231.png ashraf1395_1-1723631308249.png ashraf1395_2-1723631479463.png
  • 2016 Views
  • 1 replies
  • 0 kudos
Latest Reply
menotron
Valued Contributor
  • 0 kudos

Hi,You would still connect using OAuth tokens. It is just that Databricks recommends using personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service pri...

  • 0 kudos
Maatari
by New Contributor III
  • 820 Views
  • 2 replies
  • 0 kudos

Resolved! Pre-Partitioning a delta table to reduce suffling of wide operation

Assuming i need to perfom a groupby i.e. aggregation on a dataset stored in a delta table. If the delta table is partitioned by the field by which to group, can that have an impact on the suffling that the groupby would normally cause ? As a connecte...

  • 820 Views
  • 2 replies
  • 0 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 0 kudos

Hi @Maatari, Thanks for reaching out! Please review the responses and let us know which best addresses your question. Your feedback is valuable to us and the community.   If the response resolves your issue, kindly mark it as the accepted solution. T...

  • 0 kudos
1 More Replies
Rishabh-Pandey
by Esteemed Contributor
  • 5116 Views
  • 1 replies
  • 3 kudos

Key Advantages of Serverless Compute in Databricks

Serverless compute in Databricks offers several advantages, enhancing efficiency, scalability, and ease of use. Here are some key benefits:1. Simplified Infrastructure ManagementNo Server Management: Users don't need to manage or configure servers or...

  • 5116 Views
  • 1 replies
  • 3 kudos
Latest Reply
Ashu24
Contributor
  • 3 kudos

Thanks for the clear understanding 

  • 3 kudos
Maatari
by New Contributor III
  • 1291 Views
  • 0 replies
  • 0 kudos

Readying a partitioned Table in Spark Structured Streaming

Does the pre-partitioning of a Delta Table has an influence on the number of "default" Partition of a Dataframe when readying the data?Put differently, using spark structured streaming, when readying from a delta table, is the number of Dataframe par...

  • 1291 Views
  • 0 replies
  • 0 kudos
Maatari
by New Contributor III
  • 1132 Views
  • 0 replies
  • 0 kudos

Chaining stateful Operator

I would like to do a groupby followed by a join in structured streaming. I would read from from two delta table in snapshot mode i.e. latest snapshot.My question is specifically about chaining the stateful operator. groupby is update modechaning grou...

  • 1132 Views
  • 0 replies
  • 0 kudos
bulbur
by New Contributor II
  • 1753 Views
  • 1 replies
  • 0 kudos

Use pandas in DLT pipeline

Hi,I am trying to work with pandas in a delta live table. I have created some example code: import pandas as pd import pyspark.sql.functions as F pdf = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "...

  • 1753 Views
  • 1 replies
  • 0 kudos
Latest Reply
bulbur
New Contributor II
  • 0 kudos

I have taken the advice given by the documentation (However, you can include these functions outside of table or view function definitions because this code is run once during the graph initialization phase.) and moved the toPandas call to a function...

  • 0 kudos
Labels