cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ggsmith
by Contributor
  • 1237 Views
  • 2 replies
  • 0 kudos

Resolved! DLT Streaming Schema and Select

I am reading JSON files written to adls from Kafka using dlt and spark.readStream to create a streaming table for my raw ingest data. My schema is two arrays at the top levelNewRecord array, OldRecord array. I pass the schema and I run a select on Ne...

Data Engineering
dlt
streaming
  • 1237 Views
  • 2 replies
  • 0 kudos
Latest Reply
ggsmith
Contributor
  • 0 kudos

I did a full refresh from the delta tables pipeline and that fixed it. I guess it was remembering the first run where I just had the top level arrays as two columns in the table. 

  • 0 kudos
1 More Replies
DBUser2
by New Contributor III
  • 1640 Views
  • 2 replies
  • 0 kudos

How to use transaction when connecting to Databricks using Simba ODBC driver

I'm connecting to a databricks instance using Simba ODBC driver(version 2.8.0.1002). And I am able to perform read and write on the delta tables. But if I want to do some INSERT/UPDATE/DELETE operations within a transaction, I get the below error, an...

  • 1640 Views
  • 2 replies
  • 0 kudos
Latest Reply
florence023
New Contributor III
  • 0 kudos

@DBUser2 wrote:I'm connecting to a databricks instance using Simba ODBC driver(version 2.8.0.1002). And I am able to perform read and write on the delta tables. But if I want to do some INSERT/UPDATE/DELETE operations within a transaction, I get the ...

  • 0 kudos
1 More Replies
ivanychev
by Contributor II
  • 2930 Views
  • 3 replies
  • 0 kudos

Resolved! Delta table takes too long to write due to S3 full scan

DBR 14.3, Spark 3.5.0. We use AWS Glue Metastore.On August 20th some of our pipelines started timing out during write to a Delta table. We're experiencing many hours of driver executing post commit hooks. We write dataframes to delta with `mode=overw...

  • 2930 Views
  • 3 replies
  • 0 kudos
Latest Reply
ivanychev
Contributor II
  • 0 kudos

spark.databricks.delta.catalog.update.enabled=true setting helped but I still don't understand why the problem started to occur.https://docs.databricks.com/en/archive/external-metastores/external-hive-metastore.html#external-apache-hive-metastore-leg...

  • 0 kudos
2 More Replies
Rishabh-Pandey
by Esteemed Contributor
  • 954 Views
  • 2 replies
  • 1 kudos

Resolved! The Latest Improvements to Databricks Workflows

What's new in Workflows?  @Sujitha @Retired_mod

  • 954 Views
  • 2 replies
  • 1 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 1 kudos

@Sujitha I am happy to see workflows maturing day by day; this is going to be a game changer for the market. I am also very excited about the upcoming feature, Lakeflow.

  • 1 kudos
1 More Replies
Mangeysh
by New Contributor
  • 1292 Views
  • 2 replies
  • 0 kudos

Converting databricks query output in JSON and creating end point

Hello  I am very new to DB and building an UI where I need to show data from databricks table. Unfortunately , I am getting access delta sharing feature by administrator.  Planning to develop own API and expose endpoint with JSON output. I am sure th...

  • 1292 Views
  • 2 replies
  • 0 kudos
Latest Reply
menotron
Valued Contributor
  • 0 kudos

Hi @Mangeysh,You could achieve this using Databricks SQL Statement Execution API. I would recommend going through the docs and looking at the functionality and limitations and see if it serves your need before planning to develop your own APIs.

  • 0 kudos
1 More Replies
hrushi512
by New Contributor II
  • 1981 Views
  • 1 replies
  • 1 kudos

Resolved! External Table on Databricks using DBT(Data Build Tool) Models

How can we create external tables in Databricks using DBT Models?

  • 1981 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @hrushi512 ,You can try to use location_root config parameter, as they did below:https://discourse.getdbt.com/t/add-location-to-create-database-schema-statement-in-databricks-to-enable-creation-of-managed-tables-on-external-storage-accounts/6894

  • 1 kudos
Phani1
by Valued Contributor II
  • 2030 Views
  • 3 replies
  • 2 kudos

Multi languages support in Databricks

Hi Team,How can I set up multiple languages in Databricks? For example, if I connect from Germany, the workspace and data should support German. If I connect from China, it should support Chinese, and if I connect from the US, it should be in English...

  • 2030 Views
  • 3 replies
  • 2 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 2 kudos

1-In that case you need to encode the data in that language format , ex if the data is in japanease then u need to encode in UTF-8 REATE OR REPLACE TEMP VIEW japanese_dataAS SELECT * FROM csv.`path/to/japanese_data.csv` OPTIONS ('encoding'='UTF-8')al...

  • 2 kudos
2 More Replies
berk
by New Contributor II
  • 2400 Views
  • 2 replies
  • 1 kudos

Delete Managed Table from S3 Bucket

Hello,I am encountering an issue with our managed tables in Databricks. The tables are stored in S3 Bucket. When I drop a managed table (either through UI or through running a drop table code in a notebook), the associated data is not being deleted f...

  • 2400 Views
  • 2 replies
  • 1 kudos
Latest Reply
berk
New Contributor II
  • 1 kudos

@kenkoshaw, thank you for your reply. It is indeed interesting that the data isn't immediately deleted after the table is dropped, and that there's no way to force this process. I suppose I'll have to manually delete the files from the S3 Bucket if I...

  • 1 kudos
1 More Replies
Roxio
by New Contributor II
  • 1644 Views
  • 1 replies
  • 1 kudos

Resolved! Materilized view quite slower than table and lots of time on "Optimizing query & pruning files"

I have a query that calls different materialized views, anyway most of the time of the query is spent in "Optimizing query & pruning files" vs the execution.The difference is like 2-3 secs for the optimization and 300-400ms for the executionSimilar i...

  • 1644 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 1 kudos

Hi Roxio, How are you doing today?The difference in query times between materialized views and tables likely comes from the complexity of the views, as they often involve more steps in the background. To reduce the optimization time, you can try simp...

  • 1 kudos
pgrandjean
by New Contributor III
  • 15285 Views
  • 6 replies
  • 2 kudos

How to transfer ownership of a database and/or table?

We created a new Service Principal (SP) on Azure and would like to transfer the ownership of the databases and tables created with the old SP. The issue is that these databases and tables are not visible to the users using the new SP.I am using a Hiv...

  • 15285 Views
  • 6 replies
  • 2 kudos
Latest Reply
VivekChandran
New Contributor II
  • 2 kudos

Regarding the [PARSE_SYNTAX_ERROR] Syntax error at or near 'OWNER'.Remember to wrap the new owner name in the SQL statement with the Grave Accent (`) as the below sample. ALTER SCHEMA schema_name OWNER TO `new_oner_name`;  

  • 2 kudos
5 More Replies
ashraf1395
by Honored Contributor
  • 2441 Views
  • 1 replies
  • 0 kudos

Resolved! Authentication Issue while connecting to Databricks using Looker Studio

So previously I created source connections from looker with Databricks using my personal access token.I followed this databricks docs. https://docs.databricks.com/en/partners/bi/looker-studio.htmlBut from 10 July, I think basic authentication has bee...

ashraf1395_0-1723631031231.png ashraf1395_1-1723631308249.png ashraf1395_2-1723631479463.png
  • 2441 Views
  • 1 replies
  • 0 kudos
Latest Reply
menotron
Valued Contributor
  • 0 kudos

Hi,You would still connect using OAuth tokens. It is just that Databricks recommends using personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service pri...

  • 0 kudos
Maatari
by New Contributor III
  • 985 Views
  • 2 replies
  • 0 kudos

Resolved! Pre-Partitioning a delta table to reduce suffling of wide operation

Assuming i need to perfom a groupby i.e. aggregation on a dataset stored in a delta table. If the delta table is partitioned by the field by which to group, can that have an impact on the suffling that the groupby would normally cause ? As a connecte...

  • 985 Views
  • 2 replies
  • 0 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 0 kudos

Hi @Maatari, Thanks for reaching out! Please review the responses and let us know which best addresses your question. Your feedback is valuable to us and the community.   If the response resolves your issue, kindly mark it as the accepted solution. T...

  • 0 kudos
1 More Replies
Rishabh-Pandey
by Esteemed Contributor
  • 6144 Views
  • 1 replies
  • 3 kudos

Key Advantages of Serverless Compute in Databricks

Serverless compute in Databricks offers several advantages, enhancing efficiency, scalability, and ease of use. Here are some key benefits:1. Simplified Infrastructure ManagementNo Server Management: Users don't need to manage or configure servers or...

  • 6144 Views
  • 1 replies
  • 3 kudos
Latest Reply
Ashu24
Contributor
  • 3 kudos

Thanks for the clear understanding 

  • 3 kudos
Maatari
by New Contributor III
  • 2172 Views
  • 0 replies
  • 0 kudos

Readying a partitioned Table in Spark Structured Streaming

Does the pre-partitioning of a Delta Table has an influence on the number of "default" Partition of a Dataframe when readying the data?Put differently, using spark structured streaming, when readying from a delta table, is the number of Dataframe par...

  • 2172 Views
  • 0 replies
  • 0 kudos
Labels