cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

noimeta
by Contributor III
  • 15507 Views
  • 15 replies
  • 12 kudos

Resolved! Error when create an external location using code

I'm trying to create an external location from notebook, and I got this kind of error[PARSE_SYNTAX_ERROR] Syntax error at or near 'LOCATION'(line 1, pos 16)   == SQL == CREATE EXTERNAL LOCATION IF NOT EXISTS test_location URL 's3://test-bronze/db/tes...

  • 15507 Views
  • 15 replies
  • 12 kudos
Latest Reply
Lokeshv
New Contributor II
  • 12 kudos

Hey everyone,I'm facing an issue with retrieving data from a volume or table that contains a string with a symbol, for example, 'databricks+'. Whenever I try to retrieve this data, I encounter a syntax error. Can anyone help me resolve this issue?

  • 12 kudos
14 More Replies
aliacovella
by Contributor
  • 1958 Views
  • 8 replies
  • 5 kudos

Resolved! DLT Pipleline with only views

I'm trying to create a pipeline containing a view from a federated source. In this case, I'd like to just create materialized views from the federation and and schedule the pipeline for execution. If I define a pipeline  with only something like the ...

  • 1958 Views
  • 8 replies
  • 5 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 5 kudos

No problem, if you have any other questions let me know!

  • 5 kudos
7 More Replies
TejeshS
by New Contributor III
  • 1653 Views
  • 7 replies
  • 0 kudos

How to enable row tracking on Delta Live Tables?

We are encountering a scenario where we need to enable support for Incremental Processing on Materialized views having DLT base tables. However, we have observed that the compute is being executed with the COMPLETE_RECOMPUTE mode instead of INCREMENT...

  • 1653 Views
  • 7 replies
  • 0 kudos
Latest Reply
TejeshS
New Contributor III
  • 0 kudos

Moreover, we have CDF enabled DLT tables, but as per documentation we see a limitation if CDF is enabled then row Tracking won't be possible. Use row tracking for Delta tables | Databricks on AWSBut as per our use case we need incremental processing ...

  • 0 kudos
6 More Replies
David_Billa
by New Contributor III
  • 976 Views
  • 1 replies
  • 1 kudos

Create table from json and flatten in the same SQL

Any help in writing SQL to create the table using JSON file and flatten in the same step? As I'm new to JSON it would be nice if someone can give the heads up by referencing to any document or help provide the recommended the solution.Sample JSON fil...

  • 976 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

To create a table directly from a JSON file and flatten it using SQL in Databricks, you can use the CREATE TABLE statement with the USING JSON clause. However, SQL alone does not provide a direct way to flatten nested JSON structures. You would typic...

  • 1 kudos
Erik
by Valued Contributor III
  • 541 Views
  • 3 replies
  • 1 kudos

What does durationMs.commitBatch measure?

With a structured streamin job from Kafka, we have a metric in durationMs called commitBatch. There is also an example of this in this databricks documentation. I can not find any description of what this measures, and how it relates to the other met...

  • 541 Views
  • 3 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

The commitBatch metric is a part of the overall triggerExecution time, which encompasses all stages of planning and executing the microbatch, including committing the batch data and updating offsets. The commitBatch metric may not always be present i...

  • 1 kudos
2 More Replies
Tahseen0354
by Valued Contributor
  • 6839 Views
  • 5 replies
  • 3 kudos

Resolved! Why I am not receiving any mail sent to the Azure AD Group mailbox when databricks job fails ?

I have created an Azure AD Group in "Microsoft 365" type with its own email address, which being added to the Notification of a Databricks Job (on failure). But there is no mail sent to the Azure Group mailbox when the job fails.I am able to send a d...

  • 6839 Views
  • 5 replies
  • 3 kudos
Latest Reply
Lanky
New Contributor II
  • 3 kudos

Hello Guys, I have setup ses receive email for databricks notification. When i send email message from google mail or yahoo mail, it gets to the SES email receiving rule. However, notification from databricks doesn't get to the same SES email receivi...

  • 3 kudos
4 More Replies
meghana_tulla
by New Contributor III
  • 691 Views
  • 1 replies
  • 1 kudos

Automating Admin Consent for Azure Databricks SCIM App Creation Using Terraform.

I am trying to automate the creation of an Azure AD application (specifically, an Azure Databricks SCIM app) and grant admin consent for its API permissions using Terraform. The required API permissions include Application.ReadWrite.All, Application....

  • 691 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

The Service Principal you're using for authentication may not have sufficient permissions to grant admin consent. Ensure that the Service Principal has the necessary roles assigned, such as "Global Administrator" or "Privileged Role Administrator" . ...

  • 1 kudos
KishanDaxini
by New Contributor
  • 578 Views
  • 1 replies
  • 0 kudos

Handling non notebook file types in repos

Hi, I have ,py, .txt, .yml, .json files in my repo, when I merging the feature branch to master branch, the file type of these files are getting changed to notebook, which is causing error while importing these files into my different notebooks.P.S. ...

  • 578 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

IPYNB notebooks are the default format when creating a new notebook on Databricks. To change the default to the Databricks source format, log into your Databricks workspace, click your profile in the upper-right of the page, then click Settings and n...

  • 0 kudos
Abishrp
by Contributor
  • 453 Views
  • 1 replies
  • 1 kudos

Issue in finding OS in which my cluster runs

During the configuration of job compute cluster, I didn't mention any OS details. How can i find which OS my cluster is running?Also, Is there any way to get pricing details of all instances with different category  Job Compute - premium  , All Purpo...

  • 453 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

You can find the OS for your runtime in the system environment information in release notes, for example for Runtime 16.1 it can be found in: https://docs.databricks.com/en/release-notes/runtime/16.1.html#system-environment which is Ubuntu 24.04.1 LT...

  • 1 kudos
my_super_name
by New Contributor II
  • 2736 Views
  • 2 replies
  • 2 kudos

Auto Loader Schema Hint Behavior: Addressing Nested Field Errors

Hello,I'm using the auto loader to stream a table of data and have added schema hints to specify field values.I've observed that when my initial data file is missing fields specified in the schema hint,the auto loader correctly identifies this and ad...

  • 2736 Views
  • 2 replies
  • 2 kudos
Latest Reply
Mathias_Peters
Contributor II
  • 2 kudos

Hi, we are having similar issues with schema hints formulated in fully qualified DDL, e.g. "a STRUCT<b INT>" etc. Did you find a solution? Also, did you specify the schema hint using the dot-notation, e.g. "a.b INT" before ingesting any data or after...

  • 2 kudos
1 More Replies
anshi_t_k
by New Contributor III
  • 1325 Views
  • 4 replies
  • 0 kudos

Practice question for data engineer exam

A data engineer, User A, has promoted a pipeline to production by using the REST API to programmatically create several jobs. A DevOps engineer, User B, has configured an external orchestration tool to trigger job runs through the REST API. Both user...

  • 1325 Views
  • 4 replies
  • 0 kudos
Latest Reply
rakeshdey
New Contributor II
  • 0 kudos

The answer should be answer B, when you try to get job run information, creator_user_email id always populated as 'Run As' in workflow , so which credential used to trigger job.. if you get workflow infor through rest api then Ans A correct

  • 0 kudos
3 More Replies
Karthik_2
by New Contributor
  • 974 Views
  • 1 replies
  • 0 kudos

Query on SQL Warehouse Concurrency in Azure Databricks

Hi,We are planning to migrate the backend of our web application, currently hosted on App Service with an Azure SQL Database, to Azure Databricks as the data source. For this, we intend to use the SQL Warehouse in Databricks to execute queries and in...

  • 974 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Hello Karthik, many thanks for your question. Databricks SQL Warehouses use dynamic concurrency to handle varying demands. Unlike static-capacity warehouses, Databricks SQL adjusts compute resources in real time to manage concurrent loads and maximiz...

  • 0 kudos
tseader
by New Contributor III
  • 2497 Views
  • 3 replies
  • 1 kudos

Resolved! Python SDK clusters.create_and_wait - Sourcing from cluster-create JSON

I am attempting to create a compute cluster using the Python SDK while sourcing a cluster-create configuration JSON file, which is how it's done for the databricks-cli and what databricks provides through the GUI.  Reading in the JSON as a Dict fails...

  • 2497 Views
  • 3 replies
  • 1 kudos
Latest Reply
tseader
New Contributor III
  • 1 kudos

@Retired_mod The structure of the `cluster-create.json` is perfectly fine.  The issue is as stated above related to the structure is that the SDK does not allow nested structures from the JSON file to be used, and instead they need to be cast to spec...

  • 1 kudos
2 More Replies
praful
by New Contributor II
  • 2944 Views
  • 5 replies
  • 1 kudos

Recover Lost Notebook

Hi Team, I was using Databricks community edition for learning purpose. I had an account https://community.cloud.databricks.com/?o=6822095545287159 where I stored all my learning notebooks. Unfortunately, this account suddenly stopped working, and I ...

  • 2944 Views
  • 5 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

The workspace id you have shared seems to be related to a workspace which is still in running state, if you missed the login access to this workspace then our team you have reached over email would be able to assist.I will add the following doc for s...

  • 1 kudos
4 More Replies
minhhung0507
by Valued Contributor
  • 1070 Views
  • 7 replies
  • 4 kudos

Resolved! How to reduce cost of "Regional Standard Class A Operations"

Hi Databricks experts,We're experiencing unexpectedly high costs from Regional Standard Class A Operations in GCS while running a Databricks pipeline. The costs seem related to frequent metadata queries, possibly tied to Delta table operations.In las...

image.png
  • 1070 Views
  • 7 replies
  • 4 kudos
Latest Reply
VZLA
Databricks Employee
  • 4 kudos

@minhhung0507 its hard to say without having more direct insight, but generally speaking many streaming jobs with very frequent intervals will likely contribute; 300 jobs triggered continously will also contribute depending on the use case of these j...

  • 4 kudos
6 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels