Data Engineering

Forum Posts

Sorted by:

by flamezi2 • New Contributor

10-03-2024 6:31:39 AM

4155 Views
1 replies
0 kudos

Invalid request when using the Manual generation of an account-level access token

I need to generate access token using REST API and was using the guide seen here:manually-generate-an-account-level-access-tokenWhen i try this cURL in postman, i get an error but the error description is not helpfulError: I don't know what I'm missi...

Data Engineering

4155 Views
1 replies
0 kudos

10-03-2024 6:31:39 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

12-06-2024 3:46:01 AM

0 kudos

Are you replacing the Account_id with your actual account id associated with your subscription? Also what token are you using to authenticate or run this API call?

0 kudos

12-06-2024 3:46:01 AM

by GodSpeed • New Contributor

10-05-2024 2:30:54 AM

3700 Views
1 replies
0 kudos

Postman Collection Alternatives for Data-Centric API Management?

I’ve been using Postman collections to manage APIs in my data projects, but I’m exploring alternatives. Are there tools like Apidog or Insomnia that perform better for API management, particularly when working with large data sets or data-driven work...

Data Engineering

3700 Views
1 replies
0 kudos

10-05-2024 2:30:54 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

12-06-2024 3:42:50 AM

0 kudos

Insomnia: Insomnia is another strong alternative that is frequently recommended. It is known for its simplicity and effectiveness in making REST API requests. Insomnia supports the import of Postman collections and is praised for its performance and ...

0 kudos

12-06-2024 3:42:50 AM

by Jcowell • New Contributor II

12-04-2024 2:00:49 PM

773 Views
2 replies
0 kudos

Is Limit input rate Docs not correct?

In databricks docs it says "If you use maxBytesPerTrigger in conjunction with maxFilesPerTrigger, the micro-batch processes data until either the maxFilesPerTrigger or maxBytesPerTrigger limit is reached."But based on the source code this is not true...

Data Engineering

773 Views
2 replies
0 kudos

12-04-2024 2:00:49 PM

View Replies

Latest Reply

ozaaditya
Contributor

12-04-2024 8:23:47 PM

0 kudos

In my opinion, the reason for not using both options simultaneously is that the framework would face a logical conflict:Should it stop reading after the maximum number of files is reached, even if the size limit hasn’t been exceeded?OrShould it stop ...

0 kudos

12-04-2024 8:23:47 PM

1 More Replies

by KrzysztofPrzyso • New Contributor III

09-05-2024 9:14:18 AM

6858 Views
1 replies
3 kudos

Best Practices for Copying Data Between Environments

Hi Everyone,I'd like to start a discussion about the best practices for copying data between environments. Here's a typical setup:Environment Setup:The same region and metastore (Unity Catalog) is used across environments.Each environment has a singl...

Data Engineering

6858 Views
1 replies
3 kudos

09-05-2024 9:14:18 AM

View Replies

Latest Reply

Sidhant07
Databricks Employee

12-05-2024 10:51:11 PM

3 kudos

Using CTAS (CREATE TABLE AS SELECT) might be a more robust solution for your use case: Independence: CTAS creates a new, independent copy of the data, avoiding dependencies on the source tableSimplified access control: Access rights can be managed so...

3 kudos

12-05-2024 10:51:11 PM

by arthurburkhardt • New Contributor

09-06-2024 3:00:53 AM

4678 Views
2 replies
1 kudos

Auto Loader changes the order of columns when infering JSON schema (sorted lexicographically)

We are using Auto Loader to read json files from S3 and ingest data into the bronze layer. But it seems auto loader struggles with schema inference and instead of preserving the order of columns from the JSON files, it sorts them lexicographically.Fo...

Data Engineering

auto.loader

json

schema

4678 Views
2 replies
1 kudos

09-06-2024 3:00:53 AM

View Replies

Latest Reply

Sidhant07
Databricks Employee

12-05-2024 10:46:37 PM

1 kudos

Auto Loader's default behavior of sorting columns lexicographically during schema inference is indeed a limitation when preserving the original order of JSON fields is important. Unfortunately, there isn't a built-in option in Auto Loader to maintain...

1 kudos

12-05-2024 10:46:37 PM

1 More Replies

by simple89 • New Contributor

09-07-2024 3:26:08 PM

3536 Views
1 replies
0 kudos

Runtime increases exponentially from 11.3 to 13.3

Hello. I am using R on databricks and using the below approach. My Spark version:Single node: i3.2xlarge · On-demand · DBR: 11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12) · us-east-1a, the job takes 1 hourI install all R packages (including a geo...

Data Engineering

3536 Views
1 replies
0 kudos

09-07-2024 3:26:08 PM

View Replies

Latest Reply

Sidhant07
Databricks Employee

12-05-2024 10:42:17 PM

0 kudos

Hello! It's possible that the increase in runtime when upgrading from Spark 3.3.0 (DBR 11.3) to Spark 3.4.1 (DBR 13.3) is due to changes in the underlying R runtime or package versions. When you upgrade to a new version of Spark, the R packages that ...

0 kudos

12-05-2024 10:42:17 PM

by rcostanza • New Contributor III

09-24-2024 10:43:29 AM

3904 Views
1 replies
1 kudos

Changing a Delta Live Table's schema

I have a Delta Live Table whose source is a Kafka stream. One of the columns is a Decimal and I need to change its precision.What's the correct approach to changing the DLT's schema?Just changing the column's precision in the DLT definition will resu...

Data Engineering

3904 Views
1 replies
1 kudos

09-24-2024 10:43:29 AM

View Replies

Latest Reply

Sidhant07
Databricks Employee

12-05-2024 10:38:17 PM

1 kudos

To change the precision of a Decimal column in a Delta Live Table (DLT) with a Kafka stream source, you can follow these steps: 1. Create a new column in the DLT with the desired precision.2. Copy the data from the old column to the new column.3. Dro...

1 kudos

12-05-2024 10:38:17 PM

by lprevost • Contributor II

10-03-2024 8:47:21 AM

3822 Views
1 replies
0 kudos

sampleBy stream in DLT

I would like to create a sampleBy (stratified version of sample) copy/clone of my delta table. Ideally, I'd like to do this using a DLT. My source table grows incrementally each month as batch files are added and autoloader picks them up. Id...

Data Engineering

3822 Views
1 replies
0 kudos

10-03-2024 8:47:21 AM

View Replies

Latest Reply

Sidhant07
Databricks Employee

12-05-2024 10:31:30 PM

0 kudos

You can create a stratified sample of your delta table using the `sampleBy` function in Databricks. However, DLT does not support the `sampleBy` function directly. To work around this, you can create a notebook that uses the `sampleBy` function to c...

0 kudos

12-05-2024 10:31:30 PM

by YOUKE • New Contributor III

11-29-2024 3:34:52 AM

1691 Views
4 replies
1 kudos

Resolved! Managed Tables on Azure databricks

Hi everyone,I was trying to understand: when a managed table is created, Databricks stores the metadata in the Hive metastore and the data in the cloud storage managed by it, which in the case of Azure Databricks will be an Azure Storage Account. But...

Data Engineering

1691 Views
4 replies
1 kudos

11-29-2024 3:34:52 AM

View Replies

Latest Reply

BraydenJordan
New Contributor II

12-05-2024 9:39:31 PM

1 kudos

Thank you so much for the solution.

1 kudos

12-05-2024 9:39:31 PM

3 More Replies

by Gusman • New Contributor II

12-03-2024 9:07:04 AM

650 Views
1 replies
1 kudos

Resolved! How to send BINARY parameters using the REST Sql API?

We are trying to send a SQL query to the REST API including a BINARY parameter, EX:"INSERT INTO MyTable (BinaryField) VALUES(:binaryData)"We tried to encode the parameter as base64 and specify that is a BINARY type but it throws a mapping error, if w...

Data Engineering

650 Views
1 replies
1 kudos

12-03-2024 9:07:04 AM

View Replies

Latest Reply

cgrant
Databricks Employee

12-05-2024 9:32:44 PM

1 kudos

Trying to serialize as binary can be pretty challenging, here's a way to do this with base64 - the trick is to serialize as base64 string and insert as binary with unbase64. databricks api post /api/2.0/sql/statements --json '{ "warehouse_id": "wa...

1 kudos

12-05-2024 9:32:44 PM

by AcrobaticMonkey • New Contributor II

12-03-2024 2:32:29 AM

1061 Views
2 replies
0 kudos

Alerts for Failed Queries in Databricks

How can we set up automated alerts to notify us when queries executed by a specific service principal fail in Databricks?

Data Engineering

1061 Views
2 replies
0 kudos

12-03-2024 2:32:29 AM

View Replies

Latest Reply

AcrobaticMonkey
New Contributor II

12-05-2024 9:27:27 PM

0 kudos

@Alberto_UmanaOur service principal uses the SQL Statement API to execute queries. We want to receive notifications for each query failure. While SQL Alerts are an option, they do not provide immediate responses. Is there a better solution to achieve...

0 kudos

12-05-2024 9:27:27 PM

1 More Replies

by seanstachff • New Contributor II

12-03-2024 10:02:22 PM

1198 Views
5 replies
0 kudos

Resolved! Using FROM_CSV giving unexpected results

Hello, I am trying to use from_csv in the sql warehouse, but I am getting unexpected results:As a small example I am running: WITH your_table AS ( SELECT 'a,b,c\n1,"hello, world",3.14\n2,"goodbye, world",2.71' AS csv_column ) SELECT from_csv(csv_c...

Data Engineering

1198 Views
5 replies
0 kudos

12-03-2024 10:02:22 PM

View Replies

Latest Reply

Takuya-Omi
Valued Contributor III

12-05-2024 8:26:50 PM

0 kudos

@seanstachff Here is the code I used to produce the results shown in the image I shared earlier. It's a bit verbose, so I’m not entirely satisfied with it, but I hope it might provide some helpful insights for you.%sql WITH your_table AS ( -- Examp...

0 kudos

12-05-2024 8:26:50 PM

4 More Replies

by martkev • New Contributor

10-16-2024 7:15:01 AM

3676 Views
1 replies
0 kudos

Networking Setup in Standard Tier – VNet Integration and Proxy Issues

Hi everyone,We are working on an order forecasting model using azure databricks and an ml model from Hugging Face and are running into an issue where the connection over SSL (port 443) fails during the handshake (EOF Error SSL 992). We suspect that a...

Data Engineering

3676 Views
1 replies
0 kudos

10-16-2024 7:15:01 AM

View Replies

Latest Reply

arjun_kr
Databricks Employee

12-05-2024 3:10:10 PM

0 kudos

It may depend on your UDR setup. If you have a UDR rule routing the traffic to any firewall appliance, it may possibly be related to traffic not being allowed in the firewall. If there is no UDR or UDR rule routes this traffic to the Internet, it wou...

0 kudos

12-05-2024 3:10:10 PM

by Anonymous • Not applicable

06-23-2022 10:38:14 AM

20026 Views
8 replies
14 kudos

Resolved! MetadataChangedException

A delta lake table is created with identity column and I'm not able to load the data parallelly from four process. i'm getting the metadata exception error.I don't want to load the data in temp table . Need to load directly and parallelly in to delta...

Data Engineering

20026 Views
8 replies
14 kudos

06-23-2022 10:38:14 AM

View Replies

Latest Reply

cpc0707
New Contributor II

12-05-2024 2:43:30 PM

14 kudos

I'm having the same issue, need to load a large amount of data from separate files into a delta table and I want to do it with a for each loop so I don't have to run it sequentially which will take days. There should be a way to handle this

14 kudos

12-05-2024 2:43:30 PM

7 More Replies

by Ulman • New Contributor II

05-05-2024 5:21:23 AM

5720 Views
9 replies
1 kudos

Switching to File Notification Mode with ADLS Gen2 - Encountering StorageException

Hello,We are currently utilizing an autoloader with file listing mode for a stream, which is experiencing significant latency due to the non-incremental naming of files in the directory—a condition that cannot be altered.In an effort to mitigate this...

Data Engineering

ADLS gen2

autoloader

file notification mode

5720 Views
9 replies
1 kudos

05-05-2024 5:21:23 AM

View Replies

Latest Reply

Rah_Cencora
New Contributor II

08-18-2024 2:56:14 PM

1 kudos

You should also reevaluate your use of premium storage for your landing area files. Typically, storage for raw files does not need to be the fastest and most resilient and expensive tier. Unless you have a compelling reason for premium storage for la...

1 kudos

08-18-2024 2:56:14 PM

8 More Replies

Databricks Community

Forum Posts

Invalid request when using the Manual generation of an account-level access token

Postman Collection Alternatives for Data-Centric API Management?

Is Limit input rate Docs not correct?

Best Practices for Copying Data Between Environments

Auto Loader changes the order of columns when infering JSON schema (sorted lexicographically)

Runtime increases exponentially from 11.3 to 13.3

Changing a Delta Live Table's schema

sampleBy stream in DLT

Resolved! Managed Tables on Azure databricks

Resolved! How to send BINARY parameters using the REST Sql API?

Alerts for Failed Queries in Databricks

Resolved! Using FROM_CSV giving unexpected results

Networking Setup in Standard Tier – VNet Integration and Proxy Issues

Resolved! MetadataChangedException

Switching to File Notification Mode with ADLS Gen2 - Encountering StorageException

Join Us as a Local Community Builder!

Azure Data Factory and Photon

Quota Limit Exhausted Error when Creating Data Ing...

How do use Databricks Lakeflow Declarative Pipelin...

Pass parameters between jobs

[NUMERIC_VALUE_OUT_OF_RANGE.WITHOUT_SUGGESTION] T...