Data Engineering

Forum Posts

Sorted by:

by ashraf1395 • Honored Contributor

12-06-2024 2:24:39 AM

302 Views
2 replies
0 kudos

Issue while Migration from hive metastrore to ucx

tables : 4 tables in the schema :databricks_log that are not being migrated error showing they are in dbfs root location and cannot be managed while they are in dbfs mount locationExample this model_notebook _logs : its location is dbfs:/mnt but in...

Data Engineering

302 Views
2 replies
0 kudos

12-06-2024 2:24:39 AM

View Replies

Latest Reply

MuthuLakshmi
Databricks Employee

12-06-2024 3:23:50 AM

0 kudos

@ashraf1395 To migrate managed tables stored at DBFS root to UC, you can do it through Deep Clone or Create Table As Select (CTAS). This also means that the HMS table data needs to be moved to the cloud storage location governed by UC. Please ensure ...

0 kudos

12-06-2024 3:23:50 AM

1 More Replies

by Barbarossa • New Contributor II

12-06-2024 2:53:50 AM

413 Views
2 replies
1 kudos

Resolved! Issue with Using Parameters for Table Version Selection in Databricks Dashboards

Hello Databricks Community,I have a quick question regarding the dashboarding functionality in Databricks:I’m utilizing table versioning, but I’m having trouble using a parameter to select a specific version as an input filter for my dashboard. Despi...

Data Engineering

413 Views
2 replies
1 kudos

12-06-2024 2:53:50 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

12-06-2024 4:17:59 AM

1 kudos

It seems that indeed there is a restriction with the usage of params and the VERSION AS OF functionality as the same behavior is showed when running it directly in the SQL editor. As per docs it states that only date or timestamp strings are accepted...

1 kudos

12-06-2024 4:17:59 AM

1 More Replies

by Joaofsilva • New Contributor

09-14-2023 11:02:27 AM

7330 Views
1 replies
0 kudos

Can't login to Databricks Community edition

I enter my valid login and password here https://community.cloud.databricks.com/login.html but it says "Invalid email address or password"

Data Engineering

7330 Views
1 replies
0 kudos

09-14-2023 11:02:27 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

12-06-2024 4:04:12 AM

0 kudos

Has this been resolved on your side?

0 kudos

12-06-2024 4:04:12 AM

by mac08_flo • New Contributor

09-11-2024 2:44:23 PM

900 Views
1 replies
0 kudos

Creación de log

Buenas tardes.Estoy intentando agregar logs en la creación de mi código.El detalle es que aún no encuentro la manera de poder ingresar los logsen un archivo independiente, no que salga desde la terminal, si no,que se almacene en un archivo (example.l...

Data Engineering

900 Views
1 replies
0 kudos

09-11-2024 2:44:23 PM

View Replies

Latest Reply

Walter_C
Databricks Employee

12-06-2024 3:58:09 AM

0 kudos

Para almacenar logs en un archivo en lugar de la terminal, puedes utilizar la configuración básica de logging en Python. A continuación, te muestro un ejemplo de cómo hacerlo: import logging # Configuración básica del logging logging.basicConfig( fi...

0 kudos

12-06-2024 3:58:09 AM

by costi9992 • New Contributor III

09-25-2024 11:40:12 AM

942 Views
1 replies
0 kudos

Missing Fields in Databricks REST API Documentation & SDK Due to OpenAPI Spec Gaps

Hi Community,I've been working with the Databricks REST APIs and noticed some inconsistencies between the API documentation and the actual API responses. Specifically, there are a few fields returned in the API responses that are not documented but a...

Data Engineering

942 Views
1 replies
0 kudos

09-25-2024 11:40:12 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

12-06-2024 3:53:40 AM

0 kudos

Hello thanks for your question, in regards the last_time_activity and disk_spec this fields have been deprecated and this is the reason why it is no longer showing in the API docs, you can refer to https://kb.databricks.com/clusters/databricks-api-la...

0 kudos

12-06-2024 3:53:40 AM

by Leszek1 • New Contributor III

09-27-2024 6:11:44 AM

926 Views
1 replies
0 kudos

Workflow job tasks waits

Hi,I'm having issues with Workflow Pipelines since 3-4 days. The performance is degraded and very strange behavior of the Pipeline is that Tasks waits ~2-3 minutes to start executing code in the Notebook.This is visible when you look at one of the ta...

Data Engineering

926 Views
1 replies
0 kudos

09-27-2024 6:11:44 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

12-06-2024 3:49:09 AM

0 kudos

Hello are you still behaving this issue? Are you counting the time it is taking for the cluster to start up or cluster was already running or using Serverless?

0 kudos

12-06-2024 3:49:09 AM

by flamezi2 • New Contributor

10-03-2024 6:31:39 AM

1365 Views
1 replies
0 kudos

Invalid request when using the Manual generation of an account-level access token

I need to generate access token using REST API and was using the guide seen here:manually-generate-an-account-level-access-tokenWhen i try this cURL in postman, i get an error but the error description is not helpfulError: I don't know what I'm missi...

Data Engineering

1365 Views
1 replies
0 kudos

10-03-2024 6:31:39 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

12-06-2024 3:46:01 AM

0 kudos

Are you replacing the Account_id with your actual account id associated with your subscription? Also what token are you using to authenticate or run this API call?

0 kudos

12-06-2024 3:46:01 AM

by GodSpeed • New Contributor

10-05-2024 2:30:54 AM

895 Views
1 replies
0 kudos

Postman Collection Alternatives for Data-Centric API Management?

I’ve been using Postman collections to manage APIs in my data projects, but I’m exploring alternatives. Are there tools like Apidog or Insomnia that perform better for API management, particularly when working with large data sets or data-driven work...

Data Engineering

895 Views
1 replies
0 kudos

10-05-2024 2:30:54 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

12-06-2024 3:42:50 AM

0 kudos

Insomnia: Insomnia is another strong alternative that is frequently recommended. It is known for its simplicity and effectiveness in making REST API requests. Insomnia supports the import of Postman collections and is praised for its performance and ...

0 kudos

12-06-2024 3:42:50 AM

by Jcowell • New Contributor II

12-04-2024 2:00:49 PM

374 Views
2 replies
0 kudos

Is Limit input rate Docs not correct?

In databricks docs it says "If you use maxBytesPerTrigger in conjunction with maxFilesPerTrigger, the micro-batch processes data until either the maxFilesPerTrigger or maxBytesPerTrigger limit is reached."But based on the source code this is not true...

Data Engineering

374 Views
2 replies
0 kudos

12-04-2024 2:00:49 PM

View Replies

Latest Reply

ozaaditya
Contributor

12-04-2024 8:23:47 PM

0 kudos

In my opinion, the reason for not using both options simultaneously is that the framework would face a logical conflict:Should it stop reading after the maximum number of files is reached, even if the size limit hasn’t been exceeded?OrShould it stop ...

0 kudos

12-04-2024 8:23:47 PM

1 More Replies

by 183530 • New Contributor III

08-18-2022 7:32:06 PM

1674 Views
2 replies
2 kudos

How to search an array of words in a text field

Example:TABLE 1FIELD_TEXTI like salty food and Italian foodI have Italian foodbread, rice and beansmexican foodscoke, spritearray['italia', 'mex','coke']match TABLE1 X ARRAYResults:I like salty food and Italian foodI have Italian foodmexican foodsis ...

Data Engineering

1674 Views
2 replies
2 kudos

08-18-2022 7:32:06 PM

View Replies

Latest Reply

Meredithharper
New Contributor II

12-05-2024 11:04:59 PM

2 kudos

Yes, you can do it in SQL with LIKE or IN and in PySpark using array contains, ideal for filtering Words like halal catering Barcelona, catering, and many more

2 kudos

12-05-2024 11:04:59 PM

1 More Replies

by KrzysztofPrzyso • New Contributor III

09-05-2024 9:14:18 AM

2386 Views
1 replies
3 kudos

Best Practices for Copying Data Between Environments

Hi Everyone,I'd like to start a discussion about the best practices for copying data between environments. Here's a typical setup:Environment Setup:The same region and metastore (Unity Catalog) is used across environments.Each environment has a singl...

Data Engineering

2386 Views
1 replies
3 kudos

09-05-2024 9:14:18 AM

View Replies

Latest Reply

Sidhant07
Databricks Employee

12-05-2024 10:51:11 PM

3 kudos

Using CTAS (CREATE TABLE AS SELECT) might be a more robust solution for your use case: Independence: CTAS creates a new, independent copy of the data, avoiding dependencies on the source tableSimplified access control: Access rights can be managed so...

3 kudos

12-05-2024 10:51:11 PM

by arthurburkhardt • New Contributor

09-06-2024 3:00:53 AM

1343 Views
2 replies
1 kudos

Auto Loader changes the order of columns when infering JSON schema (sorted lexicographically)

We are using Auto Loader to read json files from S3 and ingest data into the bronze layer. But it seems auto loader struggles with schema inference and instead of preserving the order of columns from the JSON files, it sorts them lexicographically.Fo...

Data Engineering

auto.loader

json

schema

1343 Views
2 replies
1 kudos

09-06-2024 3:00:53 AM

View Replies

Latest Reply

Sidhant07
Databricks Employee

12-05-2024 10:46:37 PM

1 kudos

Auto Loader's default behavior of sorting columns lexicographically during schema inference is indeed a limitation when preserving the original order of JSON fields is important. Unfortunately, there isn't a built-in option in Auto Loader to maintain...

1 kudos

12-05-2024 10:46:37 PM

1 More Replies

by simple89 • New Contributor

09-07-2024 3:26:08 PM

829 Views
1 replies
0 kudos

Runtime increases exponentially from 11.3 to 13.3

Hello. I am using R on databricks and using the below approach. My Spark version:Single node: i3.2xlarge · On-demand · DBR: 11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12) · us-east-1a, the job takes 1 hourI install all R packages (including a geo...

Data Engineering

829 Views
1 replies
0 kudos

09-07-2024 3:26:08 PM

View Replies

Latest Reply

Sidhant07
Databricks Employee

12-05-2024 10:42:17 PM

0 kudos

Hello! It's possible that the increase in runtime when upgrading from Spark 3.3.0 (DBR 11.3) to Spark 3.4.1 (DBR 13.3) is due to changes in the underlying R runtime or package versions. When you upgrade to a new version of Spark, the R packages that ...

0 kudos

12-05-2024 10:42:17 PM

by rcostanza • New Contributor III

09-24-2024 10:43:29 AM

924 Views
1 replies
1 kudos

Changing a Delta Live Table's schema

I have a Delta Live Table whose source is a Kafka stream. One of the columns is a Decimal and I need to change its precision.What's the correct approach to changing the DLT's schema?Just changing the column's precision in the DLT definition will resu...

Data Engineering

924 Views
1 replies
1 kudos

09-24-2024 10:43:29 AM

View Replies

Latest Reply

Sidhant07
Databricks Employee

12-05-2024 10:38:17 PM

1 kudos

To change the precision of a Decimal column in a Delta Live Table (DLT) with a Kafka stream source, you can follow these steps: 1. Create a new column in the DLT with the desired precision.2. Copy the data from the old column to the new column.3. Dro...

1 kudos

12-05-2024 10:38:17 PM

by lprevost • Contributor II

10-03-2024 8:47:21 AM

832 Views
1 replies
0 kudos

sampleBy stream in DLT

I would like to create a sampleBy (stratified version of sample) copy/clone of my delta table. Ideally, I'd like to do this using a DLT. My source table grows incrementally each month as batch files are added and autoloader picks them up. Id...

Data Engineering

832 Views
1 replies
0 kudos

10-03-2024 8:47:21 AM

View Replies

Latest Reply

Sidhant07
Databricks Employee

12-05-2024 10:31:30 PM

0 kudos

You can create a stratified sample of your delta table using the `sampleBy` function in Databricks. However, DLT does not support the `sampleBy` function directly. To work around this, you can create a notebook that uses the `sampleBy` function to c...

0 kudos

12-05-2024 10:31:30 PM

User

Count

1611

768

345

286

252

Databricks Community

Forum Posts

Issue while Migration from hive metastrore to ucx

Resolved! Issue with Using Parameters for Table Version Selection in Databricks Dashboards

Can't login to Databricks Community edition

Creación de log

Missing Fields in Databricks REST API Documentation & SDK Due to OpenAPI Spec Gaps

Workflow job tasks waits

Invalid request when using the Manual generation of an account-level access token

Postman Collection Alternatives for Data-Centric API Management?

Is Limit input rate Docs not correct?

How to search an array of words in a text field

Best Practices for Copying Data Between Environments

Auto Loader changes the order of columns when infering JSON schema (sorted lexicographically)

Runtime increases exponentially from 11.3 to 13.3

Changing a Delta Live Table's schema

sampleBy stream in DLT

Join Us as a Local Community Builder!

toml file syntax highlighting

Materialized Views Compute

Sending customized mail with databricks notebook w...

Error Databricks Bundle Deploy with changes in the...

OPTIMIZE command on heavily nested table OOM error