Data Engineering

Forum Posts

Sorted by:

by baatchus • New Contributor III

10-18-2021 3:58:36 AM

3435 Views
3 replies
1 kudos

Deduplication, Bronze (raw) or Silver (enriched)

Need some help in choosing between where to do deduplication of data. So I have sensor data in blob storage that I'm picking up with Databricks Autoloader. The data and files can have duplicates in them.Which of the 2 options do I choose?Option 1:Cre...

Data Engineering

3435 Views
3 replies
1 kudos

10-18-2021 3:58:36 AM

View Replies

Latest Reply

Tharun-Kumar
Honored Contributor II

07-17-2023 8:25:12 PM

1 kudos

@peter_mcnally You can use watermark to pick the late records and send only the latest records to the bronze table. This will ensure that you always have the latest information in your bronze table.This feature is explained in detail here - https://w...

1 kudos

07-17-2023 8:25:12 PM

2 More Replies

by anastassia_kor1 • New Contributor

06-19-2023 8:02:58 AM

3924 Views
2 replies
1 kudos

Error "Distributed package doesn't have nccl built in" with Transformers Library.

I am trying to run a simple training script using HF's transformers library and am running into the error `Distributed package doesn't have nccl built in` error.Runtime: DBR 13.0 ML - SPark 3.4.0 - Scala 2.12Driver: i3.xlarge - 4 coresNote: This is a...

Data Engineering

3924 Views
2 replies
1 kudos

06-19-2023 8:02:58 AM

View Replies

Latest Reply

patputnam-db
New Contributor II

07-17-2023 2:44:37 PM

1 kudos

Hi @anastassia_kor1,For CPU-only training, TrainingArguments has a no_cuda flag that should be set.For transformers==4.26.1 (MLR 13.0) and transformers==4.28.1 (MLR 13.1), there's an additional xpu_backend argument that needs to be set as well. Try u...

1 kudos

07-17-2023 2:44:37 PM

1 More Replies

by boomoto • New Contributor II

06-28-2023 3:27:13 PM

801 Views
4 replies
1 kudos

Sample Notebook to streaming data for Kafka topic (write)

Data Engineering

801 Views
4 replies
1 kudos

06-28-2023 3:27:13 PM

View Replies

Latest Reply

Tharun-Kumar
Honored Contributor II

07-17-2023 11:04:19 AM

1 kudos

@boomoto Thanks for letting us know.

1 kudos

07-17-2023 11:04:19 AM

3 More Replies

by mshettar • New Contributor II

06-06-2023 2:49:59 PM

1114 Views
2 replies
0 kudos

Databricks CLI's workspace export_dir command adds unnecessary edits despite not making any change in the workspace

databricks workspace export_dir / export command with overwrite option enabled adds non-existent changes in the target directory. 1. It introduces new line deletion and 2. add/deletion of MAGIC comments despite not making any meaningful changes in th...

Data Engineering

1114 Views
2 replies
0 kudos

06-06-2023 2:49:59 PM

View Replies

Latest Reply

RyanHager
Contributor

07-17-2023 10:03:12 AM

0 kudos

I am encountering this issue as well and it did not happen previously. Additionally, you see this pattern if you are using repos internally and make a change to a notebook in another section.

0 kudos

07-17-2023 10:03:12 AM

1 More Replies

by Chakra • New Contributor II

08-10-2021 5:25:16 PM

842 Views
1 replies
1 kudos

Create job cluster with a docker image in azure data factory

Is there a way to create a job cluster in azure data factory with a docker image either through API or UI

Data Engineering

842 Views
1 replies
1 kudos

08-10-2021 5:25:16 PM

View Replies

Latest Reply

m_szklarczyk
New Contributor II

07-17-2023 6:31:25 AM

1 kudos

Does anyone handle how to run from ADF custom image as jobs compute?

1 kudos

07-17-2023 6:31:25 AM

by Shay83 • New Contributor II

07-16-2023 12:29:52 AM

717 Views
1 replies
3 kudos

Resolved! stream from specific time

Hello,How should I start stream a delta table from specific point in time?

Data Engineering

717 Views
1 replies
3 kudos

07-16-2023 12:29:52 AM

View Replies

Latest Reply

Lakshay
Esteemed Contributor

07-17-2023 6:14:34 AM

3 kudos

If you are streaming from a Delta table, you can specify the starting version or timestamp. You can refer the document for complete detail: https://docs.databricks.com/structured-streaming/delta-lake.html#specify-initial-position

3 kudos

07-17-2023 6:14:34 AM

by User16826992666 • Valued Contributor

06-25-2021 12:09:57 PM

1222 Views
5 replies
1 kudos

When developing a Delta Live Table, can I test my code in the notebook?

Just wondering about the dev flow when building Delta Live Tables. I write my code in the notebook so it would be useful to be able to test it out from within that environment.

Data Engineering

1222 Views
5 replies
1 kudos

06-25-2021 12:09:57 PM

View Replies

Latest Reply

Lakshay
Esteemed Contributor

07-17-2023 5:20:23 AM

1 kudos

You need to create a DLT pipeline to test the code.

1 kudos

07-17-2023 5:20:23 AM

4 More Replies

by TonyLe • New Contributor

07-14-2023 2:16:57 PM

1018 Views
2 replies
1 kudos

IllegalArgumentException: requirement failed: Invalid uri

Hi all, I'm trying to connect to MongoDB using the Databricks notebook. I keep getting the error that my MongoDB uri is invalid. The uri works when connecting from my local machine using the Rust driver. I pretty much followed the tutorial that was g...

Data Engineering

spark

1018 Views
2 replies
1 kudos

07-14-2023 2:16:57 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

07-17-2023 5:08:57 AM

1 kudos

Also take in mind firewall issues in case the mongodb is on-prem or on some other location.

1 kudos

07-17-2023 5:08:57 AM

1 More Replies

by tirato • New Contributor

05-09-2023 6:45:38 AM

1346 Views
3 replies
2 kudos

Resolved! Cannot import-dir from AzureDevops, but works fine locally.

Hello,as i'm trying to create a CI/CD for the project, I'm finding myself stuck.Tried to upload the Notebooks from my Azure DevOps Release and I'm getting 403-forbidden access.I used 'cat ~/.databrickscfg file and matched with the local config that I...

Data Engineering

1346 Views
3 replies
2 kudos

05-09-2023 6:45:38 AM

View Replies

Latest Reply

valeryuaba
New Contributor III

07-17-2023 4:30:50 AM

2 kudos

Hey everyone! I can totally relate to the frustration of encountering authentication issues when setting up a CI/CD pipeline. It's great that you're able to import the notebooks locally, but facing difficulties on Azure DevOps can be quite puzzling.F...

2 kudos

07-17-2023 4:30:50 AM

2 More Replies

by User16869510359 • Esteemed Contributor

06-25-2021 9:57:44 AM

2349 Views
2 replies
1 kudos

Resolved! Cluster Health Dashboard

Is there a cluster health dashboard which has the details of the total number of running interactive cluster, the total number of job clusters? Also Flag clusters with issues.

Data Engineering

2349 Views
2 replies
1 kudos

06-25-2021 9:57:44 AM

View Replies

Latest Reply

valeryuaba
New Contributor III

07-17-2023 2:06:26 AM

1 kudos

Thanks!

1 kudos

07-17-2023 2:06:26 AM

1 More Replies

by thushar • Contributor

12-21-2021 5:27:32 AM

1578 Views
3 replies
0 kudos

Connect to Azure DevOps repository using service principle

My source code is in the VSTS repository and I am using PAT token to connect VSTS from Azure data bricks notebook and then building packages and installing my cluster. For the production environment, I can't use PAT token, so is there any way to conn...

Data Engineering

1578 Views
3 replies
0 kudos

12-21-2021 5:27:32 AM

View Replies

Latest Reply

martinez
New Contributor III

07-17-2023 1:50:38 AM

0 kudos

Hey everyoneI've been working with Azure DevOps and VSTS repositories, and I can relate to the challenges of connecting them securely. thushar, I understand your concern about using a PAT token for production environments. Fortunately, there is indee...

0 kudos

07-17-2023 1:50:38 AM

2 More Replies

by boyelana • Contributor III

12-09-2022 3:31:06 AM

1270 Views
3 replies
7 kudos

Resolved! How to start with Databricks in Google Cloud?

I am looking through Google Cloud Platform and I am looking to get started with Databricks on GCP. Happy if anyone can point me in the direction that can provide guidance on how to get started.Thansk

Data Engineering

1270 Views
3 replies
7 kudos

12-09-2022 3:31:06 AM

View Replies

Latest Reply

martinez
New Contributor III

07-17-2023 1:25:14 AM

7 kudos

Hey boyelana Databricks on Google Cloud Platform is definitely an interesting and powerful combination, and I'm thrilled to see that you're looking to get started with it, boyelana!To begin your journey with Databricks on GCP, there are a few steps y...

7 kudos

07-17-2023 1:25:14 AM

2 More Replies

by erigaud • Honored Contributor

07-14-2023 8:13:54 AM

1495 Views
2 replies
2 kudos

Resolved! Access to personal access token via python

Is there a way to get an existing personal access token via python ? Either through and sdk or a rest endpoint ? Or is the only way to do that to store the PAT in a key vault and retrieve it via a secret scope ? Thank you !

Data Engineering

1495 Views
2 replies
2 kudos

07-14-2023 8:13:54 AM

View Replies

Latest Reply

Anonymous
Not applicable

07-17-2023 12:20:17 AM

2 kudos

Hi @Ajay-Pandey Hope everything is going great. Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...

2 kudos

07-17-2023 12:20:17 AM

1 More Replies

by Sujitha • Community Manager

12-13-2022 10:38:47 AM

1282 Views
3 replies
2 kudos

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers t...

KB Feedback DiscussionIn addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers to common questions about Databricks, as well as information on optimisation and troubleshooting.These...

Data Engineering

1282 Views
3 replies
2 kudos

12-13-2022 10:38:47 AM

View Replies

Latest Reply

martinez
New Contributor III

07-17-2023 12:00:38 AM

2 kudos

Thanks for sharing!

2 kudos

07-17-2023 12:00:38 AM

2 More Replies

by VitorGhiotti • New Contributor II

07-14-2023 2:10:25 PM

674 Views
2 replies
1 kudos

Python error on install epmwebapi library

The error below occurred when trying to install the mentioned library. how do i fix this

Data Engineering

674 Views
2 replies
1 kudos

07-14-2023 2:10:25 PM

View Replies

Latest Reply

Hemant
Valued Contributor II

07-14-2023 6:20:02 PM

1 kudos

Hi @VitorGhiotti, I am able to install this package, can you share your cluster configuration and are you using a private endpoint?

1 kudos

07-14-2023 6:20:02 PM

1 More Replies

User

Count

1602

736

343

284

247

Databricks

Forum Posts

Deduplication, Bronze (raw) or Silver (enriched)

Error "Distributed package doesn't have nccl built in" with Transformers Library.

Sample Notebook to streaming data for Kafka topic (write)

Databricks CLI's workspace export_dir command adds unnecessary edits despite not making any change in the workspace

Create job cluster with a docker image in azure data factory

Resolved! stream from specific time

When developing a Delta Live Table, can I test my code in the notebook?

IllegalArgumentException: requirement failed: Invalid uri

Resolved! Cannot import-dir from AzureDevops, but works fine locally.

Resolved! Cluster Health Dashboard

Connect to Azure DevOps repository using service principle

Resolved! How to start with Databricks in Google Cloud?

Resolved! Access to personal access token via python

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers t...

Python error on install epmwebapi library

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...