Data Engineering

Forum Posts

Sorted by:

by Mado • Valued Contributor II

01-09-2023 10:37:24 PM

6153 Views
4 replies
3 kudos

Resolved! Streaming Delta Live Table, if I re-run the pipeline, does it append the new data to the current table?

Hi,I have a question about DLT table. Assume that I have a streaming DLT pipeline which reads data from a Bronze table and apply transformation on data. Pipeline mode is triggered. If I re-run the pipeline, does it append new data to the current tabl...

Data Engineering

6153 Views
4 replies
3 kudos

01-09-2023 10:37:24 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 6:13:39 AM

3 kudos

@Mohammad Saber :In a Databricks Delta Lake (DLT) pipeline, when you re-run the pipeline in "append" mode, new data will be appended to the existing table. Delta Lake provides built-in support for handling duplicates through its "upsert" functionali...

3 kudos

04-10-2023 6:13:39 AM

3 More Replies

by JJ_ • New Contributor II

01-30-2023 8:03:08 AM

3374 Views
3 replies
0 kudos

ODBC Connection to Another Compute Within the Same Workspace

Hello all!I couldn't find anything definitive related to this issue so I hope I'm not duplicating another topic :). I have imported an R repository that normally runs on another machine and uses ODBC driver to issue sparkSQL commands to a compute (le...

Data Engineering

3374 Views
3 replies
0 kudos

01-30-2023 8:03:08 AM

View Replies

Latest Reply

JJ_
New Contributor II

04-11-2023 2:15:08 AM

0 kudos

Thanks @Suteja Kanuri for your response! I tried all of the steps you mentioned (and many more) but never managed to make it work. My suspicion was that our azure networking setup was preventing this from happening. I have not found this documented ...

0 kudos

04-11-2023 2:15:08 AM

2 More Replies

by a2_ish • New Contributor II

10-04-2022 3:31:34 AM

4323 Views
1 replies
0 kudos

Where are delta lake files stored by given path?

I have below code which works for the path below but fails for path = azure storage account path. i have enough access to write and update the storage account. I would like to know what wrong am I doing and the path below which works , how can i phys...

Data Engineering

4323 Views
1 replies
0 kudos

10-04-2022 3:31:34 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-14-2023 9:39:33 AM

0 kudos

@Ankit Kumar :The error message you received indicates that the user does not have sufficient permission to access the Azure Blob Storage account. You mentioned that you have enough access to write and update the storage account, but it's possible t...

0 kudos

04-14-2023 9:39:33 AM

by vicusbass • New Contributor II

04-13-2023 11:54:50 PM

22840 Views
3 replies
1 kudos

How to extract values from JSON array field?

Hi,While creating an SQL notebook, I am struggling with extracting some values from a JSON array field. I need to create a view where a field would be an array with values extracted from a field like the one below, specifically I need the `value` fi...

Data Engineering

22840 Views
3 replies
1 kudos

04-13-2023 11:54:50 PM

View Replies

Latest Reply

vicusbass
New Contributor II

04-14-2023 9:26:46 AM

1 kudos

Maybe I didn't explain it correctly. The JSON snippet from the description is a cell from a row from a table.

1 kudos

04-14-2023 9:26:46 AM

2 More Replies

by labromb • Databricks Partner

04-14-2023 4:02:45 AM

4508 Views
2 replies
0 kudos

Getting Databricks SQL dashboard to recognise change to an underlying query

Hi CommunityScenario:I have created a query in Databricks SQL, built a number of visualisations from it and published them to a dashboard. I then realise that I need to add another field to the underlying query that I want to then leverage as a dashb...

Data Engineering

4508 Views
2 replies
0 kudos

04-14-2023 4:02:45 AM

View Replies

Latest Reply

youssefmrini
Databricks Employee

04-14-2023 5:55:18 AM

0 kudos

Can you take a screenshot ?

0 kudos

04-14-2023 5:55:18 AM

1 More Replies

by Tim_T • New Contributor

04-14-2023 6:53:49 AM

1676 Views
0 replies
0 kudos

Are training/ecommerce data tables available as CSVs?

The course "Apache Spark™ Programming with Databricks" requires data sources such as training/ecommerce/events/events.parquet. Are these available as CSV files? My company's databricks configuration does not allow me to mount to such repositories, bu...

Data Engineering

1676 Views
0 replies
0 kudos

04-14-2023 6:53:49 AM

by Hitesh_goswami • New Contributor

04-12-2023 9:25:35 AM

1995 Views
1 replies
0 kudos

Upgrading Ipython version without changing LTS version

I am using a specific Pydeeque function called ColumnProfilerRunner which is only supported with Spark 3.0.1, so I must use 7.3 LTS. Currently, I am trying to install "great_expectations" library on Python, which requires Ipython version==7.16.3, an...

Data Engineering

1995 Views
1 replies
0 kudos

04-12-2023 9:25:35 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-14-2023 2:36:43 AM

0 kudos

@Hitesh Goswami : please check if the below helps!To upgrade the Ipython version on a Databricks 7.3LTS cluster, you can follow these steps:Create a new library installation command using the Databricks CLI by running the following command in your l...

0 kudos

04-14-2023 2:36:43 AM

by JGil • New Contributor III

04-11-2023 11:35:16 PM

5037 Views
5 replies
0 kudos

Installing Bazel on databricks cluster

I am new to azure databricks and I want to install a library on a cluster and to do that I need to install bazel build tool first.I checked the site bazel but I am still not sure how to do it in databricks?I appriciate if any can help me and write me...

Data Engineering

5037 Views
5 replies
0 kudos

04-11-2023 11:35:16 PM

View Replies

Latest Reply

Avinash_94
Databricks Employee

04-14-2023 12:25:06 AM

0 kudos

Databricks migrated over from the standard Scala Build Tool (SBT) to using Bazel to build, test and deploy our Scala code. Follow this doc https://www.databricks.com/blog/2019/02/27/speedy-scala-builds-with-bazel-at-databricks.html

0 kudos

04-14-2023 12:25:06 AM

4 More Replies

by afzi • New Contributor II

08-10-2022 10:40:47 PM

4515 Views
1 replies
1 kudos

Pandas DataFrame error when using to_csv

Hi Everyone, I would like to a Pandas Dataframe to /dbfs/FileStore/ using to_csv method.Usually it would just write the Dataframe to the path described but It has been giving me "FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStor...

Data Engineering

4515 Views
1 replies
1 kudos

08-10-2022 10:40:47 PM

View Replies

Latest Reply

Avinash_94
Databricks Employee

04-14-2023 12:31:19 AM

1 kudos

f = open("/dbfs/mnt/blob/myNames.txt", "r")

1 kudos

04-14-2023 12:31:19 AM

by User16826992783 • Databricks Employee

06-11-2021 7:53:45 AM

1989 Views
1 replies
0 kudos

Why are some of my AWS EBS volumes in my workspace unencrypted?

I noticed that 30GB of my EBS volumes are unencrypted, is there a reason for this, and is there a way to encrypt these volumes?

Data Engineering

1989 Views
1 replies
0 kudos

06-11-2021 7:53:45 AM

View Replies

Latest Reply

Abishek
Databricks Employee

04-14-2023 12:30:26 AM

0 kudos

https://docs.databricks.com/security/keys/customer-managed-keys-storage-aws.html#introductionThe Databricks cluster’s EBS volumes (optional) - For Databricks Runtime cluster nodes and other compute resources in the Classic data plane, you can option...

0 kudos

04-14-2023 12:30:26 AM

by wb • New Contributor II

09-17-2022 2:53:39 PM

1798 Views
1 replies
2 kudos

Import paths using repos and installed libraries get confused

We use Azure Devops and Azure Databricks and have custom Python libraries. I placed my notebooks in the same repo and the structure is like this:mylib/ mylib/__init__.pyt mylib/code.py notebooks/ notebooks/job_notebook.py setup.pyAzure pipelines buil...

Data Engineering

1798 Views
1 replies
2 kudos

09-17-2022 2:53:39 PM

View Replies

Latest Reply

Avinash_94
Databricks Employee

04-14-2023 12:28:50 AM

2 kudos

It looks for the configs locally i suppose if you can share requirements .txt i can elaborate

2 kudos

04-14-2023 12:28:50 AM

by User16826990884 • Databricks Employee

06-25-2021 11:46:25 AM

3301 Views
1 replies
1 kudos

Delta log retention

Is there an impact on performance if I increase the Delta log retention to 3000?

Data Engineering

3301 Views
1 replies
1 kudos

06-25-2021 11:46:25 AM

View Replies

Latest Reply

DD_Sharma
Databricks Employee

04-14-2023 12:28:43 AM

1 kudos

There will be no performance impact if you want to keep " Delta log retention to 3000". However, it will increase the storage cost so it's not advisable to use a large number until really needed for the business use cases.The default delta.logRetenti...

1 kudos

04-14-2023 12:28:43 AM

by Saurabh98290 • New Contributor II

09-15-2022 8:24:54 PM

1570 Views
1 replies
2 kudos

Best Suited Language To Parallelize Notebook

I would like to know if we are writing code for parallel execution on notebook which language is best suited for that Python or Scala.

Data Engineering

1570 Views
1 replies
2 kudos

09-15-2022 8:24:54 PM

View Replies

Latest Reply

User16756723392
Databricks Employee

04-14-2023 12:28:03 AM

2 kudos

You need to test in Python and scala based on the complexity one of it outperforms the other. In few cases Python was faster where as in other Scala. It is all about the efficiency of the code

2 kudos

04-14-2023 12:28:03 AM

by NimaiAhl • New Contributor II

09-08-2022 10:24:11 PM

1874 Views
1 replies
0 kudos

External Tables - SQL

To create external tables we need to use the location keyword and use the link for the storage location, in reference to that does the user need to have permission for the storage location if not then will we use storage credentials to provide the ac...

Data Engineering

1874 Views
1 replies
0 kudos

09-08-2022 10:24:11 PM

View Replies

Latest Reply

Anu-sha
Databricks Employee

04-14-2023 12:23:22 AM

0 kudos

Hi Nimai, That's partially right. You can grant permissions directly on the storage credential, but Databricks recommends that you reference it in an external location and grant permissions to that instead. An external location combines a storage cre...

0 kudos

04-14-2023 12:23:22 AM

by Kotofosonline • New Contributor III

09-08-2021 4:41:09 AM

3435 Views
1 replies
2 kudos

Query with distinct sort and alias produces error column not found

I’m trying to use sql query on azure-databricks with distinct sort and aliasesSELECT DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistIdThe problem is that if I add an alias then I can not use not aliased name in the order by cla...

Data Engineering

3435 Views
1 replies
2 kudos

09-08-2021 4:41:09 AM

View Replies

Latest Reply

User16756723392
Databricks Employee

04-14-2023 12:22:33 AM

2 kudos

SELECT album.ArtistId ,DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistIdCan you try this

2 kudos

04-14-2023 12:22:33 AM

Databricks Community

Forum Posts

Resolved! Streaming Delta Live Table, if I re-run the pipeline, does it append the new data to the current table?

ODBC Connection to Another Compute Within the Same Workspace

Where are delta lake files stored by given path?

How to extract values from JSON array field?

Getting Databricks SQL dashboard to recognise change to an underlying query

Are training/ecommerce data tables available as CSVs?

Upgrading Ipython version without changing LTS version

Installing Bazel on databricks cluster

Pandas DataFrame error when using to_csv

Why are some of my AWS EBS volumes in my workspace unencrypted?

Import paths using repos and installed libraries get confused

Delta log retention

Best Suited Language To Parallelize Notebook

External Tables - SQL

Query with distinct sort and alias produces error column not found

Databricks to Salesforce Core (Not cloud)

Databricks optimization for query perfomance and p...

Parametrize the DLT pipeline for dynamic loading o...

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...