Data Engineering

Forum Posts

Sorted by:

by yitao • New Contributor III

08-24-2021 9:06:03 AM

1635 Views
6 replies
11 kudos

Resolved! How to make sparklyr extension work with Databricks runtime?

Hello. I'm the current maintainer of sparklyr (a R interface for Apache Spark) and a few sparklyr extensions such as sparklyr.flint.Sparklyr was fortunate to receive some contribution from Databricks folks, which enabled R users to run `spark_connect...

Data Engineering

1635 Views
6 replies
11 kudos

08-24-2021 9:06:03 AM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 2:35:15 PM

11 kudos

Hi @yitao , Just a friendly follow-up. Do you still need help, or does the above response help you to find the solution? Please let us know.

11 kudos

05-18-2022 2:35:15 PM

5 More Replies

by Hubert-Dudek • Esteemed Contributor III

10-12-2021 6:17:42 AM

1361 Views
5 replies
18 kudos

Resolved! Azure: Permanently purge cluster logs

Is there any way to purge logs via API instead of clicking daily that option:

Data Engineering

1361 Views
5 replies
18 kudos

10-12-2021 6:17:42 AM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 2:33:59 PM

18 kudos

Hi @Hubert Dudek , Just a friendly follow-up. Do you still need help, or @Prabakar Ammeappin's response help you to find the solution? Please let us know.

18 kudos

05-18-2022 2:33:59 PM

4 More Replies

by BorislavBlagoev • Valued Contributor III

09-14-2021 6:01:21 AM

2194 Views
3 replies
5 kudos

Resolved! Get package from Nexus repo.

I want to receive a package from Nexus repo both in notebook and job. If anyone has experience with this, please answer me here!

Data Engineering

2194 Views
3 replies
5 kudos

09-14-2021 6:01:21 AM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 2:28:02 PM

5 kudos

Hi @Borislav Blagoev , Just a friendly follow-up. Do you still need help, or does the above response help you to find the solution? Please let us know.

5 kudos

05-18-2022 2:28:02 PM

2 More Replies

by soundari • New Contributor

10-06-2021 2:29:07 AM

1115 Views
3 replies
1 kudos

Resolved! Identify the partitionValues written yesterday from delta

We have a streaming data written into delta. We will not write all the partitions every day. Hence i am thinking of running compact spark job, to run only on partitions that has been modified yesterday. Is it possible to query the partitionsValues wr...

Data Engineering

1115 Views
3 replies
1 kudos

10-06-2021 2:29:07 AM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 2:08:31 PM

1 kudos

Hi @Gnanasoundari Soundarajan , Just a friendly follow-up. Do you still need help, or @Deepak Bhutada 's response help you to find the solution? Please let us know.

1 kudos

05-18-2022 2:08:31 PM

2 More Replies

by narek_margaryan • New Contributor II

10-06-2021 12:51:06 PM

1447 Views
3 replies
3 kudos

Resolved! Do Spark nodes read data from storage in a sequence?

I'm new to Spark and trying to understand how some of its components work.I understand that once the data is loaded into the memory of separate nodes, they process partitions in parallel, within their own memory (RAM).But I'm wondering whether the in...

Data Engineering

1447 Views
3 replies
3 kudos

10-06-2021 12:51:06 PM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 2:07:37 PM

3 kudos

Hi @Narek Margaryan, Just a friendly follow-up. Do you still need help, or does the above response help you to find the solution? Please let us know.

3 kudos

05-18-2022 2:07:37 PM

2 More Replies

by brendan-b • New Contributor II

10-09-2021 5:35:37 PM

6534 Views
4 replies
3 kudos

spark-xml not working with Databricks Connect and Pyspark

Hi all,I currently have a cluster configured in databricks with spark-xml (version com.databricks:spark-xml_2.12:0.13.0) which was installed using Maven. The spark-xml library itself works fine with Pyspark when I am using it in a notebook within th...

Data Engineering

6534 Views
4 replies
3 kudos

10-09-2021 5:35:37 PM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 2:06:10 PM

3 kudos

Hi @Brendan Banfield , This article describes how to read and write an XML file as an Apache Spark™ data source.

3 kudos

05-18-2022 2:06:10 PM

3 More Replies

by User16783855534 • New Contributor III

06-07-2021 10:47:16 AM

5303 Views
6 replies
5 kudos

Resolved! How can I get the json spec of my Databricks Job?

Data Engineering

5303 Views
6 replies
5 kudos

06-07-2021 10:47:16 AM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 2:04:14 PM

5 kudos

Hi @Neil Patel , Just a friendly follow-up. Do you still need help, or do the above responses help you find the solution? Please let us know.

5 kudos

05-18-2022 2:04:14 PM

5 More Replies

by dataslicer • Contributor

09-27-2021 11:11:29 PM

1779 Views
3 replies
2 kudos

Resolved! upgraded R package rlang to 0.4.11 on DBR 8.3 SC, but sessionInfo() still shows rlang as 0.4.9

I am using Azure Databricks Runtime (DBR) 8.3 ML with Python notebook and R cells together.I want to use "tidyverse" and one of the dependency is rlang >= 0.4.10 and the base DBR 8.3 ML provides rlang @ 0.4.9. I successfully upgraded the R package t...

Data Engineering

1779 Views
3 replies
2 kudos

09-27-2021 11:11:29 PM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 2:02:13 PM

2 kudos

Hi @Jim Huang , Just a friendly follow-up. Do you still need help or the above responses help you to find the solution? Please let us know.

2 kudos

05-18-2022 2:02:13 PM

2 More Replies

by delta_lake • New Contributor

09-17-2021 2:36:53 AM

1066 Views
3 replies
1 kudos

Delta Lake Python

I have setup a virtual environment inside my existing hadoop cluster. Since the current cluster does not have spark >3 , so i installed delta spark using virtual environment. While trying to access the hdfs which is kerberose one, Getting below error...

Data Engineering

1066 Views
3 replies
1 kudos

09-17-2021 2:36:53 AM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 1:39:46 PM

1 kudos

Hi @Vasanth P , Just a friendly follow-up. Do you still need help or the above responses help you to find the solution? Please let us know.

1 kudos

05-18-2022 1:39:46 PM

2 More Replies

by IkramMecheri • New Contributor II

05-08-2019 1:05:45 PM

7256 Views
5 replies
2 kudos

ImportError: No module named 'bs4'

Hi, I would like to do some web scrapping, however I am unable to import the libraries I traditionally use for that task import requests from bs4 import BeautifulSoup

Data Engineering

7256 Views
5 replies
2 kudos

05-08-2019 1:05:45 PM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 1:38:59 PM

2 kudos

Hi @Ikram Mecheri , Just a friendly follow-up. Do you still need help, or do the above responses help you find the solution? Please let us know.

2 kudos

05-18-2022 1:38:59 PM

4 More Replies

by User16868770416 • Contributor

09-23-2021 5:53:02 PM

978 Views
4 replies
2 kudos

What happens to my production jobs if the underlying Databricks Runtime is no longer supported? Will they fail?

Data Engineering

978 Views
4 replies
2 kudos

09-23-2021 5:53:02 PM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 1:28:16 PM

2 kudos

Hi @Will Block , Just a friendly follow-up. Do you still need help or the above responses help you to find the solution? Please let us know.

2 kudos

05-18-2022 1:28:16 PM

3 More Replies

by Zen • New Contributor III

09-14-2021 10:04:38 AM

2296 Views
9 replies
2 kudos

Resolved! How do I run a scala script from the Terminal

Hello, how do I run a scala script from a Terminal on Databricks - Web Terminal, or from a cell with %sh just doing `scala -nc script.scala` is not working.Thanks,

Data Engineering

2296 Views
9 replies
2 kudos

09-14-2021 10:04:38 AM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 1:25:44 PM

2 kudos

Hi @Zen), Just a friendly follow-up. Do you still need help, or @DARSHAN BARGAL 's response help you to find the solution? Please let us know.

2 kudos

05-18-2022 1:25:44 PM

8 More Replies

by Alex_G • New Contributor II

09-08-2021 10:40:23 AM

1106 Views
3 replies
5 kudos

Resolved! Databricks Feature Store in MLFlow run CLI command

Hello!I am attempting to move some machine learning code from a databricks notebook into a mlflow git repository. I am utilizing the databricks feature store to load features that have been processed. Currently I cannot get the databricks library to ...

Data Engineering

1106 Views
3 replies
5 kudos

09-08-2021 10:40:23 AM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 1:23:12 PM

5 kudos

Hi @Alex Graff , Just a friendly follow-up. Do you still need help, or @Sean Owen 's response help you to find the solution? Please let us know.

5 kudos

05-18-2022 1:23:12 PM

2 More Replies

by NickGoodfella • New Contributor

08-20-2021 5:22:41 AM

909 Views
2 replies
1 kudos

DNS_Analytics Notebook Problems

Hello everyone! First post on the forums, been stuck at this for awhile now and cannot seem to understand why this is happening. Basically, I have been using a seems to be premade Databricks notebook from Databricks themselves for a DNS Analytics exa...

Data Engineering

909 Views
2 replies
1 kudos

08-20-2021 5:22:41 AM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 1:05:59 PM

1 kudos

Hi @NickGoodfella , Just a friendly follow-up. Do you still need help, or @Sean Owen's response help you to find the solution? Please let us know.

1 kudos

05-18-2022 1:05:59 PM

1 More Replies

by EricOX • New Contributor

09-28-2021 2:24:33 AM

2746 Views
3 replies
3 kudos

Resolved! How to handle configuration for different environment (e.g. DEV, PROD)?

May I know any suggested way to handle different environment variables for the same code base? For example, the mount point of Data Lake for DEV, UAT, and PROD. Any recommendations or best practices? Moreover, how to handle Azure DevOps?

Data Engineering

2746 Views
3 replies
3 kudos

09-28-2021 2:24:33 AM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 1:00:42 PM

3 kudos

Hi @Eric Yeung , Just a friendly follow-up. Do you still need help or the above responses help you to find the solution? Please let us know.

3 kudos

05-18-2022 1:00:42 PM

2 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Resolved! How to make sparklyr extension work with Databricks runtime?

Resolved! Azure: Permanently purge cluster logs

Resolved! Get package from Nexus repo.

Resolved! Identify the partitionValues written yesterday from delta

Resolved! Do Spark nodes read data from storage in a sequence?

spark-xml not working with Databricks Connect and Pyspark

Resolved! How can I get the json spec of my Databricks Job?

Resolved! upgraded R package rlang to 0.4.11 on DBR 8.3 SC, but sessionInfo() still shows rlang as 0.4.9

Delta Lake Python

ImportError: No module named 'bs4'

What happens to my production jobs if the underlying Databricks Runtime is no longer supported? Will they fail?

Resolved! How do I run a scala script from the Terminal

Resolved! Databricks Feature Store in MLFlow run CLI command

DNS_Analytics Notebook Problems

Resolved! How to handle configuration for different environment (e.g. DEV, PROD)?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...