Topics with Label: Data Ingestion & connectivity

Forum Posts

Sorted by:

Start a conversation

by brickster_2018 • Databricks Employee

06-25-2021 3:53:39 PM

4046 Views
1 replies
1 kudos

Is it recommended to use G1GC on the Databricks cluster

Data Engineering

4046 Views
1 replies
1 kudos

06-25-2021 3:53:39 PM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-25-2021 4:09:12 PM

1 kudos

G1GC can solve problems in some cases where garbage collection is a bottleneck. checkout https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html

1 kudos

06-25-2021 4:09:12 PM

by brickster_2018 • Databricks Employee

06-25-2021 3:50:46 PM

1209 Views
1 replies
1 kudos

Can I run the Python2 Pyspark application in my Databricks GCP workspace

Data Engineering

1209 Views
1 replies
1 kudos

06-25-2021 3:50:46 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-25-2021 3:55:46 PM

1 kudos

For Databricks Runtime 5.5 LTS, Spark jobs, Python notebook cells, and library installation all support both Python 2 and 3.The default Python version for clusters created using the UI is Python 3. In Databricks Runtime 5.5 LTS the default version fo...

1 kudos

06-25-2021 3:55:46 PM

by Satyadeepak • Databricks Employee

06-24-2021 7:21:43 AM

1607 Views
1 replies
1 kudos

In Databricks UI /Workspace and /Repos are in same level but while reading a CSV file in Repos Notebooks why do we need to give the path as /Workspace/Repos...?

Data Engineering

1607 Views
1 replies
1 kudos

06-24-2021 7:21:43 AM

View Replies

Latest Reply

aladda
Databricks Employee

06-25-2021 3:52:43 PM

1 kudos

Can you provide an example of what exactly do you mean? If the reference is to how "Repos" shows up in the UI, that's more for a Ux convenience. Repos as such are designed to be a container for version controlled notebooks that live in the Git reposi...

1 kudos

06-25-2021 3:52:43 PM

by aladda • Databricks Employee

06-25-2021 3:25:17 PM

5138 Views
1 replies
1 kudos

Why do Databricks deployments require 2 subnets for each workspace

Databricks must have access to at least two subnets for each workspace, with each subnet in a different availability zone per docs here

Data Engineering

5138 Views
1 replies
1 kudos

06-25-2021 3:25:17 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-25-2021 3:27:21 PM

1 kudos

This is designed for optimal user experience and as a capacity planning strategy where if instances are not available in one AZ, the other subnet in a different AZ can be used to deploy instance from instead

1 kudos

06-25-2021 3:27:21 PM

by User16790091296 • Contributor II

06-24-2021 8:40:07 AM

3592 Views
1 replies
1 kudos

Secrets in databricks

I created a secret on databricks using the secrets API.Code :Scope_name : {"scope": "dbtest", "initial_manage_principal":"user"} Resp= requests.post('https://instancename.net/mynoteid/api/2.0/secrets/scopes/create',json=Scope_name)Similar way, I adde...

Data Engineering

3592 Views
1 replies
1 kudos

06-24-2021 8:40:07 AM

View Replies

Latest Reply

aladda
Databricks Employee

06-25-2021 2:51:27 PM

1 kudos

You'll have to specify the scope and the key in the format below to get the value. dbutils.secret.get(scope="dbtest", key="user") Probably a good idea to review the Secret Management documentation for details on how to get this setup the right way - ...

1 kudos

06-25-2021 2:51:27 PM

by User16826990884 • New Contributor III

06-25-2021 11:54:35 AM

3859 Views
1 replies
1 kudos

Impact on Databricks objects after a user is deleted

What happens to resources (notebooks, jobs, clusters etc.) owned by a user when a user is deleted? The underlying problem we are trying to solve is that we want to automatically delete users through SCIM when the user leaves the company so that the u...

Data Engineering

3859 Views
1 replies
1 kudos

06-25-2021 11:54:35 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-25-2021 12:25:01 PM

1 kudos

When you remove a user from Databricks, a special backup folder is created in the workspace. This backup folder contains all of the deleted user’s content.W.r.t clusters/jobs, an admin can grant permission to other users.

1 kudos

06-25-2021 12:25:01 PM

by Srikanth_Gupta_ • Databricks Employee

06-25-2021 7:33:46 AM

2298 Views
1 replies
1 kudos

What is the difference between Databricks secret scopes vs AWS secret manager vs Azure key vault, in which scenarios I should go for secret scopes

Data Engineering

2298 Views
1 replies
1 kudos

06-25-2021 7:33:46 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-25-2021 12:15:32 PM

1 kudos

All three options are secure ways to store secrets. Databricks secrets has the additional functionality of redaction , so it is convenient sometimes. Also in azure, you have the ability to use azure KV as the backend for secrets.

1 kudos

06-25-2021 12:15:32 PM

by brickster_2018 • Databricks Employee

06-25-2021 12:13:22 PM

4399 Views
1 replies
1 kudos

Resolved! Classpath issues when running spark-submit

How to identify the jars used to load a particular class. I am sure I packed the classes correctly in my application jar. However, looks like the class is loaded from a different jar. I want to understand the details so that I can ensure to use the r...

Data Engineering

4399 Views
1 replies
1 kudos

06-25-2021 12:13:22 PM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-25-2021 12:14:49 PM

1 kudos

Adding the below configurations at the cluster level can help to print more logs to identify the jars from which the class is loaded. spark.executor.extraJavaOptions=-verbose:class spark.driver.extraJavaOptions=-verbose:class

1 kudos

06-25-2021 12:14:49 PM

by brickster_2018 • Databricks Employee

06-25-2021 12:06:06 PM

2030 Views
1 replies
1 kudos

Resolved! Databricks Vs Yarn - Resource Utilization

I have a spark-submit application that worked fine with 8GB executor memory in yarn. I am testing the same job against the Databricks cluster with the same executor memory. However, the jobs are running slower in Databricks.

Data Engineering

2030 Views
1 replies
1 kudos

06-25-2021 12:06:06 PM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-25-2021 12:06:46 PM

1 kudos

This is not an Apple to Apple comparison. When you set 8GB as the executor memory in Yarn, then the container that is launched to run the executor JVM is getting 8GB of memory. Accordingly, the Xmx value of the heap is calculated. In Databricks, when...

1 kudos

06-25-2021 12:06:46 PM

by MoJaMa • Databricks Employee

05-07-2021 11:22:39 AM

10112 Views
3 replies
1 kudos

Does Databricks charge anything when nodes in a pool are idle?

Data Engineering

10112 Views
3 replies
1 kudos

05-07-2021 11:22:39 AM

View Replies

Latest Reply

User16783853906
Contributor III

06-25-2021 10:33:47 AM

1 kudos

Databricks does not charge DBUs while instances are idle in the pool. Instance provider billing does apply.Please refer here for more information - https://docs.databricks.com/clusters/instance-pools/index.html

1 kudos

06-25-2021 10:33:47 AM

2 More Replies

by User16826992666 • Valued Contributor

06-25-2021 9:27:49 AM

2877 Views
1 replies
1 kudos

Resolved! Does Databricks integrate with Immuta?

My company uses Immuta for data governance. Will Databricks be able to fit into our existing security patterns?

Data Engineering

2877 Views
1 replies
1 kudos

06-25-2021 9:27:49 AM

View Replies

Latest Reply

Ryan_Chynoweth
Esteemed Contributor

06-25-2021 10:02:09 AM

1 kudos

Yes, check out the immuta web page on the Databricks Integration. https://www.immuta.com/integrations/databricks

1 kudos

06-25-2021 10:02:09 AM

by User16826994223 • Honored Contributor III

06-25-2021 8:26:39 AM

2135 Views
1 replies
1 kudos

File path Not recognisable for notebook jobs in DBFS

we are working on IDEs and once code is developed we put the .py file in DBFS and I am uisng that DBFS path to create a job , but I am getting an error dbfs:/artifacts/kg/bootstrap.py. I get the error notebook not found errror.what could be the is...

Data Engineering

2135 Views
1 replies
1 kudos

06-25-2021 8:26:39 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 8:27:15 AM

1 kudos

The actual notebooks that you create are not stored in Data plane but it is stored in but in control plane, you can import the notebooks through import in Databricks UI or using API , The notebook placed in DBFS cannot be used to create a job

1 kudos

06-25-2021 8:27:15 AM

by User16826994223 • Honored Contributor III

06-25-2021 8:07:43 AM

1538 Views
1 replies
1 kudos

How do i see all the dataframe column if I have more than 1000 column in dataframe

I tried printSchema() of a Dataframe in Databricks. The Dataframe is having more than 1500 columns and apparently the printscheam function is truncating results and displaying only 1000 items. How to see all columns

Data Engineering

1538 Views
1 replies
1 kudos

06-25-2021 8:07:43 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 8:10:19 AM

1 kudos

Databricks also shows the schema of the Dataframe when it's created - click on the icon next to the name of variable that holds the dataframeIf you have output of more than limit, then I would imagine outputting the schema into file,

1 kudos

06-25-2021 8:10:19 AM

by Anonymous • Not applicable

06-24-2021 10:20:42 PM

1290 Views
1 replies
2 kudos

Why doesn't Databricks support Scala on high concurrency cluster? Is it on the roadmap?

Data Engineering

1290 Views
1 replies
2 kudos

06-24-2021 10:20:42 PM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 2:49:05 AM

2 kudos

Scala Use JVM to run its code, Scala cannot run different applications at a time with complete isolation of each task inside single jvm , that is the reason Scala doesn't support high concurrency cluster, I don't think it is on road map

2 kudos

06-25-2021 2:49:05 AM

by User16790091296 • Contributor II

06-24-2021 8:24:40 AM

2566 Views
1 replies
1 kudos

Using Databricks Connect (DBConnect)

I'd like to edit Databricks notebooks locally using my favorite editor, and then use Databricks Connect to run the notebook remotely on a Databricks cluster that I usually access via the web interface.I run "databricks-connect configure" , as suggest...

Data Engineering

2566 Views
1 replies
1 kudos

06-24-2021 8:24:40 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-24-2021 5:11:53 PM

1 kudos

Here is the link to the configuration properties https://docs.databricks.com/dev-tools/databricks-connect.html#step-2-configure-connection-properties

1 kudos

06-24-2021 5:11:53 PM