cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

brickster_2018
by Databricks Employee
  • 4046 Views
  • 1 replies
  • 1 kudos
  • 4046 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

 G1GC can solve problems in some cases where garbage collection is a bottleneck. checkout https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html

  • 1 kudos
brickster_2018
by Databricks Employee
  • 1209 Views
  • 1 replies
  • 1 kudos
  • 1209 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Databricks Employee
  • 1 kudos

For Databricks Runtime 5.5 LTS, Spark jobs, Python notebook cells, and library installation all support both Python 2 and 3.The default Python version for clusters created using the UI is Python 3. In Databricks Runtime 5.5 LTS the default version fo...

  • 1 kudos
Satyadeepak
by Databricks Employee
  • 1607 Views
  • 1 replies
  • 1 kudos
  • 1607 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Databricks Employee
  • 1 kudos

Can you provide an example of what exactly do you mean? If the reference is to how "Repos" shows up in the UI, that's more for a Ux convenience. Repos as such are designed to be a container for version controlled notebooks that live in the Git reposi...

  • 1 kudos
aladda
by Databricks Employee
  • 5138 Views
  • 1 replies
  • 1 kudos

Why do Databricks deployments require 2 subnets for each workspace

Databricks must have access to at least two subnets for each workspace, with each subnet in a different availability zone per docs here

  • 5138 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Databricks Employee
  • 1 kudos

This is designed for optimal user experience and as a capacity planning strategy where if instances are not available in one AZ, the other subnet in a different AZ can be used to deploy instance from instead

  • 1 kudos
User16790091296
by Contributor II
  • 3592 Views
  • 1 replies
  • 1 kudos

Secrets in databricks

I created a secret on databricks using the secrets API.Code :Scope_name : {"scope": "dbtest", "initial_manage_principal":"user"} Resp= requests.post('https://instancename.net/mynoteid/api/2.0/secrets/scopes/create',json=Scope_name)Similar way, I adde...

  • 3592 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Databricks Employee
  • 1 kudos

You'll have to specify the scope and the key in the format below to get the value. dbutils.secret.get(scope="dbtest", key="user") Probably a good idea to review the Secret Management documentation for details on how to get this setup the right way - ...

  • 1 kudos
User16826990884
by New Contributor III
  • 3859 Views
  • 1 replies
  • 1 kudos

Impact on Databricks objects after a user is deleted

What happens to resources (notebooks, jobs, clusters etc.) owned by a user when a user is deleted? The underlying problem we are trying to solve is that we want to automatically delete users through SCIM when the user leaves the company so that the u...

  • 3859 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

When you remove a user from Databricks, a special backup folder is created in the workspace. This backup folder contains all of the deleted user’s content.W.r.t clusters/jobs, an admin can grant permission to other users.

  • 1 kudos
Srikanth_Gupta_
by Databricks Employee
  • 2298 Views
  • 1 replies
  • 1 kudos
  • 2298 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

All three options are secure ways to store secrets. Databricks secrets has the additional functionality of redaction , so it is convenient sometimes. Also in azure, you have the ability to use azure KV as the backend for secrets.

  • 1 kudos
brickster_2018
by Databricks Employee
  • 4399 Views
  • 1 replies
  • 1 kudos

Resolved! Classpath issues when running spark-submit

How to identify the jars used to load a particular class. I am sure I packed the classes correctly in my application jar. However, looks like the class is loaded from a different jar. I want to understand the details so that I can ensure to use the r...

  • 4399 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 1 kudos

Adding the below configurations at the cluster level can help to print more logs to identify the jars from which the class is loaded. spark.executor.extraJavaOptions=-verbose:class spark.driver.extraJavaOptions=-verbose:class

  • 1 kudos
brickster_2018
by Databricks Employee
  • 2030 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks Vs Yarn - Resource Utilization

I have a spark-submit application that worked fine with 8GB executor memory in yarn. I am testing the same job against the Databricks cluster with the same executor memory. However, the jobs are running slower in Databricks. 

  • 2030 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 1 kudos

This is not an Apple to Apple comparison. When you set 8GB as the executor memory in Yarn, then the container that is launched to run the executor JVM is getting 8GB of memory. Accordingly, the Xmx value of the heap is calculated. In Databricks, when...

  • 1 kudos
MoJaMa
by Databricks Employee
  • 10112 Views
  • 3 replies
  • 1 kudos
  • 10112 Views
  • 3 replies
  • 1 kudos
Latest Reply
User16783853906
Contributor III
  • 1 kudos

Databricks does not charge DBUs while instances are idle in the pool. Instance provider billing does apply.Please refer here for more information - https://docs.databricks.com/clusters/instance-pools/index.html

  • 1 kudos
2 More Replies
User16826992666
by Valued Contributor
  • 2877 Views
  • 1 replies
  • 1 kudos

Resolved! Does Databricks integrate with Immuta?

My company uses Immuta for data governance. Will Databricks be able to fit into our existing security patterns?

  • 2877 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 1 kudos

Yes, check out the immuta web page on the Databricks Integration. https://www.immuta.com/integrations/databricks

  • 1 kudos
User16826994223
by Honored Contributor III
  • 2135 Views
  • 1 replies
  • 1 kudos

File path Not recognisable for notebook jobs in DBFS

we are working on IDEs and once code is developed we put the .py file in DBFS and I am uisng that DBFS path to create a job , but I am getting an error dbfs:/artifacts/kg/bootstrap.py. I get the error notebook not found errror.what could be the is...

  • 2135 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

The actual notebooks that you create are not stored in Data plane but it is stored in but in control plane, you can import the notebooks through import in Databricks UI or using API , The notebook placed in DBFS cannot be used to create a job

  • 1 kudos
User16826994223
by Honored Contributor III
  • 1538 Views
  • 1 replies
  • 1 kudos

How do i see all the dataframe column if I have more than 1000 column in dataframe

 I tried printSchema() of a Dataframe in Databricks. The Dataframe is having more than 1500 columns and apparently the printscheam function is truncating results and displaying only 1000 items. How to see all columns

  • 1538 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

Databricks also shows the schema of the Dataframe when it's created - click on the icon next to the name of variable that holds the dataframeIf you have output of more than limit, then I would imagine outputting the schema into file,

  • 1 kudos
Anonymous
by Not applicable
  • 1290 Views
  • 1 replies
  • 2 kudos
  • 1290 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 2 kudos

Scala Use JVM to run its code, Scala cannot run different applications at a time with complete isolation of each task inside single jvm , that is the reason Scala doesn't support high concurrency cluster, I don't think it is on road map

  • 2 kudos
User16790091296
by Contributor II
  • 2566 Views
  • 1 replies
  • 1 kudos

Using Databricks Connect (DBConnect)

I'd like to edit Databricks notebooks locally using my favorite editor, and then use Databricks Connect to run the notebook remotely on a Databricks cluster that I usually access via the web interface.I run "databricks-connect configure" , as suggest...

  • 2566 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

Here is the link to the configuration properties https://docs.databricks.com/dev-tools/databricks-connect.html#step-2-configure-connection-properties

  • 1 kudos
Labels