- 675 Views
- 1 replies
- 0 kudos
I have a master delta table that is continuously getting written by a streaming job. I have optimize writes enabled and in addition, I run the OPTIMIZE command every 3 hours. However, I think the downstream streaming jobs which are streaming the data...
- 675 Views
- 1 replies
- 0 kudos
Latest Reply
This is working as expected. For Delta streaming, the data files created in the first place will be used for streaming. The optimized files are not considered the downstream streaming job. This is the reason it's not recommended to run VACUUM with f...
- 1450 Views
- 1 replies
- 1 kudos
What happens to resources (notebooks, jobs, clusters etc.) owned by a user when a user is deleted? The underlying problem we are trying to solve is that we want to automatically delete users through SCIM when the user leaves the company so that the u...
- 1450 Views
- 1 replies
- 1 kudos
Latest Reply
When you remove a user from Databricks, a special backup folder is created in the workspace. This backup folder contains all of the deleted user’s content.W.r.t clusters/jobs, an admin can grant permission to other users.
- 1364 Views
- 1 replies
- 1 kudos
Using %sh, I am able to run commands on the notebook and get output. How can i run a command on the executor and get the output. I want to avoid using the Spark API's
- 1364 Views
- 1 replies
- 1 kudos
Latest Reply
It's not possible to use %sh to run commands on the executor. The below code can be used to run commands on the executor and get the outputvar res=sc.runOnEachExecutor[String]({ () =>
import sys.process._
var cmd_Result=Seq("bash", "-c", "h...
- 10497 Views
- 1 replies
- 1 kudos
Is there general guidance around using views vs creating Delta tables? For example, I need to do some filtering and make small tweaks to a few columns for use in another application. Is there a downside of using a view here?
- 10497 Views
- 1 replies
- 1 kudos
Latest Reply
Views won't duplicate the data so if you are just filtering columns or rows or making small tweaks then views might be a good option. Unless, of course, the filtering is really expensive or you are doing a lot of calculations, then materialize the vi...
- 2180 Views
- 1 replies
- 1 kudos
How to identify the jars used to load a particular class. I am sure I packed the classes correctly in my application jar. However, looks like the class is loaded from a different jar. I want to understand the details so that I can ensure to use the r...
- 2180 Views
- 1 replies
- 1 kudos
Latest Reply
Adding the below configurations at the cluster level can help to print more logs to identify the jars from which the class is loaded. spark.executor.extraJavaOptions=-verbose:class spark.driver.extraJavaOptions=-verbose:class
- 700 Views
- 1 replies
- 0 kudos
When trying to upload libraries on UI it fails.
- 700 Views
- 1 replies
- 0 kudos
Latest Reply
One corner case scenario where we can hit this issue is if there is /<shard name>/0/Filestore/jars file in the root bucket of the workspace. Once you remove the file, the upload should work fine.
- 1426 Views
- 3 replies
- 0 kudos
When using spot fleet pools to schedule jobs, driver and worker nodes are provisioned from the spot pools and we are noticing jobs failing with the below exception when there is a driver spot loss. Share best practices around using fleet pools with 1...
- 1426 Views
- 3 replies
- 0 kudos
Latest Reply
In this scenario, the driver node is reclaimed by AWS. Databricks started preview of hybrid pools feature which would allow you to provision driver node from a different pool. We recommend using on-demand pool for driver node to improve reliability i...
2 More Replies
- 811 Views
- 1 replies
- 1 kudos
I have a spark-submit application that worked fine with 8GB executor memory in yarn. I am testing the same job against the Databricks cluster with the same executor memory. However, the jobs are running slower in Databricks.
- 811 Views
- 1 replies
- 1 kudos
Latest Reply
This is not an Apple to Apple comparison. When you set 8GB as the executor memory in Yarn, then the container that is launched to run the executor JVM is getting 8GB of memory. Accordingly, the Xmx value of the heap is calculated. In Databricks, when...
- 1031 Views
- 1 replies
- 0 kudos
Is it possible to run the OSS SPark history server and view the spark event logs.
- 1031 Views
- 1 replies
- 0 kudos
Latest Reply
Yes, it's possible. The OSS Spark history server can read the Spark event logs generated on a Databricks cluster. Using Cluster log delivery, the SPark logs can be written to any arbitrary location. Event logs can be copied from there to the storage ...
- 429 Views
- 0 replies
- 0 kudos
This is a 2-part question:How do I go about encrypting an existing root S3 bucket?Will this impact my Databricks environment? (Resources not being accessible, performance issues etc.)
- 429 Views
- 0 replies
- 0 kudos
- 1407 Views
- 1 replies
- 0 kudos
On the Spark UI, Jobs are running forever. But my notebook already completed the operations. Why the resources are wasted
- 1407 Views
- 1 replies
- 0 kudos
Latest Reply
This happens if the Spark driver is missing events. The jobs/task are not running. The Spark UI is reporting incorrect stats. This can be treated as a harmless UI issue. If you continue to see the issue consistently, then it might be good to review w...