Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
In Databricks -->Workflows-->Job Runs we have a column "Run As".From where does this value come. We are getting a user id here but need to change it to a generic account. Any help would be appreciated. Thanks
I'm surprised why there no options to select "Run as" as something like "system user". Why all this complication with Service Principal? Where to report this ?@DataBricks
I see that spark fully supports Scala 2.13. I wonder why is there no databricks runtime with Scala 2.13 yet. Any plans on making this available? It would be super useful.
I agree with @777. As Scala 3 is getting mature and there are more real use cases with Scala 3 on Spark now, support for Scala 2.13 will be valuable to users including us.I think the recent upgrade of Databricks runtime from JDK 8 to 17 was one of a ...
I recently created a table on a cluster in Azure running Databricks Runtime 11.1. The table is partitioned by a "date" column. I enabled column mapping, like this:ALTER TABLE {schema}.{table_name} SET TBLPROPERTIES('delta.columnMapping.mode' = 'nam...
Hi @Retired_mod , I have few queries on Directory Names with Column Mapping. I have this delta table on ADLS and I am trying to read it, but I am getting below error. How can we read delta tables with column mapping enabled with pyspark?Can you pleas...
According to this page, the GraphFrames package is included in the databricks runtime since at least 11.0. However trying to run a connected components algorithm inside a delta live table notebook yields the error java.lang.ClassNotFoundException: or...
I'm also trying to use GraphFrames inside a DLT pipeline. I get an error that graphframes not installed in the cluster. i"m using it successfully in test notebooks using the ML version of the cluster. Is there a way to use this inside a DLT job?
We are having Databricks Job running with main class and JAR file in it. Our JAR file code base is in Scala. Now, when our job starts running, we need to log Job ID and Run ID into the database for future purpose. How can we achieve this?
Rename and drop columns with Delta Lake column mapping. Hi all,Now databricks started supporting column rename and drop.Column mapping requires the following Delta protocols:Reader version 2 or above.Writer version 5 or above.Blog URL##Available in D...
Hi Team,We are wondering if there is a recommended way to install the chromium browser and chrome driver on Databricks Runtime 10.4 and above ?I have been through the site and have come across several links to this effect, but they all seem to be ins...
Look into Playwrite instead of Selenium. I went through the same process y'all went through here (ended up writing a init script to install the drivers etc)This is all done for you in playwright. Refer to this post - I hope it helps!!https://communit...
Hey,I'm trying to perform Time window aggregation in two different streams followed by stream-stream window join described here. I'm running Databricks Runtime 13.1, exactly as advised.However, when I'm reproducing the following code:clicksWindow = c...
I am getting an error about the `drop function of pyspark` at a cluster using 12.2 LTS. When I check the error I see spark solved that bug, see SPARK-42444. Also when I check maintenance updates page, I saw this solved issue included the Databricks R...
Hi @Sevval Mehder Elevate our community by acknowledging exceptional contributions. Your participation in marking the best answers is a testament to our collective pursuit of knowledge.
Is there a way to manually update the cuda required file in the db runtime?There are some rather annoying bugs still in TF 2.11 that have been fixed in TF 2.12.Sadly the latest DB runtime 13.1 (beta) only supports the older TF 2.11 even tho 2.12 was ...
We're using the following method (generated by using dbx) to access dbutils, e.g. to retrieve parameters from secret scopes: @staticmethod
def _get_dbutils(spark: SparkSession) -> "dbutils":
try:
from pyspark.dbutils import...
After reviewing this Deprecations, I discovered that Table Access Control is not supported in Databricks Runtime for Machine Learning.I want to understand why table access control is not designed for ML runtime. Is there any reason behind this?
@Thanapat Sontayasara Table Access Control (TAC) is a feature in Databricks that allows you to restrict access to specific tables in your workspace based on user or group identity.According to the Databricks documentation, TAC is not supported in th...
Currently I am using the following cluster. It is using the default python version of 3.9.5 and I would like to update it to 3.10.1.0How to achieve this?
Hi @Ayush Modi Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...
I want to use the same spark session which created in one notebook and need to be used in another notebook in across same environment, Example, if some of the (variable)object got initialized in the first notebook, i need to use the same object in t...
Hi,We're using Databricks Runtime version 11.3LTS and executing a Spark Java Job using a Job Cluster. To automate the execution of this job, we need to define (source in from bash config files) some environment variables through an init script (clust...
Hi @Rahul K Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!