Hi Databricks community,I'm using Databricks Jobs Cluster to run some jobs. I'm setting the worker and driver type to AWS m6gd.large, which has 2 cores and 8G of memory each.After seeing it's defaulting executor memory to 2G, I wanted to increase it,...
I think I found the right answer here: https://kb.databricks.com/en_US/clusters/spark-shows-less-memoryIt seems it sets fixed size of ~4GB is used for internal node services. So depending on the node type, `spark.executor.memory` is fixed by Databric...
Hello,I am contacting you because I am having a problem with the performance of my notebooks on databricks.My notebook is written in python (pypark) in it I read a delta table that I copy to a dataframe and do several transformations and create sever...
I've connected Databricks directly to S4/HANA ABAP layers but will re-iterate that it is extremely challenging if you do not have a background in sys administration, networking, devops, programming, and SAP.
I'm trying to use a custom library that I created from a .whl file in the workspace/shared location. The library attaches to the cluster without any issues and I can it when I list the modules using pip. When I try to call the module I get an error t...
Hello Guys,I am working on the project where we need to use spark-excel library (Maven) in order to ingest data from excel files. As those 3rd party library are not allowed on shared cluster, do you have any workaround other then using pandas for exa...
Yes, it is possible to connect databricks to a kerberized hbase cluster. The attached article explains the steps. It consists of setting up a kerberos client using a keytab in the cluster nodes, installing the hbase-spark integration library, and set...
To read Bigquery data using spark.read, i'm using a query. This query executes and creates a table on the materializationDataset. df = spark.read.format("bigquery") \.option("query", query) \.option("materializationProject", materializationProject) \...
Hi @naga_databricks, The Databricks documentation does not explicitly state that spark.read BigQuery format will create a Materialized View.
Instead, it mentions that it can read from a BigQuery table or the result of a BigQuery SQL query. When you ...
Hello,We are using a reference dataset for our Production applications. We would like to create a delta table for this dataset to be used from our applications. Currently, manual updates will occur on this dataset through a script on a weekly basis. ...
I noticed that enabling photon acceleration is increasing the number of DBU utilized per hour which in turn increases our cost.In light of this, I am interested in gaining clarity on the costing of Photon acceleration as I was led to believe that Pho...
Photon is more expensive DBU wise.The cost optimization/reduction is achieved by (possible) faster runtimes.So as you already noticed, it can be a cost reduction but not in all cases (as with you apparently).But it can also be interesting to serve da...
Hi, i have several questions regarding Trino integration:Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?When I tried to use ex...
> Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?Databricks maintained hive metastore is not suggested to be used externally. ...
I’m trying to implement a streaming pipeline that will run hourly using Spark Structured Streaming, Scala and Delta tables. The pipeline will process different items with their details.The source are delta tables that already exists, written hourly u...
@Agus1 Could you try using CDC in delta. You could use readChangeFeed to read only the changes that got applied on the source table. This is also explained here.https://learn.microsoft.com/en-us/azure/databricks/delta/delta-change-data-feed
When changing a delta table column data type in Unity Catalog, we noticed a view that is referencing that table did not automatically update to reflect the new data type.Is there a way to update the delta table column data type so that it also update...
I am trying to save a dataframe after a series of data manipulations using Udf functions to a delta table. I tried using this code( df .write .format('delta') .mode('overwrite') .option('overwriteSchema', 'true') .saveAsTable('output_table'))but this...
You should also look into the sql plan if the writing phase is indeed the part that is taking time. Since spark works on lazy evaluation, there might be some other phase that might be taking time
Facing an issue with cluster performance, in event log can see - cluster is not responsive likely due to GC. Number of pipeline (databricks notebooks) running and cluster configuration is same as it used to be before but started seeing this issue sin...