cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

938452
by New Contributor III
  • 5475 Views
  • 4 replies
  • 3 kudos

Resolved! Executor memory increase limitation based on node type

Hi Databricks community,I'm using Databricks Jobs Cluster to run some jobs. I'm setting the worker and driver type to AWS m6gd.large, which has 2 cores and 8G of memory each.After seeing it's defaulting executor memory to 2G, I wanted to increase it,...

  • 5475 Views
  • 4 replies
  • 3 kudos
Latest Reply
938452
New Contributor III
  • 3 kudos

I think I found the right answer here: https://kb.databricks.com/en_US/clusters/spark-shows-less-memoryIt seems it sets fixed size of ~4GB is used for internal node services. So depending on the node type, `spark.executor.memory` is fixed by Databric...

  • 3 kudos
3 More Replies
SaraCorralLou
by New Contributor III
  • 2331 Views
  • 7 replies
  • 2 kudos

Bad performance UDFs functions

Hello,I am contacting you because I am having a problem with the performance of my notebooks on databricks.My notebook is written in python (pypark) in it I read a delta table that I copy to a dataframe and do several transformations and create sever...

SaraCorralLou_0-1692357805407.png
  • 2331 Views
  • 7 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

looping over records is a performance killer.  To be avoided at all costs.beware the for-loop (databricks.com)

  • 2 kudos
6 More Replies
ricard98
by New Contributor II
  • 3400 Views
  • 6 replies
  • 5 kudos

How to integrate SAP ERP to databricks

is there a way to integrate SAP erp to a databricks Notebook through python???,

  • 3400 Views
  • 6 replies
  • 5 kudos
Latest Reply
Kong
New Contributor II
  • 5 kudos

I've connected Databricks directly to S4/HANA ABAP layers but will re-iterate that it is extremely challenging if you do not have a background in sys administration, networking, devops, programming, and SAP.

  • 5 kudos
5 More Replies
Chris_Shehu
by Valued Contributor III
  • 1433 Views
  • 2 replies
  • 1 kudos

Resolved! Custom Library's(Unity Catalog Enabled Clusters)

I'm trying to use a custom library that I created from a .whl file in the workspace/shared location. The library attaches to the cluster without any issues and I can it when I list the modules using pip. When I try to call the module I get an error t...

  • 1433 Views
  • 2 replies
  • 1 kudos
Latest Reply
Szpila
New Contributor II
  • 1 kudos

Hello Guys,I am working on the project where we need to use spark-excel library (Maven) in order to ingest data from excel files. As those 3rd party library are not allowed on shared cluster, do you have any workaround other then using pandas for exa...

  • 1 kudos
1 More Replies
User15986662700
by New Contributor III
  • 2719 Views
  • 4 replies
  • 1 kudos
  • 2719 Views
  • 4 replies
  • 1 kudos
Latest Reply
User15986662700
New Contributor III
  • 1 kudos

Yes, it is possible to connect databricks to a kerberized hbase cluster. The attached article explains the steps. It consists of setting up a kerberos client using a keytab in the cluster nodes, installing the hbase-spark integration library, and set...

  • 1 kudos
3 More Replies
naga_databricks
by Contributor
  • 1369 Views
  • 2 replies
  • 0 kudos

Reading bigquery data using a query

To read Bigquery data using spark.read, i'm using a query. This query executes and creates a table on the materializationDataset. df = spark.read.format("bigquery") \.option("query", query) \.option("materializationProject", materializationProject) \...

  • 1369 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @naga_databricks, The Databricks documentation does not explicitly state that spark.read BigQuery format will create a Materialized View. Instead, it mentions that it can read from a BigQuery table or the result of a BigQuery SQL query. When you ...

  • 0 kudos
1 More Replies
EDDatabricks
by Contributor
  • 585 Views
  • 2 replies
  • 2 kudos

Appropriate storage account type for reference data (Azure)

Hello,We are using a reference dataset for our Production applications. We would like to create a delta table for this dataset to be used from our applications. Currently, manual updates will occur on this dataset through a script on a weekly basis. ...

Data Engineering
Delta Live Table
Storage account
  • 585 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

+1 for ADLS.  Hierarchical storage, hot/cold/premium storage, things not possible in blob storage

  • 2 kudos
1 More Replies
564824
by New Contributor II
  • 1457 Views
  • 1 replies
  • 0 kudos

Resolved! Why is Photon increasing DBU used per hour?

I noticed that enabling photon acceleration is increasing the number of DBU utilized per hour which in turn increases our cost.In light of this, I am interested in gaining clarity on the costing of Photon acceleration as I was led to believe that Pho...

  • 1457 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

Photon is more expensive DBU wise.The cost optimization/reduction is achieved by (possible) faster runtimes.So as you already noticed, it can be a cost reduction but not in all cases (as with you apparently).But it can also be interesting to serve da...

  • 0 kudos
irispan
by New Contributor II
  • 1996 Views
  • 4 replies
  • 1 kudos

Recommended Hive metastore pattern for Trino integration

Hi, i have several questions regarding Trino integration:Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?When I tried to use ex...

test - Databricks
  • 1996 Views
  • 4 replies
  • 1 kudos
Latest Reply
JunlinZeng
New Contributor II
  • 1 kudos

> Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?Databricks maintained hive metastore is not suggested to be used externally. ...

  • 1 kudos
3 More Replies
Agus1
by New Contributor III
  • 2557 Views
  • 3 replies
  • 3 kudos

Update destination table when using Spark Structured Streaming and Delta tables

I’m trying to implement a streaming pipeline that will run hourly using Spark Structured Streaming, Scala and Delta tables. The pipeline will process different items with their details.The source are delta tables that already exists, written hourly u...

  • 2557 Views
  • 3 replies
  • 3 kudos
Latest Reply
Tharun-Kumar
Honored Contributor II
  • 3 kudos

@Agus1 Could you try using CDC in delta. You could use readChangeFeed to read only the changes that got applied on the source table. This is also explained here.https://learn.microsoft.com/en-us/azure/databricks/delta/delta-change-data-feed

  • 3 kudos
2 More Replies
Eric_Kieft
by New Contributor III
  • 1416 Views
  • 2 replies
  • 1 kudos

Unity Catalog Table/View Column Data Type Changes

When changing a delta table column data type in Unity Catalog, we noticed a view that is referencing that table did not automatically update to reflect the new data type.Is there a way to update the delta table column data type so that it also update...

  • 1416 Views
  • 2 replies
  • 1 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 1 kudos

Can you try refreshing the view by running the command: REFRESH TABLE <viewname>

  • 1 kudos
1 More Replies
suresh1122
by New Contributor III
  • 8107 Views
  • 11 replies
  • 7 kudos

dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?

I am trying to save a dataframe after a series of data manipulations using Udf functions to a delta table. I tried using this code( df .write .format('delta') .mode('overwrite') .option('overwriteSchema', 'true') .saveAsTable('output_table'))but this...

  • 8107 Views
  • 11 replies
  • 7 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 7 kudos

You should also look into the sql plan if the writing phase is indeed the part that is taking time. Since spark works on lazy evaluation, there might be some other phase that might be taking time

  • 7 kudos
10 More Replies
NDK
by New Contributor II
  • 1028 Views
  • 1 replies
  • 0 kudos

Soft stop a Streaming Job

I have an auto loader streaming job with the continuous run, I want to stop that job on weekend for some time and restart again. 

  • 1028 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
I have an auto loader streaming job with the continuous run, I want to stop that job on weekend for some time and restart again. 

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
Vibhor
by Contributor
  • 2022 Views
  • 5 replies
  • 4 kudos

Resolved! Cluster Performance

Facing an issue with cluster performance, in event log can see - cluster is not responsive likely due to GC. Number of pipeline (databricks notebooks) running and cluster configuration is same as it used to be before but started seeing this issue sin...

  • 2022 Views
  • 5 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Moderator
  • 4 kudos

Hi @Vibhor Sethi​ ,Do you see any other error messages? did you data volume increase? what kind of job are you running?

  • 4 kudos
4 More Replies
Labels
Top Kudoed Authors