Data Engineering

Forum Posts

Sorted by:

by ramravi • Contributor II

01-02-2023 6:30:29 AM

26281 Views
3 replies
0 kudos

spark is case sensitive? Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select eit...

spark is case sensitive?Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select either "Name" or "name" column you will get column ambiguity error.There is a way to handle this issue b...

Data Engineering

26281 Views
3 replies
0 kudos

01-02-2023 6:30:29 AM

View Replies

Latest Reply

zerospeed
New Contributor II

02-05-2025 8:39:15 AM

0 kudos

Hi I had similar issues with parquet files when trying to query athena, fix was i had to inspect the parquet file since it contained columns such as "Name", "name" which the aws crawler / athena would interpret as a duplicate column since it would se...

0 kudos

02-05-2025 8:39:15 AM

2 More Replies

by Martin1 • New Contributor II

05-05-2022 4:11:12 AM

10989 Views
2 replies
1 kudos

Referring to Azure Keyvault secrets in spark config

Hi allIn spark config for a cluster, it works well to refer to a Azure Keyvault secret in the "value" part of the name/value combo on a config row/setting.For example, this works fine (I've removed the string that is our specific storage account name...

Data Engineering

10989 Views
2 replies
1 kudos

05-05-2022 4:11:12 AM

View Replies

Latest Reply

kp12
New Contributor II

10-02-2023 3:31:23 AM

1 kudos

Hello,Is there any update on this issue please? Databricks no longer recommend mounting external location, so the other way to access Azure storage is to use spark config as mentioned in this document - https://learn.microsoft.com/en-us/azure/databri...

1 kudos

10-02-2023 3:31:23 AM

1 More Replies

by saikrishna3390 • New Contributor II

03-12-2023 11:51:49 AM

8840 Views
2 replies
2 kudos

How do I configure managed identity to databricks cluster and access azure storage using spark config

Partner want to use adf managed identity to connect to my databricks cluster and connect to my azure storage and copy the data from my azure storage to their azure storage storage

Data Engineering

8840 Views
2 replies
2 kudos

03-12-2023 11:51:49 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-18-2023 12:17:55 AM

2 kudos

Hi @SAI PUSALA Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback w...

2 kudos

03-18-2023 12:17:55 AM

1 More Replies

by KVNARK • Honored Contributor II

01-16-2023 10:25:38 PM

4474 Views
4 replies
6 kudos

Resolved! How to parameterize key of spark config in the job clusterlinked service from ADF

how can we parameterize key of the spark-config in the job cluster linked service from Azure datafactory, we can parameterize the values but any idea how can we parameterize the key so that when deploying to further environment it takes the PROD/QA v...

Data Engineering

4474 Views
4 replies
6 kudos

01-16-2023 10:25:38 PM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

01-17-2023 12:07:20 AM

6 kudos

@KVNARK . You can use Databricks Secrets (create a Secret scope from AKV https://learn.microsoft.com/en-us/azure/databricks/security/secrets/secret-scopes) and then reference a secret in spark configuration (https://learn.microsoft.com/en-us/azure/d...

6 kudos

01-17-2023 12:07:20 AM

3 More Replies

by debanjan89 • New Contributor II

08-25-2022 12:03:55 AM

3659 Views
3 replies
2 kudos

How do we concatenate some fixed string with a secret value in Spark Config in Databricks Job Cluster?

Hi Team,I am trying to configure access to adls through Service Principal through Spark Config in Databricks job cluster. like, fs.azure.account.oauth2.client.id.<adls_account_name>.dfs.core.windows.net {{secrets/scopeName/clientID}}The above stateme...

Data Engineering

3659 Views
3 replies
2 kudos

08-25-2022 12:03:55 AM

View Replies

Latest Reply

Manimkm08
New Contributor III

01-04-2023 5:16:00 AM

2 kudos

@Kaniz Fatma We are blocked on this issue. Can you please look into the thread and give your suggestion to workaround it.

2 kudos

01-04-2023 5:16:00 AM

2 More Replies

by AJMorgan591 • New Contributor II

09-21-2022 1:29:50 PM

5055 Views
4 replies
0 kudos

Temporarily disable Photon

Is it possible to temporarily disable Photon?I have a large workload that greatly benefits from Photon apart from a specific operation therein that is actually slowed by Photon. It's not worth creating a separate cluster for this operation however, s...

Data Engineering

5055 Views
4 replies
0 kudos

09-21-2022 1:29:50 PM

View Replies

Latest Reply

Anonymous
Not applicable

10-13-2022 1:26:24 AM

0 kudos

Hi @Aaron Morgan Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

0 kudos

10-13-2022 1:26:24 AM

3 More Replies

by Cassio • New Contributor II

02-21-2022 11:38:00 AM

4547 Views
4 replies
3 kudos

Resolved! "SparkSecurityException: Cannot read sensitive key" error when reading key from Spark config

In Databricks 10.1 it is possible to define in the "Spark Config" of the cluster something like:spark.fernet {{secrets/myscope/encryption-key}} . In my case my scopes are tied to Azure Key Vault.With that I can make a query as follows:%sql SELECT d...

Data Engineering

4547 Views
4 replies
3 kudos

02-21-2022 11:38:00 AM

View Replies

Latest Reply

Soma
Valued Contributor

07-15-2022 1:30:23 AM

3 kudos

This solution exposes the entire secret if I use commands like belowsql("""explain select upper("${spark.fernet.email}") as data """).display()Please dont use this

3 kudos

07-15-2022 1:30:23 AM

3 More Replies

by chandan_a_v • Valued Contributor

05-05-2022 11:23:48 PM

19459 Views
6 replies
6 kudos

Resolved! Spark Driver Out of Memory Issue

Hi, I am executing a simple job in Databricks for which I am getting below error. I increased the Driver size still I faced same issue. Spark config :from pyspark.sql import SparkSessionspark_session = SparkSession.builder.appName("Demand Forecasting...

Data Engineering

19459 Views
6 replies
6 kudos

05-05-2022 11:23:48 PM

View Replies

Latest Reply

chandan_a_v
Valued Contributor

05-08-2022 12:05:48 PM

6 kudos

I am getting the above issue while writing a Spark DF as a parquet file to AWS S3. Not doing any broadcast join actually.

6 kudos

05-08-2022 12:05:48 PM

5 More Replies

by baatchus • New Contributor III

02-22-2022 3:36:24 AM

5203 Views
4 replies
0 kudos

Resolved! parameterize azure storage account name in spark cluster config databricks

wondering if this is to parameterize the azure storage account name part in the spark cluster config in Databricks?I have a working example where the values are referencing secret scopes:spark.hadoop.fs.azure.account.oauth2.client.id.<azurestorageacc...

Data Engineering

5203 Views
4 replies
0 kudos

02-22-2022 3:36:24 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-06-2022 2:10:02 PM

0 kudos

Fantastic! Thanks for letting us know!

0 kudos

03-06-2022 2:10:02 PM

3 More Replies

by sarvesh • Contributor III

11-22-2021 9:03:38 PM

1546 Views
0 replies
0 kudos

Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot modify the value of a Spark config: spark.executor.memory;

I am trying to read a 16mb excel file and I was getting a gc overhead limit exceeded error to resolve that i tried to increase my executor memory with,spark.conf.set("spark.executor.memory", "8g")but i got the following stack :Using Spark's default l...

Data Engineering

1546 Views
0 replies
0 kudos

11-22-2021 9:03:38 PM

by sarvesh • Contributor III

11-22-2021 12:58:47 AM

5575 Views
3 replies
4 kudos

Resolved! Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot modify the value of a Spark config: spark.executor.memory;

Data Engineering

5575 Views
3 replies
4 kudos

11-22-2021 12:58:47 AM

View Replies

Latest Reply

Prabakar
Databricks Employee

11-22-2021 1:24:25 AM

4 kudos

On the cluster configuration page, go to the advanced options. Click it to expand the field. There you will find the Spark tab and you can set the values there in the "Spark config".

4 kudos

11-22-2021 1:24:25 AM

2 More Replies

by sachinmkp1 • New Contributor II

08-23-2021 7:48:52 AM

52091 Views
2 replies
1 kudos

Resolved! org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 69 tasks (4.0 GB) is bigger than spark.driver.maxResultSize (4.0 GB)

set spark.conf.set("spark.driver.maxResultSize", "20g") get spark.conf.get("spark.driver.maxResultSize") // 20g which is expected in notebook , I did not do in cluster level setting still getting 4g while executing the spark job , why? because of th...

Data Engineering

52091 Views
2 replies
1 kudos

08-23-2021 7:48:52 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

09-16-2021 10:24:14 AM

1 kudos

Hi @sachinmkp1@gmail.com ,You need to add this Spark configuration at your cluster level, not at the notebook level. When you add it to the cluster level it will apply the settings properly. For more details on this issue, please check our knowledge...

1 kudos

09-16-2021 10:24:14 AM

1 More Replies

by User16790091296 • Databricks Employee

06-24-2021 8:27:21 AM

9416 Views
1 replies
0 kudos

Azure Databricks: How to add Spark configuration in Databricks cluster?

I am using a Spark Databricks cluster and want to add a customized Spark configuration.There is a Databricks documentation on this but I am not getting any clue how and what changes I should make. Can someone pls share the example to configure the Da...

Data Engineering

9416 Views
1 replies
0 kudos

06-24-2021 8:27:21 AM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-24-2021 4:59:49 PM

0 kudos

You can set the configurations on the Databricks cluster UIhttps://docs.databricks.com/clusters/configure.html#spark-configurationTo see the default configuration, run the below code in a notebook:%sql set;

0 kudos

06-24-2021 4:59:49 PM

by Anonymous • Not applicable

06-21-2021 2:46:41 PM

2074 Views
2 replies
0 kudos

Changing default Delta behavior in DBR 8.x for writes

Is there anyway to add a Spark Config that reverts the default behavior when doing tables writes from Delta to Parquet in DBR 8.0+? I know you can simply specify .format("parquet") but that could involve a decent amount of code change for some client...

Data Engineering

2074 Views
2 replies
0 kudos

06-21-2021 2:46:41 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-22-2021 3:26:30 PM

0 kudos

Thanks @Ryan Chynoweth !

0 kudos

06-22-2021 3:26:30 PM

1 More Replies