cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16618471166
by New Contributor II
  • 3127 Views
  • 3 replies
  • 1 kudos

When I aggregate over more data, I get the below error message. I've tried multiple ways of diagnosis like going back to a version I know it was w...

When I aggregate over more data, I get the below error message. I've tried multiple ways of diagnosis like going back to a version I know it was working fine (but still got the same error below). Please advise as this is a critical report where the b...

  • 3127 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Jeff Wu​ :The error message suggests that there is a syntax error in a SQL statement, specifically near the end of the input. Without the full SQL statement or additional information, it's difficult to pinpoint the exact cause of the error. However,...

  • 1 kudos
2 More Replies
nicole_wong
by New Contributor II
  • 5758 Views
  • 13 replies
  • 7 kudos

Resolved! Can Terraform be used to set configurations in Admin / workspace settings?

I am posting this on behalf of my customer. They are currently working on the deployment & config of their workspace on AWS via Terraform.Is it possible to set some configs in the Admin/workspace settings via TF? According to the Terraform module, it...

  • 5758 Views
  • 13 replies
  • 7 kudos
Latest Reply
francly
New Contributor II
  • 7 kudos

Hi, can I get a full list of the latest configurable supported workspace_conf on tf, I can't find the list on tf registry site.

  • 7 kudos
12 More Replies
johnb1
by New Contributor III
  • 1298 Views
  • 3 replies
  • 0 kudos

Cluster Configuration for ML Model Training

Hi!I am training a Random Forest (pyspark.ml.classification.RandomForestClassifier) on Databricks with 1,000,000 training examples and 25 features. I employ a cluster with one driver (16 GB Memory, 4 Cores), 2-6 workers (32-96 GB Memory, 8-24 Cores),...

  • 1298 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @John B​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can...

  • 0 kudos
2 More Replies
Arunsundar
by New Contributor III
  • 1413 Views
  • 4 replies
  • 3 kudos

Automating the initial configuration of dbx

Hi Team,Good morning.As of now, for the deployment of our code to Databricks, dbx is configured providing the parameters such as cloud provider, git provider, etc., Say, I have a code repository in any one of the git providers. Can this process of co...

  • 1413 Views
  • 4 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Arunsundar Muthumanickam​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear fr...

  • 3 kudos
3 More Replies
oteng
by New Contributor III
  • 1058 Views
  • 2 replies
  • 1 kudos

SET configuration in SQL DLT pipeline not working

I'm not able to get the SET command to work when using sql in DLT pipeline. I am copying the code from this documentation https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-sql-ref.html#sql-spec (relevant code below). When I ru...

image
  • 1058 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Oliver Teng​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 1 kudos
1 More Replies
Sandy21
by New Contributor III
  • 846 Views
  • 1 replies
  • 2 kudos

Resolved! Cluster Configuration Best Practices

I have a cluster with the configuration of 400 GB RAM, 160 Cores.Which of the following would be the ideal configuration to use in case of one or more VM failures?Cluster A: Total RAM 400 GB      Total Cores 160   Total VMs: 1   400 GB/Exec & 160 c...

  • 846 Views
  • 1 replies
  • 2 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 2 kudos

@Santhosh Raj​ can you please confirm cluster sizes you are taking are related to driver and worker node. how much you want to allocate to Driver and Worker? once we are sure about type of driver and worker we would like to pick, we need to enable au...

  • 2 kudos
yopbibo
by Contributor II
  • 1104 Views
  • 2 replies
  • 0 kudos

Resolved! Cluster configuration / notebook panel

Hi,Is it possible to let regular users to see all running notebooks (in the notebook panel of the cluster) on a specific cluster they can use (attach and restart).by default admins can see all running notebooks and users can see only their own notebo...

  • 1104 Views
  • 2 replies
  • 0 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 0 kudos

hi @Philippe CRAVE​ a user can see a notebook only if they have permission to that notebook. Else they won't be able to see it. Unfortunately there is no possibility for a normal user to see the notebooks attached to a cluster if they do not have per...

  • 0 kudos
1 More Replies
EricOX
by New Contributor
  • 2755 Views
  • 3 replies
  • 3 kudos

Resolved! How to handle configuration for different environment (e.g. DEV, PROD)?

May I know any suggested way to handle different environment variables for the same code base? For example, the mount point of Data Lake for DEV, UAT, and PROD. Any recommendations or best practices? Moreover, how to handle Azure DevOps?

  • 2755 Views
  • 3 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Eric Yeung​  , Just a friendly follow-up. Do you still need help or the above responses help you to find the solution? Please let us know.

  • 3 kudos
2 More Replies
Vee
by New Contributor
  • 3205 Views
  • 2 replies
  • 1 kudos

Cluster configuration and optimal number for fs.s3a.connection.maximum , fs.s3a.threads.max

Please could you suggest best cluster configuration for a use case stated below and tips to resolve the errors shown below -Use case:There could be 4 or 5 spark jobs that run concurrently.Each job reads 40 input files and spits out 120 output files ...

  • 3205 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Vetrivel Senthil​ , Just a friendly follow-up. Do you still need help? Please let us know.

  • 1 kudos
1 More Replies
Anonymous
by Not applicable
  • 4581 Views
  • 3 replies
  • 5 kudos

Cluster does not have proper permissions to view DBFS mount point to Azure ADLS Gen 2.

I've created other mount points and am now trying to use the OAUTH method. I'm able to define the mount point using the OAUTH Mount to ADLS Gen 2 Storage.I've created an App Registration with Secret, added the App Registration as Contributor to the ...

  • 4581 Views
  • 3 replies
  • 5 kudos
Latest Reply
Gerbastanovic
New Contributor II
  • 5 kudos

Also check if you set the right permissions for the app on the containers ACLhttps://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control

  • 5 kudos
2 More Replies
TJS
by New Contributor II
  • 12910 Views
  • 6 replies
  • 5 kudos

Resolved! Can you help with this error please? Issue when using a new high concurrency cluster

Hello, I am trying to use MLFlow on a new high concurrency cluster but I get the error below. Does anyone have any suggestions? It was working before on a standard cluster. Thanks.py4j.security.Py4JSecurityException: Method public int org.apache.spar...

  • 12910 Views
  • 6 replies
  • 5 kudos
Latest Reply
User16753724828
New Contributor III
  • 5 kudos

@Tom Soto​ We have a workaround for this. This cluster spark configuration setting will disable py4jSecurity while still enabling passthrough spark.databricks.pyspark.enablePy4JSecurity false

  • 5 kudos
5 More Replies
trm
by New Contributor II
  • 1070 Views
  • 2 replies
  • 2 kudos

Resolved! mail configuration azure data bricks pyspark notebook

Hi All,i am new to azure databricks , i am using pyspark .. we need to configure mail alerts when notebook failed or succeeded ..please can some one help me in mail configuration azure data bricks .Thanks

  • 1070 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

the easiest way to schedule notebooks in Azure is to use Data Factory.In Data Factory you can schedule the notebooks and define the alerts you want to send.The other option is the one Hubert mentioned.

  • 2 kudos
1 More Replies
DouglasLinder
by New Contributor III
  • 4110 Views
  • 5 replies
  • 1 kudos

Is it possible to pass configuration to a job on high concurrency cluster?

On a regular cluster, you can use:```spark.sparkContext._jsc.hadoopConfiguration().set(key, value)```These values are then available on the executors using the hadoop configuration. However, on a high concurrency cluster, attempting to do so results ...

  • 4110 Views
  • 5 replies
  • 1 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 1 kudos

I am not sure why you are getting that error on a high concurrency cluster. As I am able to set the configuration as you show above. Can you try the following code instead? sc._jsc.hadoopConfiguration().set(key, value)

  • 1 kudos
4 More Replies
cfregly
by Contributor
  • 5555 Views
  • 5 replies
  • 0 kudos
  • 5555 Views
  • 5 replies
  • 0 kudos
Latest Reply
MatthewValenti
New Contributor II
  • 0 kudos

This is an old post, however, is this still accurate for the latest version of Databricks in 2019? If so, how to approach the following?1. Connect to many MongoDBs.2. Connect to MongoDB when connection string information is dynamic (i.e. stored in s...

  • 0 kudos
4 More Replies
Labels