cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16618471166
by New Contributor II
  • 4458 Views
  • 3 replies
  • 1 kudos

When I aggregate over more data, I get the below error message. I've tried multiple ways of diagnosis like going back to a version I know it was w...

When I aggregate over more data, I get the below error message. I've tried multiple ways of diagnosis like going back to a version I know it was working fine (but still got the same error below). Please advise as this is a critical report where the b...

  • 4458 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Jeff Wu​ :The error message suggests that there is a syntax error in a SQL statement, specifically near the end of the input. Without the full SQL statement or additional information, it's difficult to pinpoint the exact cause of the error. However,...

  • 1 kudos
2 More Replies
nicole_wong
by New Contributor II
  • 11065 Views
  • 10 replies
  • 7 kudos

Resolved! Can Terraform be used to set configurations in Admin / workspace settings?

I am posting this on behalf of my customer. They are currently working on the deployment & config of their workspace on AWS via Terraform.Is it possible to set some configs in the Admin/workspace settings via TF? According to the Terraform module, it...

  • 11065 Views
  • 10 replies
  • 7 kudos
Latest Reply
francly
New Contributor II
  • 7 kudos

Hi, can I get a full list of the latest configurable supported workspace_conf on tf, I can't find the list on tf registry site.

  • 7 kudos
9 More Replies
johnb1
by Contributor
  • 2459 Views
  • 3 replies
  • 0 kudos

Cluster Configuration for ML Model Training

Hi!I am training a Random Forest (pyspark.ml.classification.RandomForestClassifier) on Databricks with 1,000,000 training examples and 25 features. I employ a cluster with one driver (16 GB Memory, 4 Cores), 2-6 workers (32-96 GB Memory, 8-24 Cores),...

  • 2459 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @John B​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can...

  • 0 kudos
2 More Replies
Arunsundar
by New Contributor III
  • 2546 Views
  • 4 replies
  • 3 kudos

Automating the initial configuration of dbx

Hi Team,Good morning.As of now, for the deployment of our code to Databricks, dbx is configured providing the parameters such as cloud provider, git provider, etc., Say, I have a code repository in any one of the git providers. Can this process of co...

  • 2546 Views
  • 4 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Arunsundar Muthumanickam​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear fr...

  • 3 kudos
3 More Replies
oteng
by New Contributor III
  • 1963 Views
  • 1 replies
  • 0 kudos

SET configuration in SQL DLT pipeline not working

I'm not able to get the SET command to work when using sql in DLT pipeline. I am copying the code from this documentation https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-sql-ref.html#sql-spec (relevant code below). When I ru...

image
  • 1963 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Oliver Teng​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 0 kudos
Sandy21
by New Contributor III
  • 2084 Views
  • 1 replies
  • 2 kudos

Resolved! Cluster Configuration Best Practices

I have a cluster with the configuration of 400 GB RAM, 160 Cores.Which of the following would be the ideal configuration to use in case of one or more VM failures?Cluster A: Total RAM 400 GB      Total Cores 160   Total VMs: 1   400 GB/Exec & 160 c...

  • 2084 Views
  • 1 replies
  • 2 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 2 kudos

@Santhosh Raj​ can you please confirm cluster sizes you are taking are related to driver and worker node. how much you want to allocate to Driver and Worker? once we are sure about type of driver and worker we would like to pick, we need to enable au...

  • 2 kudos
yopbibo
by Contributor II
  • 2065 Views
  • 2 replies
  • 0 kudos

Resolved! Cluster configuration / notebook panel

Hi,Is it possible to let regular users to see all running notebooks (in the notebook panel of the cluster) on a specific cluster they can use (attach and restart).by default admins can see all running notebooks and users can see only their own notebo...

  • 2065 Views
  • 2 replies
  • 0 kudos
Latest Reply
Prabakar
Databricks Employee
  • 0 kudos

hi @Philippe CRAVE​ a user can see a notebook only if they have permission to that notebook. Else they won't be able to see it. Unfortunately there is no possibility for a normal user to see the notebooks attached to a cluster if they do not have per...

  • 0 kudos
1 More Replies
Vee
by New Contributor
  • 5055 Views
  • 1 replies
  • 1 kudos

Cluster configuration and optimal number for fs.s3a.connection.maximum , fs.s3a.threads.max

Please could you suggest best cluster configuration for a use case stated below and tips to resolve the errors shown below -Use case:There could be 4 or 5 spark jobs that run concurrently.Each job reads 40 input files and spits out 120 output files ...

  • 5055 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Vetrivel Senthil​ , Just wondering if this question is a duplicate from this one https://community.databricks.com/s/feed/0D53f00001qvQJcCAM?

  • 1 kudos
Anonymous
by Not applicable
  • 6496 Views
  • 2 replies
  • 4 kudos

Cluster does not have proper permissions to view DBFS mount point to Azure ADLS Gen 2.

I've created other mount points and am now trying to use the OAUTH method. I'm able to define the mount point using the OAUTH Mount to ADLS Gen 2 Storage.I've created an App Registration with Secret, added the App Registration as Contributor to the ...

  • 6496 Views
  • 2 replies
  • 4 kudos
Latest Reply
Gerbastanovic
New Contributor II
  • 4 kudos

Also check if you set the right permissions for the app on the containers ACLhttps://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control

  • 4 kudos
1 More Replies
TJS
by New Contributor II
  • 16303 Views
  • 6 replies
  • 5 kudos

Resolved! Can you help with this error please? Issue when using a new high concurrency cluster

Hello, I am trying to use MLFlow on a new high concurrency cluster but I get the error below. Does anyone have any suggestions? It was working before on a standard cluster. Thanks.py4j.security.Py4JSecurityException: Method public int org.apache.spar...

  • 16303 Views
  • 6 replies
  • 5 kudos
Latest Reply
Pradeep54
Databricks Employee
  • 5 kudos

@Tom Soto​ We have a workaround for this. This cluster spark configuration setting will disable py4jSecurity while still enabling passthrough spark.databricks.pyspark.enablePy4JSecurity false

  • 5 kudos
5 More Replies
adb-rm
by New Contributor II
  • 2044 Views
  • 2 replies
  • 2 kudos

Resolved! mail configuration azure data bricks pyspark notebook

Hi All,i am new to azure databricks , i am using pyspark .. we need to configure mail alerts when notebook failed or succeeded ..please can some one help me in mail configuration azure data bricks .Thanks

  • 2044 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

the easiest way to schedule notebooks in Azure is to use Data Factory.In Data Factory you can schedule the notebooks and define the alerts you want to send.The other option is the one Hubert mentioned.

  • 2 kudos
1 More Replies
EricOX
by New Contributor
  • 4581 Views
  • 1 replies
  • 3 kudos

Resolved! How to handle configuration for different environment (e.g. DEV, PROD)?

May I know any suggested way to handle different environment variables for the same code base? For example, the mount point of Data Lake for DEV, UAT, and PROD. Any recommendations or best practices? Moreover, how to handle Azure DevOps?

  • 4581 Views
  • 1 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

@Eric Yeung​ , you can put all your configuration parameters in a file (JSON, CONF, YAML whatever you like) and read that file at the beginning of each program.I like to use the ConfigFactory in Scala for example.You only have to make sure the file c...

  • 3 kudos
DouglasLinder
by New Contributor III
  • 10130 Views
  • 4 replies
  • 1 kudos

Is it possible to pass configuration to a job on high concurrency cluster?

On a regular cluster, you can use:```spark.sparkContext._jsc.hadoopConfiguration().set(key, value)```These values are then available on the executors using the hadoop configuration. However, on a high concurrency cluster, attempting to do so results ...

  • 10130 Views
  • 4 replies
  • 1 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 1 kudos

I am not sure why you are getting that error on a high concurrency cluster. As I am able to set the configuration as you show above. Can you try the following code instead? sc._jsc.hadoopConfiguration().set(key, value)

  • 1 kudos
3 More Replies
cfregly
by Contributor
  • 7714 Views
  • 5 replies
  • 0 kudos
  • 7714 Views
  • 5 replies
  • 0 kudos
Latest Reply
MatthewValenti
New Contributor II
  • 0 kudos

This is an old post, however, is this still accurate for the latest version of Databricks in 2019? If so, how to approach the following?1. Connect to many MongoDBs.2. Connect to MongoDB when connection string information is dynamic (i.e. stored in s...

  • 0 kudos
4 More Replies
Labels