cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Mohit_m
by Valued Contributor II
  • 4416 Views
  • 1 replies
  • 2 kudos

Resolved! Databricks jobs create API throws unexpected error

Databricks jobs create API throws unexpected errorError response :{"error_code": "INVALID_PARAMETER_VALUE","message": "Cluster validation error: Missing required field: settings.cluster_spec.new_cluster.size"}Any idea on this?

  • 4416 Views
  • 1 replies
  • 2 kudos
Latest Reply
Mohit_m
Valued Contributor II
  • 2 kudos

Could you please specify num_workers in the json body and try API again.Also, another recommendation can be configuring what you want in UI, and then pressing “JSON” button that should show corresponding JSON which you can use for API

  • 2 kudos
lav
by New Contributor III
  • 1006 Views
  • 1 replies
  • 1 kudos

Correlated Column Exception in Spark SQL

Hi Johan,Were you able to resolve the correlated column exception issue? I have been stuck on this since past week. If you can guide me that will be alot of help.Thanks.

  • 1006 Views
  • 1 replies
  • 1 kudos
Latest Reply
Johan_Van_Noten
New Contributor III
  • 1 kudos

Seems to be a duplicate of your comment on https://community.databricks.com/s/question/0D53f00001XCuCACA1/correlated-column-exception-in-sql-udf-when-using-udf-parameters. I guess you did that to be able to put other tags?

  • 1 kudos
darshan
by New Contributor III
  • 17062 Views
  • 13 replies
  • 12 kudos

Resolved! Is there a way to run notebooks concurrently in same session?

tried using-dbutils.notebook.run(notebook.path, notebook.timeout, notebook.parameters)but it takes 20 seconds to start new session. %run uses same session but cannot figure out how to use it to run notebooks concurrently.

  • 17062 Views
  • 13 replies
  • 12 kudos
Latest Reply
rudesingh56
New Contributor II
  • 12 kudos

I’ve been struggling with opening multiple browser sessions to open more than one notebook at a time.

  • 12 kudos
12 More Replies
TheOptimizer
by Contributor
  • 9349 Views
  • 5 replies
  • 8 kudos

Resolved! How to create delta table with identity column.

I'm sure this is probably some oversight on my part, but I don't see it. I'm trying to create a delta table with an identity column. I've tried every combination of the syntax I can think of. %sql create or replace table IDS.picklist ( picklist_id...

Capture
  • 9349 Views
  • 5 replies
  • 8 kudos
Latest Reply
lucas_marchand
New Contributor III
  • 8 kudos

I was also having this same error and my cluster was running Databricks Runtime Version 9.1 so I changed it to 11.0 and it worked.

  • 8 kudos
4 More Replies
abd
by Contributor
  • 1302 Views
  • 0 replies
  • 0 kudos

Why use databricks over other tools ?

What is something special about databricks.What databricks provides that no other tool in the market provides ?How can I convince some other person to use databricks and not some other tool ?

  • 1302 Views
  • 0 replies
  • 0 kudos
spartakos
by New Contributor
  • 768 Views
  • 0 replies
  • 0 kudos

Big data ingest into Delta Lake

I have a feature table in BQ that I want to ingest into Delta Lake. This feature table in BQ has 100TB of data. This table can be partitioned by DATE.What best practices and approaches can I take to ingest this 100TB? In particular, what can I do to ...

  • 768 Views
  • 0 replies
  • 0 kudos
merca
by Valued Contributor II
  • 1808 Views
  • 2 replies
  • 4 kudos

DLT schema ambiguity

I have schema:| |-- costCentres: struct (nullable = true) | | |-- dimension1: struct (nullable = true) | | | |-- name: string (nullable = true) | | | |-- value: string (nullable = true) | | |-- dimension10: struct...

  • 1808 Views
  • 2 replies
  • 4 kudos
Latest Reply
PeteC
New Contributor III
  • 4 kudos

I've got the same problem - but using a SQL Select statement (with some explodes).

  • 4 kudos
1 More Replies
Shellytest
by New Contributor
  • 1292 Views
  • 1 replies
  • 1 kudos
  • 1292 Views
  • 1 replies
  • 1 kudos
Latest Reply
Rheiman
Contributor II
  • 1 kudos

Through mounting your Databricks resource and your storage resource. Here is a sample using azure blob storage Azure Blob storage - Azure Databricks | Microsoft Docs

  • 1 kudos
Laniel
by New Contributor
  • 1013 Views
  • 1 replies
  • 0 kudos

‘How do you get cost of a notebook run?’

‘How do you get cost of a notebook run?’

  • 1013 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rheiman
Contributor II
  • 0 kudos

You can check your cloud provider's portal. Go to the subscription > costs field and you should be able to see the costs of the VMs and Databricks. For more granular information, consider installing overwatch.Environment Setup :: Overwatch (databrick...

  • 0 kudos
Reabouri
by New Contributor
  • 1188 Views
  • 1 replies
  • 1 kudos
  • 1188 Views
  • 1 replies
  • 1 kudos
Latest Reply
Rheiman
Contributor II
  • 1 kudos

Table ACLs, Hashing, Anonymization and Pseudonymization of PII to name a few.You can learn everything in the databricks academy course for professional data engineering.

  • 1 kudos
harrisriaz
by New Contributor
  • 3129 Views
  • 2 replies
  • 5 kudos

Resolved! what are the key Data engineering problems that databricks solve?

what are the problem that databricks address from typical data engineering prespective and comparing with other cloud DE tools.

  • 3129 Views
  • 2 replies
  • 5 kudos
Latest Reply
Rheiman
Contributor II
  • 5 kudos

Annoying things databricks solvesSane Data Movement (Fast Parallelized Compute, Table Versioning and History)Environment Management (spark + delta + java) are installed out-of-the-boxCost and Job Monitoring (Overwatch)I've only worked with it for 6 m...

  • 5 kudos
1 More Replies
Zaphod
by New Contributor
  • 798 Views
  • 1 replies
  • 1 kudos

PII restriction

Can you enforce PII export restrictions at the user level?

  • 798 Views
  • 1 replies
  • 1 kudos
Latest Reply
Rheiman
Contributor II
  • 1 kudos

Yes you can, by enforcing table ACLs (premium plan feature). A rule of thumb though is to do this on a group level.Table access control | Databricks on AWS

  • 1 kudos
Dburgos
by New Contributor III
  • 1740 Views
  • 3 replies
  • 1 kudos

Resolved! Is there a way to protect the secrets on databricks?

Hi all, I’ve a lot of secrets on databricks, however my users are able to see it when they make a simple loop over the secret.is there a way to prevent that?regards

  • 1740 Views
  • 3 replies
  • 1 kudos
Latest Reply
Rheiman
Contributor II
  • 1 kudos

Do use the Databricks CLI or API 2.0 to manage secrets. Don't leave them in your notebooks for everyone to see, same applies for salting or hash strings.Secret management - Azure Databricks | Microsoft Docs

  • 1 kudos
2 More Replies
abd
by Contributor
  • 14412 Views
  • 12 replies
  • 11 kudos

Resolved! Is there any difference between performance of Python and SQL ?

I read somewhere that Python code is converted to SQL at the end. So is it true or there is any difference in performance while working with Scala, Python or SQL ?

  • 14412 Views
  • 12 replies
  • 11 kudos
Latest Reply
Rheiman
Contributor II
  • 11 kudos

To add on the consideration of UDFs, try to consider using HOFs (Higher Order Functions) whenever possible first as there is a signifcant performance benefit as seen here.

  • 11 kudos
11 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels