cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Laniel
by New Contributor
  • 1652 Views
  • 1 replies
  • 0 kudos

‘How do you get cost of a notebook run?’

‘How do you get cost of a notebook run?’

  • 1652 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rheiman
Contributor II
  • 0 kudos

You can check your cloud provider's portal. Go to the subscription > costs field and you should be able to see the costs of the VMs and Databricks. For more granular information, consider installing overwatch.Environment Setup :: Overwatch (databrick...

  • 0 kudos
Reabouri
by New Contributor
  • 1607 Views
  • 1 replies
  • 1 kudos
  • 1607 Views
  • 1 replies
  • 1 kudos
Latest Reply
Rheiman
Contributor II
  • 1 kudos

Table ACLs, Hashing, Anonymization and Pseudonymization of PII to name a few.You can learn everything in the databricks academy course for professional data engineering.

  • 1 kudos
harrisriaz
by New Contributor
  • 4384 Views
  • 2 replies
  • 5 kudos

Resolved! what are the key Data engineering problems that databricks solve?

what are the problem that databricks address from typical data engineering prespective and comparing with other cloud DE tools.

  • 4384 Views
  • 2 replies
  • 5 kudos
Latest Reply
Rheiman
Contributor II
  • 5 kudos

Annoying things databricks solvesSane Data Movement (Fast Parallelized Compute, Table Versioning and History)Environment Management (spark + delta + java) are installed out-of-the-boxCost and Job Monitoring (Overwatch)I've only worked with it for 6 m...

  • 5 kudos
1 More Replies
Zaphod
by New Contributor
  • 1066 Views
  • 1 replies
  • 1 kudos

PII restriction

Can you enforce PII export restrictions at the user level?

  • 1066 Views
  • 1 replies
  • 1 kudos
Latest Reply
Rheiman
Contributor II
  • 1 kudos

Yes you can, by enforcing table ACLs (premium plan feature). A rule of thumb though is to do this on a group level.Table access control | Databricks on AWS

  • 1 kudos
Dburgos
by New Contributor III
  • 2486 Views
  • 3 replies
  • 1 kudos

Resolved! Is there a way to protect the secrets on databricks?

Hi all, I’ve a lot of secrets on databricks, however my users are able to see it when they make a simple loop over the secret.is there a way to prevent that?regards

  • 2486 Views
  • 3 replies
  • 1 kudos
Latest Reply
Rheiman
Contributor II
  • 1 kudos

Do use the Databricks CLI or API 2.0 to manage secrets. Don't leave them in your notebooks for everyone to see, same applies for salting or hash strings.Secret management - Azure Databricks | Microsoft Docs

  • 1 kudos
2 More Replies
abd
by Contributor
  • 19585 Views
  • 12 replies
  • 11 kudos

Resolved! Is there any difference between performance of Python and SQL ?

I read somewhere that Python code is converted to SQL at the end. So is it true or there is any difference in performance while working with Scala, Python or SQL ?

  • 19585 Views
  • 12 replies
  • 11 kudos
Latest Reply
Rheiman
Contributor II
  • 11 kudos

To add on the consideration of UDFs, try to consider using HOFs (Higher Order Functions) whenever possible first as there is a signifcant performance benefit as seen here.

  • 11 kudos
11 More Replies
Imran_Anwar
by New Contributor II
  • 1141 Views
  • 0 replies
  • 1 kudos

Structured streaming vs Confluent Kstream

For Ultra low latency customer facing App, I am curious on cost efficiency between Structured streaming and Kstream; which work better in terms of cost ? Though still achieving the ultra low latency and quality outcome. Appreciate any thoughts from p...

  • 1141 Views
  • 0 replies
  • 1 kudos
01_binary
by New Contributor III
  • 1322 Views
  • 1 replies
  • 1 kudos

Resolved! How to improve the performance of small delta tables?

How to improve the performance of small delta tables?

  • 1322 Views
  • 1 replies
  • 1 kudos
Latest Reply
NM
New Contributor III
  • 1 kudos

Use databricks optimize command on delta tables. It will regroup al the files and provides better performance​

  • 1 kudos
Shomari
by New Contributor
  • 2720 Views
  • 1 replies
  • 2 kudos

Resolved! Workflow dependencies

Is it possible to make one workflow job dependent on successful completion of another job?​

  • 2720 Views
  • 1 replies
  • 2 kudos
Latest Reply
Tony_N
New Contributor III
  • 2 kudos

I believe you can set workflow dependencies between other workflows.

  • 2 kudos
zLiu
by New Contributor II
  • 1148 Views
  • 0 replies
  • 1 kudos

Project lightspeed

It’s just a breeze for all the streaming users. What’s the best venue to learn more about it. Is there a Jira ticket that tracks all the progresses? also wonder which Spark version it will come with.

  • 1148 Views
  • 0 replies
  • 1 kudos
VictorP
by New Contributor
  • 2315 Views
  • 1 replies
  • 3 kudos

Resolved! Does databricks run on GPU?

Does databricks run on GPU?

  • 2315 Views
  • 1 replies
  • 3 kudos
Latest Reply
ron_defreitas
Contributor
  • 3 kudos

There is support for running on GPU which will be beneficial to certain ML workloads.​Cluster​s are configured to run on CPU by default, but you can choose GPU based nodes during creation.

  • 3 kudos
Di
by New Contributor
  • 1469 Views
  • 1 replies
  • 2 kudos

Resolved! Project Lightspeed

Is Spark Structured Streaming now comparable with Flink on streaming workloads?

  • 1469 Views
  • 1 replies
  • 2 kudos
Latest Reply
ron_defreitas
Contributor
  • 2 kudos

Hard to say. Project Lightspeed ​is a work in progress and has not yet been released.

  • 2 kudos
ShuImamura
by New Contributor II
  • 2854 Views
  • 2 replies
  • 1 kudos

Resolved! How to use multi character encodings in Delta Tables?

Can we mix different encodings in Delta tables? Downstream needs different character encodings like UTF-8 and Shift JIS.

  • 2854 Views
  • 2 replies
  • 1 kudos
Latest Reply
ShuImamura
New Contributor II
  • 1 kudos

@Werner Stinckens​ thanks for answering!

  • 1 kudos
1 More Replies
bgarcia
by New Contributor III
  • 1387 Views
  • 1 replies
  • 0 kudos

Resolved! Delta Tables

I’m beginning my journey into Delta Tables and one thing that is still confusing me is where is the best place to save your delta tables if you need to query them later.For example I'm migrating several tables from on-prem to azure databricks into in...

  • 1387 Views
  • 1 replies
  • 0 kudos
Latest Reply
fshimamoto
New Contributor III
  • 0 kudos

I usually recommend people to store data in a separate storage account (either mounted, or used directly), and don't use the internal storage of workspace for that tasks. Primary reason - it's easier to share this data with other workspaces, or other...

  • 0 kudos
L_
by New Contributor II
  • 4414 Views
  • 4 replies
  • 2 kudos

How to change email on databricks community account?

I want to change the email associated with my databricks community edition account to a different email. How do I do that?

  • 4414 Views
  • 4 replies
  • 2 kudos
Latest Reply
User16565259302
Databricks Employee
  • 2 kudos

You can add email in the account console and add them as an admin. One other solution is to close the current account and start a new community edition account with a different email.

  • 2 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels