cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Laniel
by New Contributor
  • 2186 Views
  • 1 replies
  • 0 kudos

‘How do you get cost of a notebook run?’

‘How do you get cost of a notebook run?’

  • 2186 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rheiman
Contributor II
  • 0 kudos

You can check your cloud provider's portal. Go to the subscription > costs field and you should be able to see the costs of the VMs and Databricks. For more granular information, consider installing overwatch.Environment Setup :: Overwatch (databrick...

  • 0 kudos
Reabouri
by New Contributor
  • 1980 Views
  • 1 replies
  • 1 kudos
  • 1980 Views
  • 1 replies
  • 1 kudos
Latest Reply
Rheiman
Contributor II
  • 1 kudos

Table ACLs, Hashing, Anonymization and Pseudonymization of PII to name a few.You can learn everything in the databricks academy course for professional data engineering.

  • 1 kudos
harrisriaz
by New Contributor
  • 5207 Views
  • 2 replies
  • 5 kudos

Resolved! what are the key Data engineering problems that databricks solve?

what are the problem that databricks address from typical data engineering prespective and comparing with other cloud DE tools.

  • 5207 Views
  • 2 replies
  • 5 kudos
Latest Reply
Rheiman
Contributor II
  • 5 kudos

Annoying things databricks solvesSane Data Movement (Fast Parallelized Compute, Table Versioning and History)Environment Management (spark + delta + java) are installed out-of-the-boxCost and Job Monitoring (Overwatch)I've only worked with it for 6 m...

  • 5 kudos
1 More Replies
Zaphod
by New Contributor
  • 1318 Views
  • 1 replies
  • 1 kudos

PII restriction

Can you enforce PII export restrictions at the user level?

  • 1318 Views
  • 1 replies
  • 1 kudos
Latest Reply
Rheiman
Contributor II
  • 1 kudos

Yes you can, by enforcing table ACLs (premium plan feature). A rule of thumb though is to do this on a group level.Table access control | Databricks on AWS

  • 1 kudos
Dburgos
by Databricks Partner
  • 3306 Views
  • 3 replies
  • 1 kudos

Resolved! Is there a way to protect the secrets on databricks?

Hi all, I’ve a lot of secrets on databricks, however my users are able to see it when they make a simple loop over the secret.is there a way to prevent that?regards

  • 3306 Views
  • 3 replies
  • 1 kudos
Latest Reply
Rheiman
Contributor II
  • 1 kudos

Do use the Databricks CLI or API 2.0 to manage secrets. Don't leave them in your notebooks for everyone to see, same applies for salting or hash strings.Secret management - Azure Databricks | Microsoft Docs

  • 1 kudos
2 More Replies
abd
by Contributor
  • 24465 Views
  • 12 replies
  • 11 kudos

Resolved! Is there any difference between performance of Python and SQL ?

I read somewhere that Python code is converted to SQL at the end. So is it true or there is any difference in performance while working with Scala, Python or SQL ?

  • 24465 Views
  • 12 replies
  • 11 kudos
Latest Reply
Rheiman
Contributor II
  • 11 kudos

To add on the consideration of UDFs, try to consider using HOFs (Higher Order Functions) whenever possible first as there is a signifcant performance benefit as seen here.

  • 11 kudos
11 More Replies
Imran_Anwar
by New Contributor II
  • 1493 Views
  • 0 replies
  • 1 kudos

Structured streaming vs Confluent Kstream

For Ultra low latency customer facing App, I am curious on cost efficiency between Structured streaming and Kstream; which work better in terms of cost ? Though still achieving the ultra low latency and quality outcome. Appreciate any thoughts from p...

  • 1493 Views
  • 0 replies
  • 1 kudos
01_binary
by New Contributor III
  • 1670 Views
  • 1 replies
  • 1 kudos

Resolved! How to improve the performance of small delta tables?

How to improve the performance of small delta tables?

  • 1670 Views
  • 1 replies
  • 1 kudos
Latest Reply
NM
New Contributor III
  • 1 kudos

Use databricks optimize command on delta tables. It will regroup al the files and provides better performance​

  • 1 kudos
Shomari
by New Contributor
  • 3099 Views
  • 1 replies
  • 2 kudos

Resolved! Workflow dependencies

Is it possible to make one workflow job dependent on successful completion of another job?​

  • 3099 Views
  • 1 replies
  • 2 kudos
Latest Reply
Tony_N
New Contributor III
  • 2 kudos

I believe you can set workflow dependencies between other workflows.

  • 2 kudos
zLiu
by New Contributor II
  • 1368 Views
  • 0 replies
  • 1 kudos

Project lightspeed

It’s just a breeze for all the streaming users. What’s the best venue to learn more about it. Is there a Jira ticket that tracks all the progresses? also wonder which Spark version it will come with.

  • 1368 Views
  • 0 replies
  • 1 kudos
VictorP
by New Contributor
  • 2880 Views
  • 1 replies
  • 3 kudos

Resolved! Does databricks run on GPU?

Does databricks run on GPU?

  • 2880 Views
  • 1 replies
  • 3 kudos
Latest Reply
ron_defreitas
Contributor
  • 3 kudos

There is support for running on GPU which will be beneficial to certain ML workloads.​Cluster​s are configured to run on CPU by default, but you can choose GPU based nodes during creation.

  • 3 kudos
Di
by New Contributor
  • 1886 Views
  • 1 replies
  • 2 kudos

Resolved! Project Lightspeed

Is Spark Structured Streaming now comparable with Flink on streaming workloads?

  • 1886 Views
  • 1 replies
  • 2 kudos
Latest Reply
ron_defreitas
Contributor
  • 2 kudos

Hard to say. Project Lightspeed ​is a work in progress and has not yet been released.

  • 2 kudos
ShuImamura
by New Contributor II
  • 3515 Views
  • 2 replies
  • 1 kudos

Resolved! How to use multi character encodings in Delta Tables?

Can we mix different encodings in Delta tables? Downstream needs different character encodings like UTF-8 and Shift JIS.

  • 3515 Views
  • 2 replies
  • 1 kudos
Latest Reply
ShuImamura
New Contributor II
  • 1 kudos

@Werner Stinckens​ thanks for answering!

  • 1 kudos
1 More Replies
bgarcia
by Databricks Partner
  • 1786 Views
  • 1 replies
  • 0 kudos

Resolved! Delta Tables

I’m beginning my journey into Delta Tables and one thing that is still confusing me is where is the best place to save your delta tables if you need to query them later.For example I'm migrating several tables from on-prem to azure databricks into in...

  • 1786 Views
  • 1 replies
  • 0 kudos
Latest Reply
fshimamoto
New Contributor III
  • 0 kudos

I usually recommend people to store data in a separate storage account (either mounted, or used directly), and don't use the internal storage of workspace for that tasks. Primary reason - it's easier to share this data with other workspaces, or other...

  • 0 kudos
L_
by New Contributor II
  • 5571 Views
  • 4 replies
  • 2 kudos

How to change email on databricks community account?

I want to change the email associated with my databricks community edition account to a different email. How do I do that?

  • 5571 Views
  • 4 replies
  • 2 kudos
Latest Reply
User16565259302
Databricks Employee
  • 2 kudos

You can add email in the account console and add them as an admin. One other solution is to close the current account and start a new community edition account with a different email.

  • 2 kudos
3 More Replies
Labels