cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

abd
by Contributor
  • 1977 Views
  • 0 replies
  • 0 kudos

Why use databricks over other tools ?

What is something special about databricks.What databricks provides that no other tool in the market provides ?How can I convince some other person to use databricks and not some other tool ?

  • 1977 Views
  • 0 replies
  • 0 kudos
spartakos
by New Contributor
  • 1302 Views
  • 0 replies
  • 0 kudos

Big data ingest into Delta Lake

I have a feature table in BQ that I want to ingest into Delta Lake. This feature table in BQ has 100TB of data. This table can be partitioned by DATE.What best practices and approaches can I take to ingest this 100TB? In particular, what can I do to ...

  • 1302 Views
  • 0 replies
  • 0 kudos
merca
by Valued Contributor II
  • 2991 Views
  • 2 replies
  • 4 kudos

DLT schema ambiguity

I have schema:| |-- costCentres: struct (nullable = true) | | |-- dimension1: struct (nullable = true) | | | |-- name: string (nullable = true) | | | |-- value: string (nullable = true) | | |-- dimension10: struct...

  • 2991 Views
  • 2 replies
  • 4 kudos
Latest Reply
PeteC
New Contributor III
  • 4 kudos

I've got the same problem - but using a SQL Select statement (with some explodes).

  • 4 kudos
1 More Replies
Shellytest
by New Contributor
  • 1973 Views
  • 1 replies
  • 1 kudos
  • 1973 Views
  • 1 replies
  • 1 kudos
Latest Reply
Rheiman
Contributor II
  • 1 kudos

Through mounting your Databricks resource and your storage resource. Here is a sample using azure blob storage Azure Blob storage - Azure Databricks | Microsoft Docs

  • 1 kudos
Laniel
by New Contributor
  • 2071 Views
  • 1 replies
  • 0 kudos

‘How do you get cost of a notebook run?’

‘How do you get cost of a notebook run?’

  • 2071 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rheiman
Contributor II
  • 0 kudos

You can check your cloud provider's portal. Go to the subscription > costs field and you should be able to see the costs of the VMs and Databricks. For more granular information, consider installing overwatch.Environment Setup :: Overwatch (databrick...

  • 0 kudos
Reabouri
by New Contributor
  • 1865 Views
  • 1 replies
  • 1 kudos
  • 1865 Views
  • 1 replies
  • 1 kudos
Latest Reply
Rheiman
Contributor II
  • 1 kudos

Table ACLs, Hashing, Anonymization and Pseudonymization of PII to name a few.You can learn everything in the databricks academy course for professional data engineering.

  • 1 kudos
harrisriaz
by New Contributor
  • 5054 Views
  • 2 replies
  • 5 kudos

Resolved! what are the key Data engineering problems that databricks solve?

what are the problem that databricks address from typical data engineering prespective and comparing with other cloud DE tools.

  • 5054 Views
  • 2 replies
  • 5 kudos
Latest Reply
Rheiman
Contributor II
  • 5 kudos

Annoying things databricks solvesSane Data Movement (Fast Parallelized Compute, Table Versioning and History)Environment Management (spark + delta + java) are installed out-of-the-boxCost and Job Monitoring (Overwatch)I've only worked with it for 6 m...

  • 5 kudos
1 More Replies
Zaphod
by New Contributor
  • 1246 Views
  • 1 replies
  • 1 kudos

PII restriction

Can you enforce PII export restrictions at the user level?

  • 1246 Views
  • 1 replies
  • 1 kudos
Latest Reply
Rheiman
Contributor II
  • 1 kudos

Yes you can, by enforcing table ACLs (premium plan feature). A rule of thumb though is to do this on a group level.Table access control | Databricks on AWS

  • 1 kudos
Dburgos
by Databricks Partner
  • 3046 Views
  • 3 replies
  • 1 kudos

Resolved! Is there a way to protect the secrets on databricks?

Hi all, I’ve a lot of secrets on databricks, however my users are able to see it when they make a simple loop over the secret.is there a way to prevent that?regards

  • 3046 Views
  • 3 replies
  • 1 kudos
Latest Reply
Rheiman
Contributor II
  • 1 kudos

Do use the Databricks CLI or API 2.0 to manage secrets. Don't leave them in your notebooks for everyone to see, same applies for salting or hash strings.Secret management - Azure Databricks | Microsoft Docs

  • 1 kudos
2 More Replies
abd
by Contributor
  • 23431 Views
  • 12 replies
  • 11 kudos

Resolved! Is there any difference between performance of Python and SQL ?

I read somewhere that Python code is converted to SQL at the end. So is it true or there is any difference in performance while working with Scala, Python or SQL ?

  • 23431 Views
  • 12 replies
  • 11 kudos
Latest Reply
Rheiman
Contributor II
  • 11 kudos

To add on the consideration of UDFs, try to consider using HOFs (Higher Order Functions) whenever possible first as there is a signifcant performance benefit as seen here.

  • 11 kudos
11 More Replies
Imran_Anwar
by New Contributor II
  • 1433 Views
  • 0 replies
  • 1 kudos

Structured streaming vs Confluent Kstream

For Ultra low latency customer facing App, I am curious on cost efficiency between Structured streaming and Kstream; which work better in terms of cost ? Though still achieving the ultra low latency and quality outcome. Appreciate any thoughts from p...

  • 1433 Views
  • 0 replies
  • 1 kudos
01_binary
by New Contributor III
  • 1575 Views
  • 1 replies
  • 1 kudos

Resolved! How to improve the performance of small delta tables?

How to improve the performance of small delta tables?

  • 1575 Views
  • 1 replies
  • 1 kudos
Latest Reply
NM
New Contributor III
  • 1 kudos

Use databricks optimize command on delta tables. It will regroup al the files and provides better performance​

  • 1 kudos
Shomari
by New Contributor
  • 2995 Views
  • 1 replies
  • 2 kudos

Resolved! Workflow dependencies

Is it possible to make one workflow job dependent on successful completion of another job?​

  • 2995 Views
  • 1 replies
  • 2 kudos
Latest Reply
Tony_N
New Contributor III
  • 2 kudos

I believe you can set workflow dependencies between other workflows.

  • 2 kudos
zLiu
by New Contributor II
  • 1330 Views
  • 0 replies
  • 1 kudos

Project lightspeed

It’s just a breeze for all the streaming users. What’s the best venue to learn more about it. Is there a Jira ticket that tracks all the progresses? also wonder which Spark version it will come with.

  • 1330 Views
  • 0 replies
  • 1 kudos
VictorP
by New Contributor
  • 2711 Views
  • 1 replies
  • 3 kudos

Resolved! Does databricks run on GPU?

Does databricks run on GPU?

  • 2711 Views
  • 1 replies
  • 3 kudos
Latest Reply
ron_defreitas
Contributor
  • 3 kudos

There is support for running on GPU which will be beneficial to certain ML workloads.​Cluster​s are configured to run on CPU by default, but you can choose GPU based nodes during creation.

  • 3 kudos
Labels