cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

successhawk
by New Contributor II
  • 2557 Views
  • 3 replies
  • 2 kudos

Resolved! Is there a way to tell if a created job is not compliant against configured cluster policies before it runs?

As a DevOps engineer, I want to enforce cluster policies at deployment time when the job is deployed/created, well before it is time to actually use it (i.e. before its scheduled/triggered run time without actually running it).

  • 2557 Views
  • 3 replies
  • 2 kudos
Latest Reply
irfanaziz
Contributor II
  • 2 kudos

Is it not the linked service that defines the kind of cluster created or used for any job?So i believe you could control the configuration via the linked service settings.

  • 2 kudos
2 More Replies
labtech
by Valued Contributor II
  • 2093 Views
  • 3 replies
  • 20 kudos

Resolved! Create Databricks Workspace with different email address on Azure

Hi team,I wonder if we can create a Databricks Workspace that not releated with Azure email address.Thanks

  • 2093 Views
  • 3 replies
  • 20 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 20 kudos

yes , i have done this multiple time

  • 20 kudos
2 More Replies
labtech
by Valued Contributor II
  • 2006 Views
  • 3 replies
  • 14 kudos

Get a new badge or new certified for version 3 of DE exam

I took a certified of DE exam (version 2). Do I receive a new badge or certified when I pass newest version of DE exam?I'm going to take that and review my knowlege.

  • 2006 Views
  • 3 replies
  • 14 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 14 kudos

Hi @Gam Nguyen​ I think there is no new badge for this one

  • 14 kudos
2 More Replies
cmilligan
by Contributor II
  • 969 Views
  • 0 replies
  • 1 kudos

Fail a multi-task job successfully

I have a multi-task job that runs everyday where the first notebook in the job checks if the run should be continued based on the date that the job is run. The majority of the time the answer to that is no and I'm raising an exception for the job to ...

  • 969 Views
  • 0 replies
  • 1 kudos
Harun
by Honored Contributor
  • 1778 Views
  • 1 replies
  • 1 kudos

Hi Community members and Databricks Officials, Now a days i am seeing lot of spam post in our groups and discussions. Forum admins and databricks offi...

Hi Community members and Databricks Officials,Now a days i am seeing lot of spam post in our groups and discussions. Forum admins and databricks officials please take action on the users who are spamming the timeline with some promotional contents.As...

  • 1778 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 1 kudos

Yes @Databricks Forum Admin​ please take an action on this

  • 1 kudos
DB_developer
by New Contributor III
  • 9617 Views
  • 2 replies
  • 3 kudos

How to optimize storage for sparse data in data lake?

I have lot of tables with 80% of columns being filled with nulls. I understand SQL sever provides a way to handle these kind of data during the data definition of the tables (with Sparse keyword). Do datalake provide similar kind of thing?

  • 9617 Views
  • 2 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

datalake itself not, but the file format you use to store data does.f.e. parquet uses column compression, so sparse data will compress pretty good.csv on the other hand: total disaster

  • 3 kudos
1 More Replies
grazie
by Contributor
  • 8725 Views
  • 5 replies
  • 3 kudos

Resolved! Can we use "Access Connector for Azure Databricks" to access Azure Key Vault?

We have a scenario where ideally we'd like to use Managed Identities to access storage but also secrets. Per now we have a setup with service principals accessing secrets through secret scopes, but we foresee a situation where we may get many service...

  • 8725 Views
  • 5 replies
  • 3 kudos
Latest Reply
grive
New Contributor III
  • 3 kudos

I have unofficial word that this is not supported, and docs don't mention it. I have the feeling that even if I got it to work it should not be trusted for now.

  • 3 kudos
4 More Replies
andrew0117
by Contributor
  • 6435 Views
  • 4 replies
  • 4 kudos

Resolved! will I be charged by Databricks if leaving the cluster on but not running?

or Databricks only charges you whenever you are actually running the cluster, no matter how long you keep the cluster idle?Thanks!

  • 6435 Views
  • 4 replies
  • 4 kudos
Latest Reply
labtech
Valued Contributor II
  • 4 kudos

If you not congifure your cluster auto terminate after period of idle time, yes you will be charged for that.

  • 4 kudos
3 More Replies
seberino
by New Contributor III
  • 2262 Views
  • 1 replies
  • 1 kudos

Resolved! Why Revoke button initially greyed out in Data Explorer (of SQL Workspace) in the Permissions tab of a table?

Goal is to try to revoke SELECT permissions from a user for a table in Data Explorer in the SQL Workspace.I've tried navigating to the Permissions tab of the tab in Data Explorer.Initially the Revoke button is greyed out and only the Grant button is ...

  • 2262 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 1 kudos

Hi @Christian Seberino​ connect with Databricks for the sameYou can also raise a request for the same

  • 1 kudos
FerArribas
by Contributor
  • 3239 Views
  • 4 replies
  • 3 kudos

How to import a custom CA certificate into the Databricks SQL module?

We need to be able to import a custom certificate (https://learn.microsoft.com/en-us/azure/databricks/kb/python/import-custom-ca-cert) in the same way as in the "data engineering" module but in the Databricks SQL module

  • 3239 Views
  • 4 replies
  • 3 kudos
Latest Reply
VaibB
Contributor
  • 3 kudos

You can try downloading it to DBFS and may be accessing it from there if you use case really needs that.

  • 3 kudos
3 More Replies
ivanychev
by Contributor II
  • 15336 Views
  • 14 replies
  • 5 kudos

toPandas() causes IndexOutOfBoundsException in Apache Arrow

Using DBR 10.0   When calling toPandas() the worker fails with IndexOutOfBoundsException. It seems like ArrowWriter.sizeInBytes (which looks like a proprietary method since I can't find it in OSS) calls arrow's getBufferSizeFor which fails with this ...

  • 15336 Views
  • 14 replies
  • 5 kudos
Latest Reply
vikas_ahlawat
New Contributor II
  • 5 kudos

I am also facing the same issue, I have applied the config: `spark.sql.execution.arrow.pyspark.enabled` set to `false`, but still facing the same issue. Any Idea, what's going on???. Please help me out....org.apache.spark.SparkException: Job aborted ...

  • 5 kudos
13 More Replies
Soma
by Valued Contributor
  • 3484 Views
  • 3 replies
  • 1 kudos

Resolved! Store data using client side encryption and read data using client side encryption

Hi All,I am looking for some options to add the Client side encryption feature of azure to store data in adls gen2https://learn.microsoft.com/en-us/azure/storage/blobs/client-side-encryption?tabs=javaAny help will be highly appreciatedNote: Fernet si...

  • 3484 Views
  • 3 replies
  • 1 kudos
Latest Reply
Soma
Valued Contributor
  • 1 kudos

@Vidula Khanna​ We are going with fernet encryption as direct method is not available

  • 1 kudos
2 More Replies
Ravikumashi
by Contributor
  • 1375 Views
  • 2 replies
  • 0 kudos

access databricks secretes in int script

we are trying install databricks cli on init scripts and in order to do this we need to autheticate with databricks token but it is not secure as anyone got access to cluster can get hold of this databricks token.we try to inject the secretes into se...

  • 1375 Views
  • 2 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

I think you don't need to install CLI. There is a whole API available via notebook. below is example:import requests ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext() host_name = ctx.tags().get("browserHostName").get() host_toke...

  • 0 kudos
1 More Replies
KVNARK
by Honored Contributor II
  • 5816 Views
  • 4 replies
  • 11 kudos

Resolved! Pyspark learning path

Can anyone suggest to take the best series of courses offered by Databricks to learn pyspark for ETL purpose either in Databricks partner learning portal or Databricks learning portal.

  • 5816 Views
  • 4 replies
  • 11 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 11 kudos

To learn Databricks ETL, I highy recommend videos made by Simon on that channel https://www.youtube.com/@AdvancingAnalytics

  • 11 kudos
3 More Replies
boyelana
by Contributor III
  • 3682 Views
  • 9 replies
  • 5 kudos

I am preparing for the data analyst exam and I need as many resources as I can get to fully prepare. Hands-on labs will be welcome as well

I am preparing for the data analyst exam and I need as many resources as I can get to fully prepare. Hands-on labs will be welcome as well

  • 3682 Views
  • 9 replies
  • 5 kudos
Latest Reply
tunstila
Contributor II
  • 5 kudos

Hi,Kindly refer to the materials below:Videohttps://info.databricks.com/dc/kvtpV3WYob2etSFEoxuDGMYVc6afyrIMgIW50ZzIbvpUgj2uOQyz91VsFjIVPsTMDcYAQ8K0HTbFHGKunTHn_tZmFrrG7SaByl8pfwUNMIZfHhQHiMHwQEKzYSwtM9Vr6hKVl28RlEsSlOluDqaxKqoLcg8-qEwq4xtnrG8zKMEOSpQ...

  • 5 kudos
8 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels