cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Harun
by Honored Contributor
  • 1127 Views
  • 1 replies
  • 1 kudos

Hi Community members and Databricks Officials, Now a days i am seeing lot of spam post in our groups and discussions. Forum admins and databricks offi...

Hi Community members and Databricks Officials,Now a days i am seeing lot of spam post in our groups and discussions. Forum admins and databricks officials please take action on the users who are spamming the timeline with some promotional contents.As...

  • 1127 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 1 kudos

Yes @Databricks Forum Admin​ please take an action on this

  • 1 kudos
DB_developer
by New Contributor III
  • 9309 Views
  • 2 replies
  • 3 kudos

How to optimize storage for sparse data in data lake?

I have lot of tables with 80% of columns being filled with nulls. I understand SQL sever provides a way to handle these kind of data during the data definition of the tables (with Sparse keyword). Do datalake provide similar kind of thing?

  • 9309 Views
  • 2 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

datalake itself not, but the file format you use to store data does.f.e. parquet uses column compression, so sparse data will compress pretty good.csv on the other hand: total disaster

  • 3 kudos
1 More Replies
grazie
by Contributor
  • 7814 Views
  • 5 replies
  • 3 kudos

Resolved! Can we use "Access Connector for Azure Databricks" to access Azure Key Vault?

We have a scenario where ideally we'd like to use Managed Identities to access storage but also secrets. Per now we have a setup with service principals accessing secrets through secret scopes, but we foresee a situation where we may get many service...

  • 7814 Views
  • 5 replies
  • 3 kudos
Latest Reply
grive
New Contributor III
  • 3 kudos

I have unofficial word that this is not supported, and docs don't mention it. I have the feeling that even if I got it to work it should not be trusted for now.

  • 3 kudos
4 More Replies
andrew0117
by Contributor
  • 5678 Views
  • 4 replies
  • 4 kudos

Resolved! will I be charged by Databricks if leaving the cluster on but not running?

or Databricks only charges you whenever you are actually running the cluster, no matter how long you keep the cluster idle?Thanks!

  • 5678 Views
  • 4 replies
  • 4 kudos
Latest Reply
labtech
Valued Contributor II
  • 4 kudos

If you not congifure your cluster auto terminate after period of idle time, yes you will be charged for that.

  • 4 kudos
3 More Replies
seberino
by New Contributor III
  • 2048 Views
  • 1 replies
  • 1 kudos

Resolved! Why Revoke button initially greyed out in Data Explorer (of SQL Workspace) in the Permissions tab of a table?

Goal is to try to revoke SELECT permissions from a user for a table in Data Explorer in the SQL Workspace.I've tried navigating to the Permissions tab of the tab in Data Explorer.Initially the Revoke button is greyed out and only the Grant button is ...

  • 2048 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 1 kudos

Hi @Christian Seberino​ connect with Databricks for the sameYou can also raise a request for the same

  • 1 kudos
FerArribas
by Contributor
  • 2884 Views
  • 4 replies
  • 3 kudos

How to import a custom CA certificate into the Databricks SQL module?

We need to be able to import a custom certificate (https://learn.microsoft.com/en-us/azure/databricks/kb/python/import-custom-ca-cert) in the same way as in the "data engineering" module but in the Databricks SQL module

  • 2884 Views
  • 4 replies
  • 3 kudos
Latest Reply
VaibB
Contributor
  • 3 kudos

You can try downloading it to DBFS and may be accessing it from there if you use case really needs that.

  • 3 kudos
3 More Replies
ivanychev
by Contributor II
  • 14521 Views
  • 14 replies
  • 5 kudos

toPandas() causes IndexOutOfBoundsException in Apache Arrow

Using DBR 10.0When calling toPandas() the worker fails with IndexOutOfBoundsException. It seems like ArrowWriter.sizeInBytes (which looks like a proprietary method since I can't find it in OSS) calls arrow's getBufferSizeFor which fails with this err...

  • 14521 Views
  • 14 replies
  • 5 kudos
Latest Reply
vikas_ahlawat
New Contributor II
  • 5 kudos

I am also facing the same issue, I have applied the config: `spark.sql.execution.arrow.pyspark.enabled` set to `false`, but still facing the same issue. Any Idea, what's going on???. Please help me out....org.apache.spark.SparkException: Job aborted ...

  • 5 kudos
13 More Replies
Soma
by Valued Contributor
  • 3169 Views
  • 3 replies
  • 1 kudos

Resolved! Store data using client side encryption and read data using client side encryption

Hi All,I am looking for some options to add the Client side encryption feature of azure to store data in adls gen2https://learn.microsoft.com/en-us/azure/storage/blobs/client-side-encryption?tabs=javaAny help will be highly appreciatedNote: Fernet si...

  • 3169 Views
  • 3 replies
  • 1 kudos
Latest Reply
Soma
Valued Contributor
  • 1 kudos

@Vidula Khanna​ We are going with fernet encryption as direct method is not available

  • 1 kudos
2 More Replies
Ravikumashi
by Contributor
  • 1256 Views
  • 2 replies
  • 0 kudos

access databricks secretes in int script

we are trying install databricks cli on init scripts and in order to do this we need to autheticate with databricks token but it is not secure as anyone got access to cluster can get hold of this databricks token.we try to inject the secretes into se...

  • 1256 Views
  • 2 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

I think you don't need to install CLI. There is a whole API available via notebook. below is example:import requests ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext() host_name = ctx.tags().get("browserHostName").get() host_toke...

  • 0 kudos
1 More Replies
KVNARK
by Honored Contributor II
  • 4625 Views
  • 4 replies
  • 11 kudos

Resolved! Pyspark learning path

Can anyone suggest to take the best series of courses offered by Databricks to learn pyspark for ETL purpose either in Databricks partner learning portal or Databricks learning portal.

  • 4625 Views
  • 4 replies
  • 11 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 11 kudos

To learn Databricks ETL, I highy recommend videos made by Simon on that channel https://www.youtube.com/@AdvancingAnalytics

  • 11 kudos
3 More Replies
boyelana
by Contributor III
  • 3365 Views
  • 9 replies
  • 5 kudos

I am preparing for the data analyst exam and I need as many resources as I can get to fully prepare. Hands-on labs will be welcome as well

I am preparing for the data analyst exam and I need as many resources as I can get to fully prepare. Hands-on labs will be welcome as well

  • 3365 Views
  • 9 replies
  • 5 kudos
Latest Reply
tunstila
Contributor II
  • 5 kudos

Hi,Kindly refer to the materials below:Videohttps://info.databricks.com/dc/kvtpV3WYob2etSFEoxuDGMYVc6afyrIMgIW50ZzIbvpUgj2uOQyz91VsFjIVPsTMDcYAQ8K0HTbFHGKunTHn_tZmFrrG7SaByl8pfwUNMIZfHhQHiMHwQEKzYSwtM9Vr6hKVl28RlEsSlOluDqaxKqoLcg8-qEwq4xtnrG8zKMEOSpQ...

  • 5 kudos
8 More Replies
Searce
by New Contributor III
  • 1668 Views
  • 3 replies
  • 5 kudos

Databricks Cross cloud

We have service with AWS Databricks. We are doing the same replica on GCP Databricks. Here we required all the services and functionalities should be run in AWS and AWS Databricks. The only thing data should be stored on the GCP Storage. Simply funct...

  • 1668 Views
  • 3 replies
  • 5 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 5 kudos

no, right now i don't think they are supporting this type of architecture

  • 5 kudos
2 More Replies
Smitha1
by Valued Contributor II
  • 1498 Views
  • 1 replies
  • 2 kudos

#00244807 and #00245872 Ticket Status - HIGH Priority

Dear @Vidula Khanna​ Vidula, Databricks team, @Nadia Elsayed​ @Jose Gonzalez​ @Aden Jaxson​ What is the SLA/ETA for normal priority ticket and HIGH priority ticket?I created tickets #00244807 on 7th Dec and  #00245872 but haven't received any update ...

image.png
  • 1498 Views
  • 1 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

you can only create high-priority tasks if you have an enterprise plan.as a normal user you can only create normal tasksif you have enterprise plan then you can escalate case .databricks team will revert you soon there.

  • 2 kudos
john_odwyer
by New Contributor III
  • 4928 Views
  • 1 replies
  • 1 kudos

Resolved! Masking A Data Column

Is there a way to mask the data in a column in a table from specific users or user groups?

  • 4928 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

yesthis doc will be helpful for you -- https://www.databricks.com/blog/2020/11/20/enforcing-column-level-encryption-and-avoiding-data-duplication-with-pii.html

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels