cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

ivanychev
by Contributor
  • 6664 Views
  • 16 replies
  • 5 kudos

toPandas() causes IndexOutOfBoundsException in Apache Arrow

Using DBR 10.0When calling toPandas() the worker fails with IndexOutOfBoundsException. It seems like ArrowWriter.sizeInBytes (which looks like a proprietary method since I can't find it in OSS) calls arrow's getBufferSizeFor which fails with this err...

  • 6664 Views
  • 16 replies
  • 5 kudos
Latest Reply
vikas_ahlawat
New Contributor II
  • 5 kudos

I am also facing the same issue, I have applied the config: `spark.sql.execution.arrow.pyspark.enabled` set to `false`, but still facing the same issue. Any Idea, what's going on???. Please help me out....org.apache.spark.SparkException: Job aborted ...

  • 5 kudos
15 More Replies
Soma
by Valued Contributor
  • 2313 Views
  • 5 replies
  • 1 kudos

Resolved! Store data using client side encryption and read data using client side encryption

Hi All,I am looking for some options to add the Client side encryption feature of azure to store data in adls gen2https://learn.microsoft.com/en-us/azure/storage/blobs/client-side-encryption?tabs=javaAny help will be highly appreciatedNote: Fernet si...

  • 2313 Views
  • 5 replies
  • 1 kudos
Latest Reply
Soma
Valued Contributor
  • 1 kudos

@Vidula Khanna​ We are going with fernet encryption as direct method is not available

  • 1 kudos
4 More Replies
Ravikumashi
by Contributor
  • 918 Views
  • 2 replies
  • 0 kudos

access databricks secretes in int script

we are trying install databricks cli on init scripts and in order to do this we need to autheticate with databricks token but it is not secure as anyone got access to cluster can get hold of this databricks token.we try to inject the secretes into se...

  • 918 Views
  • 2 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

I think you don't need to install CLI. There is a whole API available via notebook. below is example:import requests ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext() host_name = ctx.tags().get("browserHostName").get() host_toke...

  • 0 kudos
1 More Replies
KVNARK
by Honored Contributor II
  • 2687 Views
  • 4 replies
  • 11 kudos

Resolved! Pyspark learning path

Can anyone suggest to take the best series of courses offered by Databricks to learn pyspark for ETL purpose either in Databricks partner learning portal or Databricks learning portal.

  • 2687 Views
  • 4 replies
  • 11 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 11 kudos

To learn Databricks ETL, I highy recommend videos made by Simon on that channel https://www.youtube.com/@AdvancingAnalytics

  • 11 kudos
3 More Replies
Harish2122
by Contributor
  • 7143 Views
  • 2 replies
  • 10 kudos

Databricks SQL string_agg

Migrating some on-premise SQL views to Databricks and struggling to find conversions for some functions. the main one is the string_agg function.string_agg(field_name, ', ')​Anyone know how to convert that to Databricks SQL?​Thanks in advance.

  • 7143 Views
  • 2 replies
  • 10 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 10 kudos

Hi @Harish K​ you can use the below query in spark SQL-%sql SELECT col1, array_join(collect_set(col2), ',') j FROM tmp GROUP BY col1

  • 10 kudos
1 More Replies
boyelana
by Contributor III
  • 2452 Views
  • 9 replies
  • 5 kudos

I am preparing for the data analyst exam and I need as many resources as I can get to fully prepare. Hands-on labs will be welcome as well

I am preparing for the data analyst exam and I need as many resources as I can get to fully prepare. Hands-on labs will be welcome as well

  • 2452 Views
  • 9 replies
  • 5 kudos
Latest Reply
tunstila
Contributor II
  • 5 kudos

Hi,Kindly refer to the materials below:Videohttps://info.databricks.com/dc/kvtpV3WYob2etSFEoxuDGMYVc6afyrIMgIW50ZzIbvpUgj2uOQyz91VsFjIVPsTMDcYAQ8K0HTbFHGKunTHn_tZmFrrG7SaByl8pfwUNMIZfHhQHiMHwQEKzYSwtM9Vr6hKVl28RlEsSlOluDqaxKqoLcg8-qEwq4xtnrG8zKMEOSpQ...

  • 5 kudos
8 More Replies
Searce
by New Contributor III
  • 1180 Views
  • 3 replies
  • 5 kudos

Databricks Cross cloud

We have service with AWS Databricks. We are doing the same replica on GCP Databricks. Here we required all the services and functionalities should be run in AWS and AWS Databricks. The only thing data should be stored on the GCP Storage. Simply funct...

  • 1180 Views
  • 3 replies
  • 5 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 5 kudos

no, right now i don't think they are supporting this type of architecture

  • 5 kudos
2 More Replies
Smitha1
by Valued Contributor II
  • 1144 Views
  • 1 replies
  • 2 kudos

#00244807 and #00245872 Ticket Status - HIGH Priority

Dear @Vidula Khanna​ Vidula, Databricks team, @Nadia Elsayed​ @Jose Gonzalez​ @Aden Jaxson​ What is the SLA/ETA for normal priority ticket and HIGH priority ticket?I created tickets #00244807 on 7th Dec and  #00245872 but haven't received any update ...

image.png
  • 1144 Views
  • 1 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

you can only create high-priority tasks if you have an enterprise plan.as a normal user you can only create normal tasksif you have enterprise plan then you can escalate case .databricks team will revert you soon there.

  • 2 kudos
john_odwyer
by New Contributor III
  • 3962 Views
  • 1 replies
  • 1 kudos

Resolved! Masking A Data Column

Is there a way to mask the data in a column in a table from specific users or user groups?

  • 3962 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

yesthis doc will be helpful for you -- https://www.databricks.com/blog/2020/11/20/enforcing-column-level-encryption-and-avoiding-data-duplication-with-pii.html

  • 1 kudos
Mahendra1
by New Contributor III
  • 684 Views
  • 1 replies
  • 0 kudos

Materials for preparing data bricks professional exam.

Hi All, Is there any book / materials for studying for data bricks professional certification ?Thank You !!!

  • 684 Views
  • 1 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

please check databricks academy,there you will find the right courses

  • 0 kudos
183530
by New Contributor III
  • 603 Views
  • 2 replies
  • 1 kudos

i need a regex to get whole word with parentheses

SELECT '(CC) ABC' REGEXP '\\b\\(CC\\)\\b' AS TEST1,    'A(CC) ABC' REGEXP '\\b\\(CC\\)\\b' AS TEST2,    'A (CC)A ABC' REGEXP '\\b\\(CC\\)\\b' AS TEST3,    'A (CC) A ABC' REGEXP '\\b\\(CC\\)\\b' AS TEST4,    'A ABC (CC)' REGEXP '\\b\\(CC\\)\\b' AS TES...

  • 603 Views
  • 2 replies
  • 1 kudos
Latest Reply
183530
New Contributor III
  • 1 kudos

get whole word "(CC)"I had already written the outputexpected outuput '(CC) ABC' REGEXP <<regex>> = TRUE'A(CC) ABC' REGEXP <<regex>> = FALSE'A (CC)A ABC' REGEXP <<regex>> = FALSE 'A (CC) A ABC' REGEXP <<regex>> = TRUE 'A ABC (CC)' REGEXP <<regex>> = ...

  • 1 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels