cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Pazuzu7
by New Contributor II
  • 2026 Views
  • 3 replies
  • 0 kudos

Upgrading to 11.3lts, Sedona functions throwing null when previously worked fine in 7.3

I'm in the process of upgrading to 11.3. I'm using spark 3.3.0, scala 2.12, maven and sedona 1.2.0 incubating and followed the installation as outlined by sedona here. Everything was running smoothly in version 7.3 but is currently throwing when reac...

  • 2026 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @William Honeyman​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 0 kudos
2 More Replies
iwan_aucamp
by New Contributor III
  • 2265 Views
  • 2 replies
  • 1 kudos

Account SCIM API OpenAPI specification issues

I'm trying to get a list of all users, groups and service principals on Azure from a python script. As I understand things I should be using the Account SCIM API for this. According to the azure documentation [ref], the OpenAPI specification for this...

  • 2265 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Iwan Aucamp​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 1 kudos
1 More Replies
Rahul2025
by New Contributor III
  • 7740 Views
  • 11 replies
  • 1 kudos

Limitation on size of init script

Hi,We're using Databricks Runtime version 11.3LTS and executing a Spark Java Job using a Job Cluster. To automate the execution of this job, we need to define (source in from bash config files) some environment variables through an init script (clust...

  • 7740 Views
  • 11 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Rahul K​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your ...

  • 1 kudos
10 More Replies
Anonymous
by Not applicable
  • 3235 Views
  • 4 replies
  • 0 kudos

Objective is to make table unique at ID using group by , concat_ws and collect_list ,combining distinct values in one row.

Objective is to make table unique at ID. Table structure is as in attached image.Query used is : selectID,concat_ws(' & ' , collect_list(Distinct Gender)) as Genderfrom tablegroup by IDIt can be possible if we can order values within collect_list and...

  • 3235 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Rishabh Shanker​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

  • 0 kudos
3 More Replies
Jerry01
by New Contributor III
  • 1238 Views
  • 2 replies
  • 0 kudos

Is writing custom function possible in transform(array,func) in databricks sql?

This is the query I am trying to implementCreate function data_hide(data string)Return if(is_member('groupName'),data,'****')​Table : my_tableId Subject​1. ['Eng','Bio']2. ['Phy','Mat']​Select id, transform(Subject, x -> data_hide(x)) as new_data...

  • 1238 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Naveena G​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 0 kudos
1 More Replies
goal1860
by New Contributor III
  • 48378 Views
  • 5 replies
  • 2 kudos

Resolved! Failed to signup community version

I've been trying to create Community Edition account, but keep getting: "An error has occurred. Please try again later" message. I searched the other posts, there are some people running into the same issue as well, but don't see any solution posted....

Screen Shot 2023-02-03 at 9.13.33 AM
  • 48378 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Liang He​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 2 kudos
4 More Replies
tibfab
by New Contributor II
  • 5950 Views
  • 5 replies
  • 0 kudos

How can I build a custom docker image for the ML runtime (e.g. 12.1 ML)?

I successfully built a custom docker image for the Standard runtime following the steps described on the page Customize containers with Databricks Container Services and based on the image databricksruntime/standard:11.3-LTS. However, I cannot find ...

  • 5950 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Tibor Fabian​ Help us build a vibrant and resourceful community by recognizing and highlighting insightful contributions. Mark the best answers and show your appreciation!

  • 0 kudos
4 More Replies
nolanlavender00
by New Contributor
  • 6877 Views
  • 2 replies
  • 0 kudos

How to control garbage collection while using Autoloader File Notification?

I am using Autoloader to load files from a directory. I have set up File Notification with the Event Subscription. I have a backfill interval set to 1 day and have not run the stream for a week. There should only be about ~100 new files to pick up an...

  • 6877 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @nolanlavender008​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 0 kudos
1 More Replies
joshi
by New Contributor II
  • 3982 Views
  • 5 replies
  • 0 kudos

Full screen video' button not working in spark certification videos

Hi All,Many users already posted about this but no action taken till now., i tried to use different browsers and system still not able to maximize the spark training videos.Many months passed still databricks people are not correcting this mistake. @...

databricks
  • 3982 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Abhishek Joshi​ Help us build a vibrant and resourceful community by recognizing and highlighting insightful contributions. Mark the best answers and show your appreciation!

  • 0 kudos
4 More Replies
chanansh
by Contributor
  • 2477 Views
  • 3 replies
  • 0 kudos

delta table grouping by key which is not partitioned by is very slow

I have a big data delta table with timestamp, key and metric(s) columns (e.g. m1, m2, ...).I often will group by the key (e.g. select max(m1) group by timestamp, key).I cannot partition by `key` because there are too many values( ~200K).I have tried ...

  • 2477 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Hanan Shteingart​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.T...

  • 0 kudos
2 More Replies
Soma
by Valued Contributor
  • 3131 Views
  • 5 replies
  • 0 kudos

Cosmos db spark patch api

Hi all we are trying to do cosmos patch api to a array field but the problem I see is we need to collect the data to get the index can you please let us know if we have an alternative as this causes bottleneck on driver

  • 3131 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @somanath Sankaran​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your fee...

  • 0 kudos
4 More Replies
andrew0117
by Contributor
  • 6760 Views
  • 6 replies
  • 2 kudos

index a dataframe from a csv file based on the file's original order (not based on any specific column, based on the entire row) using spark

how to guarantee the index is always following the file's original order no matter what. Currently, I'm using val df = spark.read.options(Map("header"-> "true", "inferSchema" -> "true")).csv("filePath").withColumn("index", monotonically_increasing...

  • 6760 Views
  • 6 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

monotonically_increasing_id will not as it is to guarantee that every partition has separate ids. What is the whole code? Do you load directory with a lot of CSVs? What "original order" means? Is it csvs ordered by file creation date, by file name? o...

  • 2 kudos
5 More Replies
Mado
by Valued Contributor II
  • 10744 Views
  • 3 replies
  • 0 kudos

How to update value of a column with MAP data-type in a delta table using a python dictionary and SQL UPDATE command?

I have a delta table created by:%sql   CREATE TABLE IF NOT EXISTS dev.bronze.test_map ( id INT, table_updates MAP<STRING, TIMESTAMP>,   CONSTRAINT test_map_pk PRIMARY KEY(id) ) USING DELTA LOCATION "abfss://bronze@Table Path"With initi...

image image.png image image
  • 10744 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Mohammad Saber​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedba...

  • 0 kudos
2 More Replies
kk007
by New Contributor III
  • 4409 Views
  • 4 replies
  • 4 kudos

Photon engine throws error "JSON document exceeded maximum allowed size 400.0 MiB"

I am reading a 83MB json file using " spark.read.json(storage_path)", when I display the data is seems displaying fine, but when I try command line count, it complains about file size , being more than 400MB, which is not true.Photon JSON reader erro...

  • 4409 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Kamal Kumar​ :The error message suggests that the JSON document size is exceeding the maximum allowed size of 400MB. This could be caused by one or more documents in your JSON file being larger than this limit. It is not a bug, but a limitation set ...

  • 4 kudos
3 More Replies
zeta_load
by New Contributor II
  • 2164 Views
  • 1 replies
  • 1 kudos

Resolved! Unique ID of table values is not unique anymore after merge every x-times

I have two tables with unique IDs:ID val ID val1 10 1 102 11 2 103 13 ...

  • 2164 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Lukas Goldschmied​ :There are a few reasons why you might be experiencing this issue:Data Skew: Data skew is a common problem in distributed computing when one or more nodes in the cluster have more data to process than others. This can lead to long...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels