cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Soma
by Valued Contributor
  • 3637 Views
  • 5 replies
  • 0 kudos

Cosmos db spark patch api

Hi all we are trying to do cosmos patch api to a array field but the problem I see is we need to collect the data to get the index can you please let us know if we have an alternative as this causes bottleneck on driver

  • 3637 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @somanath Sankaran​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your fee...

  • 0 kudos
4 More Replies
andrew0117
by Contributor
  • 7689 Views
  • 6 replies
  • 2 kudos

index a dataframe from a csv file based on the file's original order (not based on any specific column, based on the entire row) using spark

how to guarantee the index is always following the file's original order no matter what. Currently, I'm using val df = spark.read.options(Map("header"-> "true", "inferSchema" -> "true")).csv("filePath").withColumn("index", monotonically_increasing...

  • 7689 Views
  • 6 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 2 kudos

monotonically_increasing_id will not as it is to guarantee that every partition has separate ids. What is the whole code? Do you load directory with a lot of CSVs? What "original order" means? Is it csvs ordered by file creation date, by file name? o...

  • 2 kudos
5 More Replies
Mado
by Valued Contributor II
  • 13075 Views
  • 3 replies
  • 0 kudos

How to update value of a column with MAP data-type in a delta table using a python dictionary and SQL UPDATE command?

I have a delta table created by:%sql   CREATE TABLE IF NOT EXISTS dev.bronze.test_map ( id INT, table_updates MAP<STRING, TIMESTAMP>,   CONSTRAINT test_map_pk PRIMARY KEY(id) ) USING DELTA LOCATION "abfss://bronze@Table Path"With initi...

image image.png image image
  • 13075 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Mohammad Saber​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedba...

  • 0 kudos
2 More Replies
kk007
by New Contributor III
  • 4929 Views
  • 4 replies
  • 4 kudos

Photon engine throws error "JSON document exceeded maximum allowed size 400.0 MiB"

I am reading a 83MB json file using " spark.read.json(storage_path)", when I display the data is seems displaying fine, but when I try command line count, it complains about file size , being more than 400MB, which is not true.Photon JSON reader erro...

  • 4929 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Kamal Kumar​ :The error message suggests that the JSON document size is exceeding the maximum allowed size of 400MB. This could be caused by one or more documents in your JSON file being larger than this limit. It is not a bug, but a limitation set ...

  • 4 kudos
3 More Replies
zeta_load
by Databricks Partner
  • 2456 Views
  • 1 replies
  • 1 kudos

Resolved! Unique ID of table values is not unique anymore after merge every x-times

I have two tables with unique IDs:ID val ID val1 10 1 102 11 2 103 13 ...

  • 2456 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Lukas Goldschmied​ :There are a few reasons why you might be experiencing this issue:Data Skew: Data skew is a common problem in distributed computing when one or more nodes in the cluster have more data to process than others. This can lead to long...

  • 1 kudos
Alexander1
by New Contributor III
  • 17999 Views
  • 5 replies
  • 1 kudos

Resolved! Databricks JDBC/ODBC write batch size

I have spent way too much time to find a solution to the problem of efficiently writing data to Databricks via JDBC/ODBC. I have looked into countless docs, blogs and repos and I cannot find one example where someone is setting some kind of batch/bul...

  • 17999 Views
  • 5 replies
  • 1 kudos
Latest Reply
Alexander1
New Contributor III
  • 1 kudos

@Vidula Khanna​ yes, have done so. thanks.

  • 1 kudos
4 More Replies
nupur_dogra
by New Contributor II
  • 7404 Views
  • 4 replies
  • 0 kudos

Unable to get or download fundamentals of databricks lake house platform badge download

Hi Team,I have completed the fundamentals of databricks lakehouse platform and received the certificate but unable to download or get the badge.When I login to website I am unbale to see the badge as well.Please help me on this.I haven't received any...

  • 7404 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Nupur Dogra​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

  • 0 kudos
3 More Replies
744291
by Databricks Partner
  • 3185 Views
  • 6 replies
  • 0 kudos

I have attended an event on certification preparation on databricks data engineer associate on 17th Jan 2023.I have filled the survey form and it was ...

I have attended an event on certification preparation on databricks data engineer associate on 17th Jan 2023.I have filled the survey form and it was mentioned that I will receive the voucher in early Feb.Still I have not received.Please update me as...

  • 3185 Views
  • 6 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Rituparna Das​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

  • 0 kudos
5 More Replies
Meghala
by Valued Contributor II
  • 10882 Views
  • 11 replies
  • 2 kudos

Exam issues

How to approach the databricks team if we facing any problem ​Some time question is not appearing properly so any one know the solution kindly tell me ​

  • 10882 Views
  • 11 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @S Meghala​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

  • 2 kudos
10 More Replies
Krishna264
by New Contributor
  • 2232 Views
  • 2 replies
  • 0 kudos

Delta write stream to different folders dynamically based on input file

I have root folder and files are getting ingested in sub folders​ . Want to build a workflow which will write stream based on file being ingested

  • 2232 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Krishnamoorthy Natarajan​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Y...

  • 0 kudos
1 More Replies
tarhone
by New Contributor II
  • 7426 Views
  • 5 replies
  • 0 kudos

The Academy training "Introduction to Delta Live Tables" data source error

Hi,I flow the databricks academy training for "Introduction to Delta Live Tables", but it is cannot setup the environment, when excute first notebook "1_DLT UI Walkthrough", below is the error, who can provide the new data source path for this traini...

微信截图_20230205194952
  • 7426 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @fan tian​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we c...

  • 0 kudos
4 More Replies
venkat-bodempud
by New Contributor III
  • 3591 Views
  • 3 replies
  • 0 kudos

Databricks-PowerBI-Architecture-Help

Hello Community,We are currently working designing Power BI reports, the data source is databricks. We have all our reporting data in bronze/silver layer of databricks. we want to create summarized/aggregated tables in Gold layer and we want to conne...

  • 3591 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @bodempudi venkat​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us...

  • 0 kudos
2 More Replies
Ullsokk
by New Contributor III
  • 3422 Views
  • 2 replies
  • 0 kudos

What is a good way to implement unit tests using github actions for databricks?

I am trying to use a git template for unit tests on a databricks project. The framework uses pylint, pytest and black to check the code. But I am having a lot of trouble getting the github actions vm to run the code without issues. I have had issues ...

  • 3422 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Stian Arntsen​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

  • 0 kudos
1 More Replies
najmead
by Contributor
  • 5434 Views
  • 2 replies
  • 0 kudos

Creating an external table reference vs creating a view

In a practical sense, what is the difference between creating an external table;create table my_catalog.my_schema.my_favourite_table location 'abfss://path/to/my/dataversus creating a view that references the same dataset;create view my_catalog.my_sc...

  • 5434 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Nicholas Mead​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedbac...

  • 0 kudos
1 More Replies
param3sh
by New Contributor
  • 2974 Views
  • 3 replies
  • 0 kudos

Performance b/w Managed Table and Un-Managed table

I am using Databricks in Azure. I want to mount ADLS Gen2 on Databricks and create unmanged (external) tables on the mount point. But before that I want to know which will give best performance, is it Managed table (stores data in DBFS root)or Un-ma...

  • 2974 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Paramesh Malla​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedba...

  • 0 kudos
2 More Replies
Labels