Data Engineering

Forum Posts

Sorted by:

by KJ_Saravana • New Contributor III

03-20-2023 12:44:45 AM

7791 Views
6 replies
9 kudos

Resolved! Databricks cluster Init scripts on ABFSS location

HI,I have an init script which works on DBFS location during the cluster start up, but when the same shell script file is placed on ABFSS location (ADLS Gen 2 storage) I get the following init script failure error and the cluster is unable to start.E...

Data Engineering

7791 Views
6 replies
9 kudos

03-20-2023 12:44:45 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-26-2023 10:05:56 PM

9 kudos

Hi @Saravana KJ I'm sorry you could not find a solution to your problem in the answers provided.Our community strives to provide helpful and accurate information, but sometimes an immediate solution may only be available for some issues.I suggest pr...

9 kudos

03-26-2023 10:05:56 PM

5 More Replies

by ima94 • New Contributor II

02-13-2023 8:01:42 AM

5699 Views
1 replies
1 kudos

read cdm error: java.util.NoSuchElementException: None.get

Hi all, I'm trying to read cdm file and get the error in the image (I replaced the names in uppercase). Any ideas on how to solve it?Thank you!

Data Engineering

5699 Views
1 replies
1 kudos

02-13-2023 8:01:42 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-10-2023 6:02:41 PM

1 kudos

Hi @imma marra Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...

1 kudos

03-10-2023 6:02:41 PM

by Trodenn • New Contributor III

01-30-2023 8:41:20 AM

5019 Views
5 replies
1 kudos

Resolved! ApprodxQuantile does not seem to be working with delta live tables (DLT)

HI,I am tying to use the approxQuantile() function and populate a list that I made, yet somehow, whenever I try to run the code it's as if the list is empty and there are no values in it.Code is written as below:@dlt.table(name = "customer_order_silv...

Data Engineering

5019 Views
5 replies
1 kudos

01-30-2023 8:41:20 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-30-2023 10:15:00 AM

1 kudos

Maybe try to use (and the first test in the separate notebook) standard df = spark.read.table("customer_order_silver") to calculate approxQuantile.Of course, you need to set that customer_order_silver has a target location in the catalog, so read us...

1 kudos

01-30-2023 10:15:00 AM

4 More Replies

by sharonbjehome • New Contributor

11-16-2022 4:17:29 AM

1711 Views
1 replies
1 kudos

Structered Streamin from MongoDB Atlas not parsing JSON correctly

HI all,I have a table in MongoDB Atlas that I am trying to read continuously to memory and then will write that file out eventually. However, when I look at the in-memory table it doesn't have the correct schema.Code here:from pyspark.sql.types impo...

Data Engineering

1711 Views
1 replies
1 kudos

11-16-2022 4:17:29 AM

View Replies

Latest Reply

Debayan
Databricks Employee

11-17-2022 11:36:04 PM

1 kudos

Hi @sharonbjehome , This has to be checked thoroughly via a support ticket, did you follow: https://docs.databricks.com/external-data/mongodb.html Also, could you please check with mongodb support, Was this working before?

1 kudos

11-17-2022 11:36:04 PM

by Fairy • New Contributor

09-23-2022 4:04:27 AM

1246 Views
0 replies
0 kudos

Error in Loading VDS to Dremio

Hi, I am getting this error increasingly while loading VDS to Dremio. Do you know how I can avoid it?Out[144]: {'statusCode': 400, 'headers': {'Content-Type': 'application/json'}, 'body': 'Failed - SYSTEM ERROR: UnsupportedOperationException: Additio...

Data Engineering

1246 Views
0 replies
0 kudos

09-23-2022 4:04:27 AM

by ahana • New Contributor III

08-11-2022 11:50:18 PM

1193 Views
0 replies
0 kudos

i tried to pull the report from QuickBase but it is giving error report too large

hii tried to pull the report from below query%pythondf = quickbasePull('b5zj8k_pbz5_0_cd5h4wbbp77n4nvp56b4u','bqmnP8jm7',24)but it is giving me error report too largethen i tried below%pythonimport pyqbfrom pyspark.sql import *import pandas as pdqbc ...

Data Engineering

1193 Views
0 replies
0 kudos

08-11-2022 11:50:18 PM

by Krishscientist • New Contributor III

04-08-2022 4:48:14 PM

1251 Views
1 replies
0 kudos

Resolved! AutoML : data set for problem type "Classification"

HI,I am working on AutoML Experiment. Could you plz help me with data set for problem type "Classification"Regards.

Data Engineering

1251 Views
1 replies
0 kudos

04-08-2022 4:48:14 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-09-2022 4:18:05 PM

0 kudos

There are a lot of datasets available in /databricks-datasets/ that you can look through. You'll have to turn them into a table so that you can access them in automl. There are datasets associated with the spark definitive guide and learning spark ...

0 kudos

04-09-2022 4:18:05 PM

by Autel • New Contributor II

01-08-2022 9:31:05 PM

4295 Views
3 replies
0 kudos

Resolved! concurrent update to same hive or deltalake table

HI, I'm interested to know if multiple executors to append the same hive table using saveAsTable or insertInto sparksql. will that cause any data corruption? What configuration do I need to enable concurrent write to same hive table? what about the s...

Data Engineering

4295 Views
3 replies
0 kudos

01-08-2022 9:31:05 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

01-10-2022 1:21:46 AM

0 kudos

The Hive table will not like this, as the underlying data is parquet format which is not ACID compliant.Delta lake however is:https://docs.delta.io/0.5.0/concurrency-control.htmlYou can see that inserts do not give conflicts.

0 kudos

01-10-2022 1:21:46 AM

2 More Replies

by guruv • New Contributor III

12-27-2021 10:22:53 PM

19827 Views
4 replies
5 kudos

Resolved! parquet file to include partitioned column in file

HI,I have a daily scheduled job which processes the data and write as parquet file in a specific folder structure like root_folder/{CountryCode}/parquetfiles. Where each day job will write new data for countrycode under the folder for countrycodeI am...

Data Engineering

19827 Views
4 replies
5 kudos

12-27-2021 10:22:53 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

12-28-2021 2:45:32 AM

5 kudos

Most external consumers will read partition as column when are properly configured (for example Azure Data Factory or Power BI).Only way around is that you will duplicate column with other name (you can not have the same name as it will generate conf...

5 kudos

12-28-2021 2:45:32 AM

3 More Replies

Databricks Community

Resolved! Databricks cluster Init scripts on ABFSS location

read cdm error: java.util.NoSuchElementException: None.get

Resolved! ApprodxQuantile does not seem to be working with delta live tables (DLT)

Structered Streamin from MongoDB Atlas not parsing JSON correctly

Error in Loading VDS to Dremio

i tried to pull the report from QuickBase but it is giving error report too large

Resolved! AutoML : data set for problem type "Classification"

Resolved! concurrent update to same hive or deltalake table

Resolved! parquet file to include partitioned column in file