Data Engineering

Forum Posts

Sorted by:

by chaitanya • New Contributor II

08-13-2021 8:27:54 AM

2254 Views
3 replies
4 kudos

Resolved! While loading Data from blob to delta lake facing below issue

I'm calling the stored proc then store into pandas dataframe then creating list while creating list getting below error Databricks execution failed with error state Terminated. For more details please check the run page url: path An error occurred w...

Data Engineering

2254 Views
3 replies
4 kudos

08-13-2021 8:27:54 AM

View Replies

Latest Reply

shan_chandra
Honored Contributor III

01-06-2022 2:37:31 PM

4 kudos

@chaitanya , could you please try disabling arrow optimization and see if this resolves the issue?spark.sql.execution.arrow.enabled falsespark.sql.execution.arrow.pyspark.enabled false

4 kudos

01-06-2022 2:37:31 PM

2 More Replies

by sanjoydas6 • New Contributor III

11-16-2021 4:52:41 AM

3986 Views
15 replies
3 kudos

Resolved! Problem faced while trying to Reset my Community Edition Password

I have forgotten my Databricks Community Edition Password and is trying to Reset the same using the Forgot Password link. It is saying that an Email will be sent with the link to reset the password but the Email is not coming. However Databricks mail...

Data Engineering

3986 Views
15 replies
3 kudos

11-16-2021 4:52:41 AM

View Replies

Latest Reply

Kaniz
Community Manager

12-22-2021 9:51:12 AM

3 kudos

Hi @Sanjoy Das , We could not find an account associated to your email. Did you pass the correct email or did you delete your account? Can you please create a CE account or pass the correct email address over mail with which you used to browse the C...

3 kudos

12-22-2021 9:51:12 AM

14 More Replies

by maranBH • New Contributor III

11-24-2021 6:21:55 AM

1072 Views
4 replies
1 kudos

Resolved! Trained model artifact, CI/CD and Databricks without MLFlow.

Hi all,We are constructing our CI/CD pipelines with the Repos feature following this guide:https://databricks.com/blog/2021/09/20/part-1-implementing-ci-cd-on-databricks-using-databricks-notebooks-and-azure-devops.htmlI'm trying to implement my pipes...

Data Engineering

1072 Views
4 replies
1 kudos

11-24-2021 6:21:55 AM

View Replies

Latest Reply

sean_owen
Honored Contributor II

01-05-2022 7:14:39 PM

1 kudos

So you are managing your models with MLflow, and want to include them in a git repository?You can do that in a CI/CD process; it would run the mlflow CLI to copy the model you want (e.g. model:/my_model/production) to a git checkout and then commit i...

1 kudos

01-05-2022 7:14:39 PM

3 More Replies

by bchaubey • Contributor II

01-05-2022 6:22:00 PM

329 Views
0 replies
0 kudos

@Kunal Gaurav any whatsapp group for Databricks discussion?

Data Engineering

329 Views
0 replies
0 kudos

01-05-2022 6:22:00 PM

by dimsh • Contributor

01-04-2022 7:30:57 AM

9944 Views
3 replies
1 kudos

Resolved! Delta Table is not available in the Databricks SQL

Hi, there!I'm trying to read a data (simple SELECT * FROM schema.tabl_a) from the "Queries" Tab inside the Databricks SQL platform, but always getting "org.apache.spark.sql.AnalysisException: dbfs:/.../.. doesn't exist" DescribeRelation true, [col_na...

Data Engineering

9944 Views
3 replies
1 kudos

01-04-2022 7:30:57 AM

View Replies

Latest Reply

Anonymous
Not applicable

01-04-2022 12:42:14 PM

1 kudos

Because it's a delta table, you don't need to provide the schema.

1 kudos

01-04-2022 12:42:14 PM

2 More Replies

by RicksDB • Contributor II

12-29-2021 5:43:46 PM

2208 Views
6 replies
6 kudos

Resolved! SingleNode all-purpose cluster for small ETLs

Hi,I have many "small" jobs than needs to be executed quickly and at a predictable low cost from several Azure Data Factory pipelines. For this reason, I configured a small single node cluster to execute those processes. For the moment, everything se...

Data Engineering

2208 Views
6 replies
6 kudos

12-29-2021 5:43:46 PM

View Replies

Latest Reply

RicksDB
Contributor II

01-04-2022 5:52:06 AM

6 kudos

@Bilal Aslam In my case, it usually depends on the customers and their SLA. Most of them usually do not have a "true" high SLA requirement thus prefer the jobs to be throttled when the actual cost is within a certain range of the budget instead of ...

6 kudos

01-04-2022 5:52:06 AM

5 More Replies

by dcrezee • New Contributor III

12-20-2021 6:57:10 AM

3886 Views
11 replies
3 kudos

Resolved! Issue with quotes in struct type columns when using ODBC

I'm trying to connect to Databricks using pyodbc and I'm running into an issue with struct columns. As far as I understand, struct columns and array columns are not supported by pyodbc, but they are converted to JSON. However, when there are nested c...

Data Engineering

3886 Views
11 replies
3 kudos

12-20-2021 6:57:10 AM

View Replies

Latest Reply

BilalAslamDbrx
Honored Contributor II

12-27-2021 4:21:43 PM

3 kudos

@Derk Crezee - I learned something today. Apparently ODBC does not convert to JSON. There is no defined spec on how to return complex types, in fact that was added only in SQL 2016. That's exactly what you are running into!End of history lesson Her...

3 kudos

12-27-2021 4:21:43 PM

10 More Replies

by RicksDB • Contributor II

12-30-2021 6:28:44 AM

2131 Views
9 replies
1 kudos

Configure jobs throttling for ephemeral cluster ETLs

Hi,Is it possible to configure job throttling in order to queue jobs across a workspace after a given number of concurrent execution when using the ephemeral cluster pattern? The reason is mainly for cost control. We prefer reducing performance rathe...

Data Engineering

2131 Views
9 replies
1 kudos

12-30-2021 6:28:44 AM

View Replies

Latest Reply

RicksDB
Contributor II

12-30-2021 5:03:27 PM

1 kudos

Thanks for the help josephk. I will continue to use an interactive cluster for the time being until the release of that new feature. Hopefully, it will allow my use case. Is there visibility on the roadmap for an ETA or more information on it?

1 kudos

12-30-2021 5:03:27 PM

8 More Replies

by barashe • New Contributor II

12-30-2021 3:59:12 AM

731 Views
1 replies
0 kudos

Installing python modules on databricks job clusters

Different than all-purpose clusters, the databricks job new cluster configuration window does not have a "Libraries" tab, in which specific python modules could be installed. What's the best practice for installing python modules on such clusters?

Data Engineering

731 Views
1 replies
0 kudos

12-30-2021 3:59:12 AM

View Replies

Latest Reply

barashe
New Contributor II

01-02-2022 1:35:45 AM

0 kudos

It turns out that the option exists outside of the cluster configuration scope, in the task configuration window itself - under "Advanced options" -> "Add dependent libraries".

0 kudos

01-02-2022 1:35:45 AM

by pthaenraj • New Contributor III

09-11-2021 8:19:04 AM

2874 Views
10 replies
8 kudos

Resolved! Databricks Certified Professional Data Scientist Exam Question Types

Hello,I am not seeing a lot of information regarding the Databricks Certified Professional Data Scientistexam. I took the Associate Developer in Apache Spark Exam last year and the materials for the exam seemed much more focused than what I found for...

Data Engineering

2874 Views
10 replies
8 kudos

09-11-2021 8:19:04 AM

View Replies

Latest Reply

Abdull
New Contributor III

01-01-2022 12:15:05 AM

8 kudos

Hello @Sundar R , Yes I took the exam. Unfortunately I fail to reach the pass mark even though I got close. Things I could have did different:I focused so much in mastering each topics i.e. linear, logistic & regularized regression, ALS and etc. But...

8 kudos

01-01-2022 12:15:05 AM

9 More Replies

by guruv • New Contributor III

12-27-2021 10:22:53 PM

12636 Views
4 replies
5 kudos

Resolved! parquet file to include partitioned column in file

HI,I have a daily scheduled job which processes the data and write as parquet file in a specific folder structure like root_folder/{CountryCode}/parquetfiles. Where each day job will write new data for countrycode under the folder for countrycodeI am...

Data Engineering

12636 Views
4 replies
5 kudos

12-27-2021 10:22:53 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

12-28-2021 2:45:32 AM

5 kudos

Most external consumers will read partition as column when are properly configured (for example Azure Data Factory or Power BI).Only way around is that you will duplicate column with other name (you can not have the same name as it will generate conf...

5 kudos

12-28-2021 2:45:32 AM

3 More Replies

by RantoB • Valued Contributor

11-02-2021 8:17:44 AM

4135 Views
19 replies
7 kudos

Resolved! unzip twice the same file not executing

Hi, I need to unzip some files that are ingested but when I unzip twice the same zipped file, the unzip command does not execute :As suggesgted in the documentation I did :import urllib urllib.request.urlretrieve("https://resources.lendingclub.com/L...

Data Engineering

4135 Views
19 replies
7 kudos

11-02-2021 8:17:44 AM

View Replies

Latest Reply

Kaniz
Community Manager

12-29-2021 1:30:26 AM

7 kudos

Hi @Bertrand BURCKER , Create a script.sh and copy the script in the directory where is data.zip archive. This script is working with any name of archives and any name of csv.#!/bin/bash currLoc="$PWD" path="${currLoc}" cd ${currLoc} #EXTRACT ...

7 kudos

12-29-2021 1:30:26 AM

18 More Replies

by marsjuli • New Contributor II

10-24-2021 10:39:07 AM

8572 Views
5 replies
3 kudos

Resolved! How to handle <IPython.core.display.HTML object>

Some libraries have intermediate IPython HTML-objects returned to the notebook cell output.Since this happens during training a machine learning model the statements are typically buried within in the library so I cannot easily interfere. (e.g. in or...

Data Engineering

8572 Views
5 replies
3 kudos

10-24-2021 10:39:07 AM

View Replies

Latest Reply

marsjuli
New Contributor II

11-17-2021 12:30:57 AM

3 kudos

Hi @Kaniz Fatma ,thanks for showing me the link. This helps if you are in control of the generated html-object. If the html-content comes from a library, that is where the problems start, because I cannot wrap displayHTML().(I can of course look for...

3 kudos

11-17-2021 12:30:57 AM

4 More Replies

by Development • New Contributor III

12-28-2021 11:37:40 PM

359 Views
0 replies
0 kudos

Hi All, I hope you're doing well I am facing issue while installing an python library on ADB Cluster. lib - PyCaret ( latest version) its not gett...

Hi All,I hope you're doing wellI am facing issue while installing an python library on ADB Cluster.lib - PyCaret ( latest version)its not getting install and showing me 'Failed' Status.It would be great if you can help here !!Thanks

Data Engineering

359 Views
0 replies
0 kudos

12-28-2021 11:37:40 PM

by TimK • New Contributor II

12-20-2021 12:07:04 PM

2424 Views
3 replies
1 kudos

Resolved! Cannot Get Databricks SQL to read external Hive Metastore

I have followed the documentation and using the same metastore config that is working in the Data Engineering context. When attempting to view the Databases, I get the error:Encountered an internal errorThe following information failed to load:The li...

Data Engineering

2424 Views
3 replies
1 kudos

12-20-2021 12:07:04 PM

View Replies

Latest Reply

TimK
New Contributor II

12-27-2021 7:30:07 AM

1 kudos

@Bilal Aslam I didn't think to look there before since I hadn't tried to run any queries. I see the failed SHOW DATABASES queries in history and they identify the error: Builtin jars can only be used when hive execution version == hive metastore v...

1 kudos

12-27-2021 7:30:07 AM

2 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Resolved! While loading Data from blob to delta lake facing below issue

Resolved! Problem faced while trying to Reset my Community Edition Password

Resolved! Trained model artifact, CI/CD and Databricks without MLFlow.

@Kunal Gaurav any whatsapp group for Databricks discussion?

Resolved! Delta Table is not available in the Databricks SQL

Resolved! SingleNode all-purpose cluster for small ETLs

Resolved! Issue with quotes in struct type columns when using ODBC

Configure jobs throttling for ephemeral cluster ETLs

Installing python modules on databricks job clusters

Resolved! Databricks Certified Professional Data Scientist Exam Question Types

Resolved! parquet file to include partitioned column in file

Resolved! unzip twice the same file not executing

Resolved! How to handle <IPython.core.display.HTML object>

Hi All, I hope you're doing well I am facing issue while installing an python library on ADB Cluster. lib - PyCaret ( latest version) its not gett...

Resolved! Cannot Get Databricks SQL to read external Hive Metastore

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...