Data Engineering

Forum Posts

Sorted by:

by AdamRink • New Contributor III

11-28-2022 6:03:26 AM

1004 Views
2 replies
6 kudos

How to limit batch size from Confluent Kafka

I have a large stream of data read from Confluent Kafka, 500+ millions of row. When I initialize the stream I cannot control the batch sizes that are read.I've tried setting options on the readstream - maxBytesPerTrigger, maxOffsetsPerTrigger, fetc...

Data Engineering

1004 Views
2 replies
6 kudos

11-28-2022 6:03:26 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

11-29-2022 10:46:52 AM

6 kudos

Hi @Adam Rink Just checking for further info on your question. How are you deducing that the batch sizes are more than what you are providing as maxOffsetsPerTrigger ?

6 kudos

11-29-2022 10:46:52 AM

1 More Replies

by Tahseen0354 • Contributor III

10-18-2022 2:03:51 AM

4759 Views
15 replies
39 kudos

How do I compare cost between databricks gcp and azure databricks ?

I have a databricks job running in azure databricks. A similar job is also running in databricks gcp. I would like to compare the cost. If I assign a custom tag to the job cluster running in azure databricks, I can see the cost incurred by that job i...

Data Engineering

4759 Views
15 replies
39 kudos

10-18-2022 2:03:51 AM

View Replies

Latest Reply

Own
Contributor

11-29-2022 1:41:11 PM

39 kudos

In Azure, you can use Cost Management to track your expenses incurred by Databricks instance.

39 kudos

11-29-2022 1:41:11 PM

14 More Replies

by ossinova • Contributor II

11-29-2022 4:27:04 AM

671 Views
1 replies
0 kudos

Schedule reload of system.information_schema for external tables in platform

Probably not feasible, but is there a way to update (via STORED PROCEDURE, FUNCTION or SQL query) the information schema of all external tables within Databricks. Last updated that I can see was when I converted the tables to Unity. From my understa...

Data Engineering

671 Views
1 replies
0 kudos

11-29-2022 4:27:04 AM

View Replies

Latest Reply

Own
Contributor

11-29-2022 1:28:35 PM

0 kudos

You can try optimize and cache with the internal tables such as schema tables to fetch updated information.

0 kudos

11-29-2022 1:28:35 PM

by rammy • Contributor III

11-21-2022 10:17:34 PM

1481 Views
3 replies
11 kudos

How would i retrieve data JSON data with namespaces using spark SQL?

File.json from the below code contains huge JSON data with each key containing namespace prefix(This JSON file converted from the XML file).I could able to retrieve if JSON does not contain namespaces but what could be the approach to retrieve record...

Data Engineering

1481 Views
3 replies
11 kudos

11-21-2022 10:17:34 PM

View Replies

Latest Reply

SS2
Valued Contributor

11-29-2022 12:45:22 PM

11 kudos

I case of struct you can use (.) For extracting the value

11 kudos

11-29-2022 12:45:22 PM

2 More Replies

by allan-silva • New Contributor III

11-23-2022 5:42:11 AM

1828 Views
3 replies
4 kudos

Resolved! Can't create database - UnsupportedFileSystemException No FileSystem for scheme "dbfs"

I'm following a class "DE 3.1 - Databases and Tables on Databricks", but it is not possible create databases due to "AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.fs.Unsupp...

Data Engineering

1828 Views
3 replies
4 kudos

11-23-2022 5:42:11 AM

View Replies

Latest Reply

allan-silva
New Contributor III

11-29-2022 12:15:06 PM

4 kudos

A colleague from my work figured out the problem: the cluster being used wasn't configured to use DBFS when running notebooks.

4 kudos

11-29-2022 12:15:06 PM

2 More Replies

by Shiva_Dsouz • New Contributor II

11-24-2022 6:22:13 AM

969 Views
1 replies
1 kudos

How to get spark streaming metrics like input rows, processed rows and batch duration to Prometheus for monitoring

I have been reading this article https://www.databricks.com/session_na20/native-support-of-prometheus-monitoring-in-apache-spark-3-0 and it has been mentioned that we can get the spark streaming metrics like input rows, processing rate and batch dura...

Data Engineering

969 Views
1 replies
1 kudos

11-24-2022 6:22:13 AM

View Replies

Latest Reply

SS2
Valued Contributor

11-29-2022 11:37:37 AM

1 kudos

I think you can use spark UI to see deep level details

1 kudos

11-29-2022 11:37:37 AM

by andalo • New Contributor II

11-24-2022 2:45:53 PM

1375 Views
3 replies
2 kudos

Databricks cluster failure

do you help me with the next error?MessageCluster terminated. Reason: Azure Vm Extension FailureHelpInstance bootstrap failed.Failure message: Cloud Provider Failure. Azure VM Extension stuck on transitioning state. Please try again later.VM extensio...

Data Engineering

1375 Views
3 replies
2 kudos

11-24-2022 2:45:53 PM

View Replies

Latest Reply

SS2
Valued Contributor

11-29-2022 11:36:44 AM

2 kudos

You can restart the cluster and check once.

2 kudos

11-29-2022 11:36:44 AM

2 More Replies

by mickniz • Contributor

11-24-2022 11:30:37 PM

1814 Views
6 replies
10 kudos

What is the best way to take care of Drop and Rename a column in Schema evaluation.

I would need some suggestion from DataBricks Folks. As per documentation in Schema Evaluation for Drop and Rename Data is overwritten. Does it means we loose data (because I read data is not deleted but kind of staged). Is it possible to query old da...

Data Engineering

1814 Views
6 replies
10 kudos

11-24-2022 11:30:37 PM

View Replies

Latest Reply

SS2
Valued Contributor

11-29-2022 11:31:31 AM

10 kudos

Overwritte option will overwritte your data. If you want to change column name then you can first alter the delta table as per your need then you can append new data as well. So both problems you can resolve

10 kudos

11-29-2022 11:31:31 AM

5 More Replies

by Shirley • New Contributor III

11-25-2022 9:12:10 AM

4203 Views
12 replies
8 kudos

Cluster terminated after 120 mins and cannot restart

Last night the cluster was working properly, but this morning the cluster was terminated automatically and cannot be restarted. Got an error message under sparkUI: Could not find data to load UI for driver 5526297689623955253 in cluster 1125-062259-i...

Data Engineering

4203 Views
12 replies
8 kudos

11-25-2022 9:12:10 AM

View Replies

Latest Reply

SS2
Valued Contributor

11-29-2022 11:24:15 AM

8 kudos

Then can use.

8 kudos

11-29-2022 11:24:15 AM

11 More Replies

by kodvakare • New Contributor III

11-25-2022 7:25:18 AM

2223 Views
5 replies
9 kudos

Resolved! How to write same code in different locations in the DB notebook?

The old version of the notebook had this feature, where you could Ctrl+click on different positions in a notebook cell to bring the cursor there, and type to update the code in both the positions like in JupyterLab. The newer version is awesome but s...

Old DataBricks version, update in multiple positions like Jupyter IDE

Data Engineering

2223 Views
5 replies
9 kudos

11-25-2022 7:25:18 AM

View Replies

Latest Reply

SS2
Valued Contributor

11-29-2022 11:13:11 AM

9 kudos

Alt+click is working fine

9 kudos

11-29-2022 11:13:11 AM

4 More Replies

by SindhujaRaghupa • New Contributor II

03-21-2018 9:44:37 AM

6783 Views
3 replies
1 kudos

Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4, localhost, executor driver): java.lang.NullPointerException

I have uploaded a csv file which have well formatted data and I was trying to use display(questions) where questions=spark.read.option("header","true").csv("/FileStore/tables/Questions.csv")This is throwing an error as follows:SparkException: Job abo...

Data Engineering

6783 Views
3 replies
1 kudos

03-21-2018 9:44:37 AM

View Replies

Latest Reply

SS2
Valued Contributor

11-29-2022 11:05:45 AM

1 kudos

You can use inferschema

1 kudos

11-29-2022 11:05:45 AM

2 More Replies

by pkgltn • New Contributor III

11-29-2022 9:40:03 AM

569 Views
0 replies
0 kudos

Mounting a Azure Storage Account path on Databricks

Hi,I have a Databricks instance and I mounted the Azure Storage Account. When I run the following command, the output is ExecutionError: An error occurred while calling o1168.ls.: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.util...

Data Engineering

569 Views
0 replies
0 kudos

11-29-2022 9:40:03 AM

by Muthumk255 • New Contributor

11-28-2022 10:18:34 PM

799 Views
2 replies
0 kudos

Cannot sign in at databricks partner-academy portal

Hi thereI have used my company email to register an account for databricks learning .databricks.com a while back.Now what I need to do is create an account with partner-academy.databricks.com using my company email too.However when I register at part...

Data Engineering

799 Views
2 replies
0 kudos

11-28-2022 10:18:34 PM

View Replies

Latest Reply

Harshjot
Contributor III

11-29-2022 9:02:57 AM

0 kudos

Hi @Muthukrishnan Balasubramanian I got the same issue a while back what worked for me is registering using personal account on partner academy then later I changed my email to my work email. Not sure if it's the best way to sort the issue.

0 kudos

11-29-2022 9:02:57 AM

1 More Replies

by Chandru • New Contributor II

11-29-2022 6:08:09 AM

2948 Views
2 replies
2 kudos

Resolved! Issue in importing librosa library while using databricks runtime engine 11.2

I have installed the library via PyPI on the cluster. When we import the package on notebook, getting the following errorimport librosaOSError: cannot load library 'libsndfile.so': libsndfile.so: cannot open shared object file: No such file or direct...

Data Engineering

2948 Views
2 replies
2 kudos

11-29-2022 6:08:09 AM

View Replies

Latest Reply

Chandru
New Contributor II

11-29-2022 8:25:40 AM

2 kudos

Thank you werners. Just figured that out and had an init script to sort out the issue. Below steps helped me to solve the issue.dbutils.fs.mkdirs("dbfs:/cluster-init/scripts/")dbutils.fs.put("/cluster-init/scripts/libsndfile-install.sh","""#!/bin/bas...

2 kudos

11-29-2022 8:25:40 AM

1 More Replies

by db-avengers2rul • Contributor II

11-29-2022 5:18:27 AM

1050 Views
1 replies
0 kudos

Resolved! zip file not able to import in workspace

Dear Team,Using the community edition when i tried to import a zip file it is always throwing some error

Data Engineering

1050 Views
1 replies
0 kudos

11-29-2022 5:18:27 AM

View Replies

Latest Reply

db-avengers2rul
Contributor II

11-29-2022 5:19:52 AM

0 kudos

Please refer to the error in the attachment my question is this restriction is only for community edition ? or also for premium account ?

0 kudos

11-29-2022 5:19:52 AM

User

Count

1601

736

343

284

246

Databricks

Forum Posts

How to limit batch size from Confluent Kafka

How do I compare cost between databricks gcp and azure databricks ?

Schedule reload of system.information_schema for external tables in platform

How would i retrieve data JSON data with namespaces using spark SQL?

Resolved! Can't create database - UnsupportedFileSystemException No FileSystem for scheme "dbfs"

How to get spark streaming metrics like input rows, processed rows and batch duration to Prometheus for monitoring

Databricks cluster failure

What is the best way to take care of Drop and Rename a column in Schema evaluation.

Cluster terminated after 120 mins and cannot restart

Resolved! How to write same code in different locations in the DB notebook?

Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4, localhost, executor driver): java.lang.NullPointerException

Mounting a Azure Storage Account path on Databricks

Cannot sign in at databricks partner-academy portal

Resolved! Issue in importing librosa library while using databricks runtime engine 11.2

Resolved! zip file not able to import in workspace

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...