cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

AdamRink
by New Contributor III
  • 1497 Views
  • 2 replies
  • 6 kudos

How to limit batch size from Confluent Kafka

I have a large stream of data read from Confluent Kafka, 500+ millions of row. When I initialize the stream I cannot control the batch sizes that are read.I've tried setting options on the readstream - maxBytesPerTrigger, maxOffsetsPerTrigger, fetc...

  • 1497 Views
  • 2 replies
  • 6 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 6 kudos

Hi @Adam Rink​ Just checking for further info on your question. How are you deducing that the batch sizes are more than what you are providing as maxOffsetsPerTrigger ?

  • 6 kudos
1 More Replies
Tahseen0354
by Contributor III
  • 7130 Views
  • 15 replies
  • 39 kudos

How do I compare cost between databricks gcp and azure databricks ?

I have a databricks job running in azure databricks. A similar job is also running in databricks gcp. I would like to compare the cost. If I assign a custom tag to the job cluster running in azure databricks, I can see the cost incurred by that job i...

  • 7130 Views
  • 15 replies
  • 39 kudos
Latest Reply
Own
Contributor
  • 39 kudos

In Azure, you can use Cost Management to track your expenses incurred by Databricks instance.

  • 39 kudos
14 More Replies
ossinova
by Contributor II
  • 929 Views
  • 1 replies
  • 0 kudos

Schedule reload of system.information_schema for external tables in platform

Probably not feasible, but is there a way to update (via STORED PROCEDURE, FUNCTION or SQL query) the information schema of all external tables within Databricks. Last updated that I can see was when I converted the tables to Unity. From my understa...

  • 929 Views
  • 1 replies
  • 0 kudos
Latest Reply
Own
Contributor
  • 0 kudos

You can try optimize and cache with the internal tables such as schema tables to fetch updated information.

  • 0 kudos
rammy
by Contributor III
  • 2119 Views
  • 3 replies
  • 11 kudos

How would i retrieve data JSON data with namespaces using spark SQL?

File.json from the below code contains huge JSON data with each key containing namespace prefix(This JSON file converted from the XML file).I could able to retrieve if JSON does not contain namespaces but what could be the approach to retrieve record...

image.png image
  • 2119 Views
  • 3 replies
  • 11 kudos
Latest Reply
SS2
Valued Contributor
  • 11 kudos

I case of struct you can use (.) For extracting the value

  • 11 kudos
2 More Replies
allan-silva
by New Contributor III
  • 2561 Views
  • 3 replies
  • 4 kudos

Resolved! Can't create database - UnsupportedFileSystemException No FileSystem for scheme "dbfs"

I'm following a class "DE 3.1 - Databases and Tables on Databricks", but it is not possible create databases due to "AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.fs.Unsupp...

  • 2561 Views
  • 3 replies
  • 4 kudos
Latest Reply
allan-silva
New Contributor III
  • 4 kudos

A colleague from my work figured out the problem: the cluster being used wasn't configured to use DBFS when running notebooks.

  • 4 kudos
2 More Replies
Shiva_Dsouz
by New Contributor II
  • 1359 Views
  • 1 replies
  • 1 kudos

How to get spark streaming metrics like input rows, processed rows and batch duration to Prometheus for monitoring

I have been reading this article https://www.databricks.com/session_na20/native-support-of-prometheus-monitoring-in-apache-spark-3-0 and it has been mentioned that we can get the spark streaming metrics like input rows, processing rate and batch dura...

  • 1359 Views
  • 1 replies
  • 1 kudos
Latest Reply
SS2
Valued Contributor
  • 1 kudos

I think you can use spark UI to see deep level details ​

  • 1 kudos
andalo
by New Contributor II
  • 1839 Views
  • 3 replies
  • 2 kudos

Databricks cluster failure

do you help me with the next error?MessageCluster terminated. Reason: Azure Vm Extension FailureHelpInstance bootstrap failed.Failure message: Cloud Provider Failure. Azure VM Extension stuck on transitioning state. Please try again later.VM extensio...

  • 1839 Views
  • 3 replies
  • 2 kudos
Latest Reply
SS2
Valued Contributor
  • 2 kudos

You can restart the cluster and check once.​

  • 2 kudos
2 More Replies
mickniz
by Contributor
  • 2676 Views
  • 6 replies
  • 10 kudos

What is the best way to take care of Drop and Rename a column in Schema evaluation.

I would need some suggestion from DataBricks Folks. As per documentation in Schema Evaluation for Drop and Rename Data is overwritten. Does it means we loose data (because I read data is not deleted but kind of staged). Is it possible to query old da...

  • 2676 Views
  • 6 replies
  • 10 kudos
Latest Reply
SS2
Valued Contributor
  • 10 kudos

Overwritte ​option will overwritte your data. If you want to change column name then you can first alter the delta table as per your need then you can append new data as well. So both problems you can resolve

  • 10 kudos
5 More Replies
Shirley
by New Contributor III
  • 5857 Views
  • 12 replies
  • 8 kudos

Cluster terminated after 120 mins and cannot restart

Last night the cluster was working properly, but this morning the cluster was terminated automatically and cannot be restarted. Got an error message under sparkUI: Could not find data to load UI for driver 5526297689623955253 in cluster 1125-062259-i...

  • 5857 Views
  • 12 replies
  • 8 kudos
Latest Reply
SS2
Valued Contributor
  • 8 kudos

Then can use.​

  • 8 kudos
11 More Replies
kodvakare
by New Contributor III
  • 3288 Views
  • 5 replies
  • 9 kudos

Resolved! How to write same code in different locations in the DB notebook?

The old version of the notebook had this feature, where you could Ctrl+click on different positions in a notebook cell to bring the cursor there, and type to update the code in both the positions like in JupyterLab. The newer version is awesome but s...

Old DataBricks version, update in multiple positions like Jupyter IDE image
  • 3288 Views
  • 5 replies
  • 9 kudos
Latest Reply
SS2
Valued Contributor
  • 9 kudos

Alt+click is working fine ​

  • 9 kudos
4 More Replies
SindhujaRaghupa
by New Contributor II
  • 7428 Views
  • 3 replies
  • 1 kudos

Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4, localhost, executor driver): java.lang.NullPointerException

I have uploaded a csv file which have well formatted data and I was trying to use display(questions) where questions=spark.read.option("header","true").csv("/FileStore/tables/Questions.csv")This is throwing an error as follows:SparkException: Job abo...

  • 7428 Views
  • 3 replies
  • 1 kudos
Latest Reply
SS2
Valued Contributor
  • 1 kudos

You can use inferschema​

  • 1 kudos
2 More Replies
pkgltn
by New Contributor III
  • 734 Views
  • 0 replies
  • 0 kudos

Mounting a Azure Storage Account path on Databricks

Hi,I have a Databricks instance and I mounted the Azure Storage Account. When I run the following command, the output is ExecutionError: An error occurred while calling o1168.ls.: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.util...

  • 734 Views
  • 0 replies
  • 0 kudos
Muthumk255
by New Contributor
  • 1258 Views
  • 2 replies
  • 0 kudos

Cannot sign in at databricks partner-academy portal

Hi thereI have used my company email to register an account for databricks learning .databricks.com a while back.Now what I need to do is create an account with partner-academy.databricks.com using my company email too.However when I register at part...

  • 1258 Views
  • 2 replies
  • 0 kudos
Latest Reply
Harshjot
Contributor III
  • 0 kudos

Hi @Muthukrishnan Balasubramanian​ I got the same issue a while back what worked for me is registering using personal account on partner academy then later I changed my email to my work email. Not sure if it's the best way to sort the issue.

  • 0 kudos
1 More Replies
Chandru
by New Contributor II
  • 3720 Views
  • 2 replies
  • 2 kudos

Resolved! Issue in importing librosa library while using databricks runtime engine 11.2

I have installed the library via PyPI on the cluster. When we import the package on notebook, getting the following errorimport librosaOSError: cannot load library 'libsndfile.so': libsndfile.so: cannot open shared object file: No such file or direct...

  • 3720 Views
  • 2 replies
  • 2 kudos
Latest Reply
Chandru
New Contributor II
  • 2 kudos

Thank you werners. Just figured that out and had an init script to sort out the issue. Below steps helped me to solve the issue.dbutils.fs.mkdirs("dbfs:/cluster-init/scripts/")dbutils.fs.put("/cluster-init/scripts/libsndfile-install.sh","""#!/bin/bas...

  • 2 kudos
1 More Replies
db-avengers2rul
by Contributor II
  • 1483 Views
  • 1 replies
  • 0 kudos

Resolved! zip file not able to import in workspace

Dear Team,Using the community edition when i tried to import a zip file it is always throwing some error

  • 1483 Views
  • 1 replies
  • 0 kudos
Latest Reply
db-avengers2rul
Contributor II
  • 0 kudos

Please refer to the error in the attachment my question is this restriction is only for community edition ? or also for premium account ?

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels