cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ossinova
by Contributor II
  • 1591 Views
  • 1 replies
  • 0 kudos

Schedule reload of system.information_schema for external tables in platform

Probably not feasible, but is there a way to update (via STORED PROCEDURE, FUNCTION or SQL query) the information schema of all external tables within Databricks. Last updated that I can see was when I converted the tables to Unity. From my understa...

  • 1591 Views
  • 1 replies
  • 0 kudos
Latest Reply
Own
Contributor
  • 0 kudos

You can try optimize and cache with the internal tables such as schema tables to fetch updated information.

  • 0 kudos
rammy
by Contributor III
  • 3603 Views
  • 3 replies
  • 11 kudos

How would i retrieve data JSON data with namespaces using spark SQL?

File.json from the below code contains huge JSON data with each key containing namespace prefix(This JSON file converted from the XML file).I could able to retrieve if JSON does not contain namespaces but what could be the approach to retrieve record...

image.png image
  • 3603 Views
  • 3 replies
  • 11 kudos
Latest Reply
SS2
Valued Contributor
  • 11 kudos

I case of struct you can use (.) For extracting the value

  • 11 kudos
2 More Replies
allan-silva
by New Contributor III
  • 4351 Views
  • 3 replies
  • 4 kudos

Resolved! Can't create database - UnsupportedFileSystemException No FileSystem for scheme "dbfs"

I'm following a class "DE 3.1 - Databases and Tables on Databricks", but it is not possible create databases due to "AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.fs.Unsupp...

  • 4351 Views
  • 3 replies
  • 4 kudos
Latest Reply
allan-silva
New Contributor III
  • 4 kudos

A colleague from my work figured out the problem: the cluster being used wasn't configured to use DBFS when running notebooks.

  • 4 kudos
2 More Replies
Shiva_Dsouz
by New Contributor II
  • 2091 Views
  • 1 replies
  • 1 kudos

How to get spark streaming metrics like input rows, processed rows and batch duration to Prometheus for monitoring

I have been reading this article https://www.databricks.com/session_na20/native-support-of-prometheus-monitoring-in-apache-spark-3-0 and it has been mentioned that we can get the spark streaming metrics like input rows, processing rate and batch dura...

  • 2091 Views
  • 1 replies
  • 1 kudos
Latest Reply
SS2
Valued Contributor
  • 1 kudos

I think you can use spark UI to see deep level details ​

  • 1 kudos
andalo
by New Contributor II
  • 3009 Views
  • 3 replies
  • 2 kudos

Databricks cluster failure

do you help me with the next error?MessageCluster terminated. Reason: Azure Vm Extension FailureHelpInstance bootstrap failed.Failure message: Cloud Provider Failure. Azure VM Extension stuck on transitioning state. Please try again later.VM extensio...

  • 3009 Views
  • 3 replies
  • 2 kudos
Latest Reply
SS2
Valued Contributor
  • 2 kudos

You can restart the cluster and check once.​

  • 2 kudos
2 More Replies
mickniz
by Contributor
  • 4438 Views
  • 6 replies
  • 10 kudos

What is the best way to take care of Drop and Rename a column in Schema evaluation.

I would need some suggestion from DataBricks Folks. As per documentation in Schema Evaluation for Drop and Rename Data is overwritten. Does it means we loose data (because I read data is not deleted but kind of staged). Is it possible to query old da...

  • 4438 Views
  • 6 replies
  • 10 kudos
Latest Reply
SS2
Valued Contributor
  • 10 kudos

Overwritte ​option will overwritte your data. If you want to change column name then you can first alter the delta table as per your need then you can append new data as well. So both problems you can resolve

  • 10 kudos
5 More Replies
Shirley
by New Contributor III
  • 9650 Views
  • 12 replies
  • 8 kudos

Cluster terminated after 120 mins and cannot restart

Last night the cluster was working properly, but this morning the cluster was terminated automatically and cannot be restarted. Got an error message under sparkUI: Could not find data to load UI for driver 5526297689623955253 in cluster 1125-062259-i...

  • 9650 Views
  • 12 replies
  • 8 kudos
Latest Reply
SS2
Valued Contributor
  • 8 kudos

Then can use.​

  • 8 kudos
11 More Replies
kodvakare
by New Contributor III
  • 6222 Views
  • 5 replies
  • 9 kudos

Resolved! How to write same code in different locations in the DB notebook?

The old version of the notebook had this feature, where you could Ctrl+click on different positions in a notebook cell to bring the cursor there, and type to update the code in both the positions like in JupyterLab. The newer version is awesome but s...

Old DataBricks version, update in multiple positions like Jupyter IDE image
  • 6222 Views
  • 5 replies
  • 9 kudos
Latest Reply
SS2
Valued Contributor
  • 9 kudos

Alt+click is working fine ​

  • 9 kudos
4 More Replies
SindhujaRaghupa
by New Contributor II
  • 9573 Views
  • 2 replies
  • 1 kudos

Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4, localhost, executor driver): java.lang.NullPointerException

I have uploaded a csv file which have well formatted data and I was trying to use display(questions) where questions=spark.read.option("header","true").csv("/FileStore/tables/Questions.csv")This is throwing an error as follows:SparkException: Job abo...

  • 9573 Views
  • 2 replies
  • 1 kudos
Latest Reply
SS2
Valued Contributor
  • 1 kudos

You can use inferschema​

  • 1 kudos
1 More Replies
pkgltn
by New Contributor III
  • 1123 Views
  • 0 replies
  • 0 kudos

Mounting a Azure Storage Account path on Databricks

Hi,I have a Databricks instance and I mounted the Azure Storage Account. When I run the following command, the output is ExecutionError: An error occurred while calling o1168.ls.: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.util...

  • 1123 Views
  • 0 replies
  • 0 kudos
Muthumk255
by New Contributor
  • 2252 Views
  • 2 replies
  • 0 kudos

Cannot sign in at databricks partner-academy portal

Hi thereI have used my company email to register an account for databricks learning .databricks.com a while back.Now what I need to do is create an account with partner-academy.databricks.com using my company email too.However when I register at part...

  • 2252 Views
  • 2 replies
  • 0 kudos
Latest Reply
Harshjot
Contributor III
  • 0 kudos

Hi @Muthukrishnan Balasubramanian​ I got the same issue a while back what worked for me is registering using personal account on partner academy then later I changed my email to my work email. Not sure if it's the best way to sort the issue.

  • 0 kudos
1 More Replies
db-avengers2rul
by Contributor II
  • 2542 Views
  • 1 replies
  • 0 kudos

Resolved! zip file not able to import in workspace

Dear Team,Using the community edition when i tried to import a zip file it is always throwing some error

  • 2542 Views
  • 1 replies
  • 0 kudos
Latest Reply
db-avengers2rul
Contributor II
  • 0 kudos

Please refer to the error in the attachment my question is this restriction is only for community edition ? or also for premium account ?

  • 0 kudos
yang
by New Contributor II
  • 1639 Views
  • 1 replies
  • 2 kudos

Resolved! Error in DE 4.1 - DLT UI Walkthrough (from Data Engineering with Databricks v3 course)

I am working on Data Engineering with Databricks v3 course. In notebook DE 4.1 - DLT UI Walkthrough, I countered an error in cmd 11: DA.validate_pipeline_config(pipeline_language)The error message is: AssertionError: Expected the parameter "suite" to...

  • 1639 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

The DA validate function is just to check that you named the pipeline correctly, set up the correct number of workers, 0, and other configurations. The name and directory aren't crucial to the learning process. The goal is to get familiar with the ...

  • 2 kudos
eimis_pacheco
by Contributor
  • 1385 Views
  • 1 replies
  • 1 kudos

How to remove more than 4 byte characters using pyspark in databricks?

Hi community,We have the need of removing more than 4 byte characters using pyspark in databricks since these are not supported by amazon Redshift. Does someone know how can I accomplish this?Thank you very much in advanceRegards

  • 1385 Views
  • 1 replies
  • 1 kudos
Latest Reply
Shalabh007
Honored Contributor
  • 1 kudos

assuming you are having a string type column in pyspark dataframe, one possible way could beidentify total number of characters for each value in column (say identify no of bytes taken by each character (say b)use substring() function to select first...

  • 1 kudos
Ullsokk
by New Contributor III
  • 3587 Views
  • 1 replies
  • 5 kudos

How do I import a notebook from workspaces to repos?

I have a few notebooks in workspaces that I created before linking repo to my git. I have tried importing them from the repo (databricks repo). The only two options are a local file from my pc or a url. The url for a notebook does not work. Do I need...

  • 3587 Views
  • 1 replies
  • 5 kudos
Latest Reply
Geeta1
Valued Contributor
  • 5 kudos

Hi @Stian Arntsen​ , when you click on the down arrow beside your notebook name (in your workspace), you will have a option called 'clone'. You can use it to clone your notebook from your workspace to repos. Hope it helps!

  • 5 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels