cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sujai_sparks
by New Contributor III
  • 13425 Views
  • 14 replies
  • 15 kudos

Resolved! How to convert records in Azure Databricks delta table to a nested JSON structure?

Let's say I have a delta table in Azure databricks that stores the staff details (denormalized).  I wanted to export the data in the JSON format and save it as a single file on a storage location. I need help with the databricks sql query to group/co...

2023-02-24 22_08_34-MyTest - Databricks
  • 13425 Views
  • 14 replies
  • 15 kudos
Latest Reply
NateAnth
Valued Contributor
  • 15 kudos

Glad it worked for you!!

  • 15 kudos
13 More Replies
EmilioGC
by New Contributor III
  • 4412 Views
  • 5 replies
  • 7 kudos

Resolved! Why was SQL formatting removed inside spark.sql functions? Now it looks like a plain string.

Previously we were able to see SQL queries inside spark.sql() like this:But now it just looks like a plain string: I know it's not a big issue, but it's still annoying to have to code in SQL while having it all be blue, it makes debugging more cumber...

old format new format
  • 4412 Views
  • 5 replies
  • 7 kudos
Latest Reply
jose_gonzalez
Moderator
  • 7 kudos

Hi @Emilio Garza​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 7 kudos
4 More Replies
Arpi
by New Contributor II
  • 2898 Views
  • 3 replies
  • 4 kudos

Resolved! Database creation error

I am trying to create database with external location abfss but facing the below error.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs....

  • 2898 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Arpit Agrawal​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 4 kudos
2 More Replies
Twilight
by New Contributor III
  • 2537 Views
  • 2 replies
  • 0 kudos

How to make backreferences in regexp_replace repl string work correctly in Databricks SQL?

Both of these work in Spark SQL:regexp_replace('1234567890abc', '^(?<one>\\w)(?<two>\\w)(?<three>\\w)', '$1') regexp_replace('1234567890abc', '^(?<one>\\w)(?<two>\\w)(?<three>\\w)', '${one}')However, neither work in Databricks SQL. I found that this ...

  • 2537 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16764241763
Honored Contributor
  • 0 kudos

Hello @Stephen Wilcoxon​ Could you please share the expected output in Spark SQL?

  • 0 kudos
1 More Replies
najmead
by Contributor
  • 15215 Views
  • 6 replies
  • 13 kudos

How to convert string to datetime with correct timezone?

I have a field stored as a string in the format "12/30/2022 10:30:00 AM"If I use the function TO_DATE, I only get the date part... I want the full date and time.If I use the function TO_TIMESTAMP, I get the date and time, but it's assumed to be UTC, ...

  • 15215 Views
  • 6 replies
  • 13 kudos
Latest Reply
Rajeev_Basu
Contributor III
  • 13 kudos

use from_utc_timestamp(to_timestam("<string>", <format>),<timezone>)

  • 13 kudos
5 More Replies
lambarc
by New Contributor II
  • 11771 Views
  • 7 replies
  • 13 kudos

How to read file in pyspark with “]|[” delimiter

The data looks like this: pageId]|[page]|[Position]|[sysId]|[carId 0005]|[bmw]|[south]|[AD6]|[OP4 There are atleast 50 columns and millions of rows. I did try to use below code to read: dff = sqlContext.read.format("com.databricks.spark.csv").option...

  • 11771 Views
  • 7 replies
  • 13 kudos
Latest Reply
rohit199912
New Contributor II
  • 13 kudos

you might also try the blow option.1). Use a different file format: You can try using a different file format that supports multi-character delimiters, such as text JSON.2). Use a custom Row class: You can write a custom Row class to parse the multi-...

  • 13 kudos
6 More Replies
quakenbush
by Contributor
  • 3400 Views
  • 4 replies
  • 5 kudos

Resolved! Does Databricks offer something like Oracle's dblink?

I am aware, I can load anything into a DataFrame using JDBC, that works well from Oracle sources. Is there an equivalent in Spark SQL, so I can combine datasets as well?Basically something like so - you get the idea...select lt.field1, rt.fie...

  • 3400 Views
  • 4 replies
  • 5 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 5 kudos

Hi @Roger Bieri​  (Customer)​, I appreciate your attempt to choose the best answer for us. I'm glad you got your query resolved. @Joseph Kambourakis​ and @Adrian Łobacz​, Thank you for giving excellent answers .

  • 5 kudos
3 More Replies
Manojkumar
by New Contributor II
  • 2978 Views
  • 4 replies
  • 0 kudos

Can we assigee default value in select columns in Spark sql when the column is not present?

Im reading avro file and loading into table. The avro data is nested data.Now from this table im trying to extract the necessary elements using spark sql. Using explode function when there is array data. Now the challenge is there are cases like the ...

  • 2978 Views
  • 4 replies
  • 0 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 0 kudos

Hi @manoj kumar​ An easiest way would be to make use of unmanaged delta tables and while loading data into the path of that table, you can enable mergeSchema to be true. This handles all the schema differences, incase column is not present as null an...

  • 0 kudos
3 More Replies
aicd_de
by New Contributor III
  • 2027 Views
  • 1 replies
  • 0 kudos

Resolved! Error Using spark.catalog.dropTempView()

I have a set of Spark Dataframes that I convert into Temp Views to run Spark SQL with. Then, I delete them after my logic/use is complete. The delete step throws an odd error that I am not sure how to fix. Looking for some tips on fixing it. As a not...

  • 2027 Views
  • 1 replies
  • 0 kudos
Latest Reply
aicd_de
New Contributor III
  • 0 kudos

            spark.sql("DROP TABLE "+prefix_updates)            spark.sql("DROP TABLE "+prefix_main)Fixed it for me.

  • 0 kudos
pvm26042000
by New Contributor III
  • 779 Views
  • 1 replies
  • 3 kudos

Spark SQL & Spark ML

I am using Spark SQL to import their data into a machine learning pipeline. Once data is imported I want performs machine learning tasks using Spark ML. So I should use what compute tools is best suited for this use case? Please help me!!! Thank you ...

  • 779 Views
  • 1 replies
  • 3 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 3 kudos

Hi, please refer https://docs.databricks.com/machine-learning/index.html, please let us know if this helps.

  • 3 kudos
pvm26042000
by New Contributor III
  • 751 Views
  • 1 replies
  • 2 kudos

I am using Spark SQL to import their data into a machine learning pipeline. Once data is imported I want performs machine learning tasks using Spark...

I am using Spark SQL to import their data into a machine learning pipeline. Once data is imported I want performs machine learning tasks using Spark ML. So I should use what compute tools is best suited for this use case? Please help me!!! Thank y...

  • 751 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 2 kudos

Hi, please refer https://docs.databricks.com/machine-learning/index.html, please let us know if this helps.

  • 2 kudos
CBull
by New Contributor III
  • 1513 Views
  • 3 replies
  • 2 kudos

Spark Notebook to import data into Excel

Is there a way to create a notebook that will take the SQL that I want to put into the Notebook and populate Excel daily and send it to a particular person?

  • 1513 Views
  • 3 replies
  • 2 kudos
Latest Reply
Meghala
Valued Contributor II
  • 2 kudos

@Aviral Bhardwaj​  thanks for this, I was needed this info

  • 2 kudos
2 More Replies
ramankr48
by Contributor II
  • 2421 Views
  • 2 replies
  • 3 kudos

Issue with identity key column in databricks?

For the identity key I've used both GENERATED ALWAYS AS IDENTITY(start with 1 increment by 1) andGENERATED BY DEFAULT AS IDENTITY(start with 1 increment by 1)but in both cases, if I'm running my script once then it is fine (identity key is working as...

  • 2421 Views
  • 2 replies
  • 3 kudos
Latest Reply
lizou
Contributor II
  • 3 kudos

yes, by default option allow duplicated values per design.I will avoid this option and use only use GENERATED ALWAYS AS IDENTITY Using BY DEFAULT option is worse than not using it at all in BY Default option, If I forget to set starting value, the ID...

  • 3 kudos
1 More Replies
enavuio
by New Contributor II
  • 1344 Views
  • 2 replies
  • 3 kudos

Count on External Table to Azure Data Storage is taking too long

I have created an External table to Azure Data Lake Storage Gen2.The Container has about 200K Json files.The structure of the json files are created with```CREATE EXTERNAL TABLE IF NOT EXISTS dbo.table(    ComponentInfo STRUCT<ComponentHost: STRING, ...

  • 1344 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Ena Vu​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 3 kudos
1 More Replies
Sadiq
by New Contributor III
  • 2647 Views
  • 6 replies
  • 4 kudos

Fixed length file from Databricks notebook ( Spark SQL)

Hi ,I need help writing data from azure databricks notebook into Fixed Length .txt.notebook has 10 lakh rows and 86 columns. can anyone suggest me

  • 2647 Views
  • 6 replies
  • 4 kudos
Latest Reply
Vidula
Honored Contributor
  • 4 kudos

Hi @sadiq vali​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 4 kudos
5 More Replies
Labels