cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

hare
by New Contributor III
  • 4405 Views
  • 1 replies
  • 5 kudos

"Databricks" - "PySpark" - Read "JSON" file - Azure Blob container - "APPEND BLOB"

Hi All, We are getting JSON files in Azure blob container and its "Blob Type" is "Append Blob".We are getting an error "AnalysisException: Unable to infer schema for JSON. It must be specified manually.", when we try to read using below mentioned scr...

  • 4405 Views
  • 1 replies
  • 5 kudos
Latest Reply
User16856839485
Databricks Employee
  • 5 kudos

There currently does not appear to be direct support for append blob reads, however, converting the append blob to block blob [and then parquet or delta, etc.] are a viable option:https://kb.databricks.com/en_US/data-sources/wasb-check-blob-types?_ga...

  • 5 kudos
leos1
by New Contributor II
  • 1737 Views
  • 2 replies
  • 0 kudos

Resolved! Question regarding ZORDER option of OPTIMIZE

Is the order of the columns in ZORDER important? For example, does ZORDER BY (product, site) and ZORDER BY (site, product) produce the same results?

  • 1737 Views
  • 2 replies
  • 0 kudos
Latest Reply
leos1
New Contributor II
  • 0 kudos

thanks for the quick reply

  • 0 kudos
1 More Replies
Trey
by New Contributor III
  • 3182 Views
  • 2 replies
  • 6 kudos

Resolved! Is it a good idea to use a managed delta table as a temporal table?

Hi all!I would like to use a managed delta table as a temporal table, meaning:to create a managed table in the middle of ETL processto drop the managed table right after the processThis way I can perform merge, insert, or delete oprations better than...

  • 3182 Views
  • 2 replies
  • 6 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 6 kudos

@Kwangwon Yi​ Instead of performance, main issue with managed table is whenever you delete table, data under that table gets deleted.If you have good use case on Reporting, best approach is to go with external storage location to store your managed t...

  • 6 kudos
1 More Replies
Matt101122
by Contributor
  • 2371 Views
  • 1 replies
  • 1 kudos

Resolved! why aren't rdds using all available cores of executor?

I'm extracting data from a custom format by day of month using a 32 core executor. I'm using rdds to distribute work across cores of the executor. I'm seeing an intermittent issue where for a run sometimes I see 31 cores being used as expected and ot...

image image
  • 2371 Views
  • 1 replies
  • 1 kudos
Latest Reply
Matt101122
Contributor
  • 1 kudos

I may have figured this out! I'm explicitly setting the number of slices instead of using the default.days_rdd = sc.parallelize(days_to_process,len(days_to_process))

  • 1 kudos
enavuio
by New Contributor II
  • 1730 Views
  • 2 replies
  • 3 kudos

Count on External Table to Azure Data Storage is taking too long

I have created an External table to Azure Data Lake Storage Gen2.The Container has about 200K Json files.The structure of the json files are created with```CREATE EXTERNAL TABLE IF NOT EXISTS dbo.table(    ComponentInfo STRUCT<ComponentHost: STRING, ...

  • 1730 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Ena Vu​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 3 kudos
1 More Replies
parthsalvi
by Contributor
  • 2381 Views
  • 3 replies
  • 1 kudos

Unable to update permissions in Unity Catalog object in Single User Mode DBR 11.2

We're trying to update permissions of catalogs in Single User Cluster Mode but running into following error We were able to update permission in Shared Mode. We used Shared mode to create objects but using single user mode to update permission seems...

image.png
  • 2381 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Parth Salvi​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 1 kudos
2 More Replies
AJMorgan591
by New Contributor II
  • 3707 Views
  • 4 replies
  • 0 kudos

Temporarily disable Photon

Is it possible to temporarily disable Photon?I have a large workload that greatly benefits from Photon apart from a specific operation therein that is actually slowed by Photon. It's not worth creating a separate cluster for this operation however, s...

  • 3707 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Aaron Morgan​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

  • 0 kudos
3 More Replies
refint650
by New Contributor II
  • 8428 Views
  • 4 replies
  • 0 kudos

Resolved! String converstion to datetimestamp format

Hello i'm converting hana sql code in databricks. we have 4 columns all in string format, start date, start time, end date, endtime..1) what expression i can use to convert values of startdate & start time from string format to datetimeformat wit...

image
  • 8428 Views
  • 4 replies
  • 0 kudos
Latest Reply
refint650
New Contributor II
  • 0 kudos

Hello Mattconcat & to_timstamp function partially worked, values with 24 timestamp format not converted. any other approach i can think .? 

  • 0 kudos
3 More Replies
db-avengers2rul
by Contributor II
  • 3070 Views
  • 3 replies
  • 1 kudos

Resolved! Unable to create SQL Warehouse using AZURE DATABRICKS Subscription

Dear Team,I am unable to create a SQL Warehouse using AZURE Databricks subscription , below are the detailsi am able to create a cluster with single node ,but i am not able to create SQL Warehouse , i am using Cluster size - 2X Small , as part of the...

response from microsoft support
  • 3070 Views
  • 3 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

In portal.azure.com, write quotas and ask for an increase. https://portal.azure.com/#view/Microsoft_Azure_Capacity/QuotaMenuBlade/~/myQuotasAdditionally, in SQL warehouse "Advanced options," you can change "Spot instance policy" from "cost-optimized"...

  • 1 kudos
2 More Replies
elgeo
by Valued Contributor II
  • 4420 Views
  • 1 replies
  • 3 kudos

Resolved! Generate new token error

Hello. I need to install Databricks CLI. While I am trying to generate new access token (User Settings->Generate new token) I get the following error:Could not create token with comment "cli" and lifetime (seconds) of 86400.I tried with different com...

  • 4420 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

Please check in the Admin console that tokens are enabled and that you can manage them.

  • 3 kudos
elgeo
by Valued Contributor II
  • 5054 Views
  • 3 replies
  • 5 kudos

Resolved! Delta Table - Reduce time travel storage size

Hello! I am trying to understand time travel feature. I see with "DESCRIBE HISTORY" command that all the transaction history on a specific table is recorded by version and timestamp. However, I understand that this occupies a lot of storage especiall...

  • 5054 Views
  • 3 replies
  • 5 kudos
Latest Reply
elgeo
Valued Contributor II
  • 5 kudos

Thank you @Werner Stinckens​ for your reply. However I still haven't managed to delete history even after setting the below. The number of history rows remains the same when running "DESCRIBE HISTORY".SET spark.databricks.delta.retentionDurationCheck...

  • 5 kudos
2 More Replies
jerry747847
by New Contributor III
  • 3254 Views
  • 5 replies
  • 1 kudos

Resolved! Databricks Associate Practice Exam -query

Dear Experts, Can anyone please let me know how option "C" is the answer to Question 31 for PracticeExam-DataEngineerAssociate. https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DataEngineerAssociate.pdf?_ga=2.185796329.11...

  • 3254 Views
  • 5 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Question 17 is even worse. "A data engineer is overwriting data" vs "should simply be overwritten instead"One situation I assume is DROP and CREATE and another is INSERT INTO OVERWRITE but here both are called the same.A data engineer is overwriting ...

  • 1 kudos
4 More Replies
archanarddy
by New Contributor
  • 1199 Views
  • 0 replies
  • 0 kudos

metastore is down

I am trying to run a scala notebook, but my job just spins and says Metastore is down. Can someone help me. Thanks in advance.

  • 1199 Views
  • 0 replies
  • 0 kudos
Ken1
by New Contributor III
  • 2205 Views
  • 2 replies
  • 7 kudos

PySpark Error in Azure Databricks

I have this error - com.databricks.WorkflowException: com.databricks.NotebookExecutionException: FAILEDwhen i ran dbutils.notebook.run(.......)

  • 2205 Views
  • 2 replies
  • 7 kudos
Latest Reply
Debayan
Databricks Employee
  • 7 kudos

Hi @Godswill Mbata​ ,this looks like issue seems to be with High Concurrency cluster. Could you please confirm if you are using high concurrency cluster?Please refer: https://community.databricks.com/s/question/0D53f00001cx3ybCAA/strange-error-with-d...

  • 7 kudos
1 More Replies
plynton
by New Contributor II
  • 1819 Views
  • 1 replies
  • 2 kudos

Resolved! Dataframe to update subset of fields in table...

I have a table that I'll update with multiple inputs (csv). Is there a simple way to update my target when the source fields won't be a 1:1 match? Another challenge I've run into is that my sources don't have a header field, though I guess I could ...

  • 1819 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Read your CSV as a dataframe and than make update using Merge (upsert).

  • 2 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels