cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sage5616
by Valued Contributor
  • 4918 Views
  • 12 replies
  • 10 kudos

Error in SQL statement: AnalysisException: Cannot up cast documents from array

Hi Everyone,I am getting the following error when running a SQL query and do not understand what it means or what can be done to resolve it. Any recommendations?View DDL:CREATE VIEW myschema.table ( accountId, agreementType, capture_file_name, ...

  • 4918 Views
  • 12 replies
  • 10 kudos
Latest Reply
Anonymous
Not applicable
  • 10 kudos

Hi @Michael Okulik​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 10 kudos
11 More Replies
sage5616
by Valued Contributor
  • 10620 Views
  • 3 replies
  • 2 kudos

Resolved! Choosing the optimal cluster size/specs.

Hello everyone,I am trying to determine the appropriate cluster specifications/sizing for my workload:Run a PySpark task to transform a batch of input avro files to parquet files and create or re-create persistent views on these parquet files. This t...

  • 10620 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

If the data is 100MB, then I'd try a single node cluster, which will be the smallest and least expensive. You'll have more than enough memory to store it all. You can automate this and use a jobs cluster.

  • 2 kudos
2 More Replies
sage5616
by Valued Contributor
  • 3184 Views
  • 3 replies
  • 4 kudos

Resolved! Spark persistent view on a partition parquet file

In Spark, is it possible to create a persistent view on a partitioned parquet file in Azure BLOB? The view must be available when the cluster restarted, without having to re-create that view, hence it cannot be a temp view.I can create a temp view, b...

  • 3184 Views
  • 3 replies
  • 4 kudos
Latest Reply
sage5616
Valued Contributor
  • 4 kudos

Here is what worked for me. Hope this helps someone else: https://stackoverflow.com/questions/72913913/spark-persistent-view-on-a-partition-parquet-file/72914245#72914245CREATE VIEW test as select * from parquet.`/mnt/folder-with-parquet-file(s)/`@Hu...

  • 4 kudos
2 More Replies
Labels