cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Daniel20
by New Contributor
  • 1459 Views
  • 0 replies
  • 0 kudos

Flattening a Nested Recursive JSON Structure into a Struct List

This is from Spark Event log on Event SparkListenerSQLExecutionStart.How to flatten the sparkPlanInfo struct into an array of the same struct, then later explode it. Note that the element children is an array containing the parent struct, and the lev...

  • 1459 Views
  • 0 replies
  • 0 kudos
804082
by New Contributor III
  • 2784 Views
  • 4 replies
  • 1 kudos

Resolved! "Your workspace is hosted on infrastructure that cannot support serverless compute."

Hello,I wanted to try out Lakehouse Monitoring, but I receive the following message during setup: "Your workspace is hosted on infrastructure that cannot support serverless compute."I meet all requirements outlined in the documentation. My workspace ...

  • 2784 Views
  • 4 replies
  • 1 kudos
Latest Reply
SSundaram
Contributor
  • 1 kudos

Lakehouse MonitoringThis feature is in Public Preview in the following regions: eu-central-1, eu-west-1, us-east-1, us-east-2, us-west-2, ap-southeast-2. Not all workspaces in the regions listed are supported. If you see the error “Your workspace is ...

  • 1 kudos
3 More Replies
Wayne
by New Contributor III
  • 31009 Views
  • 0 replies
  • 0 kudos

How to flatten a nested recursive JSON struct to a list of struct

This is from Spark Event log on Event SparkListenerSQLExecutionStart.How to flatten the sparkPlanInfo struct into an array of the same struct, then later explode it. Note that the element children is an array containing the parent struct, and the lev...

  • 31009 Views
  • 0 replies
  • 0 kudos
Arnold_Souza
by New Contributor III
  • 7086 Views
  • 1 replies
  • 0 kudos

Delta Live Tables consuming different files from the same path are combining the schema

SummaryI am using Delta Live Tables to create a pipeline in Databricks and I am facing a problem of merging the schema of different files that are placed in the same folder in a datalake, even though I am using File Patterns to separate the data inge...

Data Engineering
cloud_files
Databricks SQL
Delta Live Tables
read_files
  • 7086 Views
  • 1 replies
  • 0 kudos
Latest Reply
Arnold_Souza
New Contributor III
  • 0 kudos

Found a solution:Never use 'fileNamePattern', '*file_1*',Instead, put the pattern directly into the path:"abfss://<container>@<storage_account>.dfs.core.windows.net/path/to/folder/*file_1*"

  • 0 kudos
bzh
by New Contributor
  • 3943 Views
  • 3 replies
  • 0 kudos

Question: Delta Live Table, multiple streaming sources to the single target

We are trying to writing multiple sources to the same target table using DLT, but getting the below errors. Not sure what we are missing here in the code....File /databricks/spark/python/dlt/api.py:817, in apply_changes(target, source, keys, sequence...

  • 3943 Views
  • 3 replies
  • 0 kudos
Latest Reply
nag_kanchan
New Contributor III
  • 0 kudos

The solution did not work for me. It was throwing an error stating: raise Py4JError( py4j.protocol.Py4JError: An error occurred while calling o434.readStream. Trace: py4j.Py4JException: Method readStream([class java. util.ArrayList]) does not exist.A...

  • 0 kudos
2 More Replies
Faisal
by Contributor
  • 2624 Views
  • 1 replies
  • 0 kudos

DLT - how to log number of rows read and written

Hi @Retired_mod - how to log number of rows read and written in dlt pipeline, I want to store it in audit tables post the pipeline update completes. Can you give me sample query code ?

  • 2624 Views
  • 1 replies
  • 0 kudos
Latest Reply
Faisal
Contributor
  • 0 kudos

Thanks @Retired_mod but I asked on how to log number of rows/written via a delta live table (DLT) pipeline, not a delta lake table and the solution you gave is related to data factory pipeline which is not what I need.

  • 0 kudos
AFox
by Contributor
  • 7035 Views
  • 3 replies
  • 3 kudos

databricks-connect: PandasUDFs importing local packages: ModuleNotFoundError

databricks-connect==14.1.0Related to other posts:https://community.databricks.com/t5/data-engineering/modulenotfounderror-serializationerror-when-executing-over/td-p/14301https://stackoverflow.com/questions/59322622/how-to-use-a-udf-defined-in-a-sub-...

  • 7035 Views
  • 3 replies
  • 3 kudos
Latest Reply
AFox
Contributor
  • 3 kudos

There is a way to do this!! spark.addArtifact(src_zip_path, pyfile=True) Some things of note:This only works on single user (non shared) clusterssrc_zip_path must be a posixpath type string (i.e. forward slash ) even on windows (drop C: and replace t...

  • 3 kudos
2 More Replies
amitdatabricksc
by New Contributor II
  • 12133 Views
  • 4 replies
  • 2 kudos

how to zip a dataframe

how to zip a dataframe so that i get a zipped csv output file. please share command. it is only 1 dataframe involved and not multiple. 

  • 12133 Views
  • 4 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

writing to a local directory does not work.See this topic:https://community.databricks.com/s/feed/0D53f00001M7hNlCAJ

  • 2 kudos
3 More Replies
harvey-c
by New Contributor III
  • 1809 Views
  • 0 replies
  • 0 kudos

How to manage data reload in DLT

Hi, Community membersI had an situation to reload some data via DLT pipeline.  All data are stored in landing storage account and they have been loaded in daily base. For example, from 1/Nov to 30/Nov.For some reason, I need to reload the data of 25/...

  • 1809 Views
  • 0 replies
  • 0 kudos
AkifCakir
by New Contributor II
  • 25382 Views
  • 3 replies
  • 4 kudos

Resolved! Why Spark Save Modes , "overwrite" always drops table although "truncate" is true ?

Hi Dear Team, I am trying to import data from databricks to Exasol DB. I am using following code in below with Spark version is 3.0.1 ,dfw.write \ .format("jdbc") \ .option("driver", exa_driver) \ .option("url", exa_url) \ .option("db...

  • 25382 Views
  • 3 replies
  • 4 kudos
Latest Reply
Gembo
New Contributor III
  • 4 kudos

@AkifCakir , Were you able to find a way to truncate without dropping the table using the .write function as I am facing the same issue as well.

  • 4 kudos
2 More Replies
feed
by New Contributor III
  • 19679 Views
  • 4 replies
  • 2 kudos

OSError: No wkhtmltopdf executable found: "b''"

OSError: No wkhtmltopdf executable found: "b''"If this file exists please check that this process can read it or you can pass path to it manually in method call, check README. Otherwise please install wkhtmltopdf - https://github.com/JazzCore/python-...

  • 19679 Views
  • 4 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi, When did you receive this error? running a code insde a notebook , or running a cluster? or a job?Also, please tag @Debayan​ with your next response which will notify me. Thank you!

  • 2 kudos
3 More Replies
george_ognyanov
by New Contributor III
  • 6277 Views
  • 5 replies
  • 3 kudos

Resolved! Terraform Azure Databricks Unity Catalogue - Failed to check metastore quota limit for region

I am trying to create a metastore via the Terraform Azure databricks_metastore resource but I keep getting the error: This is the exact code I am using to create the resource:I have tried using both my Databricks account and a service principal appli...

george_ognyanov_1-1699523634061.png george_ognyanov_0-1699523597833.png
  • 6277 Views
  • 5 replies
  • 3 kudos
Latest Reply
george_ognyanov
New Contributor III
  • 3 kudos

Hi @Retired_mod as far as I understand one region can have one metastore. I am able to create a metastore in the same region if I log into the Databricks GUI and do it there.Alternatively, if I already have a metastore created and try to execute the ...

  • 3 kudos
4 More Replies
Nathant93
by New Contributor III
  • 3189 Views
  • 1 replies
  • 0 kudos

SQL Server OUTPUT clause alternative

I am looking at after a merge or insert has happened to get the records in that batch that had been inserted via either method, much like the OUTPUT clause in sql server.Does anyone have any suggestions, the only thing I can think of is to add a time...

  • 3189 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nathant93
New Contributor III
  • 0 kudos

I've managed to do it like this qry = spark.sql(f"DESCRIBE history <table_name> limit 1").collect()current_version = int(qry[0][0])prev_version = current_version - 1Then do an except statement between the versions. 

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels