cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Wayne
by New Contributor III
  • 26291 Views
  • 0 replies
  • 0 kudos

How to flatten a nested recursive JSON struct to a list of struct

This is from Spark Event log on Event SparkListenerSQLExecutionStart.How to flatten the sparkPlanInfo struct into an array of the same struct, then later explode it. Note that the element children is an array containing the parent struct, and the lev...

  • 26291 Views
  • 0 replies
  • 0 kudos
Arnold_Souza
by New Contributor III
  • 5671 Views
  • 1 replies
  • 0 kudos

Delta Live Tables consuming different files from the same path are combining the schema

SummaryI am using Delta Live Tables to create a pipeline in Databricks and I am facing a problem of merging the schema of different files that are placed in the same folder in a datalake, even though I am using File Patterns to separate the data inge...

Data Engineering
cloud_files
Databricks SQL
Delta Live Tables
read_files
  • 5671 Views
  • 1 replies
  • 0 kudos
Latest Reply
Arnold_Souza
New Contributor III
  • 0 kudos

Found a solution:Never use 'fileNamePattern', '*file_1*',Instead, put the pattern directly into the path:"abfss://<container>@<storage_account>.dfs.core.windows.net/path/to/folder/*file_1*"

  • 0 kudos
sgannavaram
by New Contributor III
  • 2442 Views
  • 2 replies
  • 1 kudos

How to connect to IBM MQ from Databricks notebook?

We are trying to connect to IBM MQ and post message to MQ, which eventually consumed by mainframe application.What are the IBM MQ clients .jars / libraries installed in cluster ? if you have any sample code for connectivity that would be helpful.

  • 2442 Views
  • 2 replies
  • 1 kudos
Latest Reply
Saleem
New Contributor II
  • 1 kudos

Kindly update if you are able to connect to MQ from Databricks. I am working on same but no luck as I’m unable to install pymqi library on the cluster as its showing error as MQ library could not be found

  • 1 kudos
1 More Replies
bzh
by New Contributor
  • 2768 Views
  • 3 replies
  • 0 kudos

Question: Delta Live Table, multiple streaming sources to the single target

We are trying to writing multiple sources to the same target table using DLT, but getting the below errors. Not sure what we are missing here in the code....File /databricks/spark/python/dlt/api.py:817, in apply_changes(target, source, keys, sequence...

  • 2768 Views
  • 3 replies
  • 0 kudos
Latest Reply
nag_kanchan
New Contributor III
  • 0 kudos

The solution did not work for me. It was throwing an error stating: raise Py4JError( py4j.protocol.Py4JError: An error occurred while calling o434.readStream. Trace: py4j.Py4JException: Method readStream([class java. util.ArrayList]) does not exist.A...

  • 0 kudos
2 More Replies
Faisal
by Contributor
  • 1341 Views
  • 1 replies
  • 0 kudos

DLT - how to log number of rows read and written

Hi @Retired_mod - how to log number of rows read and written in dlt pipeline, I want to store it in audit tables post the pipeline update completes. Can you give me sample query code ?

  • 1341 Views
  • 1 replies
  • 0 kudos
Latest Reply
Faisal
Contributor
  • 0 kudos

Thanks @Retired_mod but I asked on how to log number of rows/written via a delta live table (DLT) pipeline, not a delta lake table and the solution you gave is related to data factory pipeline which is not what I need.

  • 0 kudos
AFox
by Contributor
  • 5646 Views
  • 3 replies
  • 2 kudos

databricks-connect: PandasUDFs importing local packages: ModuleNotFoundError

databricks-connect==14.1.0Related to other posts:https://community.databricks.com/t5/data-engineering/modulenotfounderror-serializationerror-when-executing-over/td-p/14301https://stackoverflow.com/questions/59322622/how-to-use-a-udf-defined-in-a-sub-...

  • 5646 Views
  • 3 replies
  • 2 kudos
Latest Reply
AFox
Contributor
  • 2 kudos

There is a way to do this!! spark.addArtifact(src_zip_path, pyfile=True) Some things of note:This only works on single user (non shared) clusterssrc_zip_path must be a posixpath type string (i.e. forward slash ) even on windows (drop C: and replace t...

  • 2 kudos
2 More Replies
amitdatabricksc
by New Contributor II
  • 8591 Views
  • 4 replies
  • 2 kudos

how to zip a dataframe

how to zip a dataframe so that i get a zipped csv output file. please share command. it is only 1 dataframe involved and not multiple. 

  • 8591 Views
  • 4 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

writing to a local directory does not work.See this topic:https://community.databricks.com/s/feed/0D53f00001M7hNlCAJ

  • 2 kudos
3 More Replies
harvey-c
by New Contributor III
  • 1301 Views
  • 0 replies
  • 0 kudos

How to manage data reload in DLT

Hi, Community membersI had an situation to reload some data via DLT pipeline.  All data are stored in landing storage account and they have been loaded in daily base. For example, from 1/Nov to 30/Nov.For some reason, I need to reload the data of 25/...

  • 1301 Views
  • 0 replies
  • 0 kudos
AkifCakir
by New Contributor II
  • 19315 Views
  • 3 replies
  • 3 kudos

Resolved! Why Spark Save Modes , "overwrite" always drops table although "truncate" is true ?

Hi Dear Team, I am trying to import data from databricks to Exasol DB. I am using following code in below with Spark version is 3.0.1 ,dfw.write \ .format("jdbc") \ .option("driver", exa_driver) \ .option("url", exa_url) \ .option("db...

  • 19315 Views
  • 3 replies
  • 3 kudos
Latest Reply
Gembo
New Contributor II
  • 3 kudos

@AkifCakir , Were you able to find a way to truncate without dropping the table using the .write function as I am facing the same issue as well.

  • 3 kudos
2 More Replies
feed
by New Contributor III
  • 13202 Views
  • 4 replies
  • 2 kudos

OSError: No wkhtmltopdf executable found: "b''"

OSError: No wkhtmltopdf executable found: "b''"If this file exists please check that this process can read it or you can pass path to it manually in method call, check README. Otherwise please install wkhtmltopdf - https://github.com/JazzCore/python-...

  • 13202 Views
  • 4 replies
  • 2 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 2 kudos

Hi, When did you receive this error? running a code insde a notebook , or running a cluster? or a job?Also, please tag @Debayan​ with your next response which will notify me. Thank you!

  • 2 kudos
3 More Replies
george_ognyanov
by New Contributor III
  • 4313 Views
  • 5 replies
  • 3 kudos

Resolved! Terraform Azure Databricks Unity Catalogue - Failed to check metastore quota limit for region

I am trying to create a metastore via the Terraform Azure databricks_metastore resource but I keep getting the error: This is the exact code I am using to create the resource:I have tried using both my Databricks account and a service principal appli...

george_ognyanov_1-1699523634061.png george_ognyanov_0-1699523597833.png
  • 4313 Views
  • 5 replies
  • 3 kudos
Latest Reply
george_ognyanov
New Contributor III
  • 3 kudos

Hi @Retired_mod as far as I understand one region can have one metastore. I am able to create a metastore in the same region if I log into the Databricks GUI and do it there.Alternatively, if I already have a metastore created and try to execute the ...

  • 3 kudos
4 More Replies
Nathant93
by New Contributor III
  • 2114 Views
  • 1 replies
  • 0 kudos

SQL Server OUTPUT clause alternative

I am looking at after a merge or insert has happened to get the records in that batch that had been inserted via either method, much like the OUTPUT clause in sql server.Does anyone have any suggestions, the only thing I can think of is to add a time...

  • 2114 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nathant93
New Contributor III
  • 0 kudos

I've managed to do it like this qry = spark.sql(f"DESCRIBE history <table_name> limit 1").collect()current_version = int(qry[0][0])prev_version = current_version - 1Then do an except statement between the versions. 

  • 0 kudos
Akshay_127877
by New Contributor II
  • 30278 Views
  • 7 replies
  • 1 kudos

How to open Streamlit URL that is hosted by Databricks in local web browser?

I have run this webapp code on Databricks notebook. It works properly without any errors. With databricks acting as server, I am unable open this link on my browser for this webapp.But when I run the code on my local IDE, I am able to just open the U...

image
  • 30278 Views
  • 7 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Akshay Aravinnakshan​ Thank you for posting your question in our community!Your input matters! Help our community thrive by coming back and marking the most helpful and accurate answers. Together, we can make a difference!

  • 1 kudos
6 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels