cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

LearnerShahid
by Databricks Partner
  • 9660 Views
  • 6 replies
  • 4 kudos

Resolved! Lesson 6.1 of Data Engineering. Error when reading stream - java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.resolvePathOnPhysicalStorage(path: Path)

Below function executes fine: def autoload_to_table(data_source, source_format, table_name, checkpoint_directory):  query = (spark.readStream         .format("cloudFiles")         .option("cloudFiles.format", source_format)         .option("cloudFile...

I have verified that source data exists.
  • 9660 Views
  • 6 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Autoloader is not supported on community edition.

  • 4 kudos
5 More Replies
BenLambert
by Contributor
  • 3819 Views
  • 2 replies
  • 2 kudos

Resolved! Delta Live Tables not inferring table schema properly.

I have a delta live tables pipeline that is loading and transforming data. Currently I am having a problem that the schema inferred by DLT does not match the actual schema of the table. The table is generated via a groupby.pivot operation as follows:...

  • 3819 Views
  • 2 replies
  • 2 kudos
Latest Reply
BenLambert
Contributor
  • 2 kudos

I was able to get around this by specifying the table schema in the table decorator.

  • 2 kudos
1 More Replies
mick042
by New Contributor III
  • 1888 Views
  • 1 replies
  • 0 kudos

Optimal approach when using external script/executable for processing data

I need to process a number of files where I manipulate file text utilising an external executable that operates on stdin/stdout. I am quite new to spark. What I am attempting is to use rdd.pipe as in the followingexe_path = " /usr/local/bin/external...

  • 1888 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16753725469
Databricks Employee
  • 0 kudos

Hi @Michael Lennon​  Can you please elaborate use case on what the external app is doing exe_path

  • 0 kudos
Leszek
by Contributor
  • 3840 Views
  • 2 replies
  • 4 kudos

How to set up partitions on the streaming Delta Table?

Let's assume that we have 3 streaming Delta Tables:BronzeSilverGoldMy aim is to add partitioning to Silver table (for example by Date). So, as a result Gold table with throw an error that source table has been updated and I would need to set 'ignoreC...

  • 3840 Views
  • 2 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

is the change data feed functionality (of your silver table) an option, combined with merge in your gold table?https://docs.microsoft.com/en-us/azure/databricks/delta/delta-change-data-feed

  • 4 kudos
1 More Replies
Mohit_m
by Databricks Employee
  • 12494 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks - python error when importing wheel distribution package

In previous days all notebooks containing : 'import anomalydetection' worked just fine. There was no change in any configuration of the cluster, notebook or our imported library.However recently notebooks just crashed with below errorSame happen also...

  • 12494 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mohit_m
Databricks Employee
  • 1 kudos

Solution: This is due to the latest version of protobuf library, please try to downgrade the library which should solve the issuepip install protobuf==3.20.*protobuf library versions which works: 3.20.1 if it does not work then try 3.18.1

  • 1 kudos
noimeta
by Contributor III
  • 2143 Views
  • 0 replies
  • 0 kudos

How to use Terraform to add Git provider credentials to a workspace in order to use service principal for CI/CD

Hi,I'm very new to Terraform. Currently, I'm trying to automate the service principal setup process using Terraform.Following this example, I successfully created a service principal and an access token. However, when I tried adding databricks_git_cr...

  • 2143 Views
  • 0 replies
  • 0 kudos
jakubk
by Contributor
  • 5358 Views
  • 2 replies
  • 0 kudos

spark.read.parquet() - how to check for file lock before reading? (azure)

I have some python code which takes parquet files from an adlsv2 location and merges it into delta tables (run as a workflow job on a schedule)I have a try catch wrapper around this so that any files that fail get moved into a failed folder using dbu...

  • 5358 Views
  • 2 replies
  • 0 kudos
Latest Reply
jakubk
Contributor
  • 0 kudos

That's the problem - it's not being locked (or fs.mv() isn't checking/honoring the lock). The upload process/tool is a 3rd-prty external toolI can see via the upload tool that the file upload is 'in progress'I can also see the 0 byte destination file...

  • 0 kudos
1 More Replies
VaDim
by New Contributor III
  • 11200 Views
  • 8 replies
  • 3 kudos

Resolved! Are MERGE INTO inserts supported when the delta table has an identity column ?

I can't seem to make it work as I keep getting:DeltaInvariantViolationException: NOT NULL constraint violated for column: dl_id.

  • 11200 Views
  • 8 replies
  • 3 kudos
Latest Reply
byrdman
New Contributor III
  • 3 kudos

if you are using 'delta.columnMapping.mode' = 'name' on your table i could not get it to work, without that line .. for the not matched .. WHEN NOT MATCHED  THEN INSERT (columnname,columnName2) values(columnname,columnName2)WHEN MATCHED Then UPDAT...

  • 3 kudos
7 More Replies
anders_poirel
by New Contributor II
  • 1847 Views
  • 0 replies
  • 2 kudos

Moving Notebook Cell causes browser to run out of memory

Platform:AWS Databricksenabled "Turn on the new, updated code editor" in Notebook SettingsMacOS 12.5.1Firefox 104.0.2When I attempt to drag a notebook cell to move it, the tab crashes and causes my computer to run out of memory. I profiled the tab to...

  • 1847 Views
  • 0 replies
  • 2 kudos
Taha_Hussain
by Databricks Employee
  • 2227 Views
  • 1 replies
  • 3 kudos

Register for Databricks Office HoursSeptember 14: 8:00 - 9:00 AM PT | 3:00pm - 4:00pm GMTSeptember 28: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMT Dat...

Register for Databricks Office HoursSeptember 14: 8:00 - 9:00 AM PT | 3:00pm - 4:00pm GMTSeptember 28: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMTDatabricks Office Hours connects you directly with experts to answer your Databricks questions.Join us t...

  • 2227 Views
  • 1 replies
  • 3 kudos
Latest Reply
Taha_Hussain
Databricks Employee
  • 3 kudos

Check out some of the questions from fellow users during our last Office Hours. All these questions were answered live by a Databricks expert!Q: What's the best way of using a UDF in a class?A: You need to define your class and then register the func...

  • 3 kudos
osoucy
by New Contributor II
  • 1856 Views
  • 0 replies
  • 1 kudos

Is it possible to join two aggregated streams of data?

ObjectiveWithin the context of a delta live table, I'm trying to merge two streams aggregation, but run into challenges. Is it possible to achieve such a join?ContextAssume- table trades stores a list of trades with their associated time stamps- tabl...

  • 1856 Views
  • 0 replies
  • 1 kudos
Aran_Oribu
by New Contributor II
  • 7333 Views
  • 5 replies
  • 2 kudos

Resolved! Create and update a csv/json file in ADLSG2 with Eventhub in Databricks streaming

Hello ,This is my first post here and I am a total beginner with DataBricks and spark.Working on an IoT Cloud project with azure , I'm looking to set up a continuous stream processing of data.A current architecture already exists thanks to Stream Ana...

  • 7333 Views
  • 5 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

So the event hub creates files (json/csv) on adls.You can read those files into databricks with the spark.read.csv/json method. If you want to read many files in one go, you can use wildcards.f.e. spark.read.json("/mnt/datalake/bronze/directory/*/*...

  • 2 kudos
4 More Replies
jacob1
by Databricks Partner
  • 1889 Views
  • 1 replies
  • 1 kudos

I passed my DE associate exam, but unable to see/download my certificate on credentials.databricks.com. Can someone help download the certificate - this is time sensitive

I passed my DE associate exam, but unable to see/download my certificate on  credentials.databricks.com. I am using the same email as the one on Kryterion on webassessor.com/databricks.I can log invto Kryterion and see that I have passed the exam

  • 1889 Views
  • 1 replies
  • 1 kudos
Latest Reply
Vidula
Databricks Partner
  • 1 kudos

Hi @jacob stallone​ Thank you for reaching out!Let us look into this for you, and we will get back to you.

  • 1 kudos
PChan
by New Contributor II
  • 1746 Views
  • 1 replies
  • 0 kudos

www.googleapis.com

It happens after databricks deleted my cluster{    "protoPayload": {      "@type": "type.googleapis.com/google.cloud.audit.AuditLog",      "status": {},      "serviceName": "container.googleapis.com",      "methodName": "google.container.v1.ClusterMa...

error
  • 1746 Views
  • 1 replies
  • 0 kudos
Latest Reply
PChan
New Contributor II
  • 0 kudos

attached the error log.

  • 0 kudos
Anonymous
by Not applicable
  • 3067 Views
  • 1 replies
  • 5 kudos

www.linkedin.com

September 2022 Featured Member Interview Aman Sehgal - @AmanSehgal Pronouns: He, Him  Company: CyberCXJob Title: Senior Data Engineer Could you give a brief description of your professional journey to date? A. I started my career as software develope...

  • 3067 Views
  • 1 replies
  • 5 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 5 kudos

Thank you @Lindsay Olson​  and @Christy Seto​ for interviewing me and nominating me as this months featured member. It's a pleasure to be member of Databricks community and I'm looking forward to contribute more in future.To all the community members...

  • 5 kudos
Labels