Data Engineering

Forum Posts

Sorted by:

by db_knowledge • New Contributor II

06-04-2024 7:35:36 AM

276 Views
2 replies
0 kudos

Merge operation with ouputMode update in autoloader databricks

Hi team,I am trying to do merge operation along with outputMode('update') and foreachmode byusing below code but it is not updating data could you please any help on this?output=(casting_df.writeStream.format('delta').trigger(availableNow=True).optio...

Data Engineering

276 Views
2 replies
0 kudos

06-04-2024 7:35:36 AM

View Replies

Latest Reply

anardinelli
New Contributor III

06-04-2024 8:29:06 AM

0 kudos

Hi @db_knowledge Please try .foreachBatch(upsertToDelta) instead of creating the lambda inside it. Best, Alessandro

0 kudos

06-04-2024 8:29:06 AM

1 More Replies

by Adigkar • New Contributor

06-04-2024 2:40:25 AM

355 Views
3 replies
0 kudos

Reprocess of old data stored in adls

Hi,We have a requirement fir a scenario to reprocess old data using data factory pipeline.Here are the detailsStorage in ADLSGEN2Landing zone(where the data will be stored in the same format as we get from source),Data will be loaded from sql server ...

Data Engineering

355 Views
3 replies
0 kudos

06-04-2024 2:40:25 AM

View Replies

Latest Reply

Hkesharwani
Contributor II

06-04-2024 7:27:41 AM

0 kudos

@Kaniz_Fatma I just posted a possible solution for the above problem and it has been rejected community moderator without any explanation. This has happened to me twice in past as well.Can you please help in this case.

0 kudos

06-04-2024 7:27:41 AM

2 More Replies

by LucasBelpaire • New Contributor

06-04-2024 7:58:29 AM

187 Views
0 replies
0 kudos

Generate embeddings from third party API in Delta Live Tables

HiWe currently have a Delta Live Tables flow through which textual data is flowing. As a final enrichment step we would also want to generate embeddings using a third party api provider (probably Voyage.AI). They support batch embedding which would g...

Data Engineering

187 Views
0 replies
0 kudos

06-04-2024 7:58:29 AM

by mk1987c • New Contributor III

02-11-2023 8:21:13 PM

3427 Views
5 replies
1 kudos

Resolved! I am trying to use Databricks Autoloader with File Notification Mode

When i run my command for readstream using .option("cloudFiles.useNotifications", "true") it start reading the files from Azure blob (please note that i did not provide the configuration like subscription id , clint id , connect string and all while...

Data Engineering

3427 Views
5 replies
1 kudos

02-11-2023 8:21:13 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

02-22-2023 2:27:59 PM

1 kudos

Hi,I would like to share the following docs that might be able to help you with this issue. https://docs.databricks.com/ingestion/auto-loader/file-notification-mode.html#required-permissions-for-configuring-file-notification-for-adls-gen2-and-azure-b...

1 kudos

02-22-2023 2:27:59 PM

4 More Replies

by Sricharan05 • New Contributor III

06-02-2024 7:48:51 AM

407 Views
4 replies
2 kudos

Databricks Certified Associate Developer Exam Got Suspended. Require support for the same.

Request #00482566Hello Team, I encountered Pathetic experience while attempting my 1st Databricks certification. I had some network issues and lighting issues. My test was stopped in the middle and I was connected with the proctor for reviewing. As r...

Data Engineering

407 Views
4 replies
2 kudos

06-02-2024 7:48:51 AM

View Replies

Latest Reply

Sricharan05
New Contributor III

06-04-2024 1:28:25 AM

2 kudos

Hi @Kaniz @Sujitha @APadmanabhan @Cert-Team @Cert-Bricks @Cert-TeamOPS I have been waiting for more than 40+ hours since I raised my ticket. Till now I dint get any response from the support team nor from anyone. Can you please escalate this issue...

2 kudos

06-04-2024 1:28:25 AM

3 More Replies

by zero234 • New Contributor III

02-16-2024 3:44:35 AM

2422 Views
3 replies
2 kudos

Resolved! i have created a materialized view table using delta live table pipeline and its not appending data

i have created a materialized view table using delta live table pipeline , for some reason it is overwriting data every day , i want it to append data to the table instead of doing full refresh suppose i had 8 million records in table and if irun the...

Data Engineering

2422 Views
3 replies
2 kudos

02-16-2024 3:44:35 AM

View Replies

Latest Reply

kulkpd
Contributor

02-17-2024 6:37:07 PM

2 kudos

@zero234 ,Adding some suggestion based on answers from @Kaniz_Fatma. Important point to note here: "To define a materialized view in Python, apply @table to a query that performs a static read against a data source. To define a streaming table, apply...

2 kudos

02-17-2024 6:37:07 PM

2 More Replies

by alonisser • Contributor

06-01-2024 12:44:21 PM

424 Views
2 replies
1 kudos

Since moving to dbr 14.3 with python jobs I don't see the stack trace for exceptions

or even the logs don't contain the error line I see (downloaded all logs file from the UI and checked them)How can I see the stacktrace? it's essential to debug certain issues

Data Engineering

424 Views
2 replies
1 kudos

06-01-2024 12:44:21 PM

View Replies

Latest Reply

alonisser
Contributor

06-03-2024 2:59:47 PM

1 kudos

Thanks for the answer, but i fail to see what it has to do with my questions. it's not a "general python error", I run lots of jobs with python on Databricks clusters and know how to run python jobs and dependencies, I'm pointing to a specific issue ...

1 kudos

06-03-2024 2:59:47 PM

1 More Replies

by dzsuzs • New Contributor II

06-03-2024 8:56:45 AM

417 Views
2 replies
1 kudos

OOM Issue in Streaming with foreachBatch()

I have a stateless streaming application that uses foreachBatch. This function executes between 10-400 times each hour based on custom logic. The logic within foreachBatch includes: collect() on very small DataFrames (a few megabytes) --> driver mem...

Data Engineering

417 Views
2 replies
1 kudos

06-03-2024 8:56:45 AM

View Replies

Latest Reply

xorbix_rshiva
New Contributor III

06-03-2024 12:33:27 PM

1 kudos

From the information you provided, your issue might be resolved by setting a watermark on the streaming dataframe. The purpose of watermarks is to set a maximum time for records to be retained in state. Without a watermark, records in your state will...

1 kudos

06-03-2024 12:33:27 PM

1 More Replies

by shanebo425 • New Contributor III

05-29-2024 2:45:34 PM

623 Views
2 replies
0 kudos

Saving Widgets to Git

We use Databricks widgets in our python notebooks to pass parameters in jobs but also for when we are running the notebooks manually (outside of a job context) for various reasons. We're a small team, but I've noticed that when I create a notebook an...

Data Engineering

623 Views
2 replies
0 kudos

05-29-2024 2:45:34 PM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

05-30-2024 10:41:27 PM

0 kudos

@shanebo425 You can add your widgets to the code, ex:dbutils.widgets.text("test", "") dbutils.widgets.get("test") Remember that the cell with widget needs to be run in order for widgets to be actually visible in a notebook.

0 kudos

05-30-2024 10:41:27 PM

1 More Replies

by avrm91 • Contributor

05-31-2024 6:02:34 AM

8317 Views
3 replies
1 kudos

Resolved! XML DLT Autoloader - Ingestion of XML Files

I want to ingest multiple XML files with varying but similar structures without defining a schema.For example: <?xml version="1.0" encoding="ISO-8859-1"?> <LIEFERUNG> <ABSENDER> <RZLZ>R00000001</RZLZ> <NAME>Informatik GmbH </NAME> <ST...

Data Engineering

8317 Views
3 replies
1 kudos

05-31-2024 6:02:34 AM

View Replies

Latest Reply

avrm91
Contributor

06-03-2024 7:29:35 AM

1 kudos

@Kaniz_Fatma Thanks a lot.I found an issue in from_xml function.I posted above: SELECT from_xml(CONCAT('<ABSENDER>', ABSENDER, '</ABSENDER>'), schema_of_xml(' <ABSENDER> <RZLZ>R00000001</RZLZ> <NAME>Informatik GmbH</NAME> <STRASSE>M...

1 kudos

06-03-2024 7:29:35 AM

2 More Replies

by daindana • New Contributor III

10-13-2021 5:58:51 PM

4049 Views
8 replies
3 kudos

Resolved! How to preserve my database when the cluster is terminated?

Whenever my cluster is terminated, I lose my whole database(I'm not sure if it's related, I made those database with delta format. ) And since the cluster is terminated in 2 hours from not using it, I wake up with no database every morning.I don't wa...

Data Engineering

4049 Views
8 replies
3 kudos

10-13-2021 5:58:51 PM

View Replies

Latest Reply

dhpaulino
New Contributor II

06-03-2024 11:25:19 AM

3 kudos

As the file still in the dbfs you can just recreate the reference of your tables and continue the work, with something like this:db_name = "mydb" from pathlib import Path path_db = f"dbfs:/user/hive/warehouse/{db_name}.db/" tables_dirs = dbutils.fs....

3 kudos

06-03-2024 11:25:19 AM

7 More Replies

by lnsnarayanan • New Contributor II

08-22-2021 12:05:47 AM

7419 Views
8 replies
11 kudos

Resolved! I cannot see the Hive databases or tables once I terminate the cluster and use another cluster.

I am using Databricks community edition for learning purposes. I created some Hive-managed tables through spark sql as well as with df.saveAsTable options. But when I connect to a new cluser, "Show databases" only returns the default database....

Data Engineering

7419 Views
8 replies
11 kudos

08-22-2021 12:05:47 AM

View Replies

Latest Reply

dhpaulino
New Contributor II

06-03-2024 11:24:08 AM

11 kudos

11 kudos

06-03-2024 11:24:08 AM

7 More Replies

by DBUser2 • New Contributor II

05-31-2024 10:47:44 AM

373 Views
2 replies
0 kudos

Simba Spark ODBC driver .NET core compatibility

HiIs the Simba Spark ODBC driver (2.08.00.1002) compatible with .NET core?

Data Engineering

373 Views
2 replies
0 kudos

05-31-2024 10:47:44 AM

View Replies

Latest Reply

NandiniN
Honored Contributor

05-31-2024 10:01:01 PM

0 kudos

Hi @DBUser2 , I checked the official doc https://www.databricks.com/spark/odbc-drivers-download we currently provide Simba Apache Spark ODBC Connector 2.8.0 In the archives as well it is available until 2.6.15 https://www.databricks.com/spark/odbc-d...

0 kudos

05-31-2024 10:01:01 PM

1 More Replies

by v01d • New Contributor III

06-01-2024 5:11:43 AM

954 Views
2 replies
0 kudos

Databricks Auto Loader authorization exception

Hello,I'm trying to process the DB Auto Loader with notifications=true option (Azure ADLS) and get not clear authorization error. The exception log attached.Looks like all required permission are provided to the service principle:

Data Engineering

954 Views
2 replies
0 kudos

06-01-2024 5:11:43 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

06-03-2024 1:02:55 AM

0 kudos

Hi @v01d, There can be three probable causes - The service principal used for authentication lacks the necessary permissions. Confirm that the service principal has the required permissions on the ADLS.Specifically, ensure that it has Read permissio...

0 kudos

06-03-2024 1:02:55 AM

1 More Replies

by AkasBala • New Contributor III

05-17-2023 7:26:09 PM

1815 Views
3 replies
0 kudos

Primary Key not working as expected on Unity Catalog delta tables

Hi @Chetan Kardekar. I noticed that you had commented on Primary key on Delta tables. Do we have that feature already released in DataBricks Premium. I have a Unity Catalog and I created a table with Primary Key, though it doesnt act like Primary Key...

Data Engineering

1815 Views
3 replies
0 kudos

05-17-2023 7:26:09 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-21-2023 12:12:57 AM

0 kudos

Hi @Bala Akas Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

0 kudos

06-21-2023 12:12:57 AM

2 More Replies

User

Count

1603

744

348

285

247

Databricks Community

Forum Posts

Merge operation with ouputMode update in autoloader databricks

Reprocess of old data stored in adls

Generate embeddings from third party API in Delta Live Tables

Resolved! I am trying to use Databricks Autoloader with File Notification Mode

Databricks Certified Associate Developer Exam Got Suspended. Require support for the same.

Resolved! i have created a materialized view table using delta live table pipeline and its not appending data

Since moving to dbr 14.3 with python jobs I don't see the stack trace for exceptions

OOM Issue in Streaming with foreachBatch()

Saving Widgets to Git

Resolved! XML DLT Autoloader - Ingestion of XML Files

Resolved! How to preserve my database when the cluster is terminated?

Resolved! I cannot see the Hive databases or tables once I terminate the cluster and use another cluster.

Simba Spark ODBC driver .NET core compatibility

Databricks Auto Loader authorization exception

Primary Key not working as expected on Unity Catalog delta tables

Compute Policy Does Not Install Libraries

Is there a way to let the DLT pipeline retry by it...

Can't create Catalog on Databricks on AWS

Executing Notebooks - Run All Cells vs Run All Bel...

getting Status code: 301 Moved Permanently error