Data Engineering

Forum Posts

Sorted by:

by SamGreene • Contributor

05-09-2024 9:49:03 AM

838 Views
4 replies
0 kudos

Resolved! Using parameters in a SQL Notebook and COPY INTO statement

Hi, My scenario is I have an export of a table being dropped in ADLS every day. I would like to load this data into a UC table and then repeat the process every day, replacing the data. This seems to rule out DLT as it is meant for incremental proc...

Data Engineering

838 Views
4 replies
0 kudos

05-09-2024 9:49:03 AM

View Replies

Latest Reply

SamGreene
Contributor

3 weeks ago

0 kudos

The solution that worked what adding this python cell to the notebook: %pythonfrom pyspark.dbutils import DBUtilsdbutils = DBUtils(spark)dbutils.widgets.text("catalog", "my_business_app")dbutils.widgets.text("schema", "dev") Then in the SQL Cell: CRE...

0 kudos

3 weeks ago

3 More Replies

by JUPin • New Contributor II

4 weeks ago

484 Views
3 replies
0 kudos

REST API for Pipeline Events does not return all records

I'm using the REST API to retrieve Pipeline Events per the documentation:https://docs.databricks.com/api/workspace/pipelines/listpipelineeventsI am able to retrieve some records but the API stops after a call or two. I verified the number of rows us...

Data Engineering

484 Views
3 replies
0 kudos

4 weeks ago

View Replies

Latest Reply

JUPin
New Contributor II

3 weeks ago

0 kudos

I've attached some screenshots of the API call. It shows "59" records (Event Log API1.png) retrieved and a populated "next_page_token" however, when I pull the next set of data using the "next_page_token", the result set is empty(Event Log API2.png)...

0 kudos

3 weeks ago

2 More Replies

by galzamo • New Contributor

3 weeks ago

144 Views
1 replies
0 kudos

Job running time too long

Hi all,I'm doing my first data jobs.I create one job that consists of 4 other jobs.Yesterday I ran the 4 jobs separately and it worked fine (about half hour)-today I ran the big job, and the 4 jobs is running for 2 hours (and still running), Why is t...

Data Engineering

144 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

anardinelli
New Contributor III

3 weeks ago

0 kudos

Hello @galzamo how are you? You can check on the SparkUI for long running stages that might give you a clue where it's spending the most time on each task. Somethings can be the reason: 1. Increase of data and partitions on your source data 2. Cluste...

0 kudos

3 weeks ago

by EDDatabricks • Contributor

04-03-2024 3:43:54 AM

609 Views
2 replies
0 kudos

Expected size of managed Storage Accounts

Dear all,we are monitoring the size of managed storage accounts associated with our deployed Azure databricks instances.We have 5 databricks instances for specific components of our platform replicated in 4 environments (DEV, TEST, PREPROD, PROD).Dur...

Data Engineering

Filesize

LOGS

Managed Storage Account

609 Views
2 replies
0 kudos

04-03-2024 3:43:54 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-05-2024 3:51:51 AM

0 kudos

Hi @EDDatabricks, Let’s address your questions regarding Azure-managed storage accounts: What do these Storage Accounts contain? An Azure storage account contains various data objects, including: Blobs: Used for storing unstructured data like ima...

0 kudos

04-05-2024 3:51:51 AM

1 More Replies

by Kayla • Contributor II

3 weeks ago

409 Views
3 replies
6 kudos

Resolved! SQL Warehouse Timeout / Prevent Long Running Queries

We have an external service connecting to a SQL Warehouse, running a query that normally lasts 30 minutes.On occasion an error occurs and it will run for 6 hours.This happens overnight and is contributing to a larger bill. Is there any way to force l...

Data Engineering

409 Views
3 replies
6 kudos

3 weeks ago

View Replies

Latest Reply

Kayla
Contributor II

3 weeks ago

6 kudos

@lucasrocha @raphaelblg That is exactly what I was hoping to find. Thank you!

6 kudos

3 weeks ago

2 More Replies

by Rik • New Contributor III

08-07-2023 7:57:18 AM

3089 Views
5 replies
7 kudos

Resolved! File information is not passed to trigger job on file arrival

We are using the UC mechanism for triggering jobs on file arrival, as described here: https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/file-arrival-triggers.Unfortunately, the trigger doesn't actually pass the file-path that is gener...

Data Engineering

file arrival

trigger file

Unity Catalog

3089 Views
5 replies
7 kudos

08-07-2023 7:57:18 AM

View Replies

Latest Reply

marcuskw
Contributor

3 weeks ago

7 kudos

Also something I'm interested in using, would be really helpful to use File Trigger and get relevant information about exactly what file triggered the workflow!

7 kudos

3 weeks ago

4 More Replies

by jackgurae • New Contributor

3 weeks ago

126 Views
0 replies
0 kudos

Best Practices for Filling Table Information for Databricks Assistant

Hi everyone,I’m looking for advice on the best practices for populating table information in Databricks to ensure that non-SQL users (such as PMs and Marketing teams) can easily query the tables using Databricks Assistant.Specifically, I have a few q...

Data Engineering

126 Views
0 replies
0 kudos

3 weeks ago

by AlokThampi • New Contributor

3 weeks ago

140 Views
0 replies
0 kudos

Issues while writing into bad_records path

Hello All,I would like to get your inputs with a scenario that I see while writing into the bad_records file.I am reading a ‘Ԓ’ delimited CSV file based on a schema that I have already defined. I have enabled error handling while reading the file to ...

Data Engineering

140 Views
0 replies
0 kudos

3 weeks ago

by LasseL • New Contributor

3 weeks ago

172 Views
1 replies
0 kudos

How to use change data feed when schema is changing between delta table versions?

How to use change data feed when delta table schema changes between delta table versions?I tried to read change data feed in parts (in code snippet I read version 1372, because 1371 and 1373 schema versions are different), but getting errorUnsupporte...

Data Engineering

172 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

raphaelblg
Contributor III

3 weeks ago

0 kudos

Hi @LasseL, Please check: What is the schema for the change data feed? . It might help you

0 kudos

3 weeks ago

by Mitashi12 • New Contributor

3 weeks ago

136 Views
1 replies
0 kudos

Can we run Databricks sql built in function in spark sql on local

Data Engineering

136 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

raphaelblg
Contributor III

3 weeks ago

0 kudos

Hi @Mitashi12 it's possible as long as you're on DBR 14.3+. Limitations with Databricks Connect for Python .

0 kudos

3 weeks ago

by MaximeGendre • New Contributor II

3 weeks ago

494 Views
0 replies
0 kudos

Problem using from_avro function

Hello everyone,I need your help with a topic that has been preoccupying me for a few days."from_avro" function gives me a strange result when I pass it the json schema of a Kafka topic.=================================================================...

Data Engineering

494 Views
0 replies
0 kudos

3 weeks ago

by db_knowledge • New Contributor II

3 weeks ago

198 Views
2 replies
0 kudos

Merge operation with ouputMode update in autoloader databricks

Hi team,I am trying to do merge operation along with outputMode('update') and foreachmode byusing below code but it is not updating data could you please any help on this?output=(casting_df.writeStream.format('delta').trigger(availableNow=True).optio...

Data Engineering

198 Views
2 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

anardinelli
New Contributor III

3 weeks ago

0 kudos

Hi @db_knowledge Please try .foreachBatch(upsertToDelta) instead of creating the lambda inside it. Best, Alessandro

0 kudos

3 weeks ago

1 More Replies

by AyushModi038 • New Contributor III

02-17-2023 6:26:26 AM

3568 Views
5 replies
3 kudos

Library installation in cluster taking a long time

I am trying to install "pycaret" libraray in cluster using whl file.But it is creating conflict in the dependency sometimes (not always, sometimes it works too.) My questions are -1 - How to install libraries in cluster only single time (Maybe from ...

Data Engineering

3568 Views
5 replies
3 kudos

02-17-2023 6:26:26 AM

View Replies

Latest Reply

Spencer_Kent
New Contributor III

3 weeks ago

3 kudos

Can any Databricks pros provide some guidance on this? My clusters that have "cluster-installed" libraries take 30 minutes or more to become usable. I'm only trying to install a handful of CRAN libraries, but having to re-install them every time a cl...

3 kudos

3 weeks ago

4 More Replies

by Adigkar • New Contributor

3 weeks ago

254 Views
3 replies
0 kudos

Reprocess of old data stored in adls

Hi,We have a requirement fir a scenario to reprocess old data using data factory pipeline.Here are the detailsStorage in ADLSGEN2Landing zone(where the data will be stored in the same format as we get from source),Data will be loaded from sql server ...

Data Engineering

254 Views
3 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Hkesharwani
Contributor II

3 weeks ago

0 kudos

@Kaniz I just posted a possible solution for the above problem and it has been rejected community moderator without any explanation. This has happened to me twice in past as well.Can you please help in this case.

0 kudos

3 weeks ago

2 More Replies

by LucasBelpaire • New Contributor

3 weeks ago

126 Views
0 replies
0 kudos

Generate embeddings from third party API in Delta Live Tables

HiWe currently have a Delta Live Tables flow through which textual data is flowing. As a final enrichment step we would also want to generate embeddings using a third party api provider (probably Voyage.AI). They support batch embedding which would g...

Data Engineering

126 Views
0 replies
0 kudos

3 weeks ago

User

Count

1602

737

348

285

247

Databricks Community

Forum Posts

Resolved! Using parameters in a SQL Notebook and COPY INTO statement

REST API for Pipeline Events does not return all records

Job running time too long

Expected size of managed Storage Accounts

Resolved! SQL Warehouse Timeout / Prevent Long Running Queries

Resolved! File information is not passed to trigger job on file arrival

Best Practices for Filling Table Information for Databricks Assistant

Issues while writing into bad_records path

How to use change data feed when schema is changing between delta table versions?

Can we run Databricks sql built in function in spark sql on local

Problem using from_avro function

Merge operation with ouputMode update in autoloader databricks

Library installation in cluster taking a long time

Reprocess of old data stored in adls

Generate embeddings from third party API in Delta Live Tables

unity catalog with external table and column maski...

Generating Personal Access Token to service princi...

Differences between Spark SQL and Databricks

Init Script Failing

I keep getting dataset from spark.table command (i...