Data Engineering

Forum Posts

Sorted by:

by VJ3 • New Contributor III

05-31-2024 1:25:56 PM

595 Views
2 replies
0 kudos

Databricks Upload local files (Create/Modify table)

Hello Team,I believe Databricks come out recently feature of Create or modify a table using file upload which is less than 2 GB (file format CSV, TSV, or JSON, Avro, Parquet, or text files to create or overwrite a managed Delta Lake table) on Self Se...

Data Engineering

595 Views
2 replies
0 kudos

05-31-2024 1:25:56 PM

View Replies

Latest Reply

VJ3
New Contributor III

06-05-2024 2:04:28 PM

0 kudos

Hello Nandini,Thank you for reply. Apologies for delay. Let's say I uploaded CSV file containing PII data using Upload feature available in Databricks UI. Will I be able to share that file with another user who should not have access to PII data elem...

0 kudos

06-05-2024 2:04:28 PM

1 More Replies

by PabloFelipe • New Contributor III

06-05-2024 8:05:33 AM

514 Views
4 replies
1 kudos

Resolved! My Libraries are not being installed in dbx-pipelines

Hello,I have some libraries on Azure Artifacts, but when I'm using notebooks, they are unreachable even though I'm explicitly adding the pip extra-url option (I have validated the tokens). So, I had to install them manually by downloading the wheel f...

Data Engineering

Databricks

dbx

514 Views
4 replies
1 kudos

06-05-2024 8:05:33 AM

View Replies

Latest Reply

PabloFelipe
New Contributor III

06-05-2024 1:43:33 PM

1 kudos

@shan_chandrawe solved it, it was an issue with the DevOps key-vault token associated of the artifacts token.

1 kudos

06-05-2024 1:43:33 PM

3 More Replies

by AH • New Contributor

06-05-2024 12:26:58 AM

229 Views
1 replies
0 kudos

Delta Lake Table Daily Read and Write job optimization

I have created 7 job for each business system to extract product data from each postgress source then write all job data into one data lake delta table [raw_product].each business system product table has around 20 GB of data.do the same thing for 15...

Data Engineering

229 Views
1 replies
0 kudos

06-05-2024 12:26:58 AM

View Replies

Latest Reply

shan_chandra
Esteemed Contributor

06-05-2024 12:51:23 PM

0 kudos

@AH - we can try out the config if read or fetch from postgres is slow , we can increase the fetchsize , numPartitions (to increase parallelism). kindly try to do a df.count() to check on slowness. https://spark.apache.org/docs/latest/sql-data-sou...

0 kudos

06-05-2024 12:51:23 PM

by ksilva • New Contributor

04-29-2022 5:04:45 AM

2341 Views
3 replies
1 kudos

Incorrect secret value when loaded as environment variable

I recently faced an issue that took good hours to identify. I'm loading an environment variable with a secretENVVAR: {{secrets/scope/key}}The secret is loaded in my application, I could verify it's there, but its value is not correct. I realised tha...

Data Engineering

2341 Views
3 replies
1 kudos

04-29-2022 5:04:45 AM

View Replies

Latest Reply

danmlopsmaz
New Contributor II

06-05-2024 11:59:23 AM

1 kudos

Hi team, is there an update or fix for this?

1 kudos

06-05-2024 11:59:23 AM

2 More Replies

by SamGreene • Contributor

05-09-2024 9:49:03 AM

1142 Views
4 replies
0 kudos

Resolved! Using parameters in a SQL Notebook and COPY INTO statement

Hi, My scenario is I have an export of a table being dropped in ADLS every day. I would like to load this data into a UC table and then repeat the process every day, replacing the data. This seems to rule out DLT as it is meant for incremental proc...

Data Engineering

1142 Views
4 replies
0 kudos

05-09-2024 9:49:03 AM

View Replies

Latest Reply

SamGreene
Contributor

06-05-2024 10:33:34 AM

0 kudos

The solution that worked what adding this python cell to the notebook: %pythonfrom pyspark.dbutils import DBUtilsdbutils = DBUtils(spark)dbutils.widgets.text("catalog", "my_business_app")dbutils.widgets.text("schema", "dev") Then in the SQL Cell: CRE...

0 kudos

06-05-2024 10:33:34 AM

3 More Replies

by JUPin • New Contributor II

05-30-2024 12:14:51 PM

650 Views
3 replies
0 kudos

REST API for Pipeline Events does not return all records

I'm using the REST API to retrieve Pipeline Events per the documentation:https://docs.databricks.com/api/workspace/pipelines/listpipelineeventsI am able to retrieve some records but the API stops after a call or two. I verified the number of rows us...

Data Engineering

650 Views
3 replies
0 kudos

05-30-2024 12:14:51 PM

View Replies

Latest Reply

JUPin
New Contributor II

06-05-2024 9:39:03 AM

0 kudos

I've attached some screenshots of the API call. It shows "59" records (Event Log API1.png) retrieved and a populated "next_page_token" however, when I pull the next set of data using the "next_page_token", the result set is empty(Event Log API2.png)...

0 kudos

06-05-2024 9:39:03 AM

2 More Replies

by galzamo • New Contributor

06-05-2024 1:09:14 AM

217 Views
1 replies
0 kudos

Job running time too long

Hi all,I'm doing my first data jobs.I create one job that consists of 4 other jobs.Yesterday I ran the 4 jobs separately and it worked fine (about half hour)-today I ran the big job, and the 4 jobs is running for 2 hours (and still running), Why is t...

Data Engineering

217 Views
1 replies
0 kudos

06-05-2024 1:09:14 AM

View Replies

Latest Reply

anardinelli
New Contributor III

06-05-2024 7:49:47 AM

0 kudos

Hello @galzamo how are you? You can check on the SparkUI for long running stages that might give you a clue where it's spending the most time on each task. Somethings can be the reason: 1. Increase of data and partitions on your source data 2. Cluste...

0 kudos

06-05-2024 7:49:47 AM

by EDDatabricks • Contributor

04-03-2024 3:43:54 AM

787 Views
2 replies
0 kudos

Expected size of managed Storage Accounts

Dear all,we are monitoring the size of managed storage accounts associated with our deployed Azure databricks instances.We have 5 databricks instances for specific components of our platform replicated in 4 environments (DEV, TEST, PREPROD, PROD).Dur...

Data Engineering

Filesize

LOGS

Managed Storage Account

787 Views
2 replies
0 kudos

04-03-2024 3:43:54 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

04-05-2024 3:51:51 AM

0 kudos

Hi @EDDatabricks, Let’s address your questions regarding Azure-managed storage accounts: What do these Storage Accounts contain? An Azure storage account contains various data objects, including: Blobs: Used for storing unstructured data like ima...

0 kudos

04-05-2024 3:51:51 AM

1 More Replies

by Kayla • Contributor III

06-04-2024 9:46:34 AM

633 Views
3 replies
6 kudos

Resolved! SQL Warehouse Timeout / Prevent Long Running Queries

We have an external service connecting to a SQL Warehouse, running a query that normally lasts 30 minutes.On occasion an error occurs and it will run for 6 hours.This happens overnight and is contributing to a larger bill. Is there any way to force l...

Data Engineering

633 Views
3 replies
6 kudos

06-04-2024 9:46:34 AM

View Replies

Latest Reply

Kayla
Contributor III

06-05-2024 4:10:05 AM

6 kudos

@lucasrocha @raphaelblg That is exactly what I was hoping to find. Thank you!

6 kudos

06-05-2024 4:10:05 AM

2 More Replies

by Rik • New Contributor III

08-07-2023 7:57:18 AM

3550 Views
5 replies
7 kudos

Resolved! File information is not passed to trigger job on file arrival

We are using the UC mechanism for triggering jobs on file arrival, as described here: https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/file-arrival-triggers.Unfortunately, the trigger doesn't actually pass the file-path that is gener...

Data Engineering

file arrival

trigger file

Unity Catalog

3550 Views
5 replies
7 kudos

08-07-2023 7:57:18 AM

View Replies

Latest Reply

marcuskw
Contributor

06-05-2024 1:18:23 AM

7 kudos

Also something I'm interested in using, would be really helpful to use File Trigger and get relevant information about exactly what file triggered the workflow!

7 kudos

06-05-2024 1:18:23 AM

4 More Replies

by jackgurae • New Contributor II

06-04-2024 9:32:08 PM

203 Views
0 replies
0 kudos

Best Practices for Filling Table Information for Databricks Assistant

Hi everyone,I’m looking for advice on the best practices for populating table information in Databricks to ensure that non-SQL users (such as PMs and Marketing teams) can easily query the tables using Databricks Assistant.Specifically, I have a few q...

Data Engineering

203 Views
0 replies
0 kudos

06-04-2024 9:32:08 PM

by AlokThampi • New Contributor

06-04-2024 6:01:47 PM

193 Views
0 replies
0 kudos

Issues while writing into bad_records path

Hello All,I would like to get your inputs with a scenario that I see while writing into the bad_records file.I am reading a ‘Ԓ’ delimited CSV file based on a schema that I have already defined. I have enabled error handling while reading the file to ...

Data Engineering

193 Views
0 replies
0 kudos

06-04-2024 6:01:47 PM

by LasseL • New Contributor

06-04-2024 4:52:57 AM

260 Views
1 replies
0 kudos

How to use change data feed when schema is changing between delta table versions?

How to use change data feed when delta table schema changes between delta table versions?I tried to read change data feed in parts (in code snippet I read version 1372, because 1371 and 1373 schema versions are different), but getting errorUnsupporte...

Data Engineering

260 Views
1 replies
0 kudos

06-04-2024 4:52:57 AM

View Replies

Latest Reply

raphaelblg
Honored Contributor II

06-04-2024 2:09:38 PM

0 kudos

Hi @LasseL, Please check: What is the schema for the change data feed? . It might help you

0 kudos

06-04-2024 2:09:38 PM

by Mitashi12 • New Contributor

06-04-2024 6:58:34 AM

192 Views
1 replies
0 kudos

Can we run Databricks sql built in function in spark sql on local

Data Engineering

192 Views
1 replies
0 kudos

06-04-2024 6:58:34 AM

View Replies

Latest Reply

raphaelblg
Honored Contributor II

06-04-2024 2:00:34 PM

0 kudos

Hi @Mitashi12 it's possible as long as you're on DBR 14.3+. Limitations with Databricks Connect for Python .

0 kudos

06-04-2024 2:00:34 PM

by MaximeGendre • New Contributor II

06-04-2024 1:50:01 PM

557 Views
0 replies
0 kudos

Problem using from_avro function

Hello everyone,I need your help with a topic that has been preoccupying me for a few days."from_avro" function gives me a strange result when I pass it the json schema of a Kafka topic.=================================================================...