cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

VJ3
by New Contributor III
  • 595 Views
  • 2 replies
  • 0 kudos

Databricks Upload local files (Create/Modify table)

Hello Team,I believe Databricks come out recently feature of Create or modify a table using file upload which is less than 2 GB (file format CSV, TSV, or JSON, Avro, Parquet, or text files to create or overwrite a managed Delta Lake table) on Self Se...

  • 595 Views
  • 2 replies
  • 0 kudos
Latest Reply
VJ3
New Contributor III
  • 0 kudos

Hello Nandini,Thank you for reply. Apologies for delay. Let's say I uploaded CSV file containing PII data using Upload feature available in Databricks UI. Will I be able to share that file with another user who should not have access to PII data elem...

  • 0 kudos
1 More Replies
PabloFelipe
by New Contributor III
  • 514 Views
  • 4 replies
  • 1 kudos

Resolved! My Libraries are not being installed in dbx-pipelines

Hello,I have some libraries on Azure Artifacts, but when I'm using notebooks, they are unreachable even though I'm explicitly adding the pip extra-url option (I have validated the tokens). So, I had to install them manually by downloading the wheel f...

PabloFelipe_0-1717599844578.png
Data Engineering
Databricks
dbx
  • 514 Views
  • 4 replies
  • 1 kudos
Latest Reply
PabloFelipe
New Contributor III
  • 1 kudos

@shan_chandrawe solved it, it was an issue with the DevOps key-vault token associated of the artifacts token.

  • 1 kudos
3 More Replies
AH
by New Contributor
  • 229 Views
  • 1 replies
  • 0 kudos

Delta Lake Table Daily Read and Write job optimization

I have created 7 job for each business system to extract product data from each postgress source then write all job data into one data lake delta table [raw_product].each business system product table has around 20 GB of data.do the same thing for 15...

AH_0-1717569489175.png AH_1-1717572455868.png AH_3-1717572644640.png AH_2-1717572557758.png
  • 229 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 0 kudos

@AH  - we can try out the config  if read or fetch from postgres is slow , we can increase the fetchsize , numPartitions (to increase parallelism). kindly try to do a df.count() to check on slowness.  https://spark.apache.org/docs/latest/sql-data-sou...

  • 0 kudos
ksilva
by New Contributor
  • 2341 Views
  • 3 replies
  • 1 kudos

Incorrect secret value when loaded as environment variable

I recently faced an issue that took good hours to identify. I'm loading an environment variable with a secretENVVAR: {{secrets/scope/key}}The secret is loaded in my application, I could verify it's there, but its value is not correct. I realised tha...

  • 2341 Views
  • 3 replies
  • 1 kudos
Latest Reply
danmlopsmaz
New Contributor II
  • 1 kudos

Hi team, is there an update or fix for this?

  • 1 kudos
2 More Replies
SamGreene
by Contributor
  • 1142 Views
  • 4 replies
  • 0 kudos

Resolved! Using parameters in a SQL Notebook and COPY INTO statement

Hi, My scenario is I have an export of a table being dropped in ADLS every day.  I would like to load this data into a UC table and then repeat the process every day, replacing the data.  This seems to rule out DLT as it is meant for incremental proc...

  • 1142 Views
  • 4 replies
  • 0 kudos
Latest Reply
SamGreene
Contributor
  • 0 kudos

The solution that worked what adding this python cell to the notebook: %pythonfrom pyspark.dbutils import DBUtilsdbutils = DBUtils(spark)dbutils.widgets.text("catalog", "my_business_app")dbutils.widgets.text("schema", "dev") Then in the SQL Cell: CRE...

  • 0 kudos
3 More Replies
JUPin
by New Contributor II
  • 650 Views
  • 3 replies
  • 0 kudos

REST API for Pipeline Events does not return all records

I'm using the REST API to retrieve Pipeline Events per the documentation:https://docs.databricks.com/api/workspace/pipelines/listpipelineeventsI am able to retrieve some records but the API stops after a call or two.  I verified the number of rows us...

  • 650 Views
  • 3 replies
  • 0 kudos
Latest Reply
JUPin
New Contributor II
  • 0 kudos

I've attached some screenshots of the API call.  It shows "59" records (Event Log API1.png) retrieved and a populated "next_page_token" however, when I pull the next set of data using the "next_page_token", the result set is empty(Event Log API2.png)...

  • 0 kudos
2 More Replies
galzamo
by New Contributor
  • 217 Views
  • 1 replies
  • 0 kudos

Job running time too long

Hi all,I'm doing my first data jobs.I create one job that consists of 4 other jobs.Yesterday I ran the 4 jobs separately and it worked fine (about half hour)-today I ran the big job, and the 4 jobs is running for 2 hours (and still running), Why is t...

  • 217 Views
  • 1 replies
  • 0 kudos
Latest Reply
anardinelli
New Contributor III
  • 0 kudos

Hello @galzamo how are you? You can check on the SparkUI for long running stages that might give you a clue where it's spending the most time on each task. Somethings can be the reason: 1. Increase of data and partitions on your source data 2. Cluste...

  • 0 kudos
EDDatabricks
by Contributor
  • 787 Views
  • 2 replies
  • 0 kudos

Expected size of managed Storage Accounts

Dear all,we are monitoring the size of managed storage accounts associated with our deployed Azure databricks instances.We have 5 databricks instances for specific components of our platform replicated in 4 environments (DEV, TEST, PREPROD, PROD).Dur...

Data Engineering
Filesize
LOGS
Managed Storage Account
  • 787 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @EDDatabricks, Let’s address your questions regarding Azure-managed storage accounts: What do these Storage Accounts contain? An Azure storage account contains various data objects, including: Blobs: Used for storing unstructured data like ima...

  • 0 kudos
1 More Replies
Kayla
by Contributor III
  • 633 Views
  • 3 replies
  • 6 kudos

Resolved! SQL Warehouse Timeout / Prevent Long Running Queries

We have an external service connecting to a SQL Warehouse, running a query that normally lasts 30 minutes.On occasion an error occurs and it will run for 6 hours.This happens overnight and is contributing to a larger bill. Is there any way to force l...

  • 633 Views
  • 3 replies
  • 6 kudos
Latest Reply
Kayla
Contributor III
  • 6 kudos

@lucasrocha @raphaelblg That is exactly what I was hoping to find. Thank you!

  • 6 kudos
2 More Replies
Rik
by New Contributor III
  • 3550 Views
  • 5 replies
  • 7 kudos

Resolved! File information is not passed to trigger job on file arrival

We are using the UC mechanism for triggering jobs on file arrival, as described here: https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/file-arrival-triggers.Unfortunately, the trigger doesn't actually pass the file-path that is gener...

Data Engineering
file arrival
trigger file
Unity Catalog
  • 3550 Views
  • 5 replies
  • 7 kudos
Latest Reply
marcuskw
Contributor
  • 7 kudos

Also something I'm interested in using, would be really helpful to use File Trigger and get relevant information about exactly what file triggered the workflow!

  • 7 kudos
4 More Replies
AlokThampi
by New Contributor
  • 193 Views
  • 0 replies
  • 0 kudos

Issues while writing into bad_records path

Hello All,I would like to get your inputs with a scenario that I see while writing into the bad_records file.I am reading a ‘Ԓ’ delimited CSV file based on a schema that I have already defined. I have enabled error handling while reading the file to ...

Alok1_0-1717548996735.png Alok1_1-1717549044696.png
  • 193 Views
  • 0 replies
  • 0 kudos
LasseL
by New Contributor
  • 260 Views
  • 1 replies
  • 0 kudos

How to use change data feed when schema is changing between delta table versions?

How to use change data feed when delta table schema changes between delta table versions?I tried to read change data feed in parts (in code snippet I read version 1372, because 1371 and 1373 schema versions are different), but getting errorUnsupporte...

  • 260 Views
  • 1 replies
  • 0 kudos
Latest Reply
raphaelblg
Honored Contributor II
  • 0 kudos

Hi @LasseL, Please check: What is the schema for the change data feed? . It might help you  

  • 0 kudos
MaximeGendre
by New Contributor II
  • 557 Views
  • 0 replies
  • 0 kudos

Problem using from_avro function

Hello everyone,I need your help with a topic that has been preoccupying me for a few days."from_avro" function gives me a strange result when I pass it the json schema of a Kafka topic.=================================================================...

MaximeGendre_2-1717533967736.png MaximeGendre_0-1717533089570.png MaximeGendre_1-1717533556219.png
  • 557 Views
  • 0 replies
  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors