cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Trifa
by New Contributor II
  • 1525 Views
  • 3 replies
  • 1 kudos

Resolved! Override DLT Fille Refresh using a Job parameter

HelloI have a Job with a DLT pipeline as a first task. From time to time, I want to execute this Job with a Full Refresh of the DLT pipeline. How could I override my default "full_refresh = false" ?This was possible before using the Legacy parameters...

Trifa_0-1701170537015.png
  • 1525 Views
  • 3 replies
  • 1 kudos
Latest Reply
adriennn
Valued Contributor
  • 1 kudos

@Trifa luckily, it's simple to implement. You can be the guys are going to release Pipeline Parameters® a week after you have deployed your solution though 

  • 1 kudos
2 More Replies
Kumarn031425
by New Contributor
  • 1598 Views
  • 1 replies
  • 0 kudos

Automating Migration of Delta Live Tables Pipelines Across Environments Using Azure DevOps CI/CD

I am seeking guidance on automating the migration of Delta Live Tables (DLT) pipelines across various environments—specifically from development to testing, and ultimately to production—utilizing Azure DevOps for Continuous Integration and Continuous...

  • 1598 Views
  • 1 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hi there @Kumarn031425 , I guess, This video tutorial will answer most of your questions : https://youtu.be/SZM49lGovTg?si=X7Cwp0Wfqlo1OnuSHere , deployment of workspace resources using databricks azure devops and databeicks asset bundles tutorial is...

  • 0 kudos
WYO
by New Contributor II
  • 1103 Views
  • 1 replies
  • 0 kudos
  • 1103 Views
  • 1 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hi there @WYO , I dont think we have a way to add multiple notebooks for a single dlt pipeline from dlt pipeline configuration settings.But there can be another way - you can create a single notebooks which has multiple code blocks which use the run ...

  • 0 kudos
Meghana89
by New Contributor II
  • 1337 Views
  • 2 replies
  • 0 kudos

Read Write Stream Data from Event Hub to Databricks Delta Lake

I am trying to read streaming data from EventHub which is in JSON Format, able to read data in a data frame but the body type was coming as binary I have converted it to string and decoded but while implementing the write stream I am facing an ongoin...

  • 1337 Views
  • 2 replies
  • 0 kudos
Latest Reply
Meghana89
New Contributor II
  • 0 kudos

@SantoshJoshi  Thanks for reply please find the code snippet belowfrom pyspark.sql import functions as Ffrom pyspark.sql.types import StringTypeimport base64# Define the Event Hubs connection stringconnectionString = endpoint (replace with endpoint f...

  • 0 kudos
1 More Replies
908314
by New Contributor II
  • 996 Views
  • 3 replies
  • 2 kudos

Cluster logs stopped getting written to S3

We have two Databricks Workspaces and since a couple of days ago, cluster logs are not getting persisted to S3, in both workspaces. Driver logs are available in Databricks UI only when the job is active. Haven't seen any errors in the job logs relate...

  • 996 Views
  • 3 replies
  • 2 kudos
Latest Reply
adriantaut
New Contributor II
  • 2 kudos

Hello, Facing same issue in both our Workspaces, our Cluster logs suddenly stopped being delivered to S3 on 12th of March. There were no changes on Cluster settings nor IAM side, also all IAM Permissions should be in place according to Databricks Off...

  • 2 kudos
2 More Replies
DylanStout
by Contributor
  • 2004 Views
  • 2 replies
  • 0 kudos

Resolved! DBFS folder access

When trying to open a folder in dbfs, mnt in my case, my whole team gets the following error message - Uncaught Error: No QueryClient set, use QueryClientProvider to set one. Reloading the page results in this error not showing up anymore, but the fo...

DylanStout_1-1741874948742.png
  • 2004 Views
  • 2 replies
  • 0 kudos
Latest Reply
DylanStout
Contributor
  • 0 kudos

Compute had to be assigned first before being able to open the folder, this was done automatically before.The error is however not clear at all that this has to be done and that this is causing the error.

  • 0 kudos
1 More Replies
sparklez
by New Contributor III
  • 1087 Views
  • 2 replies
  • 1 kudos

DAB fails to deploy from first try "TLS protocol version not supported"

 HelloI'm using the Databricks extension on VSCode, and when i attempt to deploy i often get this error"Error: Post "https://adb-xxxxxx.xx.azuredatabricks.net/api/2.0/workspace-files/import-file/Workspace%2FUsers%2FXXXX@XXX.com%2F.bundle%2Fxxxx%2Floc...

sparklez_0-1741344021912.png
  • 1087 Views
  • 2 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @sparklez, Is your network blocking TLS 1.2? Databricks supports TLS 1.2  and TLSv1.3 so wondering if any of these is blocked.

  • 1 kudos
1 More Replies
azam-io
by New Contributor II
  • 1838 Views
  • 2 replies
  • 0 kudos

Passing Different Job Parameters or Environment Variables Based on Targets in DABs

Hi, I am currently using DABs in my project and have encountered an issue. Specifically, I need to pass different job parameters or environment variables depending on the target and for specific jobs.Could you please provide guidance on how to approa...

  • 1838 Views
  • 2 replies
  • 0 kudos
Latest Reply
Nivethan_Venkat
Valued Contributor
  • 0 kudos

Hi @azam-io,Please find the best practices here for deploying jobs using Asset Bundles configuration: https://docs.databricks.com/aws/en/dev-tools/bundles/settingsIf you would like to pass different parameters for your same job in different environme...

  • 0 kudos
1 More Replies
mlivshutz
by New Contributor II
  • 2878 Views
  • 2 replies
  • 2 kudos

How to configure DAB bundles to run serverless

I am following the guidelines in https://docs.databricks.com/aws/en/dev-tools/bundles/jobs-tutorial to setup the job for serverless. It says to "omit the job_clusters configuration from the bundle configuration file." It sounds like the idea is to si...

  • 2878 Views
  • 2 replies
  • 2 kudos
Latest Reply
mlivshutz
New Contributor II
  • 2 kudos

Hi, @ashraf1395 , Thank you for looking at my question. My cli is 0.243, which is current as of today (3/17/25).The task definition within resources/dbx_backfill_emotion_job.yml:tasks: - task_key: dbx_backfill_base_fields_x_1 # job_...

  • 2 kudos
1 More Replies
noorbasha534
by Valued Contributor II
  • 1793 Views
  • 1 replies
  • 0 kudos

Databricks Jobs API - Throttling

Dear all,I am planning to execute a script that fetches databricks jobs status every 10 minutes. I have around 500 jobs in my workspace. The APIs I use are listed below - list runs, get all job runs.I was wondering if this could cause throttling as t...

  • 1793 Views
  • 1 replies
  • 0 kudos
Latest Reply
koji_kawamura
Databricks Employee
  • 0 kudos

Hi @noorbasha534  Different limitations are implemented at API endpoints. The "/jobs/runs/list" has a limitation of 30 requests/second. The number of concurrent task executions is limited up to 2000. These limits work separately, so the job list API ...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 12377 Views
  • 6 replies
  • 3 kudos
  • 12377 Views
  • 6 replies
  • 3 kudos
Latest Reply
VasuBajaj
New Contributor II
  • 3 kudos

A .CRC file (Cyclic Redundancy Check) is an internal checksum file used by Spark (and Hadoop) to ensure data integrity when reading and writing files.Data Integrity Check – .CRC files store checksums of actual data files. When reading a file, Spark/H...

  • 3 kudos
5 More Replies
BricksGuy
by New Contributor III
  • 1320 Views
  • 1 replies
  • 0 kudos

DLT Pipeline OOM issue

Hi ,I am getting performance issues in one of my pipeline which is taking 5hour to run even for no data where it was taking 1hour earlier. It seems as the volume of the source grows it keeps degrading the performance. I am having below setup.Source i...

  • 1320 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi BricksGuy,How are you doing today?, As per my understanding, It looks like your pipeline is slowing down because it's processing too many small parquet files—over 10 million—which is causing high metadata overhead and memory issues. Since Spark ha...

  • 0 kudos
lmorrissey
by New Contributor II
  • 880 Views
  • 3 replies
  • 0 kudos

Unable to connect to mongodb spark connector on a shared cluster

The connector works without issue if the cluster is made private; does anyone know why this is or have a workaround (besides spawning a bunch of private clusters)

  • 880 Views
  • 3 replies
  • 0 kudos
Latest Reply
dewman
New Contributor II
  • 0 kudos

Any news on this? I too am having issues where a dedicated cluster can read from MongoDB no problem but as soon as I try to run the notebook with a shared cluster, I get an ConflictType (ofclass com.mongodb.spark.sql.types.ConflictTypes) error

  • 0 kudos
2 More Replies
Soumik
by New Contributor II
  • 3197 Views
  • 2 replies
  • 1 kudos

#N/A value is coming as null/NaN while using pandas.read_excel

Hi All,I am trying to read an input_file.xlsx file using pandas.read_excel. I am using the below option import pandas as pddf = pd.read_excel(input_file, sheetname = sheetname, dtype = str, na_filter= False, keep_default_na = FalseNot sure but the va...

  • 3197 Views
  • 2 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi Soumik,How are you doing today? As per my understanding, It looks like Pandas is still treating #N/A as a missing value because Excel considers it a special type of NA. Even though you've set na_filter=False and keep_default_na=False, Pandas might...

  • 1 kudos
1 More Replies
dyusuf
by New Contributor II
  • 1764 Views
  • 2 replies
  • 0 kudos

Data Skewnesss

I am trying to visualize data skewness through a simple aggregation example by performing groupby operation on a dataframe, the data is skewed highly for one customer, but yet databricks is balancing it automatically when I check spark UI. Is there a...

  • 1764 Views
  • 2 replies
  • 0 kudos
Latest Reply
SantoshJoshi
New Contributor III
  • 0 kudos

Hi @dyusuf ,It could be because AQE (Adaptive Query Execution) is enabled....AQE, dynamically handles skew...Please refer below link for more details:https://docs.databricks.com/aws/en/optimizations/aqeCan you please disable AQE and check if this wor...

  • 0 kudos
1 More Replies
Labels