cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

taruntarun1345
by New Contributor
  • 2432 Views
  • 1 replies
  • 0 kudos

cluster creation

Hey all, I am facing an issue in creating a cluster. I can only see the SQL warehouse and its server creation. But I need to create a cluster to work on a data engineering project.

  • 2432 Views
  • 1 replies
  • 0 kudos
Latest Reply
jameshughes
Contributor II
  • 0 kudos

A couple of things to explore here as it can be solved a couple of different ways.1. A workspace admin needs to update your Entitlements to allow for cluster creation.  This is generally not a best practice as it can lead to unmanaged cluster sprawl....

  • 0 kudos
Monteiro_12
by New Contributor II
  • 423 Views
  • 1 replies
  • 0 kudos

How to Add a Certified Tag to a Table Using a DLT Pipeline

Is there a table property or configuration that allows me to add a certified tag directly to a table when using a Delta Live Tables pipeline?

  • 423 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor
  • 0 kudos

Hi @Monteiro_12 ,As far as I know, DLT pipeline doesn’t support adding a certified tag directly through table properties or pipeline configurations. Tags like system.Certified needs to be applied manually after the table is created via SQL

  • 0 kudos
samgon
by New Contributor III
  • 1507 Views
  • 4 replies
  • 4 kudos

Resolved! study materials for Certified Data Engineer Professional Certification?

Can anyone recommend high-quality study materials or resources (courses, documentation, practice exams, etc.) that helped you prepare for the Professional-level exam?

Data Engineering
dataengineering
  • 1507 Views
  • 4 replies
  • 4 kudos
Latest Reply
samgon
New Contributor III
  • 4 kudos

Thanks alot for the suggestion, much appreciated , I already pass the associate exam!

  • 4 kudos
3 More Replies
HariPrasad1
by New Contributor II
  • 469 Views
  • 2 replies
  • 0 kudos

Unable to create log files using logging.basicConfig()

When I run this code below, I am not able to see the file under the path specified:import logginglogger = logging.getLogger(__name__)logging.basicConfig(filename='/Volumes/d_use1_ach_dbw_databricks1/default/ach_elegibility_raw/logs/example.log', enco...

  • 469 Views
  • 2 replies
  • 0 kudos
Latest Reply
Yogesh_Verma_
Contributor
  • 0 kudos

The issue is happening because you're calling logging.getLogger(__name__) before setting up logging.basicConfig(). When the logger is created too early, it doesn't know about the file handler, so it doesn't write to the file.To fix this, make sure yo...

  • 0 kudos
1 More Replies
Datamate
by New Contributor
  • 462 Views
  • 2 replies
  • 0 kudos

Databricks Connecting to ADLS Gen2 vs Azure SQL

What is the best approach to connect Databricks with Azure SQL or connect Databricks with ADLS Gen2.I am designing the system where I am planning to Integrate Databricks to Azure.May someone share experience Pros and cons of approach and best practic...

  • 462 Views
  • 2 replies
  • 0 kudos
Latest Reply
kavithai
New Contributor II
  • 0 kudos

Use Azure SQL Spark Connector. This method allows Databricks to read from and write to Azure SQL Database efficiently, supporting both bulk operations and secure authentication.Azure sql : Install connector, configure JDBC, use Key Vault, set permiss...

  • 0 kudos
1 More Replies
saicharandeepb
by New Contributor III
  • 429 Views
  • 1 replies
  • 1 kudos

Resolved! I'm trying to understand if predicate pushdown is supported when using the DESCRIBE HISTORY command

I'm trying to understand if predicate pushdown is supported when using the DESCRIBE HISTORY command on a Delta table in Databricks.

  • 429 Views
  • 1 replies
  • 1 kudos
Latest Reply
Yogesh_Verma_
Contributor
  • 1 kudos

`DESCRIBE HISTORY` on a Delta table in Databricks does **not support predicate pushdown** in the same way as regular SQL queries on data tables.This is because `DESCRIBE HISTORY` is a **metadata operation** that reads the Delta log files to return ta...

  • 1 kudos
mai_luca
by New Contributor III
  • 854 Views
  • 5 replies
  • 2 kudos

Resolved! Validation with views - Dlt pipeline expectations

I have a question about how expectations work when applied to views inside a Delta Live Tables (DLT) pipeline. For instance, suppose we define this view inside a pipeline to stop the pipeline if we spot some duplicates:@Dlt.view( name=view_name, ...

  • 854 Views
  • 5 replies
  • 2 kudos
Latest Reply
Yogesh_Verma_
Contributor
  • 2 kudos

In DLT, expectations defined with dlt.expect_or_fail() on views are only evaluated if the view is used downstream by a materialized table. Since views are logical and lazily evaluated, if no table depends on the view, the expectation is skipped and t...

  • 2 kudos
4 More Replies
bgerhardi
by New Contributor III
  • 14447 Views
  • 13 replies
  • 13 kudos

Surrogate Keys with Delta Live

We are considering moving to Delta Live tables from a traditional sql-based data warehouse. Worrying me is this FAQ on identity columns Delta Live Tables frequently asked questions | Databricks on AWS this seems to suggest that we basically can't cre...

  • 14447 Views
  • 13 replies
  • 13 kudos
Latest Reply
tmaund1704
New Contributor II
  • 13 kudos

Hi , Is there any resolution for the above?Thanks

  • 13 kudos
12 More Replies
sahil_s_jain
by New Contributor III
  • 964 Views
  • 3 replies
  • 0 kudos

How to Exclude or Overwrite Specific JARs in Databricks Jars

Spark Version in Databricks 15.5 LTS: The runtime includes Apache Spark 3.5.x, which defines the SparkListenerApplicationEnd constructor as:public SparkListenerApplicationEnd(long time)This constructor takes a single long parameter.Conflicting Spark ...

  • 964 Views
  • 3 replies
  • 0 kudos
Latest Reply
baljeetyadav_23
New Contributor II
  • 0 kudos

Hi Alberto_Umana,Do we have fix of this issue in 16.4 LTS?

  • 0 kudos
2 More Replies
pooja_bhumandla
by New Contributor II
  • 1229 Views
  • 3 replies
  • 0 kudos

data file size

"numRemovedFiles": "2099","numRemovedBytes": "29658974681","p25FileSize": "29701688","numDeletionVectorsRemoved": "0","minFileSize": "19920357","numAddedFiles": "883","maxFileSize": "43475356","p75FileSize": "34394580","p50FileSize": "31978037","numA...

  • 1229 Views
  • 3 replies
  • 0 kudos
Latest Reply
pooja_bhumandla
New Contributor II
  • 0 kudos

What are the criterias based on which max and min files sizes vary from target file size? 

  • 0 kudos
2 More Replies
Alex79
by New Contributor II
  • 905 Views
  • 2 replies
  • 0 kudos

Get Job Run output through Rest API call

I have a simple notebook reading a dataframe as input and returning another dataframe, which is as follows:from pyspark.sql import SparkSessionimport pandas as pd, jsonspark = SparkSession.builder \    .appName("Pandas to Spark DataFrame Conversion")...

  • 905 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 0 kudos

Hi team,{"error_code": "INVALID_PARAMETER_VALUE","message": "Retrieving the output of runs with multiple tasks is not supported..."}means the job you're triggering (job_id = 'my_job_id') is a multi-task job (even if it has only one task). In such cas...

  • 0 kudos
1 More Replies
cool_cool_cool
by New Contributor II
  • 2076 Views
  • 3 replies
  • 0 kudos

Databricks Workflow is stuck on the first task and doesnt do anyworkload

Heya I have a workflow in databricks with 2 tasks. They are configured to run on the same job cluster, and the second task depends on the first.I have a weird behavior that happened twice now - the job takes a long time (it usually finishes within 30...

  • 2076 Views
  • 3 replies
  • 0 kudos
Latest Reply
Sri_M
New Contributor II
  • 0 kudos

@cool_cool_cool I am facing same issue as well.Is this issue resolved for you? If yes, can you please let me know what action have you taken?

  • 0 kudos
2 More Replies
lorenz
by New Contributor III
  • 12206 Views
  • 8 replies
  • 3 kudos

Resolved! Databricks approaches to CDC

I'm interested in learning more about Change Data Capture (CDC) approaches with Databricks. Can anyone provide insights on the best practices and recommendations for utilizing CDC effectively in Databricks? Are there any specific connectors or tools ...

  • 12206 Views
  • 8 replies
  • 3 kudos
Latest Reply
Deekay
New Contributor II
  • 3 kudos

Hi @jcozar ,Thank you so much for your response  I have some queries, it will be really helpful if you can share your thoughts.How are you segregating the tables from raw to bronze? Suppose Debezium is capturing CDCs from 100 tables, all changes are ...

  • 3 kudos
7 More Replies
lezwon
by Contributor
  • 825 Views
  • 2 replies
  • 3 kudos

Resolved! Install custom wheel from dbfs in serverless enviroment

Hey folks,I have a job that runs on a serverless compute. I have also created a wheel file with custom functions, which I require in this job. I see that from here, we cannot install libraries for a task and must use notebook-scoped libraries. So wha...

  • 825 Views
  • 2 replies
  • 3 kudos
Latest Reply
loui_wentzel
Contributor
  • 3 kudos

Is your dbfs mounted?Otherwise, try uploading it to your workspace's "shared" folder - this is a common place to put these sorts of files. dbfs is slowly getting phased out and not really in any best practices.

  • 3 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels