cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

SrinuM
by New Contributor III
  • 3934 Views
  • 1 replies
  • 0 kudos

Workspace Client dbutils issue

 host = "https://adb-xxxxxx.xx.azuredatabricks.net"token = "dapxxxxxxx"we are using databricksconnect from databricks.sdk import WorkspaceClientdbutil = WorkspaceClient(host=host,token=token).dbutilsfiles = dbutil.fs.ls("abfss://container-name@storag...

  • 3934 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The error where files and directories can be read at the root ADLS level but not at the blob/subdirectory level, combined with a "No file or directory exists on path" message, is frequently due to permission configuration, incorrect path usage, or ne...

  • 0 kudos
Anshul_DBX
by New Contributor
  • 3844 Views
  • 1 replies
  • 0 kudos

Executing Stored Procedures/update in Federated SQL Server

I have federated  Azure SQL DB in my DBX workspace, but I am not able to run update commands or execute a stored procedure, is this still not supported? 

  • 3844 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Federated connections from Azure Databricks to Azure SQL DB via Lakehouse Federation currently only support read-only queries—meaning running update commands or executing stored procedures directly through the federated Unity Catalog interface is not...

  • 0 kudos
pooja_bhumandla
by New Contributor III
  • 183 Views
  • 2 replies
  • 0 kudos

Best Practice for Updating Data Skipping Statistics for Additional Columns

Hi Community,I have a scenario where I’ve already calculated delta statistics for the first 32 columns after enabling the dataskipping property. Now, I need to include 10 more frequently used columns that were not part of the original 32.Goal:I want ...

  • 183 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @pooja_bhumandla ,Updating any of two below options does not automatically recompute statistics for existing data. Rather, it impacts the behavior of future statistics collection when adding or updating data in the table.- delta.dataSkippingNumInd...

  • 0 kudos
1 More Replies
vamsi_simbus
by New Contributor III
  • 133 Views
  • 1 replies
  • 0 kudos

System tables for DLT Expectations Quality Metrics

Hi Everyone,I’m working with Delta Live Tables (DLT) and using Expectations to track data quality, but I’m having trouble finding where the expectation quality metrics are stored in the DLT system tables.My questions are:Which specific system table(s...

  • 133 Views
  • 1 replies
  • 0 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 0 kudos

@vamsi_simbus  DLT captures data quality metrics in specialized system tables known as “event” and “metrics” tables. Specifically, look in the following table:LIVE.DLT_EVENT_LOG or LIVE.DLT_METRICS: These tables contain granular event logs and metric...

  • 0 kudos
Suheb
by New Contributor III
  • 165 Views
  • 1 replies
  • 0 kudos

Resolved! What are best practices for designing a large-scale data engineering pipeline on Databricks for real

How do you design a scalable, reliable pipeline that handles both fast/continuous data and slower bulk data in the same system?

  • 165 Views
  • 1 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor III
  • 0 kudos

Very generic question Here are general rules and best practices related to Databricks well-architected framework: https://docs.databricks.com/aws/en/lakehouse-architecture/well-architected Take a deeper look on operational excellence, reliability an...

  • 0 kudos
Askenm
by New Contributor
  • 1006 Views
  • 4 replies
  • 1 kudos

Docker tab missing in create compute

I am running databricks premium and looking to create a compute running conda. It seems that the best way to do this is to boot the compute from a docker image. However, in the ```create_compute > advanced``` I cannot see the the docker option nor ca...

Data Engineering
conda
Docker
  • 1006 Views
  • 4 replies
  • 1 kudos
Latest Reply
Wull
New Contributor II
  • 1 kudos

@NandiniN @Advika I've followed the documentation and enabled DCS by using the Databricks CLI and runningdatabricks workspace-conf set-status \ --json '{"enableDcs": "true"}'I even checked by running get-status.However, one month later, and the Docke...

  • 1 kudos
3 More Replies
aravind-ey
by New Contributor II
  • 21454 Views
  • 23 replies
  • 5 kudos

vocareum lab access

Hi I am doing a data engineering course in databricks(Partner labs) and would like to have access to vocareum workspace to practice using the demo sessions.can you please help me to get the access to this workspace?regards,Aravind

  • 21454 Views
  • 23 replies
  • 5 kudos
Latest Reply
Eicke
New Contributor II
  • 5 kudos

You can log into databricks, search for "Canada Sales" in the Marketplace and find "Simulated Canada Sales and Opportunities Data". Get free instant access, wait a few seconds for the warehouse to be built for you et voila: the tables for building th...

  • 5 kudos
22 More Replies
sunnyday
by New Contributor
  • 5455 Views
  • 1 replies
  • 0 kudos

Naming jobs in the Spark UI in Databricks Runtime 15.4

I am asking almost the same question as: https://community.databricks.com/t5/data-engineering/how-to-improve-spark-ui-job-description-for-pyspark/td-p/48959 .  I would like to know how to improve the readability of the Spark UI by naming jobs.   I am...

  • 5455 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

You are correct—on Databricks Runtime 15.4 and with shared clusters (or clusters enabled with Unity Catalog), you will see the [JVM_ATTRIBUTE_NOT_SUPPORTED] error when trying to directly access sparkContext attributes that are only available in singl...

  • 0 kudos
Vishnu_9959
by New Contributor
  • 3724 Views
  • 1 replies
  • 0 kudos

Can we develop a connector that integrates nintex and Databricks community version

Can we develop a connector that integrates nintex and Databricks with community version

  • 3724 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

It is technically possible to develop a connector that integrates Nintex with Databricks, but there are important limitations when trying to achieve this with Databricks Community Edition.​ Connector Integration Overview Nintex can be integrated with...

  • 0 kudos
rpilli
by New Contributor
  • 4298 Views
  • 1 replies
  • 0 kudos

Conditional Execution in DLT Pipeline based on the output

Hello ,I'm working on a Delta Live Tables (DLT) pipeline where I need to implement a conditional step that only triggers under specific conditions. Here's the challenge I'm facing:I have a function that checks if the data meets certain thresholds. If...

  • 4298 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

In Delta Live Tables (DLT), native conditional or branch-based control flow is limited; all table/stream definitions declared in your pipeline will execute, and dependencies are handled via @Dlt.table or @Dlt.view decorators. You can’t dynamically sk...

  • 0 kudos
OmarCardoso
by New Contributor
  • 4779 Views
  • 1 replies
  • 0 kudos

Efficient Parallel JSON Extraction from Elasticsearch in Azure Databricks

Hi Databricks community,I'm facing a challenge extracting JSON data from Elasticsearch in Azure Databricks efficiently, maintaining header information.Previously, I had to use RDDs for parallel extraction, but they're no longer supported in Databrick...

  • 4779 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

To efficiently extract JSON data from Elasticsearch in Azure Databricks—while maintaining header information and without reverting to legacy RDD-based parallelization—a few modern Spark-based strategies can be used. Spark DataFrames and Spark SQL sup...

  • 0 kudos
Puspak
by New Contributor II
  • 594 Views
  • 2 replies
  • 0 kudos

DLT behaving differently when used with python syntax vs when used with sql syntax to read CDF

I was trying t read CDF data of a table as a DLT materialized view.It works fine with sql syntax reading all the columns of the source table along with the 3 CDF columns : _change_type,_commit_timestamp,_commit_version:@dlt.table()def change_table():...

  • 594 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

When accessing Change Data Feed (CDF) data in Delta Live Tables (DLT), the behavior between SQL and Python APIs differs notably regarding CDF metadata columns—_change_type, _commit_timestamp, and _commit_version. SQL Approach (using table_changes):T...

  • 0 kudos
1 More Replies
Datalight
by Contributor
  • 144 Views
  • 1 replies
  • 0 kudos

Design Oracle Fusion SCM to Azure Databricks

Hello Techie,I am planning to migrate All module of Oracle fusion scm data to Azure Databricks.Do we have only option of BICC (Business Intelligence Cloud Connector), OR any other option avaialble.Can anyone please help me with reference architecture...

  • 144 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

BICC (Business Intelligence Cloud Connector) is not the only option for migrating Oracle Fusion SCM data to Azure Databricks, though it is the most common and recommended tool for high-volume, scheduled extracts according to Oracle’s official guidanc...

  • 0 kudos
Kayla
by Valued Contributor II
  • 187 Views
  • 1 replies
  • 1 kudos

JSON Medallion Best Practices

I'm looking at ingesting JSON files from an API, pulling a list of orders. Each JSON file has header information and then a nested array of items - I want to flatten this into a table with 1 row/item and the header repeated for every item.What is the...

  • 187 Views
  • 1 replies
  • 1 kudos
Latest Reply
Coffee77
Contributor III
  • 1 kudos

I would need to know a little more about your scenario but it makes me remember a similar case I faced. My approach was to use silver layer to create a delta table with enforced schema, standard field names and types, etc. to perform typical actions ...

  • 1 kudos
leenack
by New Contributor II
  • 612 Views
  • 8 replies
  • 5 kudos

Resolved! No rows returned when calling Databricks procedure via .NET API and Simba ODBC driver

I created a simple Databricks procedure that should return a single value."SELECT 1 AS result;"When I call this procedure from my .NET API using ExecuteReader, ExecuteAdapter, or ExecuteScalar, the call completes without any errors, but no rows are r...

  • 612 Views
  • 8 replies
  • 5 kudos
Latest Reply
leenack
New Contributor II
  • 5 kudos

Thank you  @mark_ott  and @Coffee77  for your help .This has saved me a great deal of time. I now understand that I need to use procedures, functions, or direct SQL queries as a workaround to retrieve data in the .NET API. I will also keep an eye out...

  • 5 kudos
7 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels