cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

smoortema
by Contributor
  • 336 Views
  • 3 replies
  • 4 kudos

Resolved! how to know which join type was used (broadcast, shuffle hash or sort merge join) for a query?

What is the best way to know what kind of join was used for a SQL query between broadcast, shuffle hash and sort merge? How can the spark UI or the query plan be interpreted?

  • 336 Views
  • 3 replies
  • 4 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 4 kudos

@smoortema , Spark performance tuning is one of the hardest topics to teach or learn, and it’s even tougher to do justice to in a forum thread. That said, I’m really glad to see you asking the question. Tuning is challenging precisely because there a...

  • 4 kudos
2 More Replies
mits1
by New Contributor II
  • 191 Views
  • 2 replies
  • 1 kudos

Resolved! Unable to navigate/login to Databricks Account Console

Hi,I have deployed Azure Databricks using email id (say xx@gmail.com) and able to lauch a workspace.When I  try to access account console, it throws below errorSelected user account does not exist in tenant 'Microsoft Services' and cannot access the ...

  • 191 Views
  • 2 replies
  • 1 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 1 kudos

And old link but still relevant - https://github.com/cloudboxacademy/azure_databricks_course/blob/main/known-issues/unable-to-login-to-azure-databricks-account-console.md

  • 1 kudos
1 More Replies
mh2587
by New Contributor II
  • 3981 Views
  • 1 replies
  • 1 kudos

Managing PCI-DSS Compliance and Access to Serverless Features in Azure Databricks

Hello Databricks CommunityI am currently using Azure Databricks with PCI-DSS compliance enabled in our workspace, as maintaining stringent security standards is crucial for our organization. However, I've discovered that once PCI-DSS compliance is tu...

  • 3981 Views
  • 1 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

Once PCI-DSS compliance is enabled in Azure Databricks, the workspace is locked into a set of restrictions to maintain those standards and safeguard sensitive data. These restrictions include disabling access to features like serverless compute, whic...

  • 1 kudos
Zbyszek
by New Contributor II
  • 240 Views
  • 2 replies
  • 1 kudos

Resolved! Create a Hudi table with Databrick 17

Hi I'm trying to run my existing code which has worked on the older DB version.CREATE TABLE IF NOT EXISTS catalog.demo.ABTHudi USING org.apache.hudi.Spark3DefaultSource OPTIONS ('primaryKey' = 'ID','hoodie.table.name' = 'ABTHudi') AS SELECT * FROM pa...

  • 240 Views
  • 2 replies
  • 1 kudos
Latest Reply
Zbyszek
New Contributor II
  • 1 kudos

Thank You for your response, I will wait for more updates on that.RegardsZiggy 

  • 1 kudos
1 More Replies
Nes_Hdr
by New Contributor III
  • 5883 Views
  • 3 replies
  • 0 kudos

Path based access not supported for tables with row filters?

Hello,  I have encountered an issue recently and was not able to find a solution yet. I have a job on databricks that creates a table using dbt (dbt-databricks>=1.0.0,<2.0.0). I am setting the location_root configuration so that this table is externa...

Data Engineering
dbt
row_filter
  • 5883 Views
  • 3 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

This issue occurs because Databricks does not support applying row filters or column masks to external tables when path-based access is used. While you are able to set the row filter policy on your table with no immediate error, the limitation only b...

  • 0 kudos
2 More Replies
Harun
by Honored Contributor
  • 3978 Views
  • 1 replies
  • 0 kudos

Inquiry Regarding Serverless Compute Operations After Cloud Account Suspension

Hello Everyone,I am currently benchmarking the new serverless compute feature and have observed an unexpected behavior under specific circumstances. During my benchmarking process, I executed two notebooks: one utilizing serverless compute and the ot...

Harun_0-1723183858489.png Harun_1-1723183868878.png
  • 3978 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Serverless compute resources in Azure Databricks and Azure SQL can operate independently of your cloud subscription state because they are fully managed, abstracted services that run on infrastructure controlled by Azure rather than your own cloud ac...

  • 0 kudos
databricks8923
by New Contributor
  • 4091 Views
  • 1 replies
  • 0 kudos

DLT Pipeline, Autoloader, Streaming Query Exception: Could not find ADLS Gen2 Token

I have set up autoloader to form a streaming table in my DLT pipeline,  import dlt@dlt.tabledef streamFiles_new():        return (            spark.readStream.format("cloudFiles")                .option("cloudFiles.format", "json")                .op...

  • 4091 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Your error suggests that while your DLT pipeline works for materialized views (batch reads), switching to a streaming table using Autoloader (readStream) is triggering an ADLS Gen2 authentication failure, specifically "Could not find ADLS Gen2 Token"...

  • 0 kudos
Julien_Kronegg
by New Contributor
  • 4142 Views
  • 1 replies
  • 0 kudos

Cannot use Delta Table columns containing struct with date fields in Power BI

Hi everyone,I have a Delta Table in Databricks with a column of struct type (containing a field of type date) and a column of type date:create table date_struct (s struct<d:date>, d date, s_json string); insert into date_struct (s, d, s_json) values ...

  • 4142 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

To access the s.d field from your Delta table in Power BI, you need the content of the s column to be correctly formatted as JSON so that Power BI's Json.Document() function can parse it. Your issue arises because the default string representation of...

  • 0 kudos
thewfhengineer
by New Contributor III
  • 180 Views
  • 1 replies
  • 1 kudos

Resolved! AWS SageMaker to the Azure Databricks.

I'm starting a project to migrate our Compliance model - Python code (Pandas-based) from AWS SageMaker to the Azure ecosystem.Source: AWS (SageMaker, Airflow)Target: Azure (Databricks, ADLS)I'm evaluating the high-level approach and would appreciate ...

  • 180 Views
  • 1 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

For migrating a Python/Pandas-based compliance model from AWS SageMaker/Airflow to Azure Databricks/ADLS, the best approach depends on priorities like speed, risk, cost, and future scalability. Both "Lift & Shift" and "Refactor & Modernize" have clea...

  • 1 kudos
anhnnguyen
by New Contributor III
  • 466 Views
  • 5 replies
  • 7 kudos

Resolved! Adding maven dependency to ETL pipeline

Hello guys,I'm building ETL pipeline and need to access HANA data lake file system. In order to do that I need to have sap-hdlfs library in compute environment, library is available in maven repository.My job will have multiple notebook task and ETL ...

anhnnguyen_0-1763437214864.png
  • 466 Views
  • 5 replies
  • 7 kudos
Latest Reply
XP
Databricks Employee
  • 7 kudos

Hey @anhnnguyen, you can add libraries a few ways when building a notebook-based ETL pipeline: The best practice, scalable approach to add libraries across multiple workloads or clusters is to use Policy-scoped libraries. Any compute that uses the cl...

  • 7 kudos
4 More Replies
dbernstein_tp
by New Contributor III
  • 178 Views
  • 1 replies
  • 1 kudos

Resolved! Failed to edit ingestion pipeline PostgreSQL slot name cannot be empty or null

I'm trying to add tables to an existing SQL server CDC ingestion pipeline and today am getting this mysterious error message. Failed to edit ingestion pipelinePostgreSQL slot name cannot be empty or null Have not encountered this before. Is this simp...

  • 178 Views
  • 1 replies
  • 1 kudos
Latest Reply
dbernstein_tp
New Contributor III
  • 1 kudos

After I posted this I noticed that the gateway compute for this pipeline was repeatedly failing and retrying. This was resolved by increasing our quota of "Standard FS Family" compute on Azure. And when that was resolved the above error also disappea...

  • 1 kudos
DataRabbit
by New Contributor II
  • 22527 Views
  • 5 replies
  • 0 kudos

Resolved! py4j.security.Py4JSecurityException: Constructor public org.apache.spark.ml.feature.VectorAssembler(java.lang.String) is not whitelisted.

Hello, i have a problem.When I try to run the MLlib Assembler (from pyspark.ml.feature import VectorAssembler) I get this error and I don't know what to do anymore. Please help.

  • 22527 Views
  • 5 replies
  • 0 kudos
Latest Reply
VenuG
New Contributor III
  • 0 kudos

Do you plan to support this in Serverless Free Edition? Migration from Community Edition to Serveless has been fraught with these limitations.

  • 0 kudos
4 More Replies
Pratikmsbsvm
by Contributor
  • 336 Views
  • 2 replies
  • 1 kudos

How to Design a Data Quality Framework for Medallion Architecture Data Pipeline

Hello,I am building a Data Pipeline which extract data from Oracle Fusion and Push it to Databricks Delta lake.I am using Bronze, Silver and Gold Approach.May someone please help me how to control all three segment that is Bronze, Silver and Gold wit...

  • 336 Views
  • 2 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 1 kudos

Here’s how you can implement DQ at each stage:Bronze LayerChecks:File format validation (CSV, JSON, etc.).Schema validation (column names, types).Row count vs. source system.Tools:Use Databricks Autoloader with schema evolution and badRecordsPathImpl...

  • 1 kudos
1 More Replies
Shalabh007
by Honored Contributor
  • 9060 Views
  • 6 replies
  • 19 kudos

Practice Exams for Databricks Certified Data Engineer Professional exam

Can anyone help with official Practice Exams set for Databricks Certified Data Engineer Professional exam, like we have below for Databricks Certified Data Engineer AssociatePractice exam for the Databricks Certified Data Engineer Associate exam

  • 9060 Views
  • 6 replies
  • 19 kudos
Latest Reply
JOHNBOSCOW23
New Contributor II
  • 19 kudos

I Passed my Exam today thanks

  • 19 kudos
5 More Replies
Andolina1
by New Contributor III
  • 3240 Views
  • 6 replies
  • 1 kudos

How to trigger an Azure Data Factory pipeline through API using parameters

Hello All,I have a use case where I want to trigger an Azure Data Factory pipeline through API. Right now I am calling the API in Databricks and using Service Principal(token based) to connect to ADF from Databricks.The ADF pipeline has some paramete...

  • 3240 Views
  • 6 replies
  • 1 kudos
Latest Reply
rfranco
New Contributor II
  • 1 kudos

Hello @Andolina1,try to send your payload like:body = {'curr_working_user': f'{parameters}'}response = requests.post(url, headers=headers, json=body)the pipeline's parameter should be named curr_working_user. With these changes your setup should work...

  • 1 kudos
5 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels