cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

NathanSundarara
by Valued Contributor
  • 2337 Views
  • 1 replies
  • 0 kudos

Lakehouse federation bringing data from SQL Server

Did any one tried to bring data using the newly announced Lakehouse federation and ingest using DELTA LIVE TABLES? I'm currently testing using Materialized Views. First loaded the full data and now loading last 3 days daily and recomputing using Mate...

Data Engineering
dlt
Lake house federation
  • 2337 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nam_Nguyen
Databricks Employee
  • 0 kudos

Hi @NathanSundarara , regarding your current approach, here are the potential solutions and considerations- Deduplication: Implement deduplication strategies within your DLT pipeline. For example clicksDedupDf = ( spark.readStream.table("LIVE.rawCl...

  • 0 kudos
jamson
by New Contributor
  • 18869 Views
  • 2 replies
  • 0 kudos

What are the best practices for optimizing Power BI reports and dashboards for performance in the PL

I’m studying for the PL-300 exam and would love some advice on how to optimize Power BI reports and dashboards for better performance. Specifically, I’m interested in:Techniques for improving report load times and responsiveness.Best practices for ma...

  • 18869 Views
  • 2 replies
  • 0 kudos
Latest Reply
emily2056
New Contributor II
  • 0 kudos

Here are the best practices for optimizing Power BI reports and dashboards for performance in the production lifecycle (PL):1. Optimize Data ModelsUse star schema design for efficient querying.Avoid unnecessary columns and reduce column cardinality b...

  • 0 kudos
1 More Replies
tanjil
by New Contributor III
  • 3925 Views
  • 4 replies
  • 2 kudos

print(flush = True) not working

Hello, I have the following minimum example working example using multiprocessing:from multiprocessing import Pool   files_list = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)]     def f(t): print('Hello from child process', flush = Tr...

  • 3925 Views
  • 4 replies
  • 2 kudos
Latest Reply
tanjil
New Contributor III
  • 2 kudos

No errors are generated. The code executes successfully, but there the print statement for "Hello from child process" does not work.

  • 2 kudos
3 More Replies
Myousief
by New Contributor II
  • 2595 Views
  • 7 replies
  • 1 kudos

Can't login with password, SSO Enabled OIDC, Secret Key Expired

I am currently unable to log in to Databricks Account ConsoleOpenID SSO is enabled for our workspace using Microsoft Entra ID, but the client secret has expired. As a result, SSO login is no longer functional. I attempted to log in using a password, ...

  • 2595 Views
  • 7 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Glad to hear you got unblocked.

  • 1 kudos
6 More Replies
sangwan
by New Contributor
  • 607 Views
  • 1 replies
  • 0 kudos

Remorph : Getting Error while running remorph-core-0.2.0-SNAPSHOT.jar after Maven build

We are encountering an issue while running the remorph-core-0.2.0-SNAPSHOT.jar file after successfully building it using Maven. The build completes without errors, but when we try to execute the generated .jar file, we get the following exception att...

  • 607 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Here are a few steps you can take to debug and resolve this issue: Check the Code for Either Usage: Look through your codebase for instances where Either is used. Ensure that you are handling both Left and Right cases properly. The error suggests th...

  • 0 kudos
JamesY
by New Contributor III
  • 1077 Views
  • 1 replies
  • 0 kudos

Databricks JDBC write to table with PK column, error, key not found.

Hello, I am trying to write data to table, it works find before, but after I recreated the table with one column as PK, there is an error.Unable to write into the A_Table table....key not found: id What is the correct way of doing this?PK column: â€ƒâ€ƒ[...

Data Engineering
Databricks
SqlMi
  • 1077 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Looks like the primary key column ID is not being found during the write operation. Kindly verify the schema.  Use a command like below to create the table with id as the primary key. CREATE TABLE A_Table ( ID BIGINT IDENTITY(1,1) PRIMARY KEY NOT NUL...

  • 0 kudos
xamry
by New Contributor
  • 6717 Views
  • 1 replies
  • 0 kudos

java.lang.ClassCastException in JDBC driver's logger

Hi, Our Java application is using latest version of Databricks JDBC driver (2.6.38). This application already uses Log4j 2.17.1 and SLF4J 2.0.13. When querying data from Databricks, there are java.lang.ClassCastException printing on the console. Data...

  • 6717 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Kindly use the latest jars from maven, and if that does not help shading common packages to avoid conflicts is something that should be our next resort.   

  • 0 kudos
meret
by New Contributor II
  • 1960 Views
  • 2 replies
  • 0 kudos

Trouble Accessing Trust Store for Oracle JDBC Connection on Shared Compute Cluster

HiI am trying to read data from an Oracle DB using the Oracle JDBC Driver:df = (spark.read.format("jdbc").option("url", "jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(PORT=xxx)(HOST=xxx))(CONNECT_DATA=(SID=xxx)))").option("dbTable", "schema...

  • 1960 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

The trust store file needs to be accessible from all nodes in the shared compute cluster. You can achieve this by storing the trust store file in a location that is accessible to all nodes, such as a mounted volume or a distributed file system. Here'...

  • 0 kudos
1 More Replies
RotemBar
by New Contributor II
  • 649 Views
  • 3 replies
  • 1 kudos

Incremental refresh - non serverless compute

Hey,I read the page about incremental refresh. Will you make it available on more than just serverless compute?If so, when?ThanksReference - https://docs.databricks.com/en/optimizations/incremental-refresh.html

  • 649 Views
  • 3 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Sure thing, will keep you posted in a DM.

  • 1 kudos
2 More Replies
filipniziol
by Esteemed Contributor
  • 1270 Views
  • 2 replies
  • 1 kudos

Resolved! Is dbutils.notebook.run() supported from a local Spark Connect environment (VS Code)?

Hi everyone,I’m experimenting with the Databricks VS Code extension, using Spark Connect to run code locally in my Python environment while connecting to a Databricks cluster. I’m trying to call one notebook from another via: notebook_params = { ...

filipniziol_0-1735481046190.png filipniziol_1-1735481141816.png
  • 1270 Views
  • 2 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

@filipniziol just curious to know if getting the context and setting it manually would help, have you tried this approach? Example: ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext() dbutils.notebook.setContext(ctx) Or from pyspa...

  • 1 kudos
1 More Replies
dynia
by New Contributor
  • 358 Views
  • 1 replies
  • 0 kudos

Rest API version 1

How long rest api in version 1 will be support ?

  • 358 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

There is no mentioned of support duration for Databricks REST API version 1. I can check internally. Do you have any API in specific?

  • 0 kudos
Phani1
by Valued Contributor II
  • 1150 Views
  • 3 replies
  • 1 kudos

Databricks+DBT best practices

Hi All,could you provide the best practices for building and optimizing DBT models in databricks.Regards,Phani

  • 1150 Views
  • 3 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

@Phani1 Please let us know if after going through @szymon_dybczak references, you still need some guidance on more specific aspects that we can help with.

  • 1 kudos
2 More Replies
jeremy98
by Honored Contributor
  • 1527 Views
  • 11 replies
  • 0 kudos

Resolved! how to read the CDF logs in DLT Pipeline?

Hi Community,How to read the CDF logs in materialized views created by DLT Pipeline?Thanks for you time,

  • 1527 Views
  • 11 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@jeremy98 correct, If permissions management is complex, consider using standard Delta tables with CDF enabled and orchestrate changes through Databricks Workflows. This approach simplifies collaboration and avoids issues with restricted internal sch...

  • 0 kudos
10 More Replies
Omri
by New Contributor
  • 2423 Views
  • 3 replies
  • 0 kudos

Optimizing a complex pyspark join

I have a complex join that I'm trying to optimize df1 has cols id,main_key,col1,col1_isnull,col2,col2_isnull...col30 df2 has cols id,main_key,col1,col2..col_30I'm trying to run this sql query on Pysparkselect df1.id, df2.id from df1 join df2 on df1.m...

  • 2423 Views
  • 3 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@Omri thanks for your question! To help optimize your complex join further, we need clarification on a few details:   Data Characteristics: Approximate size of df1 and df2 (in rows and/or size).Distribution of main_key in both dataframes—are the top...

  • 0 kudos
2 More Replies
jdata
by New Contributor II
  • 3426 Views
  • 5 replies
  • 1 kudos

Dashboard Usage

Hi there,My team is developing some SQL Dashboards. I would like to know how many people view that dashboard/or at least click to it and then queries triggered.I found out that there is one endpoint provided by Databricks: List Queries | Query Histor...

  • 3426 Views
  • 5 replies
  • 1 kudos
Latest Reply
jdata
New Contributor II
  • 1 kudos

When I click to the dashboard, there are 6 statements in my dashboard -> I receive 6 records in `system.access.audit`.But the event_time is different, I expect event_time should be the same across records. So with the differences in event time, how c...

  • 1 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels