cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Omri
by New Contributor
  • 2831 Views
  • 3 replies
  • 0 kudos

Optimizing a complex pyspark join

I have a complex join that I'm trying to optimize df1 has cols id,main_key,col1,col1_isnull,col2,col2_isnull...col30 df2 has cols id,main_key,col1,col2..col_30I'm trying to run this sql query on Pysparkselect df1.id, df2.id from df1 join df2 on df1.m...

  • 2831 Views
  • 3 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@Omri thanks for your question! To help optimize your complex join further, we need clarification on a few details:   Data Characteristics: Approximate size of df1 and df2 (in rows and/or size).Distribution of main_key in both dataframes—are the top...

  • 0 kudos
2 More Replies
jdata
by New Contributor II
  • 3580 Views
  • 5 replies
  • 1 kudos

Dashboard Usage

Hi there,My team is developing some SQL Dashboards. I would like to know how many people view that dashboard/or at least click to it and then queries triggered.I found out that there is one endpoint provided by Databricks: List Queries | Query Histor...

  • 3580 Views
  • 5 replies
  • 1 kudos
Latest Reply
jdata
New Contributor II
  • 1 kudos

When I click to the dashboard, there are 6 statements in my dashboard -> I receive 6 records in `system.access.audit`.But the event_time is different, I expect event_time should be the same across records. So with the differences in event time, how c...

  • 1 kudos
4 More Replies
srtiemann
by New Contributor II
  • 1252 Views
  • 7 replies
  • 0 kudos

Shouldn't the global statement_timeout parameter prevail over the session parameter?

How can I block the use of statement_timeout at the session level in Databricks? I want the global parameter to be enforced even if a SET statement_timeout has been executed in Databricks notebooks." 

  • 1252 Views
  • 7 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Yes, this global config only applies to Serverless SQL Warehouses, but wont apply if you use serverless compute for notebooks 

  • 0 kudos
6 More Replies
jorperort
by Contributor
  • 3296 Views
  • 3 replies
  • 5 kudos

Resolved! Help with Integration Testing for SQL Notebooks in Databricks

Hi everyone,I’m looking for the best way to implement integration tests for SQL notebooks in an environment that uses Unity Catalog and workflows to execute these notebooks.For unit tests on SQL functions, I’ve reviewed the https://docs.databricks.co...

  • 3296 Views
  • 3 replies
  • 5 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 5 kudos

Hi @jorperort ,I see the question is already answered, but your question motivated me to create an article in medium and also to create a sample repo with the integration test written for SQL notebook.I hope it will be of useful for you:https://filip...

  • 5 kudos
2 More Replies
EssamHisham
by New Contributor II
  • 1134 Views
  • 2 replies
  • 3 kudos

Lakehouse Fundamentals Course

I have an issue I encountered while attempting to access the Lakehouse Fundamentals badge quiz. Every time I try to access the quiz, I receive the following error message: "Access denied. You do not have permission to access this page. Please contact...

  • 1134 Views
  • 2 replies
  • 3 kudos
Latest Reply
Walter_C
Databricks Employee
  • 3 kudos

If this does not help you can reach out to training-ops@databricks.com

  • 3 kudos
1 More Replies
jorperort
by Contributor
  • 1335 Views
  • 2 replies
  • 2 kudos

Resolved! Wap pattern unity catalog

Good afternoon,I am looking for documentation to implement the WAP pattern using Unity Catalog, workflows, SQL notebooks, and any other services necessary to use this pattern. Could you share information on how to approach the problem with documentat...

  • 1335 Views
  • 2 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @jorperort ,Apart from nice step by step instruction that @VZLA has provided, you can also take a look at short presentation of WAP pattern at the official databricks YT channel:https://youtu.be/4K3zAmUgViE?t=492

  • 2 kudos
1 More Replies
alwaysmoredata
by New Contributor II
  • 1752 Views
  • 8 replies
  • 1 kudos

Is it possible to load data only using Databricks SDK?

Is it possible to load data only using Databricks SDK?I have custom library that has to load data to a table, and I know about other features like autoloader, COPY INTO, notebook with spark dataframe... but I wonder if it is possible to load data dir...

  • 1752 Views
  • 8 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Got it, the reason of the cluster is because with shared access cluster the access to local system is more restricted that with single user cluster due to security constraints. As you are using serverless it acts as a shared cluster, on this case you...

  • 1 kudos
7 More Replies
jeremy98
by Honored Contributor
  • 699 Views
  • 2 replies
  • 0 kudos

Submit new records from gold layer to postgres db

Hi community,I want to ask you which is the best practice in your opinion to fill the data from gold layer to postgres db that is used to provide "real-time" data to an UI interface?Thanks for any help!

  • 699 Views
  • 2 replies
  • 0 kudos
Latest Reply
jeremy98
Honored Contributor
  • 0 kudos

Hello @hari-prasad, Thanks you for your answer, considers that we are using DLT Pipelines, is it a good choice in this case? Because actually I don't see with DLT Pipelines the metadata for these materialized tables in this case these CDF statements

  • 0 kudos
1 More Replies
mkEngineer
by New Contributor III
  • 5032 Views
  • 1 replies
  • 0 kudos

Refresh options on PBI from Databricks workflow using Azure Databricks

Hi!I have a workflow that includes my medallion architecture and DLT. Currently, I have a separate notebook for refreshing my Power BI semantic model, which works based on the method described in Refresh a PowerBI dataset from Azure Databricks.  Howe...

  • 5032 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @mkEngineer, Have you reviewed this documentation: https://learn.microsoft.com/en-us/azure/databricks/partners/bi/power-bi Also I don't think Serverless compute for Notebook will work for your connection with Power BI. You might need to setup a Se...

  • 0 kudos
Anonymous
by Not applicable
  • 56856 Views
  • 7 replies
  • 13 kudos

How to connect and extract data from sharepoint using Databricks (AWS) ?

We are using Databricks (on AWS). We need to connect to SharePoint and extract & load data to Databricks Delta table. Any possible solution on this ?

  • 56856 Views
  • 7 replies
  • 13 kudos
Latest Reply
yliu
New Contributor III
  • 13 kudos

Wondering the same.. Can we use Sharepoint REST API to download the file and save to dbfs/external location and read it? 

  • 13 kudos
6 More Replies
Phani1
by Valued Contributor II
  • 679 Views
  • 1 replies
  • 1 kudos

Access the data from cross-cloud.

Hi All ,We have a use case  where we need to connect AWS Databricks to a GCP storage bucket to access the data. In Databricks We're trying to use external locations and storage credentials, but it seems like AWS Databricks only supports AWS storage b...

  • 679 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Phani1 ,You can use delta sharing. In that way you can create share that will allow you to access data stored in GCS and it's govern by UC permissions model.What is Delta Sharing? | Databricks on AWSYou can also use legacy approach, but it doesn'...

  • 1 kudos
svm_varma
by New Contributor II
  • 2372 Views
  • 1 replies
  • 2 kudos

Resolved! Azure Databricks quota restrictions on compute in Azure for students subscription

Hi All,Regrading creating clusters in Databricks I'm getting quota error have tried to increase quotas in the region where the resource is hosted still unable to increase the limit, is there any workaround  or could you help select the right cluster ...

svm_varma_1-1735552504129.png svm_varma_0-1735552319815.png svm_varma_2-1735552549290.png
  • 2372 Views
  • 1 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @svm_varma ,You can try to create Standard_DS3_v2 cluster. It has 4 cores and your current subscription limit for given region is 6 cores. The one you're trying to create needs 8 cores and hence you're getting quota exceeded exception.You can also...

  • 2 kudos
vijaypodili
by New Contributor III
  • 1738 Views
  • 9 replies
  • 0 kudos

databricks job taking longer time to load 2.3 gb data from bolb to ssms table

df_CorpBond= spark.read.format("parquet").option("header", "true").load(f"/mnt/{container_name}/raw_data/dsl.corporate.parquet") df_CorpBond.repartition(20).write\ .format("jdbc")\ .option("url", url_connector)\ .option("dbtable", "MarkIt...

Data Engineering
datarbricks
performance
  • 1738 Views
  • 9 replies
  • 0 kudos
Latest Reply
vijaypodili
New Contributor III
  • 0 kudos

Hi @RiyazAliM this is my dag digramfile size is 3.5 gb and in future we need to load 14gb as well

  • 0 kudos
8 More Replies
singhanuj2803
by New Contributor III
  • 3480 Views
  • 1 replies
  • 1 kudos

Apache Spark SQL query to get organization hierarchy

I'm currently diving deep into Spark SQL and its capabilities, and I'm facing an interesting challenge. I'm eager to learn how to write CTE recursive queries in Spark SQL, but after thorough research, it seems that Spark doesn't natively support recu...

rr.png RR1.png
  • 3480 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @singhanuj2803, It is correct that Spark SQL does not natively support recursive Common Table Expressions (CTEs). However, there are some workarounds and alternative methods you can use to achieve similar results.   Using DataFrame API with Loops:...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels