cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

jeremy98
by New Contributor III
  • 2 Views
  • 1 replies
  • 0 kudos

how to read the CDF logs in DLT Pipeline?

Hi Community,How to read the CDF logs in materialized views created by DLT Pipeline?Thanks for you time,

  • 2 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @jeremy98, To read the Change Data Feed (CDF) logs in materialized views created by a Delta Live Tables (DLT) pipeline, you can follow these steps:   Enable Change Data Feed: Ensure that the change data feed is enabled on the base tables of the ma...

  • 0 kudos
mkEngineer
by New Contributor III
  • 29 Views
  • 1 replies
  • 0 kudos

Refresh options on PBI from Databricks workflow using Azure Databricks

Hi!I have a workflow that includes my medallion architecture and DLT. Currently, I have a separate notebook for refreshing my Power BI semantic model, which works based on the method described in Refresh a PowerBI dataset from Azure Databricks.  Howe...

  • 29 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @mkEngineer, Have you reviewed this documentation: https://learn.microsoft.com/en-us/azure/databricks/partners/bi/power-bi Also I don't think Serverless compute for Notebook will work for your connection with Power BI. You might need to setup a Se...

  • 0 kudos
stevomcnevo007
by New Contributor
  • 100 Views
  • 4 replies
  • 1 kudos

agents.deploy NOT_FOUND: The directory being accessed is not found. error

I keep getting the following error although the model definitely does exist and version names and model name is correct RestException: NOT_FOUND: The directory being accessed is not found. when calling # Deploy the model to the review app and a model...

  • 100 Views
  • 4 replies
  • 1 kudos
Latest Reply
stevomcnevo007
New Contributor
  • 1 kudos

Checked the config.yml file and it looks like this: agent_prompt: "Use functions to interact with questions about eggs / henhouse."llm_endpoint: "databricks-meta-llama-3-3-70b-instruct"warehouse_id: "1c7bf12e78b673de"uc_functions: "main.egg_shop.*"St...

  • 1 kudos
3 More Replies
kasiviss42
by Visitor
  • 16 Views
  • 1 replies
  • 0 kudos

Unity Credential Scope id not found in thread locals

i am facing issue :- [UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE] Missing Credential Scope. Unity Credential Scope id not found in thread locals.Issue occurs:-when we try to list files using dbutils.fs.lsand also this occurs at times when we try to write o...

  • 16 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @kasiviss42, Are you using any Scala code in your notebook? The error [UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE] Missing Credential Scope. Unity Credential Scope id not found in thread locals that you are encountering when using dbutils.fs.ls and whil...

  • 0 kudos
soumiknow
by New Contributor III
  • 28 Views
  • 1 replies
  • 0 kudos

How to resolved 'connection refused' error while using a google-cloud lib in Databricks Notebook?

I want to use google-cloud-bigquery library in my PySpark code though I know that spark-bigquery-connector is available. The reason I want to use is that the Databricks Cluster 15.4LTS comes with 0.22.2-SNAPSHOT version of spark-bigquery-connector wh...

  • 28 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @soumiknow, To resolve the 'connection refused' error when using the google-cloud-bigquery library in your Databricks notebook, you need to ensure that your Databricks cluster is properly configured to authenticate with Google Cloud Platform (GCP)...

  • 0 kudos
Phani1
by Valued Contributor II
  • 30 Views
  • 1 replies
  • 0 kudos

Access the data from cross-cloud.

Hi All ,We have a use case  where we need to connect AWS Databricks to a GCP storage bucket to access the data. In Databricks We're trying to use external locations and storage credentials, but it seems like AWS Databricks only supports AWS storage b...

  • 30 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @Phani1 ,You can use delta sharing. In that way you can create share that will allow you to access data stored in GCS and it's govern by UC permissions model.What is Delta Sharing? | Databricks on AWSYou can also use legacy approach, but it doesn'...

  • 0 kudos
zmsoft
by New Contributor III
  • 25 Views
  • 2 replies
  • 0 kudos

How to load PowerBI Dataset into databricks

Hi there, I would like to know how to load power bi dataset into databricks Thanks&Regards, zmsoft

  • 25 Views
  • 2 replies
  • 0 kudos
Latest Reply
jack533
New Contributor III
  • 0 kudos

It's not feasible, in my opinion. It is feasible to load a table from DataBricks into a PowerBI dataset, but not the other way around. 

  • 0 kudos
1 More Replies
svm_varma
by Visitor
  • 32 Views
  • 1 replies
  • 1 kudos

Azure Databricks quota restrictions on compute in Azure for students subscription

Hi All,Regrading creating clusters in Databricks I'm getting quota error have tried to increase quotas in the region where the resource is hosted still unable to increase the limit, is there any workaround  or could you help select the right cluster ...

svm_varma_1-1735552504129.png svm_varma_0-1735552319815.png svm_varma_2-1735552549290.png
  • 32 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 1 kudos

Hi @svm_varma ,You can try to create Standard_DS3_v2 cluster. It has 4 cores and your current subscription limit for given region is 6 cores. The one you're trying to create needs 8 cores and hence you're getting quota exceeded exception.You can also...

  • 1 kudos
guiferviz
by New Contributor II
  • 60 Views
  • 1 replies
  • 0 kudos

How to Determine if Materialized View is Performing Full or Incremental Refresh?

I'm currently testing materialized views and I need some help understanding the refresh behavior. Specifically, I want to know if my materialized view is querying the full table (performing a full refresh) or just doing an incremental refresh.From so...

  • 60 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @guiferviz, To determine the type of refresh used, you can query the Delta Live Tables event log. Look for the event_type called planning_information to see the technique used for the refresh. The techniques include:   FULL_RECOMPUTE: Indicates a ...

  • 0 kudos
hprasad
by New Contributor III
  • 37 Views
  • 1 replies
  • 0 kudos

Optimize Cluster Uptime by Avoiding Unwanted Library or Jar Installations

Whenever we discuss clusters or nodes in any service, we need to address the cluster bootstrap process. Traditionally, this involves configuring each node using a startup script (startup.sh).In this context, installing libraries in the cluster is par...

Data Engineering
cluster
job
jobs
Nodes
  • 37 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

For further details on managing init scripts and optimizing the bootstrap process, you can refer to the Databricks documentation on init scripts. This documentation provides recommendations for using built-in platform features instead of init scripts...

  • 0 kudos
Omri
by New Contributor
  • 38 Views
  • 2 replies
  • 0 kudos

Optimizing a complex pyspark join

I have a complex join that I'm trying to optimize df1 has cols id,main_key,col1,col1_isnull,col2,col2_isnull...col30 df2 has cols id,main_key,col1,col2..col_30I'm trying to run this sql query on Pysparkselect df1.id, df2.id from df1 join df2 on df1.m...

  • 38 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Continuation of my comments:  Shuffle Hash Join: Prefer shuffle hash join over sort-merge join if applicable. This can be more efficient in certain scenarios: spark.conf.set("spark.sql.join.preferSortMergeJoin", "false")   Data Skew Remediation: Iden...

  • 0 kudos
1 More Replies
filipniziol
by Contributor III
  • 116 Views
  • 1 replies
  • 1 kudos

Resolved! Is dbutils.notebook.run() supported from a local Spark Connect environment (VS Code)?

Hi everyone,I’m experimenting with the Databricks VS Code extension, using Spark Connect to run code locally in my Python environment while connecting to a Databricks cluster. I’m trying to call one notebook from another via: notebook_params = { ...

filipniziol_0-1735481046190.png filipniziol_1-1735481141816.png
  • 116 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @filipniziol, it is confirmed that dbutils.notebook.run relies on the full Databricks notebook context, which is not available in a local Spark Connect session. Therefore, running a notebook with dbutils.notebook.run is not possible in a local env...

  • 1 kudos
singhanuj2803
by New Contributor III
  • 46 Views
  • 1 replies
  • 1 kudos

Apache Spark SQL query to get organization hierarchy

I'm currently diving deep into Spark SQL and its capabilities, and I'm facing an interesting challenge. I'm eager to learn how to write CTE recursive queries in Spark SQL, but after thorough research, it seems that Spark doesn't natively support recu...

rr.png RR1.png
  • 46 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @singhanuj2803, It is correct that Spark SQL does not natively support recursive Common Table Expressions (CTEs). However, there are some workarounds and alternative methods you can use to achieve similar results.   Using DataFrame API with Loops:...

  • 1 kudos
singhanuj2803
by New Contributor III
  • 47 Views
  • 1 replies
  • 1 kudos

How to run stored procedure in Azure Database for PostgreSQL using Azure Databricks Notebook

We have Stored Procedure available in Azure Database for PostgreSQL and we want to call or run or execute the postgreSQL stored procedures in Azure Databricks through NotebookWe are attempting to run PostgreSQL stored procedures, through Azure Databr...

  • 47 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

To execute a PostgreSQL stored procedure from an Azure Databricks notebook, you need to follow these steps: Required Libraries:You need to install the psycopg2 library, which is a PostgreSQL adapter for Python. This can be done using the %pip install...

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels