cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

erigaud
by Honored Contributor
  • 139 Views
  • 2 replies
  • 3 kudos

Databricks asset bundles and Dashboards - pass parameters depending on bundle target

Hello everyone !Since Databricks Asset Bundles can now be used to deploy dashboards, I'm wondering how to pass parameters so that the queries for the dev dashboard query the dev catalog, and the dashboard in stg query the stg catalog etc.Is there any...

  • 139 Views
  • 2 replies
  • 3 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 3 kudos

Hi @erigaud, Here’s how you can achieve this:   Define Variables in the Bundle Configuration: You can define custom variables in your databricks.yml file to hold the catalog names for different environments. For example:   variables:    catalog_name...

  • 3 kudos
1 More Replies
devagya
by New Contributor
  • 164 Views
  • 2 replies
  • 1 kudos

Infor Data Lake to Databricks

I'm working on this project which involves moving data from Infor to Databricks.Infor is somewhat of an enterprise solution. I could not find much resources on this. I could not even find any free trial option on their site.If anyone has experience w...

  • 164 Views
  • 2 replies
  • 1 kudos
Latest Reply
michelle653burk
New Contributor III
  • 1 kudos

@devagya wrote:I'm working on this project which involves moving data from Infor to Databricks.Infor is somewhat of an enterprise solution. I could not find much resources on this. I could not even find any free trial option on their site.If anyone h...

  • 1 kudos
1 More Replies
dixonantony
by New Contributor III
  • 343 Views
  • 6 replies
  • 0 kudos

Not able create table form external spark

py4j.protocol.Py4JJavaError: An error occurred while calling o123.sql.: io.unitycatalog.client.ApiException: generateTemporaryPathCredentials call failed with: 401 - {"error_code":"UNAUTHENTICATED","message":"Request to generate access credential for...

  • 343 Views
  • 6 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Can you also try this validate on the privileges please Grant CREATE EXTERNAL TABLE on the external location to create a delta table.  on UNITY_CATALOG_EXTERNAL_GENERATE_PATH_CREDENTIALS_DENIED because that privilege is blocked behind.

  • 0 kudos
5 More Replies
schluca
by New Contributor II
  • 129 Views
  • 1 replies
  • 0 kudos

Error Querying Shallow Clones: Couldn't Initialize File System for Path

Hi,We are offering data products through a central catalog for our users. To minimize data duplication and to display relationships between tables, we use shallow clones to provide access to the data.However, since implementing this approach, we occa...

  • 129 Views
  • 1 replies
  • 0 kudos
Latest Reply
TakuyaOmi
Contributor III
  • 0 kudos

Hi @schluca ,I’ve encountered an issue where an error occurred when trying to reference a table after deleting and recreating the source table for a Shallow Clone, and then performing the Shallow Clone again. As a solution, try deleting the destinati...

  • 0 kudos
Rahman823
by New Contributor II
  • 214 Views
  • 2 replies
  • 1 kudos

Databricks table lineage

Hi,I wanted to know if it is possible to edit the lineage that we see in databricks, like the one shown below.Can I edit this lineage graph, like add other ETL tools (at the start of the tables) that I have used to get data in aws and then in databri...

Rahman823_0-1731512283889.png
  • 214 Views
  • 2 replies
  • 1 kudos
Latest Reply
chm_user_1
New Contributor II
  • 1 kudos

This will be extremely beneficial. We have certain use cases where we do not leverage Spark in our pipelines and lose the lineage. I would prefer to set an extra parameter when writing a table to specify the lineage. 

  • 1 kudos
1 More Replies
vinitkhandelwal
by New Contributor III
  • 155 Views
  • 2 replies
  • 0 kudos

Error while running a notebook job using git repo (Gitlab)

I am trying to run a notebook job with a git repo hosted on Gitlab. I have Linked my Gitlab account using Gitlab tokenYet i am getting the following error on running the job How to resolve this?

Screenshot 2024-12-02 at 7.56.10 AM.png Screenshot 2024-12-02 at 7.58.11 AM.png
  • 155 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @vinitkhandelwal, Looks like the token could be missing required permissions for the operation. Please refer to: You can clone public remote repositories without Git credentials (a personal access token and a username). To modify a public remote r...

  • 0 kudos
1 More Replies
sharukh_lodhi
by New Contributor III
  • 940 Views
  • 3 replies
  • 2 kudos

Azure IMDS is not accesbile selecting shared compute policy

Hi, Databricks community,I recently encountered an issue while using the 'azure.identity' Python library on a cluster set to the personal compute policy in Databricks. In this case, Databricks successfully returns the Azure Databricks managed user id...

image.png
Data Engineering
azure IMDS
DefaultAzureCredential
  • 940 Views
  • 3 replies
  • 2 kudos
Latest Reply
daisy08
New Contributor II
  • 2 kudos

I'm having a similar problem, my aim is from an Azure DataBricks notebook to invoke an AzureDataDactory pipeline I created an Access Connector for Azure Databricks to which I gave Data Factory Contributor permissions. Using these lines pythonfrom azu...

  • 2 kudos
2 More Replies
jeremy98
by New Contributor III
  • 245 Views
  • 1 replies
  • 0 kudos

Resolved! how to pass using DABs the parameters to use

Hello Community, I want to pass parameters to my Databricks job through the DABs CLI. Specifically, I'd like to be able to run a job with parameters directly using the command:  databricks bundle run -t prod --params [for example: table_name="client"...

  • 245 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @jeremy98 ,You can pass parameters using CLI in following way: databricks bundle run -t ENV --params Param1=Value1,Param2=Value2 Job_Name And in your yml file you should define parameters in similar way to following:You can find more info in follo...

  • 0 kudos
parasupadhyay
by New Contributor
  • 108 Views
  • 1 replies
  • 0 kudos

Pydeequ is not working with shared mode clusters on a unity catalog enabled databricks account

Hi folks,Recently i was doing a poc on pydeequ. I found that it is not working with shared mode clusters in a databricks account where unity catalog is enabled. It throws the below error :It works fine with the single mode clusters. Does anyone have ...

parasupadhyay_0-1733142604026.png
  • 108 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @parasupadhyay ,I don't think there is much you can do. For security reasons, you can't use shared access mode in UC to access sparkContext. Your only option to make it work is to use single mode cluster.

  • 0 kudos
646901
by New Contributor II
  • 204 Views
  • 2 replies
  • 0 kudos

Get user who ran a job

From the databricks API/ CLI is it possible to get the user who triggered a job run programatically?The information can be found in the job "event log" and can be queried in the "audit log" but neither of these seem to have a API option. Is there a w...

  • 204 Views
  • 2 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Contributor III
  • 0 kudos

with the databricks cli you can get all infos about a job run with this command:databricks jobs get-run <run-id> replace <run-id> with your actual run-id

  • 0 kudos
1 More Replies
matmat13
by New Contributor II
  • 14211 Views
  • 19 replies
  • 10 kudos

Resolved! Lakehouse Fundamentals Certificate/Badge not appearing

Hello! I just passed the Lakehouse Fundamentals Accreditation and I haven't received any badge or certificate for it. I understand that I need to go to credentials.databricks.com but it is not there. How long before it appears? Need help

  • 14211 Views
  • 19 replies
  • 10 kudos
Latest Reply
andy25
New Contributor II
  • 10 kudos

Hello can you help me, I have the same problem. I don't have the certificate available on credentials.databricks.com.

  • 10 kudos
18 More Replies
dev_puli
by New Contributor III
  • 35178 Views
  • 6 replies
  • 5 kudos

how to read the CSV file from users workspace

Hi!I have been carrying out a POC, so I created the CSV file in my workspace and tried to read the content using the techniques below in a Python notebook, but did not work.Option1:repo_file = "/Workspace/Users/u1@org.com/csv files/f1.csv"tmp_file_na...

  • 35178 Views
  • 6 replies
  • 5 kudos
Latest Reply
MujtabaNoori
New Contributor II
  • 5 kudos

Hi @Dev ,Generally, What happens spark reader APIs point to the DBFS by default. And, to read the file from User workspace, we need to append 'file:/' in the prefix.Thanks

  • 5 kudos
5 More Replies
AH
by New Contributor III
  • 820 Views
  • 3 replies
  • 0 kudos

Databricks Genie AI Next Level Q&A

Hi Team Could Please you answer these three questions to onboard Genie Space for Analytics?Do you know if we can use GenieSpace in our web application through API or SDK?Is there any way to manage access control other than the Unity catalog?Can we ad...

  • 820 Views
  • 3 replies
  • 0 kudos
Latest Reply
jennie258fitz
New Contributor III
  • 0 kudos

@AH my loyola wrote:Hi Team Could Please you answer these three questions to onboard Genie Space for Analytics?Do you know if we can use GenieSpace in our web application through API or SDK?Is there any way to manage access control other than the Uni...

  • 0 kudos
2 More Replies
avnish26
by New Contributor III
  • 11089 Views
  • 5 replies
  • 8 kudos

Spark 3.3.0 connect kafka problem

I am trying to connect to my Kafka from spark but getting an error:Kafka Version: 2.4.1Spark Version: 3.3.0I am using jupyter notebook to execute the pyspark code below:```from pyspark.sql.functions import *from pyspark.sql.types import *#import libr...

  • 11089 Views
  • 5 replies
  • 8 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 8 kudos

Hi @avnish26, did you added the Jar files to the cluster? do you still have issues? please let us know

  • 8 kudos
4 More Replies
Erik
by Valued Contributor III
  • 346 Views
  • 3 replies
  • 2 kudos

Managing streaming checkpoints with unity catalog

This is partly a question, partly a feature request: How do you guys handle streaming checkpoints in combination with unity catalog managed tables?It seems like the only way is to create a volume, and manually specify paths in it as streaming checkpo...

  • 346 Views
  • 3 replies
  • 2 kudos
Latest Reply
cgrant
Databricks Employee
  • 2 kudos

For Structured Streaming Applications, this would be a nice feature. Delta Live Tables manages checkpoints for you out of the box - you don't even have to reason about checkpoints at all, would recommend checking it out!

  • 2 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels