cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Long_Tran
by New Contributor
  • 1114 Views
  • 1 replies
  • 0 kudos

Can job 'run_as' be assigned to users/principals who actually run it?

Can job 'run_as' be assigned to users/principals who actually run it? instead of always a fixed creator/user/pricipal?When a job is run, I would like to see in the job setting "run_as" the name of the actual user/principal who runs it.Currently, "run...

  • 1114 Views
  • 1 replies
  • 0 kudos
Latest Reply
Wojciech_BUK
Valued Contributor III
  • 0 kudos

This is not avaliable in Workflow/Jobs.Job should newer be run as person who is executing the job, especialy in Production.The reason is that the output might be not the same, base on person who is running the job (e.g. diffrent Row Level Access). If...

  • 0 kudos
esauesp_co
by New Contributor III
  • 4308 Views
  • 5 replies
  • 1 kudos

Resolved! My jobs and cluster were deleted in a suspicious way

I want to know what happen with my cluster and if I can recover it.I entered to my Databricks account and I didn't found my jobs and my cluster. I couldn't find any log of the deleted cluster because the log is into the cluster interface. I entered t...

  • 4308 Views
  • 5 replies
  • 1 kudos
Latest Reply
Sid_databricks
New Contributor II
  • 1 kudos

Dear folks,When the tables has been deleted, then why I am unable to create the table with same name.It continiously giving me error"DeltaAnalysisException: Cannot create table ('`spark_catalog`.`default`.`Customer_Data`'). The associated location ('...

  • 1 kudos
4 More Replies
MattPython
by New Contributor
  • 17518 Views
  • 4 replies
  • 0 kudos

How do you read files from the DBFS with OS and Pandas Python libraries?

I created translations for decoded values and want to save the dictionary object the DBFS for mapping. However, I am unable to access the DBFS without using dbutils or PySpark library. Is there a way to access the DBFS with OS and Pandas Python libra...

image.png image image image
  • 17518 Views
  • 4 replies
  • 0 kudos
Latest Reply
User16789202230
New Contributor II
  • 0 kudos

db_path = 'file:///Workspace/Users/l<xxxxx>@databricks.com/TITANIC_DEMO/tested.csv' df = spark.read.csv(db_path, header = "True", inferSchema="True")

  • 0 kudos
3 More Replies
SimonXu
by New Contributor II
  • 8222 Views
  • 6 replies
  • 15 kudos

Resolved! Failed to launch pipeline cluster

Hi, there. I encountered an issue when I was trying to create my delta live table pipeline. The error is "DataPlaneException: Failed to launch pipeline cluster 1202-031220-urn0toj0: Could not launch cluster due to cloud provider failures. azure_error...

cluster failed to start usage and quota
  • 8222 Views
  • 6 replies
  • 15 kudos
Latest Reply
arpit
Valued Contributor
  • 15 kudos

@Simon Xu​ I suspect that DLT is trying to grab some machine types that you simply have zero quota for in your Azure account. By default, below machine type gets requested behind the scenes for DLT:AWS: c5.2xlargeAzure: Standard_F8sGCP: e2-standard-8...

  • 15 kudos
5 More Replies
shagun
by New Contributor III
  • 4135 Views
  • 3 replies
  • 0 kudos

Resolved! Delta live tables target schema

The first time i run my delta live table pipeline after setup, I get this error on starting it :-------------------------------------org.apache.spark.sql.catalyst.parser.ParseException: Possibly unquoted identifier my-schema-name detected. Please con...

  • 4135 Views
  • 3 replies
  • 0 kudos
Latest Reply
BenTendo
New Contributor II
  • 0 kudos

This still errors on internal databricks spark/python code likedeltaTable.history()@shagun wrote:The first time i run my delta live table pipeline after setup, I get this error on starting it :-------------------------------------org.apache.spark.sql...

  • 0 kudos
2 More Replies
ashdam
by New Contributor III
  • 3744 Views
  • 2 replies
  • 2 kudos

Resolved! Databricks asset bundles use cluster depending on target (environment) is possible?

Here is my bundle definition  Spoiler# This is a Databricks asset bundle definition for my_project.# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.experimental:   python_wheel_wrapper: truebundle:  name: my_projectinc...

Data Engineering
Databricks Asset Bundles
  • 3744 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @ashdam ,  Certainly! It seems you’re dealing with a scenario where you want to use a specific cluster based on a defined target. Let’s explore some options: Conditional Cluster Selection: You can conditionally select a cluster based on the targe...

  • 2 kudos
1 More Replies
Nastasia
by New Contributor II
  • 3492 Views
  • 3 replies
  • 1 kudos

Why is Spark creating multiple jobs for one action?

I noticed that when launching this bunch of code with only one action, I have three jobs that are launched.from pyspark.sql import DataFrame from pyspark.sql.types import StructType, StructField, StringType from pyspark.sql.functions import avgdata:...

https___i.stack.imgur.com_xfYDe.png LTHBM DdfHN
  • 3492 Views
  • 3 replies
  • 1 kudos
Latest Reply
RKNutalapati
Valued Contributor
  • 1 kudos

The above code will create two jobs.JOB-1. dataframe: DataFrame = spark.createDataFrame(data=data,schema=schema)The createDataFrame function is responsible for inferring the schema from the provided data or using the specified schema.Depending on the...

  • 1 kudos
2 More Replies
epps
by New Contributor
  • 1253 Views
  • 0 replies
  • 0 kudos

400 Unable to load OAuth Config

I've enabled SSO for my Databricks account with Okta as the identity provider and tested the integration is working. I'm now trying to implement an on-behalf-of token exchange so that my API can make authenticate requests to Databricks's API (e.g. ) ...

  • 1253 Views
  • 0 replies
  • 0 kudos
bathulaj
by New Contributor
  • 488 Views
  • 0 replies
  • 0 kudos

NonDeterministic UDF's making multiple invocations

Hi,Even after defining my UDF's as nonDeterministic like in here ..testUDF = udf(testMthod, StringType()).asNondeterministic()is still making multiple invocations. Is there any thing that I am missing here ?TIA,-John B

  • 488 Views
  • 0 replies
  • 0 kudos
famous_jt33
by New Contributor
  • 1254 Views
  • 2 replies
  • 2 kudos

SQL UDFs for DLT pipelines

I am trying to implement a UDF for a DLT pipeline. I have seen the documentation stating that it is possible but I am getting an error after adding an SQL UDF to a cell in the notebook attached to the pipeline. The aim is to have the UDF in a separat...

  • 1254 Views
  • 2 replies
  • 2 kudos
Latest Reply
6502
New Contributor III
  • 2 kudos

You can't. The SQL support on DLT pipeline cluster is limited compared to a normal notebook. You can still define a UDF in Python using, of course, a Python notebook. In this case, you can use the spark.sql() function to execute your original SQL cod...

  • 2 kudos
1 More Replies
sher
by Valued Contributor II
  • 3986 Views
  • 4 replies
  • 3 kudos

how do we use delta sharing between databricks to snowflake

Hi all,Is there any way to implement delta sharing in databricks to snowflake direct connect ?

  • 3986 Views
  • 4 replies
  • 3 kudos
Latest Reply
NateAnth
Valued Contributor
  • 3 kudos

I don't think that Snowflake has implemented the ability to read from a table via Delta Sharing as of December 2023. Please reach out to your Snowflake representatives and urge them to consider this feature from their side.  Alternatively, you can qu...

  • 3 kudos
3 More Replies
PrasSabb_97245
by New Contributor II
  • 3855 Views
  • 2 replies
  • 0 kudos

AWS S3 External Location Size in Unity Catalog

Hi,I am trying to get the raw size (total size)  of delta table. I could get delta table size from DeltaTable api but that gives only latest version size. I need to find the actual S3 size the tables takes on S3.Is there any way, to find the S3 size ...

  • 3855 Views
  • 2 replies
  • 0 kudos
Latest Reply
PrasSabb_97245
New Contributor II
  • 0 kudos

Hi Kaniz,Thank you for your suggestions. As per my understanding, the "snapshot.sizeInBytes" gives only current snapshot size. But I am looking for total size (all versions) of the table on S3.  

  • 0 kudos
1 More Replies
erigaud
by Honored Contributor
  • 2359 Views
  • 4 replies
  • 0 kudos

The operation CHANGE DATA FEED is not allowed on Streaming Tables.

Hello everyone,I have a workflow that starts by reading the CDF data for a change data feed.The syntax is exactly the following : (spark.readStream  .format("delta")  .option("readChangeFeed", "true")   .option("startingVersion", 10)   .table("my.str...

  • 2359 Views
  • 4 replies
  • 0 kudos
Latest Reply
afk
New Contributor III
  • 0 kudos

Hi, this seems to be related to the issue I've been getting around the same time here: Change data feed from target tables of APPLY CHANG... - Databricks - 54436Would be great to get an explanation for the sudden change in behaviour.

  • 0 kudos
3 More Replies
Jules
by New Contributor
  • 562 Views
  • 0 replies
  • 0 kudos

Access from DBT job to Azure DevOps repository using Service Principal

Hi,We are using Databricks bundles to deploy our DBT project. Everything is set up to deploy and run as a Service Principal.The DBT job is connected to an Azure DevOps repository. The problem is that we cannot find a way to properly authenticate the ...

Data Engineering
azure devops
bundles
dbt
  • 562 Views
  • 0 replies
  • 0 kudos
NLearn
by New Contributor II
  • 852 Views
  • 2 replies
  • 0 kudos

Save default language of notebook into variable dynamically

 For one of the requirements of project, I want to save default language of notebook into variable based on notebook path mentioned dynamically.For eg: if first notebook given by user in widget is having default language as Python then variable value...

  • 852 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @NLearn, To change the default language of a notebook in Databricks, you can select File -> Change default cell language. This will affect all the cells in the notebook that use the same language as the default one. You can also use magic commands...

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels