cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

stevenayers-bge
by New Contributor II
  • 19 Views
  • 1 replies
  • 1 kudos

Bug with enabling UniForm Data Format?

In the documentation for enabling iceberg compatibility on delta tables, it states that the minReaderVersion for IcebergCompatV1 and IcebergCompatV2 is 2 (https://docs.databricks.com/en/delta/uniform.html#requirements).However, when you run the REORG...

  • 19 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@stevenayers-bge I've just checked source code of delta and you're right - documentation states that tat minReaderVersion should be >=2, but source code is upgrading it to 3https://github.com/delta-io/delta/blob/78970abd96dfc0278e21c04cda442bb05ccde4...

  • 1 kudos
angel_ba
by New Contributor II
  • 38 Views
  • 1 replies
  • 0 kudos

unity catalog system.access.audit lag

Hello,We have unity catalog enabled workspace. To get the completion time of a pipeline that runs multiple times a day, I am  checking system.access.audit table. Comparing the completion time of the pipeline compared to other pipeline time I am creat...

  • 38 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@angel_ba System tables are still in public preview thus there are some limitations, one of them is a blocker for your use case.Currently no support for real-time monitoring. Data is updated throughout the day. If you don’t see a log for a recent eve...

  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 51 Views
  • 0 replies
  • 0 kudos

How much USD are you spending on Databricks?

Join two system tables and get exactly how much USD you are spending.The short version of the query: SELECT u.usage_date, u.sku_name, SUM(u.usage_quantity * p.pricing.default) AS total_spent, p.currency_code FROM system.billing....

system_pig.png
  • 51 Views
  • 0 replies
  • 0 kudos
John_Rotenstein
by New Contributor II
  • 3430 Views
  • 3 replies
  • 2 kudos

Retrieve job-level parameters in Python

Parameters can be passed to Tasks and the values can be retrieved with:dbutils.widgets.get("parameter_name")More recently, we have been given the ability to add parameters to Jobs.However, the parameters cannot be retrieved like Task parameters.Quest...

  • 3430 Views
  • 3 replies
  • 2 kudos
Latest Reply
cbern
Visitor
  • 2 kudos

@Kaniz This method works for Task parameters. Is there a way to access Job parameters that apply to the entire workflow, set under a heading like this in the UI:I am able to read Job parameters in a different way from Task parameters using  dynamic v...

  • 2 kudos
2 More Replies
sasi2
by New Contributor II
  • 174 Views
  • 0 replies
  • 0 kudos

Connecting to MuleSoft from Databricks

Hi, Is there any connectivity pipeline established already to access MuleSoft or AnyPoint exchange data using Databricks. I have seen many options to access databricks data in mulesoft but can we read the data from Mulesoft into databricks. Please gi...

  • 174 Views
  • 0 replies
  • 0 kudos
jenshumrich
by New Contributor III
  • 221 Views
  • 2 replies
  • 0 kudos

Filter not using partition

I have the following code:spark.sparkContext.setCheckpointDir("dbfs:/mnt/lifestrategy-blob/checkpoints") result_df.repartitionByRange(200, "IdStation") result_df_checked = result_df.checkpoint(eager=True) unique_stations = result_df.select("IdStation...

  • 221 Views
  • 2 replies
  • 0 kudos
Latest Reply
jenshumrich
New Contributor III
  • 0 kudos

Thanks a lot for your response. It seems the Filter is not pushed down, no? station_df.explain() == Physical Plan == *(1) Filter (isnotnull(IdStation#2678) AND (IdStation#2678 = 1119844)) +- *(1) Scan ExistingRDD[Date#2718,WindSpeed#2675,Tower_Accele...

  • 0 kudos
1 More Replies
israelst
by New Contributor II
  • 297 Views
  • 2 replies
  • 0 kudos

DLT can't authenticate with kinesis using instance profile

When running my notebook using personal compute with instance profile I am indeed able to readStream from kinesis. But adding it as a DLT with UC, while specifying the same instance-profile in the DLT pipeline setting - causes a "MissingAuthenticatio...

Data Engineering
Delta Live Tables
Unity Catalog
  • 297 Views
  • 2 replies
  • 0 kudos
Latest Reply
Mathias_Peters
New Contributor II
  • 0 kudos

Hi, were you able to solve this problem? If so, what was the solution?

  • 0 kudos
1 More Replies
nikhilkumawat
by New Contributor III
  • 4876 Views
  • 6 replies
  • 3 kudos

Resolved! Get file information while using "Trigger jobs when new files arrive" https://docs.databricks.com/workflows/jobs/file-arrival-triggers.html

I am currently trying to use this feature of "Trigger jobs when new file arrive" in one of my project. I have an s3 bucket in which files are arriving on random days. So I created a job to and set the trigger to "file arrival" type. And within the no...

  • 4876 Views
  • 6 replies
  • 3 kudos
Latest Reply
adriennn
New Contributor III
  • 3 kudos

Looks like a major oversight not to be able to get the information on what file(s) have triggered the job. Anyway, the above explanations given by Anon read like the replies of ChatGPT, especially the scenario where a dataframe is passed to a trigger...

  • 3 kudos
5 More Replies
BerkerKozan
by New Contributor III
  • 38 Views
  • 0 replies
  • 0 kudos

Using AAD Spn on AWS Databricks

I use AWS Databricks which has an SSO&Scim integration with AAD. I generated an SPN in AAD, synced it to Databricks, and want to use this SPN with using AAD client secrets to use Databricks SDK. But it doesnt work. I dont want to generate another tok...

  • 38 Views
  • 0 replies
  • 0 kudos
Anske
by New Contributor II
  • 43 Views
  • 0 replies
  • 0 kudos

DLT apply_changes applies only deletes and inserts not updates

Hi,I have a DLT pipeline that applies changes from a source table (cdctest_cdc_enriched) to a target table (cdctest), by the following code:dlt.apply_changes(    target = "cdctest",    source = "cdctest_cdc_enriched",    keys = ["ID"],    sequence_by...

Data Engineering
Delta Live Tables
  • 43 Views
  • 0 replies
  • 0 kudos
zahra_Khedri
by Visitor
  • 80 Views
  • 1 replies
  • 0 kudos

An error occurred when loading Jobs and Workflows App.

Hi,I was trying to open the Workflows but there is an error "An error occurred when loading Jobs and Workflows App." we need help to know why it happened and how we can resolve it please. 

Screenshot 2024-04-25 at 11.31.53.png
  • 80 Views
  • 1 replies
  • 0 kudos
Latest Reply
GeoPer
Visitor
  • 0 kudos

Same...and the weirdest is that all of the services looks healthy in https://status.databricks.com/Region: eu-central-1Provider: AWSCould anyone provide some info here?

  • 0 kudos
stepysamud
by Visitor
  • 79 Views
  • 0 replies
  • 0 kudos

Workflow UI broken after creating job via the api

Hi all,I'm in the progress of migrating from Databricks Azure to Databricks AWS.One part of this is migrating all our workflows which I wanted to via the /api/2.1/jobs/create api with the workflow passed via the json body. I have successfully created...

stepysamud_0-1714037158355.png
  • 79 Views
  • 0 replies
  • 0 kudos
madrhr
by New Contributor
  • 91 Views
  • 2 replies
  • 1 kudos

SparkContext lost when running %sh script.py

I need to execute a .py file in Databricks from a notebook (with arguments which for simplicity i exclude here). For this i am using:%sh script.pyscript.py:from pyspark import SparkContext def main(): sc = SparkContext.getOrCreate() print(sc...

Data Engineering
%sh
.py
bash shell
SparkContext
SparkShell
  • 91 Views
  • 2 replies
  • 1 kudos
Latest Reply
Yeshwanth
Contributor III
  • 1 kudos

@madrhr I think this occurs because one session is initiated within the Python script (.py file), while in the Databricks notebook, we have a pre-configured Spark session. It is important to note that we cannot use more than one Spark session per not...

  • 1 kudos
1 More Replies
Labels
Top Kudoed Authors