cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

nielsehlers
by New Contributor
  • 518 Views
  • 1 replies
  • 1 kudos

from_utc_time gives strange results

I don't understand why from_utc_time(col("original_time"), "Europe/Berlin") changes the timestamp instead of just setting the timezone. That's a non-intuitive behaviour.   spark.conf.set("spark.sql.session.timeZone", "UTC")from pyspark.sql import Row...

  • 518 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Databricks Employee
  • 1 kudos

Hello @nielsehlers! Just to clarify, PySpark's from_utc_timestamp converts a UTC timestamp to the specified timezone (in this case it's Europe/Berlin), adjusting the actual timestamp value rather than just setting timezone metadata. This happens beca...

  • 1 kudos
al_rammos
by New Contributor II
  • 646 Views
  • 2 replies
  • 0 kudos

DROP VIEW IF EXISTS Failing on Dynamically Generated Temporary View in Databricks 15.4 LTS

Hello everyone,I'm experiencing a very strange issue with temporary views in Databricks 15.4 LTS that did not occur in 13.3. I have a workflow where I create a temporary view, run a query against it, and then drop it using a DROP VIEW IF EXISTS comma...

  • 646 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @al_rammos, Thanks for your detail comments and replication of the issue. There have been known issues in recent DBR versions where dynamically created temporary views are not being properly resolved during certain operations due to incorrect sess...

  • 0 kudos
1 More Replies
Volker
by Contributor
  • 295 Views
  • 4 replies
  • 1 kudos

Retention Period for Parquet Data in e.g. S3 After Dropping a Managed Delta Table

Hey community,I have a question regarding the data retention policy for managed Delta tables stored e.g. in Amazon S3. Specifically:​When a managed Delta table is dropped, what is the retention period for the underlying Parquet data files in S3 befor...

  • 295 Views
  • 4 replies
  • 1 kudos
Latest Reply
Volker
Contributor
  • 1 kudos

Thanks for the resources!So, to adjust how long Parquet files are stored in the S3 bucket after I drop a table, I would need to adjust the delta.logRetentionDuration, right?And since dropping a Delta table marks the files for deletion after 7 days, I...

  • 1 kudos
3 More Replies
sandy311
by New Contributor III
  • 3115 Views
  • 5 replies
  • 5 kudos

if else conditions in databricks asset bundles

Can I use if-else conditions in databricks.yml and parameterize my asset bundles similarly to Azure Pipelines YAML?

  • 3115 Views
  • 5 replies
  • 5 kudos
Latest Reply
davidcardoner
New Contributor II
  • 5 kudos

Can we define a task based on an if else logic based on a variable passed at bundle deploy time.

  • 5 kudos
4 More Replies
CashyMcSmashy
by New Contributor II
  • 317 Views
  • 2 replies
  • 0 kudos

Databricks Asset Bundles Firewall Issue

HelloI'm trying to use Databricks Asset Bundles within a network that has limited access to the internet.  When I try to deploy I get the error message "error downloading Terraform: Get "https://releases.hashicorp.com/terraform/1.5.5/index.json".  Is...

  • 317 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @CashyMcSmashy, You might want to whitelist below URLs   https://releases.hashicorp.com/terraform/1.5.5/index.jsonhttps://registry.terraform.io/.well-known/terraform.json  

  • 0 kudos
1 More Replies
Oliver_Angelil
by Valued Contributor II
  • 11280 Views
  • 10 replies
  • 1 kudos

How to use the git CLI in databricks?

After making some changes in my feature branch, I have committed and pushed (to Azure Devops) some work (note I have not yet raised a PR or merge to any other branch). Many of the files I committed are data files and so I would like to reverse the co...

  • 11280 Views
  • 10 replies
  • 1 kudos
Latest Reply
turagittech
New Contributor III
  • 1 kudos

I would love to get an update on this. Git commands would be outstanding in some form. I have the same issue I have changed directory to the workspace. Ls shows the files in the repository, but git status fails.-rwxrwxrwx 1 root root 2386 Feb 13 01:1...

  • 1 kudos
9 More Replies
Katalin555
by New Contributor II
  • 252 Views
  • 1 replies
  • 0 kudos

Found a potential bug in Job Details/Schedule and Trigger section

One of our jobs is scheduled to run at 4:30 AM based on GMT+1 timezone, which is visible if we click on the Edit trigger (Picture1), but under job details it is show as if it was schedule to run at 4:30 AM UTC time (Picture 2).Based on previous runs ...

Katalin555_0-1743586717525.jpeg Katalin555_1-1743586768998.jpeg Katalin555_3-1743587093136.png
  • 252 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Contributor
  • 0 kudos

Hey @Katalin555 Even though in the “Edit Trigger” panel (Picture 2) the time is shown in local timezone (e.g. GMT+1), once the schedule is saved and viewed under job details (Picture 1), Databricks always displays it as UTC — without making it visual...

  • 0 kudos
Guigui
by New Contributor II
  • 208 Views
  • 2 replies
  • 0 kudos

Package installation for multi-tasks job

I have a job with the same task to be executed twice with two sets of parameters. In each task is run after cloning a git repo then installing it locally and running a notebook from this repo. However, as each task clones the same repo, I was wonderi...

  • 208 Views
  • 2 replies
  • 0 kudos
Latest Reply
Guigui
New Contributor II
  • 0 kudos

That what I've done, but I find it less elegant that setup an environment and sharing it across multiple tasks. It seems to be impossible (unless I build a wheel file and I dont want to) as tasks do not share environment, but anyway, as they run in p...

  • 0 kudos
1 More Replies
Eric_Kieft
by New Contributor III
  • 467 Views
  • 5 replies
  • 4 kudos

Centralized Location of Table History/Timestamps in Unity Catalog

Is there a centralized location in Unity Catalog that retains the table history, specifically the last timestamp, for managed delta tables?DESCRIBE HISTORY will provide it for a specific table, but I would like to get it for a number of tables.inform...

  • 467 Views
  • 5 replies
  • 4 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 4 kudos

Hi @Eric_Kieft @noorbasha534  system.access.table_lineage includes a record for each read or write event on a Unity Catalog table or path. This includes but is not limited to job runs, notebook runs, and dashboards updated with the read or write even...

  • 4 kudos
4 More Replies
William_Scardua
by Valued Contributor
  • 471 Views
  • 1 replies
  • 1 kudos

Upsert from Databricks to CosmosDB

Hi guys,I'm adjusting a data upsert process from Databricks to CosmosDB using the .jar connector. As the load is very large, do you know if it's possible to change only the fields that have been modified?best regards

  • 471 Views
  • 1 replies
  • 1 kudos
Latest Reply
BigRoux
Databricks Employee
  • 1 kudos

Yes, you can update only the modified fields in your Cosmos DB documents from Databricks using the Partial Document Update feature (also known as Patch API). This is particularly useful for large documents where sending the entire document for update...

  • 1 kudos
397973
by New Contributor III
  • 516 Views
  • 1 replies
  • 1 kudos

Resolved! What's the best way to get from Python dict > JSON > PySpark and apply as a mapping to a dataframe?

I'm migrating code from Python Linux to Databricks PySpark. I have many mappings like this: {    "main": {    "honda": 1.0,    "toyota": 2.9,    "BMW": 5.77,    "Fiat": 4.5,    },}I exported using json.dump, saved to s3 and was able to import with sp...

397973_0-1743620626332.png
  • 516 Views
  • 1 replies
  • 1 kudos
Latest Reply
BigRoux
Databricks Employee
  • 1 kudos

For migrating your Python dictionary mappings to PySpark, you have several good options. Let's examine the approaches and identify the best solution. Using F.create_map (Your Current Approach) Your current approach using `F.create_map` is actually qu...

  • 1 kudos
srinum89
by New Contributor II
  • 175 Views
  • 1 replies
  • 0 kudos

Workflow job failing with source as Git Provider (remote github repo) with SP

Facing issue using Github App when running job with source as "Git provider" using Service Principle. Since we can't use PAT with SP on github, I am using Github app for authentication. Followed below documentation but still giving permission issue. ...

  • 175 Views
  • 1 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

When running a Databricks workflow with a Git provider source using a Service Principal, you’re encountering permission issues despite using the GitHub App for authentication. This is a common challenge because Service Principals cannot use Personal ...

  • 0 kudos
bricks3
by New Contributor III
  • 603 Views
  • 5 replies
  • 0 kudos

Resolved! How to schedule workflow in python script

I saw how to schedule a workflow using UI but python script, can someone help me to find how to schedule workflow hourly in python script ? Thank you.

  • 603 Views
  • 5 replies
  • 0 kudos
Latest Reply
Isi
Contributor
  • 0 kudos

Hey @bricks3,Exactly, as far as I know you define the workflow configuration in the YAML file, and under the hood, DABS handles the API calls to Databricks (including scheduling).To run your workflow hourly, you just need to include the schedule bloc...

  • 0 kudos
4 More Replies
minhhung0507
by Contributor III
  • 1093 Views
  • 8 replies
  • 6 kudos

CANNOT_UPDATE_TABLE_SCHEMA

I'm encountering a puzzling schema merge issue with my Delta Live Table. My setup involves several master tables on Databricks, and due to a schema change in the source database, one of my Delta Live Tables has a column (e.g., "reference_score") that...

  • 1093 Views
  • 8 replies
  • 6 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 6 kudos

Dear Hung,Thank you so much for the kind words—I’m really glad the suggestions were helpful! You're absolutely doing the right thing by trying those options first before going for a full table drop. Testing with a new table and checking schema hints ...

  • 6 kudos
7 More Replies
databicky
by Contributor II
  • 167 Views
  • 2 replies
  • 0 kudos

How to acknowledge the incidents from the databricks

I want to connect to the service now from the databricks and i want to acknowledge the incidents like assigned to the person and i need to change the status of the incident and i want to update the worknotes as well, how can i achieve this by the hel...

  • 167 Views
  • 2 replies
  • 0 kudos
Latest Reply
Isi
Contributor
  • 0 kudos

Hi @databicky Just a quick clarification: this is the Databricks community forum, not official Databricks Support. Think of it like a place where users can share ideas, help each other, or sometimes just browse, responses are not guaranteed here.If y...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels