cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

TimReddick
by Contributor
  • 12713 Views
  • 6 replies
  • 2 kudos

Using run_job_task in Databricks Asset Bundles

Do Databrick Asset Bundles support run_job_task tasks?I've made various attempts to add a run_job_task with a specified job_id. See my the code_snippet below. I tried substituting the job_id using ${...} syntax, as well as three other ways which I've...

Data Engineering
Databrick Asset Bundles
run_job_task
  • 12713 Views
  • 6 replies
  • 2 kudos
Latest Reply
kyle_r
New Contributor II
  • 2 kudos

Ah, I see it is a known bug in the Databricks CLI: Asset bundle run_job_task fails · Issue #812 · databricks/cli (github.com). Anyone facing this issue should comment on and keep an eye on that ticket for resolution. 

  • 2 kudos
5 More Replies
User16765131552
by Databricks Employee
  • 11768 Views
  • 3 replies
  • 0 kudos

Resolved! Pull Cluster Tags

Does anybody know any in-notebook or JAR code to pull cluster tags from the runtime environment? Something like... dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().apply('user')but for the cluster name?

  • 11768 Views
  • 3 replies
  • 0 kudos
Latest Reply
DatBoi
Contributor
  • 0 kudos

Did you find any documentation for spark.conf.get properties? I am trying to get some metadata about the environment my notebook is running in (specifically cluster custom tags)? But cannot find any information beside a couple of forum posts.

  • 0 kudos
2 More Replies
arielmoraes
by New Contributor III
  • 3954 Views
  • 1 replies
  • 1 kudos

Resolved! Job Concurrency Queue not working as expected

I have a process that should run the same notebook with varying parameters, thus translating to a job with queue and concurrency enabled. When the first executions are triggered the Jobs Runs work as expected, i.e. if the job has a max concurrency se...

arielmoraes_0-1696872175101.png arielmoraes_1-1696872724206.png
  • 3954 Views
  • 1 replies
  • 1 kudos
Latest Reply
arielmoraes
New Contributor III
  • 1 kudos

Hi @Retired_mod, we double-checked everything, the resources are enough and all settings are properly set. I'll reach out the support by filing a new ticket. Thank you for your help.

  • 1 kudos
b_1
by New Contributor II
  • 2326 Views
  • 2 replies
  • 1 kudos

to_timstamp function in non-legacy mode does not parse this format: yyyyMMddHHmmssSS

I have this datetime string in my dataset: '2023061218154258' and I want to convert it to datetime, using below code. However the format that I expect to work, doesn't work, namely: yyyyMMddHHmmssSS. This code will reproduce the issue:from pyspark.sq...

  • 2326 Views
  • 2 replies
  • 1 kudos
Latest Reply
b_1
New Contributor II
  • 1 kudos

Is there anybody who has the same issue or knows that this is in fact an issue?

  • 1 kudos
1 More Replies
orso
by New Contributor III
  • 11527 Views
  • 1 replies
  • 0 kudos

Resolved! Java - FAILED_WITH_ERROR when saving to snowflake

I'm trying to move data from database A to B on Snowflake. There's no permission issue since using the Python package snowflake.connector  works Databricks runtime version: 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)Insert into database B fail...

  • 11527 Views
  • 1 replies
  • 0 kudos
Latest Reply
orso
New Contributor III
  • 0 kudos

Found the problem. The sub-roles didn't have grants to the warehouse.I hope it will help someone one day

  • 0 kudos
sanjay
by Valued Contributor II
  • 3001 Views
  • 1 replies
  • 0 kudos

Trigger Events in data pipeline

Hi,I am running datapipeline in databrick using matillion architecture. I am facing inconsistent events in silver to gold layer in case any row deleted/updated from a partition. Let me explain with example.e.g. I have data in silver layer with partit...

  • 3001 Views
  • 1 replies
  • 0 kudos
Latest Reply
sanjay
Valued Contributor II
  • 0 kudos

Thank you Kaniz. Further queries on this.1. If I have nested partitions e.g. on department & date, finance->09, finance->10 and if am updating one record in finance->09 then will then updates partition finance->10 as well2. Is it good idea to have sm...

  • 0 kudos
erigaud
by Honored Contributor
  • 7728 Views
  • 4 replies
  • 5 kudos

Resolved! DLT overwrite part of the table

Hello !We're currently building a pipeline of file ingestion using a Delta Live Tables pipeline and autoloader. The bronze tables are pretty much the following schema : file_name | file_upload_date | colA | colB (Well, there are actually 250+ columns...

  • 7728 Views
  • 4 replies
  • 5 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 5 kudos

@erigaud  Using jobs/workflows would be the right choice for this.

  • 5 kudos
3 More Replies
Gilg
by Contributor II
  • 3486 Views
  • 3 replies
  • 1 kudos

DLT: Autoloader Perf

Hi Team,I am looking for some advice to perf tune my bronze layer using DLT.I have the following code very simple and yet very effective. @dlt.create_table(name="bronze_events", comment = "New raw data ingested from storage account ...

Gilg_0-1696561163925.png
  • 3486 Views
  • 3 replies
  • 1 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 1 kudos

Hi @Gilg  You mentioned that micro-batch time is around 12 minutes recently. Do we also see jobs/stages with 12 minutes in the spark ui. If that is the case, then the processing of the file itself takes 12 minutes. If not, the 12 minutes is spent on ...

  • 1 kudos
2 More Replies
Gilg
by Contributor II
  • 4093 Views
  • 1 replies
  • 1 kudos

APPLY_CHANGES late arriving data

Hi Team,I have a DLT pipeline that uses APPLY_CHANGES to our Silver tables. I am using Id as keys and timestamp to know the sequence of the incoming data. Question: How does APPLY_CHANGES handles late arriving data?i.e., for silver_table_1, the data ...

  • 4093 Views
  • 1 replies
  • 1 kudos
Monika_Bagyal
by New Contributor
  • 16953 Views
  • 0 replies
  • 0 kudos

Access denied error while reading file from S3 to spark

I'm seeing the access denied error from spark cluster while reading s3 file into notebook.Running on personal single user compute with LTS 13.3 ML.configs setup looks like this:spark.conf.set("spark.hadoop.fs.s3a.access.key", access_id)spark.conf.set...

  • 16953 Views
  • 0 replies
  • 0 kudos
PradyumnJoshi
by New Contributor
  • 2278 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks Academy - Advanced Data Engineering - Notebook Error while loading configurations

Hi Databricks Academy team,I am getting below errors while running classroom setup command in Databricks Academy - Advanced data engineering course Notebooks in  databricks community edition. Please help me resolve it. #databricksacademy #advanceddat...

PradyumnJoshi_1-1696489013813.png PradyumnJoshi_0-1696488924057.png
  • 2278 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16847923431
Databricks Employee
  • 0 kudos

Hi, all. Our apologies - the Advanced Data Engineering with Databricks course will not run on Databricks Community Edition. If you would like a lab environment to run this course on, please see the new paid lab subscription available via the Databric...

  • 0 kudos
RiyuLite
by New Contributor III
  • 3922 Views
  • 3 replies
  • 2 kudos

Where do I get Account level logs after enabling diagnostic logs for Azure databricks?

I need to retrieve the accountBillage usage from Audit logsI have enabled Diagnostic logs, and it's been 36 hours. While enabling the logs , I selected every possible logs in this image. But still i am not able to see the containers for account level...

RiyuLite_3-1696492947786.png RiyuLite_2-1696492911080.png
  • 3922 Views
  • 3 replies
  • 2 kudos
Latest Reply
RiyuLite
New Contributor III
  • 2 kudos

Hi @Retired_mod , I checked Azure Monitoring and log delivery documentations, The log delivery is same as workspace level.What is the procedure to enable account level service in audit logs for Azure ? 

  • 2 kudos
2 More Replies
naga_databricks
by Contributor
  • 4670 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks asset bundles deployment to development

Hi All,I am using Databricks Asset Bundles to deploy my code on github to databricks workspace. I have written out the Github Action as provided on databricks documentation.I have setup the personal access token for the service principal I want to us...

Data Engineering
asset_bundles
  • 4670 Views
  • 1 replies
  • 0 kudos
Latest Reply
naga_databricks
Contributor
  • 0 kudos

Finally, i was able to identify the missing piece. This was setting up the environment identifier for the runner.  name: "Deploy bundle" runs-on: ubuntu-latest environment: ${{github.event.inputs.Environment}}With this, the action was able...

  • 0 kudos
N_M
by Contributor
  • 2659 Views
  • 0 replies
  • 0 kudos

Unzip multipart files

Hi all,Due to file size and file transfer limitation, we are receiving huge files compressed and split, in the format    FILE.z01, FILE.z02,...,FILE.zipHowever, I can't find a way to unzip multipart files using databricks.I tried already some of the ...

Data Engineering
bash
unzip
  • 2659 Views
  • 0 replies
  • 0 kudos
Kaviana
by New Contributor III
  • 2705 Views
  • 0 replies
  • 0 kudos

internal server error when creating workspace

I tried to create a workspace and it is not generated either automatically or manually. The strange thing is that it stopped working after a certain time. It seems like an internal Databricks error but it is not known if it is like that or a bug, wha...

  • 2705 Views
  • 0 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels