cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

juan_barreto
by New Contributor III
  • 3587 Views
  • 6 replies
  • 9 kudos

Problem with dropDuplicates in Databricks runtime 15.4LTS

Hi,I'm testing the latest version of the databricks runtime but I'm getting errors doing a simple dropDuplicates.Using the following codedata = spark.read.table("some_table") data.dropDuplicates(subset=['SOME_COLUMN']).count() I'm getting this error....

juan_barreto_0-1726153266526.png
  • 3587 Views
  • 6 replies
  • 9 kudos
Latest Reply
Witold
Databricks Partner
  • 9 kudos

Unless is was communicated as a breaking changes between major updates, it would be OK. But I can't find anything in the release notes, so it's a bug.

  • 9 kudos
5 More Replies
ossinova
by Contributor II
  • 4576 Views
  • 3 replies
  • 2 kudos

Reading data from S3 in Azure Databricks

Is it possible to create an external volume in Azure Databricks that points to an external S3 bucket so that I can read files for processing? Or is it only limited to ADLSv2?

  • 4576 Views
  • 3 replies
  • 2 kudos
Latest Reply
Ashley1
Contributor
  • 2 kudos

Yep, I'm keen to see this functionality as well.I think it is reasonable to expect external locations can be on diverse storage types (at least the big players). I can nicely control access to azure storage in UC but not S3.

  • 2 kudos
2 More Replies
del1000
by New Contributor III
  • 22840 Views
  • 8 replies
  • 3 kudos

Resolved! Is it possible to passthrough job's parameters to variable?

Scenario:I tried to run notebook_primary as a job with same parameters' map. This notebook is orchestrator for notebooks_sec_1, notebooks_sec_2, and notebooks_sec_3 and next. I run them by dbutils.notebook.run(path, timeout, arguments) function.So ho...

  • 22840 Views
  • 8 replies
  • 3 kudos
Latest Reply
nnalla
New Contributor II
  • 3 kudos

I am using getCurrentBindings(), but it returns an empty dictionary even though I passed parameters. I am running it in a scheduled workflow job

  • 3 kudos
7 More Replies
rendorHaevyn
by New Contributor III
  • 14153 Views
  • 5 replies
  • 0 kudos

Databricks SQL Warehouse did not auto stop after specified 90 minute interval - why not?

In this specific case, we're running a 2XSmall SQL Warehouse on Databricks SQL.In looking at the SQL Warehouse monitoring log for this cluster, we noticed:final query executed by user at 10:26 on 2023-06-20no activity for some time, yet cluster remai...

  • 14153 Views
  • 5 replies
  • 0 kudos
Latest Reply
jfid
New Contributor II
  • 0 kudos

Also dealing with the same issue! Anybody has any idea how to check it? There is no sort of logs and no actual query happens

  • 0 kudos
4 More Replies
VeeruK
by New Contributor III
  • 4171 Views
  • 7 replies
  • 0 kudos

Databricks Lakehouse Fundamentals BadgeI have successfully passed the test after completion of the course "Databricks Lakehouse Fundamentals&quot...

Databricks Lakehouse Fundamentals BadgeI have successfully passed the test after completion of the course "Databricks Lakehouse Fundamentals". But I have'nt recieved any badge. I have been provided with a certificate only. Please provide me with th...

  • 4171 Views
  • 7 replies
  • 0 kudos
Latest Reply
data_learner
New Contributor II
  • 0 kudos

I'm having the same issue

  • 0 kudos
6 More Replies
Sangram
by New Contributor III
  • 5645 Views
  • 4 replies
  • 2 kudos

Turn on full screen for databricks training videos

It seems full screen option for databricks training videos are turned off. How to turn it on ?

  • 5645 Views
  • 4 replies
  • 2 kudos
Latest Reply
bennner
New Contributor II
  • 2 kudos

It sounds like the full-screen option is disabled by the platform hosting the Databricks training videos. If that's the case, it may be out of your control. However, you could try these workarounds:Browser Zoom: Use the zoom feature (Ctrl + "+" on Wi...

  • 2 kudos
3 More Replies
Mario_D
by New Contributor III
  • 2626 Views
  • 2 replies
  • 1 kudos

Resolved! Foreign key constraint in a dlt pipeline

As primary/foreign key constraints are now supported/available in Databricks, how are foreign key constraints handled in a dlt pipeline, i.e if a foreign key constraint is violated, is the record logged as a data quality issue and still added to the ...

  • 2626 Views
  • 2 replies
  • 1 kudos
Latest Reply
RCo
New Contributor III
  • 1 kudos

Hi @Mario_D!While primary & foreign key constraints are generally available in Databricks Runtime 15.2 and above, they are strictly informational only.This means that a primary key will not prevent duplicates from being added to a table and a foreign...

  • 1 kudos
1 More Replies
FG
by New Contributor II
  • 14953 Views
  • 5 replies
  • 1 kudos

Running unit tests from a different notebook (using Python unittest package) doesn't produce output (can't discover the test files)

I have a test file (test_transforms.py) which has a series of tests running using Python's unittest package. I can successfully run the tests inside of the file with expected output. But when I try to run this test file from a different notebook (run...

image.png image
  • 14953 Views
  • 5 replies
  • 1 kudos
Latest Reply
SpaceDC
New Contributor II
  • 1 kudos

Hello, I have exactly the same issue.In my case, using the ipytest library from Databricks clusters, this is the error that occurs when I try to run the tests:EEEEE [100%]============================================== ERRORS =========================...

  • 1 kudos
4 More Replies
epps
by New Contributor
  • 4093 Views
  • 1 replies
  • 0 kudos

400 Unable to load OAuth Config

I've enabled SSO for my Databricks account with Okta as the identity provider and tested the integration is working. I'm now trying to implement an on-behalf-of token exchange so that my API can make authenticate requests to Databricks's API (e.g. ) ...

  • 4093 Views
  • 1 replies
  • 0 kudos
Latest Reply
riyadh-ruhr
New Contributor II
  • 0 kudos

Hello ,Were able to fix the issue ? I'm trying to implement the same thing

  • 0 kudos
JUPin
by Databricks Partner
  • 4906 Views
  • 3 replies
  • 0 kudos

REST API for Pipeline Events does not return all records

I'm using the REST API to retrieve Pipeline Events per the documentation:https://docs.databricks.com/api/workspace/pipelines/listpipelineeventsI am able to retrieve some records but the API stops after a call or two.  I verified the number of rows us...

  • 4906 Views
  • 3 replies
  • 0 kudos
Latest Reply
wise_owl
New Contributor III
  • 0 kudos

You can leverage this code base. It works as expected using "next_page_token" parameter-Don't forget to mark this solution as correct if this helped you  import requests token = 'your token' url = 'your URL' params = {'expand_tasks': 'true'} header...

  • 0 kudos
2 More Replies
Jaku6
by New Contributor
  • 1708 Views
  • 1 replies
  • 1 kudos

Run now with different parameters doesn't pass parameter to pipeline tasks

I have a job with some tasks. Some of the tasks are pipeline_task's some are notebook_task's.When I run the job with "Run now with different parameters" and enter a new key-value, I see that the key-value is available in the notebook_task's with dbut...

  • 1708 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

As per docs it seems that pipeline task type is currently not supported to pass parameters: https://docs.databricks.com/en/jobs/create-run-jobs.html#pass-parameters-to-a-databricks-job-taskYou could create a notebook task that runs before your pipeli...

  • 1 kudos
Elderion
by New Contributor II
  • 5070 Views
  • 3 replies
  • 1 kudos

Resolved! Delta Live Tables + Databricks Assets Bundles

Hi,I'm trying to setup CICD pipeline for Delta Live Table jobs using Databricks Bundles. I have a problem with path to notebook in pipeline. According to this example:https://docs.databricks.com/en/delta-live-tables/tutorial-bundles.htmlYAML file sho...

Elderion_0-1726152706169.png
  • 5070 Views
  • 3 replies
  • 1 kudos
Latest Reply
ThierryBa
Databricks Partner
  • 1 kudos

I had this error once.you need to specify the extension of your file. If you set the notebook to be python, then it must be .py at the end, likewise .sql if you used SQL     libraries:        - notebook:            path:  ${workspace.file_path}/datab...

  • 1 kudos
2 More Replies
Kurtis_R
by Databricks Partner
  • 1594 Views
  • 2 replies
  • 0 kudos

Excel Formula results

Hi all,Just wanted to raise a question regarding Databricks workbooks and viewing the results in the cells. For the example provided in the screenshot I want to view the results of an excel formula that has been applied to a cell in our workbooks. Fo...

Kurtis_R_0-1725568966348.png Kurtis_R_1-1725569630650.png
  • 1594 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16756723392
Databricks Employee
  • 0 kudos

@Kurtis_R do you want to display the value of 45 or formula of how 45 is achieved.?

  • 0 kudos
1 More Replies
Sharmila_12
by Databricks Partner
  • 1043 Views
  • 1 replies
  • 0 kudos

I don't have any Last name. what should give in the mandatory last name field?

Hi, I was about to register for Databricks certified Data engineer Associate exam. while registering for the exam, it is asking for Last name which is the mandatory field. But none of my Government proofs have last name, only first name is there. wha...

  • 1043 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anushree_Tatode
Databricks Employee
  • 0 kudos

Hi,To proceed with the registration, please enter a space or a full stop in the last name field. This should allow you to continue with the process, feel free to reach out if you need any further assistance.Best Regards,Anushree

  • 0 kudos
Enrique1987
by New Contributor III
  • 6899 Views
  • 1 replies
  • 0 kudos

Photon Benchmark

I'm conducting my own comparative study between a cluster with Photon enabled and a cluster without Photon to see what improvements occur. According to Databricks, there should be up to 12x better performance, but I'm only finding about a 20% improve...

  • 6899 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Enrique1987 ,You can find more information about photon in below whitepaper:https://people.eecs.berkeley.edu/~matei/papers/2022/sigmod_photon.pdf

  • 0 kudos
Labels