cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Pramod_G
by New Contributor II
  • 501 Views
  • 4 replies
  • 0 kudos

Job Cluster with Continuous Trigger Type: Is Frequent Restart Required?

Hi All,I have a job continuously processing IoT data. The workflow reads data from Azure Event Hub and inserts it into the Databricks bronze layer. From there, the data is read, processed, validated, and inserted into the Databricks silver layer. The...

Data Engineering
Driver or Cluster Stability Issues
Long-Running Job Challenges
  • 501 Views
  • 4 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

How are you ingesting the data? Are you using the Delta Live Table mechanism - https://docs.databricks.com/en/delta-live-tables/index.html?

  • 0 kudos
3 More Replies
lauraxyz
by Contributor
  • 1789 Views
  • 4 replies
  • 1 kudos

Put file into volume within Databricks

Hi!  From a Databricks job, i want to copy a workspace file into volume.  how can i do that?I tried`dbutils.fs.cp("/Workspace/path/to/the/file", "/Volumes/path/to/destination")`but got Public DBFS root is disabled. Access is denied on path: /Workspac...

  • 1789 Views
  • 4 replies
  • 1 kudos
Latest Reply
lauraxyz
Contributor
  • 1 kudos

Found the reason!  It's the runtime, it doesn't work on Databricks Runtime Version 15.4 LTS, but started to work after changing to 16.0.   Maybe this is something supported from the latest version?

  • 1 kudos
3 More Replies
GS_S
by New Contributor III
  • 881 Views
  • 7 replies
  • 0 kudos

Resolved! Error during merge operation: 'NoneType' object has no attribute 'collect'

Why does merge.collect() not return results in access mode: SINGLE_USER, but it does in USER_ISOLATION? I need to log the affected rows (inserted and updated) and can’t find a simple way to get this data in SINGLE_USER mode. Is there a solution or an...

  • 881 Views
  • 7 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

15.4 does not directly required the serverless but for fine-grained it indeed requires it to run it on Single User as mentioned  This data filtering is performed behind the scenes using serverless compute. In terms of costs:Customers are charged for ...

  • 0 kudos
6 More Replies
manojpatil04
by New Contributor III
  • 645 Views
  • 5 replies
  • 0 kudos

External dependency on serverless job from Airflow is not working on s3 path and workspace

I am working on use case where we have to run python script from serverless job through Airflow. when we are trying to trigger serverless job and passing external dependency as wheel from s3 path or workspace path it is not working, but on volume it ...

  • 645 Views
  • 5 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

As per serverless compute limitations I can see the following:  Task libraries are not supported for notebook tasks. Use notebook-scoped libraries instead. See Notebook-scoped Python libraries.

  • 0 kudos
4 More Replies
stadelmannkevin
by New Contributor II
  • 666 Views
  • 4 replies
  • 2 kudos

init_script breaks Notebooks

 Hi everyoneWe would like to use our private company Python repository for installing Python libraries with pip install.To achieve this, I created a simple script which sets the index-url configuration of pip to our private repoI set this script as a...

  • 666 Views
  • 4 replies
  • 2 kudos
Latest Reply
Walter_C
Databricks Employee
  • 2 kudos

Did you also try cloning the cluster or using other cluster for the testing? The metastore down is normally a Hive Metastore issue, should not be impacting here, but you could check for more details on the error on the log4j under Driver logs.

  • 2 kudos
3 More Replies
sensanjoy
by Contributor
  • 23405 Views
  • 6 replies
  • 1 kudos

Resolved! Performance issue with pyspark udf function calling rest api

Hi All,I am facing some performance issue with one of pyspark udf function that post data to REST API(uses cosmos db backend to store the data).Please find the details below: # The spark dataframe(df) contains near about 30-40k data. # I am using pyt...

  • 23405 Views
  • 6 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Sanjoy Sen​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback w...

  • 1 kudos
5 More Replies
wi11iamr
by New Contributor II
  • 1175 Views
  • 5 replies
  • 0 kudos

PowerBI Connection: Possible to use ADOMDClient (or alternative)?

I wish to extract from PowerBI Datasets the metadata of all Measures, Relationships and Entities.In VSCode I have a python script that connects to the PowerBI API using the Pyadomd module connecting via the XMLA endpoint. After much trial and error I...

  • 1175 Views
  • 5 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

I understand, yes it seems that this is currently not possible, only option will be to export your dataset as a csv file and import it in Databricks.

  • 0 kudos
4 More Replies
shusharin_anton
by New Contributor II
  • 511 Views
  • 1 replies
  • 1 kudos

Resolved! Sort after update on DWH

Running query on serverless DWH:UPDATEcatalog.schema.tableSETcol_tmp = CAST(col as DECIMAL(30, 15))In query profiling, it has some sort and shuffle stages in graph.Table has partition by partition_date columnSome details in sort node mentions that so...

  • 511 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @shusharin_anton, The sort and shuffle stages in your query profile are likely triggered by the need to redistribute and order the data based on the partition_date column. This behavior can be attributed to the way Spark handles data partitioning ...

  • 1 kudos
rai00
by New Contributor
  • 316 Views
  • 1 replies
  • 0 kudos

Mock user doesn't have the required privileges to access catalog `remorph` while running 'make test'

Utility : Remorph (Databricks)Issue  : 'User `me@example.com` doesn't have required privileges :: ``to access catalog `remorph`' while running 'make test' cmdI am encountering an issue while running tests for Databricks Labs Remorph using 'make test'...

  • 316 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @rai00, Ensure that the mock user me@example.com has the necessary privileges at both the catalog and schema levels. The user needs specific privileges such as USE_SCHEMA and CREATE_VOLUME   Use the WorkspaceClient to check the effective privilege...

  • 0 kudos
cool_cool_cool
by New Contributor II
  • 2444 Views
  • 2 replies
  • 2 kudos

Resolved! Trigger Dashboard Update At The End of a Workflow

Heya I have a workflow that computes some data and writes to a delta table, and I have a dashboard that is based on the table. How can I trigger refresh on the dashboard once the workflow is finished? Thanks!

  • 2444 Views
  • 2 replies
  • 2 kudos
Latest Reply
DanWertheimer
New Contributor II
  • 2 kudos

How does one do this with the new dashboards? I only see the ability to do this with legacy dashoards.

  • 2 kudos
1 More Replies
SparkMaster
by New Contributor III
  • 10231 Views
  • 11 replies
  • 2 kudos

Why can't I delete experiments without deleting the notebook? Or better Organize experiments into folders?

My Databricks Experiments is cluttered with a whole lot of experiments. Many of them are notebooks which are showing there for some reason (even though they didn't have an MLflow run associated with it). I would like to delete the experiments, but it...

  • 10231 Views
  • 11 replies
  • 2 kudos
Latest Reply
mhiltner
Databricks Employee
  • 2 kudos

Hey @Debayan @SparkMaster  A bit late here, but I believe this is being caused by a click on the right side experiments icon. This may look like a meaningless click but it actually triggers a run. 

  • 2 kudos
10 More Replies
jeremy98
by Contributor III
  • 455 Views
  • 1 replies
  • 0 kudos

Resolved! Can we modify the constraint of a primary key in an existed table?

 Hello Community,Is it possible to modify the schema of an existing table that currently has an ID column without any constraints? I would like to update the schema to make the ID column a primary key with auto-increment starting by the maximum id al...

  • 455 Views
  • 1 replies
  • 0 kudos
Latest Reply
PiotrMi
Contributor
  • 0 kudos

Hey @jeremy98 Based on some old article it looks it cannot be done:There are a few caveats you should keep in mind when adopting this new feature. Identity columns cannot be added to existing tables; the tables will need to be recreated with the new ...

  • 0 kudos
Shreyash_Gupta
by New Contributor III
  • 2455 Views
  • 4 replies
  • 0 kudos

Resolved! Can we display key vault secret in Databricks notebook

I am using databricks notebook and Azure key vault.When I am using below function I am getting as output [REDACTED].'dbutils.secrets.get(scope_name,secret_name)' I want to know if there is any way to display the secret in databricks.

  • 2455 Views
  • 4 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@Shreyash_Gupta You can simply iterate over each letter of the secret and print it.Something like this:for letter in dbutils.secrets.get(scope_name,secret_name): print(letter)

  • 0 kudos
3 More Replies
francisix
by New Contributor II
  • 5017 Views
  • 5 replies
  • 1 kudos

Resolved! I haven't received badge for completion

Hi,Today I completed the test for Lakehouse fundamentals by scored 85%, still I haven't received the badge through my email francis@intellectyx.comKindly let me know please !-Francis

  • 5017 Views
  • 5 replies
  • 1 kudos
Latest Reply
sureshrocks1984
New Contributor II
  • 1 kudos

HI  I completed the test for Databricks Certified Data Engineer Associate on 17 December 2024.  still I haven't received the badge through my email sureshrocks.1984@hotmail.comKindly let me know please !SURESHK 

  • 1 kudos
4 More Replies
f1nesse13
by New Contributor
  • 276 Views
  • 1 replies
  • 0 kudos

Question about notifications and failed jobs

Hello, I had a question involving rerunning a job from a checkpoint using ‘Repair Run’. I have a job which failed and Im looking to rerun the stream from a checkpoint. My job uses notifications for file detection (cloudFiles.useNotifications). My que...

  • 276 Views
  • 1 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

When rerunning your job from a checkpoint using Repair Run with cloudFiles.useNotifications, only unprocessed messages in the queue (representing new or failed-to-process files) will be consumed. Files or events already recorded in the checkpoint wil...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels