cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Sujith_i
by New Contributor
  • 3320 Views
  • 1 replies
  • 0 kudos

databricks sdk for python authentication failing

I am trying to use databricks sdk for python to do some account level operations like creating groups and created a databricks config file locally n provided the profile name as argument to AccountClient but authentication keeps failing. the same con...

  • 3320 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Authentication for account-level operations with Databricks SDK for Python requires more than just referencing the profile name in your local .databrickscfg file. While the CLI consults .databrickscfg for profiles and can use them directly, the SDK's...

  • 0 kudos
AvneeshSingh
by New Contributor
  • 3267 Views
  • 2 replies
  • 1 kudos

Autloader Data Reprocess

Hi ,If possible can any please help me with some autloader options I have 2 open queries ,(i) Let assume I am running some autoloader stream and if my job fails, so instead of resetting the whole checkpoint, I want to run stream from specified timest...

Data Engineering
autoloader
  • 3267 Views
  • 2 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

In Databricks Autoloader, controlling the starting point for streaming data after a job failure requires careful management of checkpoints and configuration options. By default, Autoloader uses checkpoints to remember where the stream last left off, ...

  • 1 kudos
1 More Replies
Nidhig
by Contributor
  • 55 Views
  • 1 replies
  • 2 kudos

Resolved! Global Parameter at the Pipeline level in Lakeflow Job

Hi ,any work around or Databricks can enable global parameters feature at the pipeline level in the lakeflow job.Currently I am working on migrating adf pipeline schedule set up to lakeflow job. 

  • 55 Views
  • 1 replies
  • 2 kudos
Latest Reply
mark_ott
Databricks Employee
  • 2 kudos

Databricks Lakeflow Declarative Pipelines do not currently support truly global parameters at the pipeline level in the same way that Azure Data Factory (ADF) allows, but there are workarounds that enable parameterization to streamline migration from...

  • 2 kudos
VaDim
by New Contributor III
  • 77 Views
  • 1 replies
  • 0 kudos

transformWithStateInPandas. Invalid pickle opcode when updating ValueState with large (float) array

I am getting an error when the entity I need to store in a ValueState is a large array (over 15k-20k items). No error (and works correctly) if I trim the array to under 10k samples. The same error is raised when using it as a value for MapState or as...

  • 77 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The error you’re facing, specifically PySparkRuntimeError: Error updating value state: invalid pickle opcod, usually points to a serialization (pickling) problem when storing large arrays in Flink/Spark state such as ValueState, ListState, or MapStat...

  • 0 kudos
SamAdams
by Contributor
  • 42 Views
  • 1 replies
  • 0 kudos

Time window for "All tables are updated" option in job Table Update Trigger

I've been using the Table Update Trigger for some SQL alert workflows. I have a job that uses 3 tables with an "All tables updated" trigger:Table 1 was updated at 07:20 UTCTable 2 was updated at 16:48 UTCTable 3 was updated at 16:50 UTC-> Job is trig...

Data Engineering
jobs
TableUpdateTrigger
  • 42 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

There is no fixed or documented “window” time for the interval between updates to all monitored tables before a job with an "All tables updated" trigger runs in Databricks. The job is triggered as soon as every table in the set has seen at least one ...

  • 0 kudos
deano2025
by New Contributor II
  • 26 Views
  • 0 replies
  • 0 kudos

Databricks asset bundles CI/CD design for github actions

We are wanting to use Databricks asset bundles and deploy code changes and tests using github actions. We have seen lots of content online, but nothing concrete on how this is done at scale. So I'm wondering, if we have many changes and therefore man...

Data Engineering
asset bundles
  • 26 Views
  • 0 replies
  • 0 kudos
ak5har
by New Contributor II
  • 2704 Views
  • 9 replies
  • 2 kudos

Databricks connection to on-prem cloudera

Hello,     we are trying to evaluate Databricks solution to extract the data from existing cloudera schema hosted on physical server. We are using the Databricks serverless compute provided by databricks express setup and we assume we will not need t...

  • 2704 Views
  • 9 replies
  • 2 kudos
Latest Reply
Adrian_Ashley
New Contributor
  • 2 kudos

I work for a databricks partner called Cirata.  Our Data migrator offering allows  both data and metadata replication  from cloudera to be delivered to the databricks environment , whether this is just delivering it to the ADLS2 object storage or to ...

  • 2 kudos
8 More Replies
pepco
by New Contributor II
  • 38 Views
  • 2 replies
  • 0 kudos

Resolved! Environment in serverless

I'm playing little bit with on the Databricks free environment and I'm super confused by the documentation vs actual behavior. Maybe you could help me to understand better.For the workspace I can define base environment which I can use in serverless ...

Data Engineering
base environment
serverless
  • 38 Views
  • 2 replies
  • 0 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 0 kudos

Hello @pepco , Is it possible to use environments with notebook tasks? Yes—but only in a very specific way. Notebook tasks can use base environments, but you don’t attach them in the job’s YAML. You pick the base env in the notebook’s Environment sid...

  • 0 kudos
1 More Replies
JanFalta
by New Contributor
  • 27 Views
  • 0 replies
  • 0 kudos

Data Masking

Hi all,I need some help on this masking problem. If you create a view with used masking function based on table.The user reading this view has to have read access to underlying table. So theoretically, he can access unmasked data in the table.I would...

  • 27 Views
  • 0 replies
  • 0 kudos
KKo
by Contributor III
  • 327 Views
  • 1 replies
  • 0 kudos

On Prem MS sql to Azure Databricks

Hi allI need to ingest data from on prem MS sql tables using Databricks to Azure Cloud. For the ingest, previously I used notebooks, jdbc connectors, read sql tables and write in unity catalog tables. Now, I want to experiment Databricks connectors f...

  • 327 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

This is feature is good to go... I can't think of any disadvantages. Here is a guide.  https://landang.ca/2025/01/31/simple-data-ingestion-from-sql-server-to-databricks-using-lakeflow-connect/  

  • 0 kudos
Suheb
by New Contributor
  • 25 Views
  • 1 replies
  • 0 kudos

How have you set up a governance structure (data access control, workspace management, cluster polic

If your company uses Databricks with many people, how do you manage security, organize teams, and control costs — and what tools do you use to make it all work smoothly?

  • 25 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

Please take a look here to get some initial ideas. https://medium.com/databricks-unity-catalog-sme/a-practical-guide-to-catalog-layout-data-sharing-and-distribution-with-databricks-unity-catalog-763e4c7b7351  

  • 0 kudos
him
by New Contributor III
  • 25004 Views
  • 14 replies
  • 10 kudos

i am getting the below error while making a GET request to job in databrick after successfully running it

"error_code": "INVALID_PARAMETER_VALUE",  "message": "Retrieving the output of runs with multiple tasks is not supported. Please retrieve the output of each individual task run instead."}

Capture
  • 25004 Views
  • 14 replies
  • 10 kudos
Latest Reply
Octavian1
Contributor
  • 10 kudos

Hi @Debayan I'd suggest to also mention this explicitly in the documentation of the workspace client for get_run_outputOne has to pay extra attention to the examplerun_id=run.tasks[0].run_id otherwise it can be easily missed. 

  • 10 kudos
13 More Replies
toproximahk
by New Contributor
  • 137 Views
  • 3 replies
  • 0 kudos

Inquiry on GraphFrame Library Upgrade Timeline for Databricks Runtime for Machine Learning

Thanks for the Databricks community and maintaining such a valuable platform.I would like to inquire if there is a planned timeline for upgrading the GraphFrame library. We’ve noticed that the latest release on GitHub is v0.9.3, while the Databricks ...

  • 137 Views
  • 3 replies
  • 0 kudos
Latest Reply
Sem-Sinchenko
New Contributor
  • 0 kudos

You can try to add to your cluster mvn dependency manually ... For example, for spark 3.5.x it will be like:io.graphframes:graphframes-spark3_2.12:0.10.0and add a PyPi dependency graphframes-py. Adding maven coordinates should download and install al...

  • 0 kudos
2 More Replies
alhuelamo
by New Contributor II
  • 10301 Views
  • 5 replies
  • 1 kudos

Getting non-traceable NullPointerExceptions

We're running a job that's issuing NullPointerException without traces of our job's code.Does anybody know what would be the best course of action when it comes to debugging these issues?The job is a Scala job running on DBR 11.3 LTS.In case it's rel...

  • 10301 Views
  • 5 replies
  • 1 kudos
Latest Reply
Amora
New Contributor
  • 1 kudos

You could try enabling full stack traces and checking the Spark executor logs for hidden errors. Null Pointer Exceptions in Scala on DBR often come from lazy evaluations or missing schema fields during I/O. Reviewing your Data Frame transformations a...

  • 1 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels