cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

SupunK
by New Contributor II
  • 237 Views
  • 1 replies
  • 2 kudos

Databricks always loads built-in BigQuery connector (0.22.2), can’t override with 0.43.x

I am using Databricks Runtime 15.4 (Spark 3.5 / Scala 2.12) on AWS.My goal is to use the latest Google BigQuery connector because I need the direct write method (BigQuery Storage Write API):option("writeMethod", "direct")This allows writing directly ...

  • 237 Views
  • 1 replies
  • 2 kudos
Latest Reply
mark_ott
Databricks Employee
  • 2 kudos

There is no supported way on Databricks Runtime 15.4 to override or replace the built-in BigQuery connector to use your own version (such as 0.43.x) in order to access the direct write method. Databricks clusters come preloaded with their own managed...

  • 2 kudos
Mathias_Peters
by Contributor II
  • 160 Views
  • 1 replies
  • 0 kudos

Question on how to properly write a dataset of custom objects to MonogDB

Hi, I am implementing a Spark Job in Kotlin (unfortunately a must-have) which reads from and writes to MongoDB. The reason for this is to reuse existing code in a MapFunction. The result of applying that map is a DataSet of type Consumer, a custom ob...

  • 160 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

You are correct—when you pass a BsonDocument to Spark's MongoDB connector using .write().format("mongodb"), Spark treats unknown types as generic serialized blobs, leading to documents stored as a single binary field (as you observed) rather than as ...

  • 0 kudos
Neo_Reeves
by New Contributor
  • 239 Views
  • 2 replies
  • 0 kudos

Trigger a Databricks job using sql query

Trying to run a SQL call function to trigger a Databricks job and am stuck with an error stating:"The schema <schema_name>.system cannot be found. Verify the spelling and correctness of the schema and catalog. If you did not qualify the name with a c...

  • 239 Views
  • 2 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor III
  • 0 kudos

When you say "Databricks Job", do you mean a custom SQL code you created in an stored procedure or function? Saying this because if you're really trying to run a real Databricks Job (lakeflow), it would be much better to establish another strategy di...

  • 0 kudos
1 More Replies
el_mark
by New Contributor II
  • 387 Views
  • 3 replies
  • 1 kudos

Resolved! Delta Sharing Issue between AWS and Azure

HiWe have attempted to setup a delta share between from Azure to AWS.We can see the delta share table and meta data in AWS, however when we attempt to query the table we hit a problem.If we use serverless SQL or Notebook and whitelist the IP address ...

  • 387 Views
  • 3 replies
  • 1 kudos
Latest Reply
el_mark
New Contributor II
  • 1 kudos

Thank you @ManojkMohan.I can see the correct IP address when if IPIFY from a compute notebook.  So from what you are saying above, that implies the issue is with the Azure Storage firewall right?

  • 1 kudos
2 More Replies
Naveenkumar1811
by New Contributor III
  • 484 Views
  • 9 replies
  • 1 kudos

How do i Create a workspace object with SP ownership

Hi Team,I have a scenario that i have a jar file(24MB) to be put on workspace directory. But the ownership should be associated to the SP with any Individual ID ownership. Tried the Databricks CLI export option but it has limitation of 10 MB max.Plea...

  • 484 Views
  • 9 replies
  • 1 kudos
Latest Reply
Coffee77
Contributor III
  • 1 kudos

Inspecting underlying HTTP traffic while using Databricks UI to import files in Workspace, it turns out (as expected) Databricks API is used, with requests similar to:So, @Naveenkumar1811 use Databricks API with SP identity in a similar way as expect...

  • 1 kudos
8 More Replies
Etyr
by Contributor II
  • 236 Views
  • 2 replies
  • 0 kudos

"Something went wrong, please try again later." On Sync tables for PostgreSQL

I'm using the Sync feature to load up a Snowflake view from a catalog to postgreSQL (to expose data to API's for faster response times).I'm been playing around scripting the creation of the sync. And when I create + delete and recreate the same sync/...

Etyr_0-1763982491712.png
  • 236 Views
  • 2 replies
  • 0 kudos
Latest Reply
Etyr
Contributor II
  • 0 kudos

Thank you for your response @stbjelcevic ,So tried to refreh the catalog and the schema when the table was deleted in Postgre + unity catalog (the sync one) and removed the pipeline:from databricks.sdk import WorkspaceClient from databricks.sdk.servi...

  • 0 kudos
1 More Replies
analyticsnerd
by New Contributor III
  • 379 Views
  • 4 replies
  • 6 kudos

Resolved! VACUUM vs VACUUM LITE

Hey Team,I have a few questions regarding VACUUM and VACUUM LITE1. How do they work internally, do both of them scan the entire table storage directory?2. How should we use these in our prod jobs..I mean should we always run VACUUM LITE or VACUUM or ...

  • 379 Views
  • 4 replies
  • 6 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 6 kudos

Thanks @K_Anudeep for the insights on VACUUM operations.

  • 6 kudos
3 More Replies
GijsR
by New Contributor II
  • 285 Views
  • 3 replies
  • 3 kudos

Resolved! List workspace permissions should return identity

Hi there,Looking through the documentation, I noticed the /api/2.0/accounts/{account_id}/workspaces/{workspace_id}/permissionassignments/permissions only returns the permissions but not the identity assigned the permission. This would be helpful for ...

  • 285 Views
  • 3 replies
  • 3 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 3 kudos

Hello @GijsR , For “who has what” today, the most reliable alternatives are the system tables and Unity Catalog information schema views, which do include principals. You can use the information_schema to list the current grants the principals (GRANT...

  • 3 kudos
2 More Replies
dbernstein_tp
by New Contributor III
  • 272 Views
  • 3 replies
  • 2 kudos

Lakeflow Connect CDC error, broken links

I get this error, regarding database validation, when setting up a lakeflow connect CDC pipeline (see screenshot). The two links mentioned in the message are broken, they give me a "404 - Content Not Found" when I try to open them. 

Screenshot 2025-11-21 at 9.42.20 AM.png
  • 272 Views
  • 3 replies
  • 2 kudos
Latest Reply
dbernstein_tp
New Contributor III
  • 2 kudos

@Advika Thank you. My reason for this post was to alert the SQL server ingestion team to this bug in the interface. I will file a report about this (didn't know I could do that) and a few other issues with the feature that I've found recently.

  • 2 kudos
2 More Replies
sk007
by New Contributor
  • 1174 Views
  • 4 replies
  • 2 kudos

Resolved! Lakeflow Connect - Postgres connector

Hi, I was wondering when is the ETA of the LF Connector for PostgreSQL (even in public/private preview)?  

  • 1174 Views
  • 4 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Ask your workspace administrator if they disabled access to it. Louis  

  • 2 kudos
3 More Replies
ashishCh
by New Contributor II
  • 300 Views
  • 2 replies
  • 1 kudos

Facing CANNOT_OPEN_SOCKET error after job cluster fails to upsacle to target nodes

This error pops up in my Databricks workflow 1 out of 10 times, and everytime it occurs I see the below message in event logs. Compute upsize complete, but below target size. The current worker count is 1, out of a target of 3.And right after this my...

Screenshot 2025-11-25 at 6.08.19 PM.png Screenshot 2025-11-25 at 6.10.50 PM.png Screenshot 2025-11-25 at 6.12.30 PM.png
  • 300 Views
  • 2 replies
  • 1 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 1 kudos

@ashishCh  The [CANNOT_OPEN_SOCKET] failures stem from PySpark’s default, socket‑based data transfer path used when collecting rows back to Python (e.g., .collect(), .first(), .take()), where the local handshake to a JVM‑opened ephemeral port on 127....

  • 1 kudos
1 More Replies
maddan80
by New Contributor II
  • 2529 Views
  • 5 replies
  • 3 kudos

Oracle Essbase connectivity

Team, I wanted to understand the best way of connecting to Oracle Essbase to ingest data into the delta lake

  • 2529 Views
  • 5 replies
  • 3 kudos
Latest Reply
hyaqoob
New Contributor II
  • 3 kudos

I am currently working with Essbase 21c and I need to pull data from Databricks through a SQL query. I was able to successfully setup JDBC connection to Databricks but when I try to create a data source using a SQL query, it gives me an error: "[Data...

  • 3 kudos
4 More Replies
Dimitry
by Valued Contributor
  • 310 Views
  • 4 replies
  • 1 kudos

Dataframe from SQL query glitches when grouping - what is going on !?!

I have a query with some grouping. I'm using spark.sql to run that query.skus = spark.sql('with cte as (select... group by all) select *, .. from cte group by all')It displays as expected table.This table I want to split into batches for processing, ...

Dimitry_1-1763964698802.png Dimitry_2-1763964772801.png Dimitry_3-1763964861816.png Dimitry_4-1763964998951.png
  • 310 Views
  • 4 replies
  • 1 kudos
Latest Reply
Coffee77
Contributor III
  • 1 kudos

Try to use this code customized in the way you need:Instead of using monotonically_increasing_id function directly, use row_number over the previous result. This will ensure sequential "small" numbers. This was indeed the exact solution I used to sol...

  • 1 kudos
3 More Replies
Aviraldb
by New Contributor
  • 918 Views
  • 3 replies
  • 0 kudos

Moving files from Volume to Workspace

Hello Team,I am trying to move some files from volume to %shdatabricks fs cp dbfs:/Volumes/workspace/default/delc/generated_scripts/*.py Workspace/Shared/Delc_Project/scripts/ I tried all ways , Please help me to move them  @DataBricks @Louis_Frolio ...

  • 918 Views
  • 3 replies
  • 0 kudos
Latest Reply
Prajapathy_NKR
Contributor
  • 0 kudos

@Aviraldb please try the below way,%shcp /dbfs/Volumes/workspace/default/delc/generated_scripts/*.py  /Workspace/Shared/Delc_Project/scripts/ Hope it helps.

  • 0 kudos
2 More Replies
Suheb
by Contributor
  • 416 Views
  • 2 replies
  • 5 kudos

Resolved! What strategies have you found most effective for optimizing ETL pipelines built on the Databricks L

If you are building data pipelines in Databricks (where data is Extracted, Transformed, and Loaded), what tips, methods, or best practices do you use to make those pipelines run faster, cheaper, and more efficiently?

  • 416 Views
  • 2 replies
  • 5 kudos
Latest Reply
bianca_unifeye
Contributor
  • 5 kudos

When I think about optimising ETL on the Databricks Lakehouse, I split it into four layers: data layout, Spark/SQL design, platform configuration, and operational excellence.And above all: you are not building pipelines for yourself, you are building...

  • 5 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels