cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AanchalSoni
by Databricks Partner
  • 1589 Views
  • 4 replies
  • 1 kudos

Resolved! Checkpoint Location Error

 Hi!I'm facing an error related to Checkpoint whenever I try to display a dataframe using auto Loader in Databricks free edition. Please refer the screenshot. To combat this, I have to delete the checkpoint folder and then execute the display or writ...

  • 1589 Views
  • 4 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @AanchalSoni, I can’t see the full history of your notebook, so I’m not sure of the exact cause. But the behaviour strongly suggests that an earlier version of the stream used complete mode against the same checkpointLocation, and that configurati...

  • 1 kudos
3 More Replies
AanchalSoni
by Databricks Partner
  • 1393 Views
  • 2 replies
  • 2 kudos

Resolved! NULL rows getting inserted in delta table- Schema mismatch

I'm trying to add _metadata column while reading a json file: from pyspark.sql.functions import colfrom pyspark.sql.types import StructType, StructField, LongType, TimestampTypedf_accounts_read = spark.readStream.format("cloudFiles").\    option("clo...

  • 1393 Views
  • 2 replies
  • 2 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 2 kudos

Hi @AanchalSoni, Looking at the first snapshot, it appears the path in all three records points to the checkpoint location. The _metadata column isn’t the root cause here. The issue is that Autoloader is ingesting your checkpoint files as data. Becau...

  • 2 kudos
1 More Replies
IM_01
by Valued Contributor
  • 1360 Views
  • 9 replies
  • 3 kudos

Resolved! Partition cols for a temporary table in Lakefow SDP

Hi,I was going through the documentation on quarantining records. Initially I thought that partitioning is not supported for temporary tables however I came cross the following@DP.table( temporary=True, partition_cols=["is_quarantined"], ) @dp.ex...

  • 1360 Views
  • 9 replies
  • 3 kudos
Latest Reply
IM_01
Valued Contributor
  • 3 kudos

Thanks Ashwin

  • 3 kudos
8 More Replies
DhivyaKeerthana
by New Contributor III
  • 942 Views
  • 4 replies
  • 0 kudos

Using spark.read.excel – dataAddress with only start cell is not working (DBR 17.x)

Hi, has anyone successfully used the Databricks Runtime 17.x native Excel reader with a dataAddress containing only a start cell (no end cell)? Even in the documentation, it is not specified (https://learn.microsoft.com/en-us/azure/databricks/query/f...

  • 942 Views
  • 4 replies
  • 0 kudos
Latest Reply
DhivyaKeerthana
New Contributor III
  • 0 kudos

Thanks @SteveOstrowski for the response. Yes I am using OPTION 1: USE A LARGE BOUNDING RANGE as a workaround.

  • 0 kudos
3 More Replies
toast_2001
by New Contributor II
  • 740 Views
  • 2 replies
  • 1 kudos

Resolved! Non-existent schema on redeployment of DAB with external volumes.

Hi all,DAB issue.My setup:Running CLI v0.294 on Python 3.12.11.Deployement is mode direct and using standard serverless compute.External locations in ADLS ST Container (container per ext loc).I'm attempting to deploy a bundle according to the followi...

config_file.png example_error.png
  • 740 Views
  • 2 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hi @toast_2001, I did some digging and have a few helpful tips/tricks to assist your troubleshooting. So let me walk through what's likely happening and what to actually do about it. The error tells you that on the second deployment, DAB is trying to...

  • 1 kudos
1 More Replies
dbr_data_engg
by New Contributor III
  • 638 Views
  • 2 replies
  • 0 kudos

Resolved! Lakebridge Reconcile Config not supporting "sfAuthenticator" for Snowflake

User is getting below error.Py4JJavaError: An error occurred while calling o508.load. : net.snowflake.client.jdbc.SnowflakeSQLException: Incorrect username or password was specified.https://github.com/databrickslabs/lakebridge/blob/201f9b7cae5ae45583...

  • 638 Views
  • 2 replies
  • 0 kudos
Latest Reply
jameswood32
Contributor
  • 0 kudos

It seems the issue lies in the way Lakebridge is handling the Snowflake authentication, particularly with the sfAuthenticator parameter. While the plain Python code works, Lakebridge might not be correctly passing the necessary configuration for Snow...

  • 0 kudos
1 More Replies
gaurang033
by New Contributor II
  • 1697 Views
  • 2 replies
  • 2 kudos

how to access snapshots in iceberg tables?

I have created an iceberg tables in databricks, and inserted bunch of values in it. how do I list the snapshot and other metadata of the tables. create table raw.landing.emp_ice(id int, name string ) using icebergfollowing doesn't work https://iceber...

  • 1697 Views
  • 2 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Greetings @gaurang033 ,  You're reading the Iceberg docs correctly. In a vanilla Iceberg-on-Spark setup, metadata tables like snapshots, history, and files are queryable like this: SELECT * FROM prod.db.table.snapshots; Your query follows that patte...

  • 2 kudos
1 More Replies
bricks_2026
by New Contributor III
  • 560 Views
  • 2 replies
  • 0 kudos

Resolved! Vacuum Command runs without any retention period even though the retention period was set

HelloI am trying to do tests on Vacuum in pytest.Command executed -> VACUUM unittest_mobi_edwhc_bul_replikation_001.t_bul_vacuum_experiment_1 RETAIN 0.05150017944444444 HOURSBut the Vacuum command is deleting all files and not considering the provide...

  • 560 Views
  • 2 replies
  • 0 kudos
Latest Reply
bricks_2026
New Contributor III
  • 0 kudos

Hi Ashwin Many thanks for your answer. I also tested it in local pytest yesterday. The minimum supported value is 1 hour.     

  • 0 kudos
1 More Replies
monojmckvie
by New Contributor II
  • 5598 Views
  • 3 replies
  • 1 kudos

Delta Live Table - Cannot redefine dataset

Hi,I am new to Delta Live Table.I am trying to create a delta live table from the databricks tutorial.I have created a notebook and attached an interactive cluster -DBR 14.3-LTS.I am running the below code.When I ran it for the 1st time it ran succes...

Data Engineering
Delta Live Table
dlt
  • 5598 Views
  • 3 replies
  • 1 kudos
Latest Reply
Malthe
Valued Contributor II
  • 1 kudos

@Walter_C there is no such "delete_table" API, see reference:https://learn.microsoft.com/en-us/azure/databricks/ldp/developer/python-ref#api-reference

  • 1 kudos
2 More Replies
der
by Valued Contributor
  • 1897 Views
  • 12 replies
  • 4 kudos

Databricks DBR 18.1 access workspace files error

import json with open("/Workspace/Users/<USER>/config.json", "r") as f: config = json.load(f) print(config)Throws following errorOSError: [Errno 5] Input/output error: '/Workspace/Users/<USER>/config.json'[Trace ID: 00-874e2bc3d747c3611c0c4a...

  • 1897 Views
  • 12 replies
  • 4 kudos
Latest Reply
Malthe
Valued Contributor II
  • 4 kudos

This happens to us on 18.0.3, but I haven't seen it on < 18.

  • 4 kudos
11 More Replies
Malthe
by Valued Contributor II
  • 1071 Views
  • 1 replies
  • 0 kudos

Resolved! Intermittent OSError: [Errno 5] Input/output error accessing workspace files from job

Every now and then, we get the following error in job cluster runs (when opening a workspace file):OSError: [Errno 5] Input/output error: '/Workspace/Users/<uuid>/.bundle/<path>'This happens even though the Python notebook (which is also in workspace...

  • 1071 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @Malthe, Sounds similar to this. Have you raised a support ticket? I can help raise it internally if there is a support ticket. If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find t...

  • 0 kudos
Adrianj
by New Contributor III
  • 27717 Views
  • 21 replies
  • 13 kudos

Databricks Bundles - How to select which jobs resources to deploy per target?

Hello, My team and I are experimenting with bundles, we follow the pattern of having one main file Databricks.yml and each job definition specified in a separate yaml for modularization. We wonder if it is possible to select from the main Databricks....

  • 27717 Views
  • 21 replies
  • 13 kudos
Latest Reply
IM_01
Valued Contributor
  • 13 kudos

One solution can be you can create separate databricks.yml file for each target such asqa/databricks.ymlprod/databricks.ymlqa,prod are folders named after target environmentHope this helps..

  • 13 kudos
20 More Replies
Annie420
by Databricks Partner
  • 1244 Views
  • 3 replies
  • 2 kudos

Resolved! Workspace folder is visible but .py file cannot be read on job cluster (DBR 18)

Hi everyone,We are running into a strange issue when running notebooks on Databricks job clusters using DBR 18. It looks like the Workspace folder is mounted, but the .py file inside cannot be read immediately. I wanted to check if anyone else has ex...

  • 1244 Views
  • 3 replies
  • 2 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 2 kudos

+1 to @pradeep_singh  The Workspace FUSE (WSFS) daemons use ports 1015, 1017, and 1021 for communication between the driver and the executor. NFS tooling (hardcoded in glibc) can race with these ports during cluster startup, causing FUSE daemons to f...

  • 2 kudos
2 More Replies
DB1To3
by New Contributor III
  • 1621 Views
  • 6 replies
  • 5 kudos

Resolved! Apache "Spark Connect"

Can someone confirm if this is the right message board for discussing the opensource Apache core of "Spark Connect". (aka databricks connect)We are hosting workloads on Azure Databricks, but would like to ensure that these workloads are following the...

  • 1621 Views
  • 6 replies
  • 5 kudos
Latest Reply
DB1To3
New Contributor III
  • 5 kudos

>> there is no native R UDF pathway over the wire. sparklyr works around this using rpy2, a Python library that embeds and executes R codeThis is interesting.  I would not think of python as the best runtime for bridging.  I'm wondering if this invol...

  • 5 kudos
5 More Replies
dpc
by Contributor III
  • 949 Views
  • 3 replies
  • 4 kudos

Resolved! Allowing a job parameter that is pushed down to be overridden

HelloI have a Job that calls a job which calls a job (this could go on)I want to generate an id for each job and log that id along with a parent id and job nameSo, I am creating an id at each level as the first taskThen passing this to the next level...

  • 949 Views
  • 3 replies
  • 4 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 4 kudos

+1 to @jooguilhermesc Option 1 response above.

  • 4 kudos
2 More Replies
Labels