cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Ramukamath1988
by New Contributor II
  • 49 Views
  • 3 replies
  • 0 kudos

Resolved! vacuum does not work as expected

The delta.logRetentionDuration (default 30 Days) is  generally not set on any table in my workspace. As per the documentation you can time travel within duration of log retention provided delta.deletedFileRetentionDuration also set for 30days. Which ...

  • 49 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ramukamath1988
New Contributor II
  • 0 kudos

 this is preciously my observation after vacuuming. I do understand these 2 parameters, but its  not working as expected. Even after vacuuming(retention for 30 days)  we can go back 2 months and log are retained for more than 3 months

  • 0 kudos
2 More Replies
chinmay0924
by New Contributor III
  • 157 Views
  • 4 replies
  • 0 kudos

mapInPandas returning an intermittent error related to data type interconversion

```File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 346, in _create_array return pa.Array.from_pandas( ^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/array.pxi", line 1126, in pyarrow.lib.Array.from_pandas File "pyarrow/array.pxi", line 3...

  • 157 Views
  • 4 replies
  • 0 kudos
Latest Reply
Raghavan93513
Databricks Employee
  • 0 kudos

Hi @chinmay0924 Good day! Could you please confirm the following: Does the ID column incorrectly contain strings, which PyArrow fails to convert to integers (int64)?Are the data processed in both dataframes the exact same? Additionally, could you pro...

  • 0 kudos
3 More Replies
oneill
by New Contributor II
  • 39 Views
  • 2 replies
  • 0 kudos

Resolved! SET a parameter in BEGIN END statement

Hello,How to set a parameter in a begin end statement. for exemple the following query fails : beginSET ansi_mode = true;end;with Cannot resolve variable `ANSI_MODE` on search path `SYSTEM`.`SESSION`. SQLSTATE: 42883   

  • 39 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vinay_M_R
Databricks Employee
  • 0 kudos

Hello @oneill  There is currently no supported workaround to dynamically change system/session parameters such as ansi_mode within a BEGIN ... END block in Databricks SQL procedures or scripts. Can you set these parameters before executing any proced...

  • 0 kudos
1 More Replies
shrutikatyal
by New Contributor
  • 297 Views
  • 6 replies
  • 0 kudos

commit time is coming as null in autoloader

As per the databricks new feature in autoloader that we can use archival and move feature in autoloader however I am trying to use that feature using databricks 16.4.x.scala2.12 however commit time is still coming null as its mentioned in the documen...

  • 297 Views
  • 6 replies
  • 0 kudos
Latest Reply
mani_22
Databricks Employee
  • 0 kudos

@shrutikatyal  I believe the commit_time only functions when the cloudFiles.cleanSource option is enabled. I don't see this option present in your snippet. Could you please enable this option for the read and check?  Refer to the below documentation,...

  • 0 kudos
5 More Replies
lezwon
by New Contributor III
  • 58 Views
  • 3 replies
  • 0 kudos

Unable to install custom wheel in serverless environment

Hey guys, I have created a custom wheel to hold my common code. Since I cannot install task libraries on a serverless environment, I am installing this library in multiple notebooks using %pip install. What I do is I upload the library to a volume in...

  • 58 Views
  • 3 replies
  • 0 kudos
Latest Reply
lezwon
New Contributor III
  • 0 kudos

Hello @Isi , I got around this by creating a requirements.txt file instead of a symlink. With the wheel path in there, I can just do pip install -r /path/to/requirements.txt and have it installed. I am now having another issue where the notebook give...

  • 0 kudos
2 More Replies
rizkyjarr
by New Contributor
  • 62 Views
  • 2 replies
  • 0 kudos

"with open" not working in single user access mode cluster (no such file or directory found)

Hi fellow engineers,So i was trying to read binary files (.jpg) in a ADLS2 mounted containerBut when im trying to read the file using "with open" i kept getting an error: No such file or directory foundI've read something related to this matter on So...

rizkyjarr_0-1750390374120.png rizkyjarr_1-1750390546193.png
  • 62 Views
  • 2 replies
  • 0 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 0 kudos

Weird...I'm able to access it without any issues. In case you are using community edition clusters, try copying the file to the driver node first and then read it.2nd option is below: 

  • 0 kudos
1 More Replies
Yuki
by New Contributor III
  • 78 Views
  • 2 replies
  • 1 kudos

It's not going well to Connect to Amazon S3 with using Spark

I can't Connect to Amazon S3 well.I'm referencing and following this document: https://docs.databricks.com/gcp/en/connect/storage/amazon-s3But I can't access the S3 well.I believe the credentials are correct because I have verified that I can access ...

  • 78 Views
  • 2 replies
  • 1 kudos
Latest Reply
Yuki
New Contributor III
  • 1 kudos

Hi Isi,Thank you for your response — I really appreciate it Apologies, I didn’t explain my concern clearly.What I’m trying to confirm may be whether the instance profile overrides the spark.conf settings defined in a notebook.For example, I want to a...

  • 1 kudos
1 More Replies
trang_le
by Contributor
  • 959 Views
  • 1 replies
  • 0 kudos

Announcing a new portfolio of Generative AI learning offerings on Databricks Academy Today, we launched new Generative AI, including LLMs, learning of...

Announcing a new portfolio of Generative AI learning offerings on Databricks AcademyToday, we launched new Generative AI, including LLMs, learning offerings for everyone from technical and business leaders to data practitioners, such as Data Scientis...

  • 959 Views
  • 1 replies
  • 0 kudos
Latest Reply
adb_newbie
New Contributor II
  • 0 kudos

Where can i find all the scripts / notebooks presented in the course for "Large Language Models (LLMs): Application through Production" ?

  • 0 kudos
maarko
by New Contributor
  • 48 Views
  • 1 replies
  • 0 kudos

Inconsistent Decimal Comparison Behavior Between SQL Warehouse (Photon) and Spark Clusters

 I'm seeing non-deterministic behavior when running the same query in SQL Warehouse (Photon) vs. interactive/job clusters (non-Photon), specifically involving a LEFT OUTER JOIN and a DECIMAL comparison in a WHERE clause. I have two views:View A: cont...

  • 48 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor II
  • 0 kudos

Hi @maarko  This is a fascinating issue that points to several potential causes related to differences betweenPhoton and standard Spark execution engines, particularly around decimal handling and parallelism.Root Causes1. Decimal Precision and Scale ...

  • 0 kudos
amitpm
by New Contributor
  • 83 Views
  • 1 replies
  • 0 kudos

Lakeflow Connect - Column filtering

Hi community , I am interested in learning more about the feature that was mentioned in recent summit about query pushdown in lakeflow connect for SQL server. I believe this feature will allow to select only the required columns from source tables. I...

  • 83 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Contributor III
  • 0 kudos

Hey @amitpm According to the documentation, this feature is currently in Public Preview, so if your Databricks account has access to public preview features, you can reach out to support to enable it and start testing performance.Setup guide for Lake...

  • 0 kudos
SenthilJ
by New Contributor III
  • 4262 Views
  • 2 replies
  • 1 kudos

Databricks Deep Clone

Hi,I am working on a DR design for Databricks in Azure. The recommendation from Databricks is to use Deep Clone to clone the Unity Catalog tables (within or across catalogs). My design is to ensure that DR is managed across different regions i.e. pri...

Data Engineering
Disaster Recovery
Unity Catalog
  • 4262 Views
  • 2 replies
  • 1 kudos
Latest Reply
Isi
Contributor III
  • 1 kudos

Hi,In my opinion, Databricks Deep Clone does not currently support cloning Unity Catalog tables natively across different metastores (each region having its own metastore). Deep Clone requires that both source and target belong to the same metastore ...

  • 1 kudos
1 More Replies
manish24101981
by New Contributor
  • 11 Views
  • 0 replies
  • 0 kudos

DLT or DataBricks for CDC and NRT

We are currently delivering a large-scale healthcare data migration project involving:One-time historical migration of approx. 80 TB of data, already completed and loaded into Delta Lake.CDC merge logic is already developed and validated using Apache...

  • 11 Views
  • 0 replies
  • 0 kudos
arun_6482
by New Contributor
  • 280 Views
  • 1 replies
  • 0 kudos

NPIP_TUNNEL_SETUP_FAILURE

Hello Databricks team,I have configure databricks in AWS , but while creating cluster getting below error . Could you please to fix this issue ?error :VM setup failed due to Ngrok setup timeout. Please check your network configuration and try again o...

  • 280 Views
  • 1 replies
  • 0 kudos
Latest Reply
mani_22
Databricks Employee
  • 0 kudos

@arun_6482 The error you have shared suggests that there is a network issue in your Databricks deployment within your AWS account. Please review the documentation provided below and ensure that all your routes and ports are configured correctly. Doc:...

  • 0 kudos
andr3s
by New Contributor II
  • 36080 Views
  • 6 replies
  • 2 kudos

SSL_connect: certificate verify failed with Power BI

Hi, I'm getting this error with Power BI:Any ideas?Thanks in advance,Andres

Screenshot 2023-05-19 154328
  • 36080 Views
  • 6 replies
  • 2 kudos
Latest Reply
benjaminpieplow
New Contributor
  • 2 kudos

We had a very similar issue. The full (redacted) error from Power BI:```Unable to update connection credentials. Unable to connect to the data source. Either the data source is inaccessible, a connection timeout occurred, or the data source credentia...

  • 2 kudos
5 More Replies
kavithai
by New Contributor II
  • 143 Views
  • 3 replies
  • 2 kudos
  • 143 Views
  • 3 replies
  • 2 kudos
Latest Reply
Isi
Contributor III
  • 2 kudos

Hey @kavithai Sometimes there are limitations in the laws of each country regarding "sharing" data outside private clouds or regions, which make it impossible to transmit data outside of your private networks. This is especially true for banks, which...

  • 2 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels