cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Ramukamath1988
by New Contributor II
  • 817 Views
  • 3 replies
  • 0 kudos

Resolved! vacuum does not work as expected

The delta.logRetentionDuration (default 30 Days) is  generally not set on any table in my workspace. As per the documentation you can time travel within duration of log retention provided delta.deletedFileRetentionDuration also set for 30days. Which ...

  • 817 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ramukamath1988
New Contributor II
  • 0 kudos

 this is preciously my observation after vacuuming. I do understand these 2 parameters, but its  not working as expected. Even after vacuuming(retention for 30 days)  we can go back 2 months and log are retained for more than 3 months

  • 0 kudos
2 More Replies
chinmay0924
by New Contributor III
  • 1052 Views
  • 4 replies
  • 0 kudos

mapInPandas returning an intermittent error related to data type interconversion

```File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 346, in _create_array return pa.Array.from_pandas( ^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/array.pxi", line 1126, in pyarrow.lib.Array.from_pandas File "pyarrow/array.pxi", line 3...

  • 1052 Views
  • 4 replies
  • 0 kudos
Latest Reply
Raghavan93513
Databricks Employee
  • 0 kudos

Hi @chinmay0924 Good day! Could you please confirm the following: Does the ID column incorrectly contain strings, which PyArrow fails to convert to integers (int64)?Are the data processed in both dataframes the exact same? Additionally, could you pro...

  • 0 kudos
3 More Replies
oneill
by New Contributor II
  • 571 Views
  • 2 replies
  • 0 kudos

Resolved! SET a parameter in BEGIN END statement

Hello,How to set a parameter in a begin end statement. for exemple the following query fails : beginSET ansi_mode = true;end;with Cannot resolve variable `ANSI_MODE` on search path `SYSTEM`.`SESSION`. SQLSTATE: 42883   

  • 571 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vinay_M_R
Databricks Employee
  • 0 kudos

Hello @oneill  There is currently no supported workaround to dynamically change system/session parameters such as ansi_mode within a BEGIN ... END block in Databricks SQL procedures or scripts. Can you set these parameters before executing any proced...

  • 0 kudos
1 More Replies
Yuki
by Contributor
  • 513 Views
  • 2 replies
  • 1 kudos

It's not going well to Connect to Amazon S3 with using Spark

I can't Connect to Amazon S3 well.I'm referencing and following this document: https://docs.databricks.com/gcp/en/connect/storage/amazon-s3But I can't access the S3 well.I believe the credentials are correct because I have verified that I can access ...

  • 513 Views
  • 2 replies
  • 1 kudos
Latest Reply
Yuki
Contributor
  • 1 kudos

Hi Isi,Thank you for your response — I really appreciate it Apologies, I didn’t explain my concern clearly.What I’m trying to confirm may be whether the instance profile overrides the spark.conf settings defined in a notebook.For example, I want to a...

  • 1 kudos
1 More Replies
trang_le
by Databricks Employee
  • 1346 Views
  • 1 replies
  • 0 kudos

Announcing a new portfolio of Generative AI learning offerings on Databricks Academy Today, we launched new Generative AI, including LLMs, learning of...

Announcing a new portfolio of Generative AI learning offerings on Databricks AcademyToday, we launched new Generative AI, including LLMs, learning offerings for everyone from technical and business leaders to data practitioners, such as Data Scientis...

  • 1346 Views
  • 1 replies
  • 0 kudos
Latest Reply
adb_newbie
New Contributor III
  • 0 kudos

Where can i find all the scripts / notebooks presented in the course for "Large Language Models (LLMs): Application through Production" ?

  • 0 kudos
maarko
by New Contributor II
  • 1070 Views
  • 1 replies
  • 0 kudos

Inconsistent Decimal Comparison Behavior Between SQL Warehouse (Photon) and Spark Clusters

 I'm seeing non-deterministic behavior when running the same query in SQL Warehouse (Photon) vs. interactive/job clusters (non-Photon), specifically involving a LEFT OUTER JOIN and a DECIMAL comparison in a WHERE clause. I have two views:View A: cont...

  • 1070 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @maarko  This is a fascinating issue that points to several potential causes related to differences betweenPhoton and standard Spark execution engines, particularly around decimal handling and parallelism.Root Causes1. Decimal Precision and Scale ...

  • 0 kudos
amitpm
by New Contributor
  • 503 Views
  • 1 replies
  • 0 kudos

Lakeflow Connect - Column filtering

Hi community , I am interested in learning more about the feature that was mentioned in recent summit about query pushdown in lakeflow connect for SQL server. I believe this feature will allow to select only the required columns from source tables. I...

  • 503 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @amitpm According to the documentation, this feature is currently in Public Preview, so if your Databricks account has access to public preview features, you can reach out to support to enable it and start testing performance.Setup guide for Lake...

  • 0 kudos
SenthilJ
by New Contributor III
  • 5179 Views
  • 2 replies
  • 1 kudos

Databricks Deep Clone

Hi,I am working on a DR design for Databricks in Azure. The recommendation from Databricks is to use Deep Clone to clone the Unity Catalog tables (within or across catalogs). My design is to ensure that DR is managed across different regions i.e. pri...

Data Engineering
Disaster Recovery
Unity Catalog
  • 5179 Views
  • 2 replies
  • 1 kudos
Latest Reply
Isi
Honored Contributor III
  • 1 kudos

Hi,In my opinion, Databricks Deep Clone does not currently support cloning Unity Catalog tables natively across different metastores (each region having its own metastore). Deep Clone requires that both source and target belong to the same metastore ...

  • 1 kudos
1 More Replies
arun_6482
by New Contributor
  • 2314 Views
  • 1 replies
  • 0 kudos

NPIP_TUNNEL_SETUP_FAILURE

Hello Databricks team,I have configure databricks in AWS , but while creating cluster getting below error . Could you please to fix this issue ?error :VM setup failed due to Ngrok setup timeout. Please check your network configuration and try again o...

  • 2314 Views
  • 1 replies
  • 0 kudos
Latest Reply
mani_22
Databricks Employee
  • 0 kudos

@arun_6482 The error you have shared suggests that there is a network issue in your Databricks deployment within your AWS account. Please review the documentation provided below and ensure that all your routes and ports are configured correctly. Doc:...

  • 0 kudos
kavithai
by New Contributor II
  • 944 Views
  • 3 replies
  • 2 kudos
  • 944 Views
  • 3 replies
  • 2 kudos
Latest Reply
Isi
Honored Contributor III
  • 2 kudos

Hey @kavithai Sometimes there are limitations in the laws of each country regarding "sharing" data outside private clouds or regions, which make it impossible to transmit data outside of your private networks. This is especially true for banks, which...

  • 2 kudos
2 More Replies
noorbasha534
by Valued Contributor II
  • 480 Views
  • 1 replies
  • 0 kudos

Global INIT script on sql warehouse

Dear allIs it possible to configure global INIT script on sql warehouse? If not, how can I achieve my below requirement-For example, this script will have 2 key and value pairs defined : src_catalog_name=ABC, tgt_catalog_name=DEFI want these 2 be ref...

  • 480 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

hello @noorbasha534 Unfortunately, in SQL Warehouses, you can't attach an init script that automatically runs when the warehouse starts (similar to what you can do with clusters).However, there are a few alternatives you can consider:Session Variable...

  • 0 kudos
AliviaB
by New Contributor
  • 445 Views
  • 1 replies
  • 0 kudos

Authorization Issue while creating first Unity catalog table

 Hi All,We are setting up our new UC enabled databricks workspace. We have completed the metastore setup for our workspace and we have created new catalog and schema. But while creating a table we are getting authorization issue. Below is the table s...

  • 445 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

Are there locations specified for the catalog/table/schema? Or do you keep these at defaults?  Also, do you have a storage credential and external location set for mystorageaccount/mycontainer?

  • 0 kudos
mai_luca
by New Contributor III
  • 2431 Views
  • 1 replies
  • 0 kudos

Resolved! Understanding dropDuplicates in Delta Live Tables (DLT) with Photon

Hi everyone,I've been working with Delta Live Tables (DLT) in Databricks, and I'm particularly interested in understanding how the dropDuplicates function works when using the Photon engine. Photon is known for its columnar data processing capabiliti...

plan.png
  • 2431 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

FIRST() never stitches together values from different rows.When Photon executes dropDuplicates, it deterministically chooses one complete row for each set of duplicate keys and returns every column from that same row. If you ever encounter a result w...

  • 0 kudos
surajitDE
by New Contributor III
  • 824 Views
  • 2 replies
  • 1 kudos

Resolved! How to Enable Sub-300 Millisecond Real-Time Mode in Delta Live Tables (DLT)

Hi folks,During the recent Data + AI Summit, there was a mention of a new real-time streaming mode in Delta Live Tables (DLT) that enables sub-300 millisecond latency. This sounds really promising!Could someone please guide me on:How do we enable thi...

  • 824 Views
  • 2 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

Real-time mode, right now, is in private preview. Reach out to your account team for enablement. It's separate from pipelines.trigger.interval. The engine is the same, just a different mode within it.

  • 1 kudos
1 More Replies
pargit2
by New Contributor II
  • 947 Views
  • 5 replies
  • 0 kudos

dlt vs delta table

Hi,I'm building gold layer and silver layers. in bronze I ingest using auto loader.  data is getting updated once a month. should I save the fd in silver notebooks using delta live table or delta table? in the past I used simple: df.write.save("s3.."...

  • 947 Views
  • 5 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 0 kudos

I would say if the data is not complex and you are not handling any DQ checks in the pipeline the go for a regular databricks workflow and save it as delta table since you are refreshing the data every 1 month and it is not streaming workload.

  • 0 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels