cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sachin_kanchan
by New Contributor III
  • 2955 Views
  • 2 replies
  • 0 kudos

Community Edition? More Like Community Abandonment - Thanks for NOTHING, Databricks!

To the Databricks Team (or whoever is pretending to care),Let me get this straight. You offer a "Community Edition" to supposedly help people learn, right? Well, congratulations, you've created the most frustrating, useless signup process I've ever s...

  • 2955 Views
  • 2 replies
  • 0 kudos
Latest Reply
Advika_
Databricks Employee
  • 0 kudos

Hello @sachin_kanchan! I understand the frustration, and I appreciate you sharing your experience. The Community Edition is meant to provide a smooth experience, and this shouldn’t be happening. We usually ask users to drop an email to help@databrick...

  • 0 kudos
1 More Replies
mstfkmlbsbdk
by New Contributor II
  • 5530 Views
  • 1 replies
  • 1 kudos

Resolved! Access ADLS with serverless. CONFIG_NOT_AVAILABLE error

I have my own Autoloader repo and this repo is responsible for ingestion data from landing layer(ADLS) and load data into raw layer in Databricks. In that repo, I created a couple of workflows, and run these workflows with serverless cluster. and I u...

Data Engineering
ADLS
autoloader
dbt
NCC
serverless cluster
  • 5530 Views
  • 1 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

The recommended approach for accessing cloud storage is to create Databricks storage credentials. These storage credentials can refer to entra service principals, managed identities, etc. After a credential is created, create an external location. Wh...

  • 1 kudos
Sjoshi
by New Contributor
  • 2171 Views
  • 2 replies
  • 1 kudos

How to make the write operation faster for writing a spark dataframe to a delta table

So, I am doing 4 spatial join operation on the files with the following sizes:Base_road_file which is 1gigabyteTelematics file which is 1.2 gigsstate boundary file , BH road file, client_geofence file and kpmg_geofence_file which are not too large My...

  • 2171 Views
  • 2 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

We recommend using spatial frameworks to speed up things like spatial joins, point-in-polygon, etc, like databricks mosaic or apache sedona. Without these frameworks, many of these operations result in unoptimized and explosive crossjoins.

  • 1 kudos
1 More Replies
lprevost
by Contributor III
  • 1903 Views
  • 2 replies
  • 1 kudos

Resolved! Autoloader streaming table - how to determine if new rows were updated from query?

If I'm running a scheduled batch Autoloader query which read from csv files on S3 and incrementally loads a delta table, how can I determine if new rows were added?  I'm currently trying to do this from the streaming query.lastProgress as follows.  s...

  • 1903 Views
  • 2 replies
  • 1 kudos
Latest Reply
lprevost
Contributor III
  • 1 kudos

Thank you!

  • 1 kudos
1 More Replies
aladda
by Databricks Employee
  • 5788 Views
  • 2 replies
  • 0 kudos
  • 5788 Views
  • 2 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Here's the difference a View and Table in the context of a Delta Live Table PIpelineViews are similar to a temporary view in SQL and are an alias for some computation. A view allows you to break a complicated query into smaller or easier-to-understan...

  • 0 kudos
1 More Replies
BillBishop
by New Contributor III
  • 957 Views
  • 1 replies
  • 0 kudos

Resolved! Using initcap function in materialized view fails

This query works: select order_date, initcap(customer_name), count(*) AS number_of_ordersfrom ... The initcap does as advertised and capitalizes the customer_name column. However, if I wrap the same exact select in a create materialized view I get an...

  • 957 Views
  • 1 replies
  • 0 kudos
Latest Reply
BillBishop
New Contributor III
  • 0 kudos

NOTE: I got it to work by aliasing the customer_name column, it's documented here: https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-create-materialized-view#limitationsHowever, it wasn't clear that "Non-column reference expre...

  • 0 kudos
devpdi
by New Contributor
  • 3698 Views
  • 3 replies
  • 0 kudos

Re-use jobs as tasks with the same cluster.

Hello,I am facing an issue with my workflow.I have a job (name it main job) that, among others, runs 5 concurrent tasks, which are defined as jobs (not notebooks).Each of these jobs is identical to the others (name them sub-job-1), with the only diff...

  • 3698 Views
  • 3 replies
  • 0 kudos
Latest Reply
razi9126
New Contributor II
  • 0 kudos

Did you find any solution?

  • 0 kudos
2 More Replies
diguid
by New Contributor III
  • 5919 Views
  • 3 replies
  • 13 kudos

Using foreachBatch within Delta Live Tables framework

Hey there!​I was wondering if there's any way of declaring a delta live table where we use foreachBatch to process the output of a streaming query.​Here's a simplification of my code:​def join_data(df_1, df_2): df_joined = ( df_1 ...

  • 5919 Views
  • 3 replies
  • 13 kudos
Latest Reply
cgrant
Databricks Employee
  • 13 kudos

foreachBatch support in DLT is coming soon, and you now have the ability to write to non-DLT sinks as well

  • 13 kudos
2 More Replies
shan-databricks
by Databricks Partner
  • 4716 Views
  • 1 replies
  • 0 kudos

LEGACY_ERROR_TEMP_DELTA_0007 A schema mismatch detected when writing to the Delta table.

Need help to resolve the issue Error : com.databricks.sql.transaction.tahoe.DeltaAnalysisException: [_LEGACY_ERROR_TEMP_DELTA_0007] A schema mismatch detected when writing to the Delta table.I am using the below code and my JSON is dynamically changi...

  • 4716 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

For datasets with constantly changing schemas, we recommend using the Variant type.

  • 0 kudos
thackman
by Databricks Partner
  • 1977 Views
  • 1 replies
  • 0 kudos

Inconsistant handling of null structs vs strucs with all null values.

Summary:We have a weird behavior with structs that we have been trying (unsuccessfully) to track down.  We have a struct column in a silver table that should only have data for 1 in every 500 records. It's normally null. But for about 1 in every 50 r...

  • 1977 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

Here are some strategies for debugging this: Before you perform each merge, write your source dataframe out as a table, and include the target table's version in the table's nameIf possible, enable the change data feed on your table so as to see chan...

  • 0 kudos
sachamourier
by Contributor
  • 2173 Views
  • 1 replies
  • 0 kudos

Install Python libraries on Databricks job cluster

Hello,I am trying to install some wheel file and requirements.txt file from my Unity Catalog Volumes on my Databricks job cluster using an init script but the results are very inconsistent.Does anyone have ever faced that ?What's wrong with my approa...

job_cluster_issue.png job_cluster_issue_1.png job_cluster_issue_2.png
  • 2173 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @sachamourier, Could you please clarify what is the inconsistency? are some packages missing or the incorrect library was loaded?

  • 0 kudos
LeenB
by New Contributor
  • 742 Views
  • 1 replies
  • 0 kudos

Running a notebook as 'Run all below' when sheduled via Azure DataFactory

We have a notebook with a lot of subsequent cells that can run independent from each other. When we execute the notebook manually via 'Run all', the runs stops when an error is thrown. When we execute manually via 'Run all below', the run proceeds ti...

  • 742 Views
  • 1 replies
  • 0 kudos
Latest Reply
PiotrMi
Contributor
  • 0 kudos

Hi @LeenB For example each cell execution you can build up with try except command. Example belowtry:     print("Hello world")    #your code of each cellexcept Exception as e:    print("Issue with printing hello world")For sure it is not recommended ...

  • 0 kudos
data_mifflin
by New Contributor III
  • 1777 Views
  • 6 replies
  • 1 kudos

Accessing Job parameters using cluster v15.4

After upgrading databricks cluster to version 15.4, is there any way to access job parameters in notebook except the following way ?dbutils.widgets.get("parameter_name")In v15.4, dbutils.notebook.entry_point.getCurrentBindings() has been discontinued...

  • 1777 Views
  • 6 replies
  • 1 kudos
Latest Reply
Pawan1979
New Contributor II
  • 1 kudos

For me it is working at 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12)

  • 1 kudos
5 More Replies
JW_99
by New Contributor II
  • 1519 Views
  • 2 replies
  • 2 kudos

PySparkRuntimeError: [CONTEXT_ONLY_VALID_ON_DRIVER]

I've troubleshot this like 20+ times. I am aware that the current code is causing the spark session to be passed to the workers, where it should only be applied to the driver. Can someone please help me resolve this (the schema is defined earlier)?--...

JW_99_0-1740614786516.png JW_99_1-1740614786523.png JW_99_2-1740614786524.png JW_99_3-1740614786524.png
  • 1519 Views
  • 2 replies
  • 2 kudos
Latest Reply
narasimha_reddy
New Contributor II
  • 2 kudos

You cannot use Spark session explicitly inside Executor logic. Here you are trying mapPartitions which makes the customlogic to get executed inside the executor thread. Either you need to change whole problem approach to segregate spark variable usag...

  • 2 kudos
1 More Replies
adhi_databricks
by Contributor
  • 4770 Views
  • 4 replies
  • 0 kudos

Connect snowflake to Databricks

Hey Folks,I just want to know if there is a way to mirror the Snowflake tables in Databricks , Meaning creating a table using format snowflake and give in options of table (host,user,pwd and dbtable in snowflake). I just tried it as per this code bel...

  • 4770 Views
  • 4 replies
  • 0 kudos
Latest Reply
adhi_databricks
Contributor
  • 0 kudos

Hi @Alberto_Umana , Just a QQ would we be able to change table properties like adding column details, column tagging and Column level masking on the snowflake tables that are under the foreign catalog created?

  • 0 kudos
3 More Replies
Labels