cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

diguid
by New Contributor III
  • 5913 Views
  • 3 replies
  • 13 kudos

Using foreachBatch within Delta Live Tables framework

Hey there!​I was wondering if there's any way of declaring a delta live table where we use foreachBatch to process the output of a streaming query.​Here's a simplification of my code:​def join_data(df_1, df_2): df_joined = ( df_1 ...

  • 5913 Views
  • 3 replies
  • 13 kudos
Latest Reply
cgrant
Databricks Employee
  • 13 kudos

foreachBatch support in DLT is coming soon, and you now have the ability to write to non-DLT sinks as well

  • 13 kudos
2 More Replies
shan-databricks
by Databricks Partner
  • 4713 Views
  • 1 replies
  • 0 kudos

LEGACY_ERROR_TEMP_DELTA_0007 A schema mismatch detected when writing to the Delta table.

Need help to resolve the issue Error : com.databricks.sql.transaction.tahoe.DeltaAnalysisException: [_LEGACY_ERROR_TEMP_DELTA_0007] A schema mismatch detected when writing to the Delta table.I am using the below code and my JSON is dynamically changi...

  • 4713 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

For datasets with constantly changing schemas, we recommend using the Variant type.

  • 0 kudos
thackman
by Databricks Partner
  • 1965 Views
  • 1 replies
  • 0 kudos

Inconsistant handling of null structs vs strucs with all null values.

Summary:We have a weird behavior with structs that we have been trying (unsuccessfully) to track down.  We have a struct column in a silver table that should only have data for 1 in every 500 records. It's normally null. But for about 1 in every 50 r...

  • 1965 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

Here are some strategies for debugging this: Before you perform each merge, write your source dataframe out as a table, and include the target table's version in the table's nameIf possible, enable the change data feed on your table so as to see chan...

  • 0 kudos
sachamourier
by Contributor
  • 2169 Views
  • 1 replies
  • 0 kudos

Install Python libraries on Databricks job cluster

Hello,I am trying to install some wheel file and requirements.txt file from my Unity Catalog Volumes on my Databricks job cluster using an init script but the results are very inconsistent.Does anyone have ever faced that ?What's wrong with my approa...

job_cluster_issue.png job_cluster_issue_1.png job_cluster_issue_2.png
  • 2169 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @sachamourier, Could you please clarify what is the inconsistency? are some packages missing or the incorrect library was loaded?

  • 0 kudos
LeenB
by New Contributor
  • 738 Views
  • 1 replies
  • 0 kudos

Running a notebook as 'Run all below' when sheduled via Azure DataFactory

We have a notebook with a lot of subsequent cells that can run independent from each other. When we execute the notebook manually via 'Run all', the runs stops when an error is thrown. When we execute manually via 'Run all below', the run proceeds ti...

  • 738 Views
  • 1 replies
  • 0 kudos
Latest Reply
PiotrMi
Contributor
  • 0 kudos

Hi @LeenB For example each cell execution you can build up with try except command. Example belowtry:     print("Hello world")    #your code of each cellexcept Exception as e:    print("Issue with printing hello world")For sure it is not recommended ...

  • 0 kudos
data_mifflin
by New Contributor III
  • 1764 Views
  • 6 replies
  • 1 kudos

Accessing Job parameters using cluster v15.4

After upgrading databricks cluster to version 15.4, is there any way to access job parameters in notebook except the following way ?dbutils.widgets.get("parameter_name")In v15.4, dbutils.notebook.entry_point.getCurrentBindings() has been discontinued...

  • 1764 Views
  • 6 replies
  • 1 kudos
Latest Reply
Pawan1979
New Contributor II
  • 1 kudos

For me it is working at 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12)

  • 1 kudos
5 More Replies
JW_99
by New Contributor II
  • 1496 Views
  • 2 replies
  • 2 kudos

PySparkRuntimeError: [CONTEXT_ONLY_VALID_ON_DRIVER]

I've troubleshot this like 20+ times. I am aware that the current code is causing the spark session to be passed to the workers, where it should only be applied to the driver. Can someone please help me resolve this (the schema is defined earlier)?--...

JW_99_0-1740614786516.png JW_99_1-1740614786523.png JW_99_2-1740614786524.png JW_99_3-1740614786524.png
  • 1496 Views
  • 2 replies
  • 2 kudos
Latest Reply
narasimha_reddy
New Contributor II
  • 2 kudos

You cannot use Spark session explicitly inside Executor logic. Here you are trying mapPartitions which makes the customlogic to get executed inside the executor thread. Either you need to change whole problem approach to segregate spark variable usag...

  • 2 kudos
1 More Replies
adhi_databricks
by Contributor
  • 4705 Views
  • 4 replies
  • 0 kudos

Connect snowflake to Databricks

Hey Folks,I just want to know if there is a way to mirror the Snowflake tables in Databricks , Meaning creating a table using format snowflake and give in options of table (host,user,pwd and dbtable in snowflake). I just tried it as per this code bel...

  • 4705 Views
  • 4 replies
  • 0 kudos
Latest Reply
adhi_databricks
Contributor
  • 0 kudos

Hi @Alberto_Umana , Just a QQ would we be able to change table properties like adding column details, column tagging and Column level masking on the snowflake tables that are under the foreign catalog created?

  • 0 kudos
3 More Replies
nikhilkumawat
by Databricks Partner
  • 21282 Views
  • 11 replies
  • 15 kudos

Resolved! Get file information while using "Trigger jobs when new files arrive" https://docs.databricks.com/workflows/jobs/file-arrival-triggers.html

I am currently trying to use this feature of "Trigger jobs when new file arrive" in one of my project. I have an s3 bucket in which files are arriving on random days. So I created a job to and set the trigger to "file arrival" type. And within the no...

  • 21282 Views
  • 11 replies
  • 15 kudos
Latest Reply
Jaison
New Contributor III
  • 15 kudos

Issue with Databricks File Arrival Trigger – Missing File Name InformationThe File Arrival Trigger in Databricks is practically useless if it does not provide the file name and path of the triggering file. In Azure Blob Storage triggers (Function App...

  • 15 kudos
10 More Replies
jeremy98
by Honored Contributor
  • 1750 Views
  • 4 replies
  • 0 kudos

Resolved! how to read excel files inside a databricks notebook?

Hi community,Is it possible to read excel files from dbfs using a notebook file inside Databricks? If yes, how to do it?

  • 1750 Views
  • 4 replies
  • 0 kudos
Latest Reply
jeremy98
Honored Contributor
  • 0 kudos

amazing, yes that's is totally what I need! Thx Stefan! 

  • 0 kudos
3 More Replies
jakub_adamik
by New Contributor III
  • 2656 Views
  • 2 replies
  • 0 kudos

Resolved! Delta Live Tables - BAD_REQUEST: Pipeline cluster is not reachable.

Hi all,I have very simple pipeline: -- Databricks notebook source CREATE OR REFRESH STREAMING TABLE `catalog-prod`.default.dlt_table AS SELECT * FROM STREAM read_files('/Volumes/catalog-prod/storage/*', format=> 'json') -- COMMAND ---------- CREATE...

jakub_adamik_0-1740565875990.png
  • 2656 Views
  • 2 replies
  • 0 kudos
Latest Reply
jakub_adamik
New Contributor III
  • 0 kudos

Hi,thank you for your response. In the mean time I found the bug Databricks UI which caused this behaviour. I will raise ticket to Databricks. Please see the draft of the ticket bellow for workaround:  We’re facing an issue with Delta Live Tables pip...

  • 0 kudos
1 More Replies
wilmorlserios
by Databricks Partner
  • 1123 Views
  • 1 replies
  • 0 kudos

Using databricks-sql-connector in Notebook

I am attempting to utilse the databricks-sql-connector python package within a generalised application deployed to run within a Databricks notebook. Upon attempting to import, I am receiving a  module not found error. However, the package is visible ...

wilmorlserios_0-1740576722972.png wilmorlserios_1-1740576738825.png
  • 1123 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @wilmorlserios  The import it's incorrect. It should be: from databricks import sql

  • 0 kudos
vk217
by Contributor
  • 18898 Views
  • 4 replies
  • 1 kudos

ModuleNotFoundError: No module named 'pyspark.dbutils'

I have a class in a python file like this from pyspark.sql import SparkSession from pyspark.dbutils import DBUtils class DatabricksUtils: def __init__(self‌‌): self.spark = SparkSession.getActiveSession() self.dbutils = DBUtil...

  • 18898 Views
  • 4 replies
  • 1 kudos
Latest Reply
T0M
Contributor
  • 1 kudos

Had the same Problem in my GitLab CI/CD Pipeline while trying to deploy: $ databricks bundle deploy -t dev Building package... Error: build failed package, error: exit status 1, output: Traceback (most recent call last): [...] File "/builds/user/...

  • 1 kudos
3 More Replies
MathewDRitch
by Databricks Partner
  • 5862 Views
  • 5 replies
  • 1 kudos

Connecting from Databricks to Network Path

Hi All,Will appreciate if someone can help me with some references links on connecting from Databricks to external network path. I have Databricks on AWS and previously used to connect to files on external network path using Mount method. Now Databri...

  • 5862 Views
  • 5 replies
  • 1 kudos
Latest Reply
om_khade
New Contributor II
  • 1 kudos

Do we have any update on this?

  • 1 kudos
4 More Replies
ozmike
by New Contributor II
  • 1105 Views
  • 3 replies
  • 0 kudos

Databrick select from web address that returns JSON

Hi I'm in a data bricks notebook and want to select from a web site that returns json.  For example this web site http://ergast.com/api/f1/2004/1/results.jsonwill return some JSON. (example only) Can i do the following or Do you need to use python. (...

  • 1105 Views
  • 3 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Databricks Partner
  • 0 kudos

yes, you could use SQL to read the data from the volume:%sql SELECT * FROM json.`/Volumes/demo/raw/files/ergast/my_results.json`You still have to flatten the result.The thing with the shell was just an example. Wouldn't recommend that, just use pytho...

  • 0 kudos
2 More Replies
Labels