cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Jiri_Koutny
by New Contributor III
  • 5916 Views
  • 11 replies
  • 3 kudos

Delay in files update on filesystem

Hi, I noticed that there is quite a significant delay (2 - 10s) between making a change to some file in Repos via Databricks file edit window and propagation of such change to the filesystem. Our engineers and scientists use YAML config files. If the...

  • 5916 Views
  • 11 replies
  • 3 kudos
Latest Reply
Irka
New Contributor II
  • 3 kudos

Is there a solution to this?BTW, the "ls" command trick didn't work for me

  • 3 kudos
10 More Replies
JamesY
by New Contributor III
  • 600 Views
  • 0 replies
  • 0 kudos

Databricks JDBC write to table with PK column, error, key not found.

Hello, I am trying to write data to table, it works find before, but after I recreated the table with one column as PK, there is an error.Unable to write into the A_Table table....key not found: id What is the correct way of doing this?PK column:   [...

Data Engineering
Databricks
SqlMi
  • 600 Views
  • 0 replies
  • 0 kudos
Prajwal_082
by New Contributor II
  • 1150 Views
  • 3 replies
  • 0 kudos

Overwriting a delta table using DLT

Hello,We are trying to ingest bunch of csv files that we receive on daily basis using DLT, we chose streaming table for this purpose since streaming table is append only records keep adding up on a daily basis which will cause multiple rows in downst...

  • 1150 Views
  • 3 replies
  • 0 kudos
Latest Reply
giuseppegrieco
New Contributor III
  • 0 kudos

In your scenario, if the data loaded on day 2 also includes all the data from day 1, you can still apply a "remove duplicates" logic. For instance, you could compute a hashdiff by hashing all the columns and use this to exclude rows you've already se...

  • 0 kudos
2 More Replies
Kjetil
by New Contributor III
  • 1299 Views
  • 1 replies
  • 1 kudos

Resolved! Read and process large CSV files that updates regularly

I've got a lot of large CSV files (> 1 GB) that updates regularly (stored in Data Lake Gen 2). The task is to concatenate these files into a single dataframe that is written to parquet format. However, since these files updates very often I get a rea...

  • 1299 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@Kjetil Since they are getting updated often then IMO making a copy would make sense.What you could try is to create Microsoft.Storage.BlobCreated event to replicate the .CSV into the secondary bucket.However, best practice would be to have some kind...

  • 1 kudos
Bazhar
by New Contributor
  • 629 Views
  • 0 replies
  • 0 kudos

Understanding this Ipython related error in cluster logs

Hi Databricks Community !I'm having this error from a cluster's logs : [IPKernelApp] ERROR | Exception in control handler:Traceback (most recent call last):File "/databricks/python/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 334, in p...

  • 629 Views
  • 0 replies
  • 0 kudos
virementz
by New Contributor II
  • 2470 Views
  • 4 replies
  • 2 kudos

Cluster Failed to Start - Cluster scoped init scrip failed: Script exit status is non-zero

i have been using cluster scoped init script for around 1 year already and everything is working fine. But suddenly, Databricks cluster has failed to restart since last week Thursday (13th June 2024). It returns this error:” Failed to add 2 container...

virementz_0-1718854375979.png virementz_1-1718854446857.png
  • 2470 Views
  • 4 replies
  • 2 kudos
Latest Reply
Wojciech_BUK
Valued Contributor III
  • 2 kudos

Just maybe - there is no outbound connection on DEV from Cluster VNET to URL you are trying to get ? You can spin ALl purpose cluster and try testing connection with %sh magic command

  • 2 kudos
3 More Replies
ChingizK
by New Contributor III
  • 732 Views
  • 1 replies
  • 0 kudos

Exclude a job from bundle deployment in PROD

My question is regarding Databricks Asset Bundles. I have defined a databricks.yml file the following way: bundle: name: my_bundle_name include: - resources/jobs/*.yml targets: dev: mode: development default: true workspace: ...

  • 732 Views
  • 1 replies
  • 0 kudos
Latest Reply
giuseppegrieco
New Contributor III
  • 0 kudos

Hello, if you want, you can deploy specific jobs only in the development environment. Since you have only two environments, a straightforward approach is to modify your jobs YAML definition as follows:resources: jobs: # Define the jobs to be de...

  • 0 kudos
HASSAN_UPPAL123
by New Contributor II
  • 2865 Views
  • 1 replies
  • 0 kudos

Resolved! Getting com.databricks.client.jdbc.Driver is not found error while connecting to databricks

Hi Community,I need help regarding the class not found issue. I'm trying to connect databricks in python via jaydebeapi, provided proper class name `com.databricks.client.jdbc.Driver` and jarpath `databricks-jdbc-2.6.34.jar`, but i'm getting com.data...

  • 2865 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16502773013
Databricks Employee
  • 0 kudos

Hello @HASSAN_UPPAL123 The class name is correct, for the jar please try downloading the latest from here This issue may also be a classpath issue were the jar is not exported correctly in your client setup, I see similar issues/ suggested solutions ...

  • 0 kudos
Avinash_Narala
by Contributor
  • 7015 Views
  • 2 replies
  • 0 kudos

export notebook

Hi,I want to export notebook in python programming.is there a way to leverage databricks cli in python.Or any other way to export the notebook to my local PC. 

  • 7015 Views
  • 2 replies
  • 0 kudos
Latest Reply
Pri-databricks
New Contributor II
  • 0 kudos

Is there a way to export a notebook though terraform? If so please provide examples?With terraform-provider-databricks.exe we are able to export all the notebooks from workspace but not a single notebook. Any suggestions ? 

  • 0 kudos
1 More Replies
xiangzhu
by Contributor III
  • 1920 Views
  • 5 replies
  • 1 kudos

Resolved! Retrieve job-level parameters in spark_python_task (not notebooks)

Hello,I would like to use job parameters in spark_python_task (not notebook_task), does anyone know how to retrieve these parameters inside pure Python ?I tried:1/ dbutils.widgets.get("debug"), got error:     com.databricks.dbutils_v1.InputWidgetNotD...

xiangzhu_0-1718966193180.png xiangzhu_1-1718966305318.png
  • 1920 Views
  • 5 replies
  • 1 kudos
Latest Reply
xiangzhu
Contributor III
  • 1 kudos

@daniel_sahal just tested {{job.parameters.[name]}}, it works, thanks again !

  • 1 kudos
4 More Replies
runninsavvy
by New Contributor II
  • 1301 Views
  • 2 replies
  • 0 kudos

Resolved! Cannot pass arrays to spark.sql() using named parameter markers

Hello all,I am attempting to use named parameter markers as shown in this article: https://docs.databricks.com/en/sql/language-manual/sql-ref-parameter-marker.html#named-parameter-markersI can pass strings and numbers in perfectly fine, but the issue...

  • 1301 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16502773013
Databricks Employee
  • 0 kudos

Hello @runninsavvy , The following code sample can be used in such case   val argArray = Array(1, 2, 3) val argMap = Map("param" -> argArray.mkString(",")) spark.sql("SELECT 1 IN (SELECT explode(split(:param, ',')))",argMap).show()

  • 0 kudos
1 More Replies
Nastia
by New Contributor III
  • 1062 Views
  • 1 replies
  • 0 kudos

DLT fails with Queries with streaming sources must be executed with writeStream.start();

Hi guys!I am having an issue with passing the "streaming flow" between layers of the DLT.first layer "ETD_Bz" is passing through, but then "ETD_Flattened_Bz" is failing with "pyspark.errors.exceptions.captured.AnalysisException: Queries with streamin...

  • 1062 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nastia
New Contributor III
  • 0 kudos

UPDATE: tried adding writeStream.start() like error suggested + as per other posts and ended up with following error/code: @dlt.table(   name="ETD_Bz",  temporary=False)def Bronze():    return (spark.readStream                 .format("delta")       ...

  • 0 kudos
ata_lh
by New Contributor II
  • 2049 Views
  • 1 replies
  • 0 kudos

Automatic conversion of timestamp to the default timezone

I am encountering the issue when ingesting data from adls xml or json files to process them via Pyspark (Autoloader or just reading df). The timestamp is automatically converted to the default timezone.And I have  dynamically timezone values. Did any...

  • 2049 Views
  • 1 replies
  • 0 kudos
Latest Reply
ata_lh
New Contributor II
  • 0 kudos

Hi @Retired_mod  , The point is that in the aim of our project, we need the timestamp attribute to be as they are from the source system. So basically our aim would be to have the attribute without the timezone conversion. I did the below tests so fa...

  • 0 kudos
sensanjoy
by Contributor
  • 1423 Views
  • 4 replies
  • 3 kudos

Java SQL Driver Manager not working in Unity Catalog shared mode

Hi All,We are facing issue during establishing connection with Azure SQL server through JDBC to perform UPSERT operation into sql server. Please find the connection statement and exception received during run:conn = spark._sc._jvm.java.sql.DriverMana...

  • 1423 Views
  • 4 replies
  • 3 kudos
Latest Reply
sensanjoy
Contributor
  • 3 kudos

Thanks @User16502773013 @jacovangelder That is something interesting to know that Lakehouse Federation does not support UPSERT(merge into...)!!@jacovangelder  I think with above approach(link shared by you) only support "append" and "overwrite", but ...

  • 3 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels