cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Prajwal_082
by New Contributor II
  • 1441 Views
  • 3 replies
  • 0 kudos

Overwriting a delta table using DLT

Hello,We are trying to ingest bunch of csv files that we receive on daily basis using DLT, we chose streaming table for this purpose since streaming table is append only records keep adding up on a daily basis which will cause multiple rows in downst...

  • 1441 Views
  • 3 replies
  • 0 kudos
Latest Reply
giuseppegrieco
New Contributor III
  • 0 kudos

In your scenario, if the data loaded on day 2 also includes all the data from day 1, you can still apply a "remove duplicates" logic. For instance, you could compute a hashdiff by hashing all the columns and use this to exclude rows you've already se...

  • 0 kudos
2 More Replies
Kjetil
by New Contributor III
  • 1514 Views
  • 1 replies
  • 1 kudos

Resolved! Read and process large CSV files that updates regularly

I've got a lot of large CSV files (> 1 GB) that updates regularly (stored in Data Lake Gen 2). The task is to concatenate these files into a single dataframe that is written to parquet format. However, since these files updates very often I get a rea...

  • 1514 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@Kjetil Since they are getting updated often then IMO making a copy would make sense.What you could try is to create Microsoft.Storage.BlobCreated event to replicate the .CSV into the secondary bucket.However, best practice would be to have some kind...

  • 1 kudos
Bazhar
by New Contributor
  • 693 Views
  • 0 replies
  • 0 kudos

Understanding this Ipython related error in cluster logs

Hi Databricks Community !I'm having this error from a cluster's logs : [IPKernelApp] ERROR | Exception in control handler:Traceback (most recent call last):File "/databricks/python/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 334, in p...

  • 693 Views
  • 0 replies
  • 0 kudos
virementz
by New Contributor II
  • 3060 Views
  • 4 replies
  • 2 kudos

Cluster Failed to Start - Cluster scoped init scrip failed: Script exit status is non-zero

i have been using cluster scoped init script for around 1 year already and everything is working fine. But suddenly, Databricks cluster has failed to restart since last week Thursday (13th June 2024). It returns this error:” Failed to add 2 container...

virementz_0-1718854375979.png virementz_1-1718854446857.png
  • 3060 Views
  • 4 replies
  • 2 kudos
Latest Reply
Wojciech_BUK
Valued Contributor III
  • 2 kudos

Just maybe - there is no outbound connection on DEV from Cluster VNET to URL you are trying to get ? You can spin ALl purpose cluster and try testing connection with %sh magic command

  • 2 kudos
3 More Replies
ChingizK
by New Contributor III
  • 840 Views
  • 1 replies
  • 0 kudos

Exclude a job from bundle deployment in PROD

My question is regarding Databricks Asset Bundles. I have defined a databricks.yml file the following way: bundle: name: my_bundle_name include: - resources/jobs/*.yml targets: dev: mode: development default: true workspace: ...

  • 840 Views
  • 1 replies
  • 0 kudos
Latest Reply
giuseppegrieco
New Contributor III
  • 0 kudos

Hello, if you want, you can deploy specific jobs only in the development environment. Since you have only two environments, a straightforward approach is to modify your jobs YAML definition as follows:resources: jobs: # Define the jobs to be de...

  • 0 kudos
HASSAN_UPPAL123
by New Contributor II
  • 3186 Views
  • 1 replies
  • 0 kudos

Resolved! Getting com.databricks.client.jdbc.Driver is not found error while connecting to databricks

Hi Community,I need help regarding the class not found issue. I'm trying to connect databricks in python via jaydebeapi, provided proper class name `com.databricks.client.jdbc.Driver` and jarpath `databricks-jdbc-2.6.34.jar`, but i'm getting com.data...

  • 3186 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16502773013
Databricks Employee
  • 0 kudos

Hello @HASSAN_UPPAL123 The class name is correct, for the jar please try downloading the latest from here This issue may also be a classpath issue were the jar is not exported correctly in your client setup, I see similar issues/ suggested solutions ...

  • 0 kudos
Avinash_Narala
by Valued Contributor II
  • 7239 Views
  • 2 replies
  • 0 kudos

export notebook

Hi,I want to export notebook in python programming.is there a way to leverage databricks cli in python.Or any other way to export the notebook to my local PC. 

  • 7239 Views
  • 2 replies
  • 0 kudos
Latest Reply
Pri-databricks
New Contributor II
  • 0 kudos

Is there a way to export a notebook though terraform? If so please provide examples?With terraform-provider-databricks.exe we are able to export all the notebooks from workspace but not a single notebook. Any suggestions ? 

  • 0 kudos
1 More Replies
xiangzhu
by Contributor III
  • 2308 Views
  • 5 replies
  • 1 kudos

Resolved! Retrieve job-level parameters in spark_python_task (not notebooks)

Hello,I would like to use job parameters in spark_python_task (not notebook_task), does anyone know how to retrieve these parameters inside pure Python ?I tried:1/ dbutils.widgets.get("debug"), got error:     com.databricks.dbutils_v1.InputWidgetNotD...

xiangzhu_0-1718966193180.png xiangzhu_1-1718966305318.png
  • 2308 Views
  • 5 replies
  • 1 kudos
Latest Reply
xiangzhu
Contributor III
  • 1 kudos

@daniel_sahal just tested {{job.parameters.[name]}}, it works, thanks again !

  • 1 kudos
4 More Replies
runninsavvy
by New Contributor II
  • 1512 Views
  • 2 replies
  • 0 kudos

Resolved! Cannot pass arrays to spark.sql() using named parameter markers

Hello all,I am attempting to use named parameter markers as shown in this article: https://docs.databricks.com/en/sql/language-manual/sql-ref-parameter-marker.html#named-parameter-markersI can pass strings and numbers in perfectly fine, but the issue...

  • 1512 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16502773013
Databricks Employee
  • 0 kudos

Hello @runninsavvy , The following code sample can be used in such case   val argArray = Array(1, 2, 3) val argMap = Map("param" -> argArray.mkString(",")) spark.sql("SELECT 1 IN (SELECT explode(split(:param, ',')))",argMap).show()

  • 0 kudos
1 More Replies
Nastia
by New Contributor III
  • 1216 Views
  • 1 replies
  • 0 kudos

DLT fails with Queries with streaming sources must be executed with writeStream.start();

Hi guys!I am having an issue with passing the "streaming flow" between layers of the DLT.first layer "ETD_Bz" is passing through, but then "ETD_Flattened_Bz" is failing with "pyspark.errors.exceptions.captured.AnalysisException: Queries with streamin...

  • 1216 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nastia
New Contributor III
  • 0 kudos

UPDATE: tried adding writeStream.start() like error suggested + as per other posts and ended up with following error/code: @dlt.table(   name="ETD_Bz",  temporary=False)def Bronze():    return (spark.readStream                 .format("delta")       ...

  • 0 kudos
ata_lh
by New Contributor II
  • 2373 Views
  • 1 replies
  • 0 kudos

Automatic conversion of timestamp to the default timezone

I am encountering the issue when ingesting data from adls xml or json files to process them via Pyspark (Autoloader or just reading df). The timestamp is automatically converted to the default timezone.And I have  dynamically timezone values. Did any...

  • 2373 Views
  • 1 replies
  • 0 kudos
Latest Reply
ata_lh
New Contributor II
  • 0 kudos

Hi @Retired_mod  , The point is that in the aim of our project, we need the timestamp attribute to be as they are from the source system. So basically our aim would be to have the attribute without the timezone conversion. I did the below tests so fa...

  • 0 kudos
sensanjoy
by Contributor
  • 1811 Views
  • 4 replies
  • 3 kudos

Java SQL Driver Manager not working in Unity Catalog shared mode

Hi All,We are facing issue during establishing connection with Azure SQL server through JDBC to perform UPSERT operation into sql server. Please find the connection statement and exception received during run:conn = spark._sc._jvm.java.sql.DriverMana...

  • 1811 Views
  • 4 replies
  • 3 kudos
Latest Reply
sensanjoy
Contributor
  • 3 kudos

Thanks @User16502773013 @jacovangelder That is something interesting to know that Lakehouse Federation does not support UPSERT(merge into...)!!@jacovangelder  I think with above approach(link shared by you) only support "append" and "overwrite", but ...

  • 3 kudos
3 More Replies
skirock
by New Contributor
  • 996 Views
  • 0 replies
  • 0 kudos

DLT live tables error while reading file from datalake gen2

I am getting following error while running cell in python.  Same file is run fine when i upload json file into databrics and then give this path to df.read syntex while reading it.   When i use DLT for same file which is in datalake it gives me follo...

  • 996 Views
  • 0 replies
  • 0 kudos
dbdude
by New Contributor II
  • 10025 Views
  • 2 replies
  • 2 kudos

Delete Delta Live Table Completely

I've been struggling with figuring out how to delete a managed delta live table. If I run a drop command in Databricks SQL I get:[STREAMING_TABLE_OPERATION_NOT_ALLOWED.DROP_DELTA_LIVE_TABLE] The operation DROP is not allowed: The operation does not a...

  • 10025 Views
  • 2 replies
  • 2 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels