cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Martin_Pham
by New Contributor III
  • 1314 Views
  • 1 replies
  • 1 kudos

Resolved! Is Datbricks-Salesforce already available to use?

Reference: Salesforce and Databricks Announce Strategic Partnership to Bring Lakehouse Data Sharing and Shared ...I was going through this article and wanted to know if this is already released. My assumption is that there’s no need to use third-part...

  • 1314 Views
  • 1 replies
  • 1 kudos
Latest Reply
Martin_Pham
New Contributor III
  • 1 kudos

Looks like it has been released - Salesforce BYOM

  • 1 kudos
Jackson1111
by New Contributor III
  • 1161 Views
  • 1 replies
  • 0 kudos

How to use job.run_id as the running parameter of jar job to trigger job through REST API

"[,\"\{\{job.run_id\}\}\"]" {"error_code": "INVALID_PARAMETER_VALUE","message": "Legacy parameters cannot contain references."}

  • 1161 Views
  • 1 replies
  • 0 kudos
Latest Reply
Jackson1111
New Contributor III
  • 0 kudos

How to get the Job ID and Run ID in job runing?

  • 0 kudos
ttamas
by New Contributor III
  • 5661 Views
  • 1 replies
  • 0 kudos

Get the triggering task's name

Hi,I have tasks that depend on each other. I would like to get variables from task1 that triggers task2.This is how I solved for my problem:Following suggestion in https://community.databricks.com/t5/data-engineering/how-to-pass-parameters-to-a-quot-...

  • 5661 Views
  • 1 replies
  • 0 kudos
Kjetil
by Contributor
  • 3399 Views
  • 3 replies
  • 2 kudos

Resolved! Autoloader to concatenate CSV files that updates regularly into a single parquet dataframe.

I have multiple large CSV files. One or more of these files changes now and then (a few times a day). The changes in the CSV files are both of type update and append (so both new rows) and updates of old. I want to combine all CSV files into a datafr...

  • 3399 Views
  • 3 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Hi @Kjetil, Please let us know if you still have issue or if @-werners- response could be mark as a best solution. Thank you

  • 2 kudos
2 More Replies
KSI
by New Contributor II
  • 1563 Views
  • 1 replies
  • 0 kudos

Variant datatype

I'm checking on variant datatype and noted that whenever a JSON string is stored as a variant datatype in order to filter and value it needs to be casted: i.eSELECT sum(jsondatavar:Value::double )FROM tableWHERE jsondatavar:customer ::int= 1000Here j...

  • 1563 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

Could you please try using SQL functions:  SELECT SUM(CAST(get_json_object(jsondatavar, '$.Value') AS DOUBLE)) AS total_value FROM table WHERE CAST(get_json_object(jsondatavar, '$.customer') AS INT) = 1000

  • 0 kudos
Prajwal_082
by New Contributor II
  • 3116 Views
  • 3 replies
  • 0 kudos

Overwriting a delta table using DLT

Hello,We are trying to ingest bunch of csv files that we receive on daily basis using DLT, we chose streaming table for this purpose since streaming table is append only records keep adding up on a daily basis which will cause multiple rows in downst...

  • 3116 Views
  • 3 replies
  • 0 kudos
Latest Reply
giuseppegrieco
New Contributor III
  • 0 kudos

In your scenario, if the data loaded on day 2 also includes all the data from day 1, you can still apply a "remove duplicates" logic. For instance, you could compute a hashdiff by hashing all the columns and use this to exclude rows you've already se...

  • 0 kudos
2 More Replies
Kjetil
by Contributor
  • 3334 Views
  • 1 replies
  • 1 kudos

Resolved! Read and process large CSV files that updates regularly

I've got a lot of large CSV files (> 1 GB) that updates regularly (stored in Data Lake Gen 2). The task is to concatenate these files into a single dataframe that is written to parquet format. However, since these files updates very often I get a rea...

  • 3334 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 1 kudos

@Kjetil Since they are getting updated often then IMO making a copy would make sense.What you could try is to create Microsoft.Storage.BlobCreated event to replicate the .CSV into the secondary bucket.However, best practice would be to have some kind...

  • 1 kudos
Bazhar
by New Contributor
  • 1220 Views
  • 0 replies
  • 0 kudos

Understanding this Ipython related error in cluster logs

Hi Databricks Community !I'm having this error from a cluster's logs : [IPKernelApp] ERROR | Exception in control handler:Traceback (most recent call last):File "/databricks/python/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 334, in p...

  • 1220 Views
  • 0 replies
  • 0 kudos
virementz
by New Contributor II
  • 7571 Views
  • 4 replies
  • 3 kudos

Cluster Failed to Start - Cluster scoped init scrip failed: Script exit status is non-zero

i have been using cluster scoped init script for around 1 year already and everything is working fine. But suddenly, Databricks cluster has failed to restart since last week Thursday (13th June 2024). It returns this error:” Failed to add 2 container...

virementz_0-1718854375979.png virementz_1-1718854446857.png
  • 7571 Views
  • 4 replies
  • 3 kudos
Latest Reply
Wojciech_BUK
Valued Contributor III
  • 3 kudos

Just maybe - there is no outbound connection on DEV from Cluster VNET to URL you are trying to get ? You can spin ALl purpose cluster and try testing connection with %sh magic command

  • 3 kudos
3 More Replies
HASSAN_UPPAL123
by New Contributor II
  • 5743 Views
  • 1 replies
  • 0 kudos

Resolved! Getting com.databricks.client.jdbc.Driver is not found error while connecting to databricks

Hi Community,I need help regarding the class not found issue. I'm trying to connect databricks in python via jaydebeapi, provided proper class name `com.databricks.client.jdbc.Driver` and jarpath `databricks-jdbc-2.6.34.jar`, but i'm getting com.data...

  • 5743 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16502773013
Databricks Employee
  • 0 kudos

Hello @HASSAN_UPPAL123 The class name is correct, for the jar please try downloading the latest from here This issue may also be a classpath issue were the jar is not exported correctly in your client setup, I see similar issues/ suggested solutions ...

  • 0 kudos
Avinash_Narala
by Databricks Partner
  • 8770 Views
  • 2 replies
  • 0 kudos

export notebook

Hi,I want to export notebook in python programming.is there a way to leverage databricks cli in python.Or any other way to export the notebook to my local PC. 

  • 8770 Views
  • 2 replies
  • 0 kudos
Latest Reply
Pri-databricks
New Contributor II
  • 0 kudos

Is there a way to export a notebook though terraform? If so please provide examples?With terraform-provider-databricks.exe we are able to export all the notebooks from workspace but not a single notebook. Any suggestions ? 

  • 0 kudos
1 More Replies
xiangzhu
by Contributor III
  • 5240 Views
  • 5 replies
  • 1 kudos

Resolved! Retrieve job-level parameters in spark_python_task (not notebooks)

Hello,I would like to use job parameters in spark_python_task (not notebook_task), does anyone know how to retrieve these parameters inside pure Python ?I tried:1/ dbutils.widgets.get("debug"), got error:     com.databricks.dbutils_v1.InputWidgetNotD...

xiangzhu_0-1718966193180.png xiangzhu_1-1718966305318.png
  • 5240 Views
  • 5 replies
  • 1 kudos
Latest Reply
xiangzhu
Contributor III
  • 1 kudos

@daniel_sahal just tested {{job.parameters.[name]}}, it works, thanks again !

  • 1 kudos
4 More Replies
runninsavvy
by New Contributor II
  • 3299 Views
  • 2 replies
  • 0 kudos

Resolved! Cannot pass arrays to spark.sql() using named parameter markers

Hello all,I am attempting to use named parameter markers as shown in this article: https://docs.databricks.com/en/sql/language-manual/sql-ref-parameter-marker.html#named-parameter-markersI can pass strings and numbers in perfectly fine, but the issue...

  • 3299 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16502773013
Databricks Employee
  • 0 kudos

Hello @runninsavvy , The following code sample can be used in such case   val argArray = Array(1, 2, 3) val argMap = Map("param" -> argArray.mkString(",")) spark.sql("SELECT 1 IN (SELECT explode(split(:param, ',')))",argMap).show()

  • 0 kudos
1 More Replies
Nastia
by New Contributor III
  • 2345 Views
  • 1 replies
  • 0 kudos

DLT fails with Queries with streaming sources must be executed with writeStream.start();

Hi guys!I am having an issue with passing the "streaming flow" between layers of the DLT.first layer "ETD_Bz" is passing through, but then "ETD_Flattened_Bz" is failing with "pyspark.errors.exceptions.captured.AnalysisException: Queries with streamin...

  • 2345 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nastia
New Contributor III
  • 0 kudos

UPDATE: tried adding writeStream.start() like error suggested + as per other posts and ended up with following error/code: @dlt.table(   name="ETD_Bz",  temporary=False)def Bronze():    return (spark.readStream                 .format("delta")       ...

  • 0 kudos
ata_lh
by New Contributor II
  • 4237 Views
  • 1 replies
  • 0 kudos

Automatic conversion of timestamp to the default timezone

I am encountering the issue when ingesting data from adls xml or json files to process them via Pyspark (Autoloader or just reading df). The timestamp is automatically converted to the default timezone.And I have  dynamically timezone values. Did any...

  • 4237 Views
  • 1 replies
  • 0 kudos
Latest Reply
ata_lh
New Contributor II
  • 0 kudos

Hi @Retired_mod  , The point is that in the aim of our project, we need the timestamp attribute to be as they are from the source system. So basically our aim would be to have the attribute without the timezone conversion. I did the below tests so fa...

  • 0 kudos
Labels