cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

rt-slowth
by Contributor
  • 4538 Views
  • 2 replies
  • 1 kudos

CRAS in @dlt

The Delta Table created as a result of the Dataframe returned by @dlt.create_table is confirmed to be overwritten when checked with the DECREASE HISTORY command.I want this to be handled as a CRAS, or CREATE AS SELECT, but how can I do this in python...

  • 4538 Views
  • 2 replies
  • 1 kudos
Latest Reply
siddhathPanchal
Databricks Employee
  • 1 kudos

Hi @rt-slowth You can review this open source code base of Delta to know more about the DeltaTableBuilder's implementation in Python.  https://github.com/delta-io/delta/blob/master/python/delta/tables.py

  • 1 kudos
1 More Replies
NanthakumarYoga
by New Contributor II
  • 1808 Views
  • 1 replies
  • 0 kudos

Parrallel Processing : Pool with 8 core and Standard Instance with 28GB

Hi Team,Need your inputs here on desiging the pool for our parrallel processingWe are processing around 4 to 5 GB files ( Process having adding a row number, removing header/trailer, adding addition 8 column which calculates over all 104 columns per ...

  • 1808 Views
  • 1 replies
  • 0 kudos
Latest Reply
siddhathPanchal
Databricks Employee
  • 0 kudos

Hi Nanthakumar. I also agree with the above solution. If this solution works for you, don't forget to press 'Accept as Solution' button.

  • 0 kudos
Erik_L
by Contributor II
  • 2836 Views
  • 2 replies
  • 0 kudos

Microbatching incremental updates Delta Live Tables

I need to create a workflow that pulls recent data from a database every two minutes, then transforms that data in various ways, and appends the results to a final table. The problem is that some of these changes _might_ update existing rows in the f...

  • 2836 Views
  • 2 replies
  • 0 kudos
Latest Reply
Manisha_Jena
Databricks Employee
  • 0 kudos

Hi @Erik_L, As my colleague mentioned, to ensure continuous operation of the Delta Live Tables pipeline compute during Workflow runs, choosing a prolonged Databricks Job over a triggered Databricks Workflow is a reliable strategy. This extended job w...

  • 0 kudos
1 More Replies
dbx_deltaSharin
by New Contributor II
  • 3387 Views
  • 2 replies
  • 1 kudos

Resolved! Open sharing protocol in Datbricks notebook

Hello,I utilize an Azure Databricks notebook to access Delta Sharing tables, employing the open sharing protocol. I've successfully uploaded the 'config.share' file to dbfs. Upon executing the commands:  client = delta_sharing.SharingClient(f"/dbfs/p...

Data Engineering
DELTA SHARING
  • 3387 Views
  • 2 replies
  • 1 kudos
Latest Reply
Manisha_Jena
Databricks Employee
  • 1 kudos

Hi @dbx_deltaSharin, When querying the individual partitions, the files are being read by using an S3 access point location while it is using the actual S3 name when reading the table as a whole. This information is fetched from the table metadata it...

  • 1 kudos
1 More Replies
dowdark
by New Contributor
  • 3158 Views
  • 2 replies
  • 0 kudos

UPDATE or DELETE with pipeline that needs to reprocess in DLT

i'm currently trying to replicate a existing pipeline that uses standard RDBMS. No experience in DataBricks at allI have about 4-5 tables (much like dimensions) with different events types and I want to my pipeline output a streaming table as final o...

  • 3158 Views
  • 2 replies
  • 0 kudos
Latest Reply
Manisha_Jena
Databricks Employee
  • 0 kudos

Hi @dowdark, What is the error that you get when the pipeline tries to update the rows instead of performing an insert? That should give us more info about the problem Please raise an SF case with us with this error and its complete stack trace.

  • 0 kudos
1 More Replies
nbakh
by New Contributor II
  • 14806 Views
  • 3 replies
  • 4 kudos

insert into a table with an identity column fails

i am trying to insert into a table with an identity column using a select query.However, if i include the identity column or ignore the identity column in my insert it throws errors. Is thee a way to insert into select * from a table if the insert t...

  • 14806 Views
  • 3 replies
  • 4 kudos
Latest Reply
karan_singh
New Contributor II
  • 4 kudos

Hi, Specify insert columns as below %sqlINSERT INTO demo_test (product_type, sales)SELECT product_type, sales FROM demo

  • 4 kudos
2 More Replies
pyyplacc
by New Contributor
  • 1030 Views
  • 0 replies
  • 0 kudos

Buy Pyypl Account

Buy Pyypl AccountBuy Pyypl account in our store! If you are looking for a pyypl account you can go to our website and buy the account without any problems. You can use the link belowBuy Pyypl Account HereBuy Pyypl Account Here

Pyypl.jpg
  • 1030 Views
  • 0 replies
  • 0 kudos
shiv4050
by New Contributor
  • 6042 Views
  • 4 replies
  • 0 kudos

Execute databricks notebook form a python source code.

Hello,I 'm trying to execute databricks notebook form a python source code but getting error.source code below------------------from databricks_api import DatabricksAPI   # Create a Databricks API client api = DatabricksAPI(host='databrick_host', tok...

  • 6042 Views
  • 4 replies
  • 0 kudos
Latest Reply
sewl
New Contributor II
  • 0 kudos

The error you are encountering indicates that there is an issue with establishing a connection to the Databricks host specified in your code. Specifically, the error message "getaddrinfo failed" suggests that the hostname or IP address you provided f...

  • 0 kudos
3 More Replies
dataslicer
by Contributor
  • 11736 Views
  • 4 replies
  • 1 kudos

Successfully installed Maven:Coordinates:com.crealytics:spark-excel_2.12:3.2.0_0.16.0 on Azure DBX 9.1 LTS runtime but getting error for missing dependency: org.apache.commons.io.IOUtils.byteArray(I)

I am using Azure DBX 9.1 LTS and successfully installed the following library on the cluster using Maven coordinates: com.crealytics:spark-excel_2.12:3.2.0_0.16.0When I executed the following line:excelSDF = spark.read.format("excel").option("dataAdd...

  • 11736 Views
  • 4 replies
  • 1 kudos
Latest Reply
RamRaju
New Contributor II
  • 1 kudos

Hi @dataslicer  were you able to solve this issue?I am using 9.1 lts databricks version with Spark 3.1.2 and scala 2.12. I have installed com.crealytics:spark-excel-2.12.17-3.1.2_2.12:3.1.2_0.18.1.  It was working fine but now facing same exception a...

  • 1 kudos
3 More Replies
SKC01
by New Contributor II
  • 3901 Views
  • 1 replies
  • 0 kudos

Delta table - version number change on merge

I am running a merge with pyspark on a delta table in which nothing is getting updated in the target table. Still target table version is incremented when I check the table history. Is that expected behavior?

Data Engineering
Delta table
deltatable
history
MERGE
version
  • 3901 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Yes, this is the expected behavior. In Delta Lake, every operation, including MERGE, is atomic. This means that each operation is a transaction that can either succeed completely or fail; it cannot have partial success. Even if the MERGE operation do...

  • 0 kudos
LiamS
by New Contributor
  • 5494 Views
  • 1 replies
  • 0 kudos

Resolved! Optimize table for joins using identity column

Hi There, I'm new to the delta table format so please bear with me if I've missed something obvious! I've migrated data from on prem. Sql to fabric and stored two related tables as delta tables. When I query data from these tables and join them based...

  • 5494 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, You mentioned that you have tried Z-ordering but it didn't impact the performance. Z-ordering is a technique that co-locates related information in the same set of files. It works best when the data is filtered by the column specified in the Z-or...

  • 0 kudos
prawan128
by New Contributor II
  • 1304 Views
  • 1 replies
  • 0 kudos

Triggering a job run on databricks compute cluster

Hi community, how to set jar_params for databricks jobs api when jar_params value is greater than 10000 bytes. 

  • 1304 Views
  • 1 replies
  • 0 kudos
Latest Reply
prawan128
New Contributor II
  • 0 kudos

@Retired_mod I was asking about jar_params as mentioned in the https://docs.databricks.com/en/workflows/jobs/jobs-2.0-api.html#request-structure since for my use case it can be more than 10000 bytes.

  • 0 kudos
Mikes
by New Contributor
  • 1528 Views
  • 0 replies
  • 0 kudos

DatabricksUnityCatalog: notebooks lineage not showing up in table/view lineage or lineage graph

Notebooks lineage not showing up in table&view lineage or lineage graph.I created two table and one view from a notebook by following the doc: Capture and explore lineage All lineages work fine, except the notebook lineage:   Lineage graph: Here is m...

Mikes_0-1699499370870.png Mikes_1-1699499411334.png Mikes_2-1699498421958.png Mikes_3-1699498777477.png
Data Engineering
azure
databricks unity catalog
lineage
Notebook
  • 1528 Views
  • 0 replies
  • 0 kudos
Labels