cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

chevichenk
by New Contributor III
  • 1834 Views
  • 1 replies
  • 1 kudos

Resolved! Why I still see delta history if I made a vacuum with just 5 hours retention

Hi, everyone!I execute a vacuum with 5 hours retention but I can see all the history of versions, even I can query those older version of the table.Plus, when I see the history version, it doesn't start with zero (supposed to be the creation of the t...

chevichenk_1-1699661687787.png
  • 1834 Views
  • 1 replies
  • 1 kudos
Latest Reply
Rom
New Contributor III
  • 1 kudos

Hi,When disk caching is enabled, a cluster might contain data from Parquet files that have been deleted with VACUUM. Therefore, it may be possible to query the data of previous table versions whose files have been deleted. Restarting the cluster will...

  • 1 kudos
MikeK_
by New Contributor II
  • 40661 Views
  • 6 replies
  • 0 kudos

Resolved! SQL Update Join

  Hi, I'm importing some data and stored procedures from SQL Server into databricks, I noticed that updates with joins are not supported in Spark SQL, what's the alternative I can use? Here's what I'm trying to do: update t1 set t1.colB=CASE WHEN t2....

  • 40661 Views
  • 6 replies
  • 0 kudos
Latest Reply
LyderIversen
New Contributor II
  • 0 kudos

Hi! This is way late, but did you ever find a solution to the CROSS APPLY-part of your question? Is it possible to do CROSS APPLY in Spark SQL, or is there something you can use instead?

  • 0 kudos
5 More Replies
boyarkaandre
by New Contributor
  • 1168 Views
  • 0 replies
  • 0 kudos

Buy ByBit Account

Buy ByBit Accounthello! Are you looking for a bybit account? You can go to our store and buy bybit account here. You can see crypto account in our store!Buy ByBit Account hereBuy ByBit Account here

bybit.jpg
  • 1168 Views
  • 0 replies
  • 0 kudos
andyh
by New Contributor
  • 2686 Views
  • 2 replies
  • 0 kudos

Resolved! Job queue for pool limit

I have a cluster pool with a max capacity limit, to make sure we're not burning too extra silicon. We use this for some of our less critical workflow/jobs. They still spend a lot of time idle, but sometimes hit this max capacity limit. Is there a way...

  • 2686 Views
  • 2 replies
  • 0 kudos
Latest Reply
SSundaram
Contributor
  • 0 kudos

Try increasing your max capacity limit and might want to bring down the min number of nodes the job uses.At the job level try configuring retry and time interval between retries. 

  • 0 kudos
1 More Replies
yutaro_ono1_558
by New Contributor II
  • 11368 Views
  • 2 replies
  • 1 kudos

How to read data from S3 Access Point by pyspark?

I want to read data from s3 access point.I successfully accessed using boto3 client to data through s3 access point.s3 = boto3.resource('s3')ap = s3.Bucket('arn:aws:s3:[region]:[aws account id]:accesspoint/[S3 Access Point name]')for obj in ap.object...

  • 11368 Views
  • 2 replies
  • 1 kudos
Latest Reply
shrestha-rj
New Contributor II
  • 1 kudos

I'm reaching out to seek assistance as I navigate an issue. Currently, I'm trying to read JSON files from an S3 Multi-Region Access Point using a Databricks notebook. While reading directly from the S3 bucket presents no challenges, I encounter an "j...

  • 1 kudos
1 More Replies
rbricks007
by New Contributor II
  • 3751 Views
  • 1 replies
  • 0 kudos

Trying to use pivot function with pyspark for count aggregate

I'm trying this code but getting the following error testDF = (eventsDF .groupBy("user_id") .pivot("event_name") .count("event_name")) TypeError: _api() takes 1 positional argument but 2 were givenPlease guide how to fix...

Data Engineering
count
pivot
python
  • 3751 Views
  • 1 replies
  • 0 kudos
Latest Reply
Krishnamatta
Contributor
  • 0 kudos

Try thisfrom pyspark.sql import functions as F testDF = (eventsDF .groupBy("user_id") .pivot("event_name") .agg(F.count("event_name")))  

  • 0 kudos
victor-nj-miami
by New Contributor II
  • 8953 Views
  • 2 replies
  • 2 kudos

Resolved! Cannot create a metastore anymore

Hi Community,I am trying to create a metastore for the Unity Catalog, but I am getting an error saying that there is already a metastore in the region, which is not true, because I deleted all the metastores. I used to have one working properly, but ...

  • 8953 Views
  • 2 replies
  • 2 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 2 kudos

@ashu_sama I see your issue got resolved by clearing or purging revision history, can you mark this as resolved  

  • 2 kudos
1 More Replies
Baldur
by New Contributor II
  • 3350 Views
  • 3 replies
  • 1 kudos

Unable to follow H3 Quickstart

Hello,I'm following H3 quickstart(Databricks SQL) tutorial because I want to do point-in-polygon queries on 21k polygons and 95B points. The volume is pushing me towards using H3. In the tutorial, they use geopandas.According to H3 geospatial functio...

Baldur_0-1698768174045.png
Data Engineering
Geopandas
H3
  • 3350 Views
  • 3 replies
  • 1 kudos
Latest Reply
siddhathPanchal
Databricks Employee
  • 1 kudos

Hi @Baldur . I hope that above answer solved your problem. If you have any follow up questions, please let us know. If you like the solution, please do not forget to press 'Accept as Solution' button.

  • 1 kudos
2 More Replies
rt-slowth
by Contributor
  • 3932 Views
  • 2 replies
  • 1 kudos

CRAS in @dlt

The Delta Table created as a result of the Dataframe returned by @dlt.create_table is confirmed to be overwritten when checked with the DECREASE HISTORY command.I want this to be handled as a CRAS, or CREATE AS SELECT, but how can I do this in python...

  • 3932 Views
  • 2 replies
  • 1 kudos
Latest Reply
siddhathPanchal
Databricks Employee
  • 1 kudos

Hi @rt-slowth You can review this open source code base of Delta to know more about the DeltaTableBuilder's implementation in Python.  https://github.com/delta-io/delta/blob/master/python/delta/tables.py

  • 1 kudos
1 More Replies
NanthakumarYoga
by New Contributor II
  • 1513 Views
  • 1 replies
  • 0 kudos

Parrallel Processing : Pool with 8 core and Standard Instance with 28GB

Hi Team,Need your inputs here on desiging the pool for our parrallel processingWe are processing around 4 to 5 GB files ( Process having adding a row number, removing header/trailer, adding addition 8 column which calculates over all 104 columns per ...

  • 1513 Views
  • 1 replies
  • 0 kudos
Latest Reply
siddhathPanchal
Databricks Employee
  • 0 kudos

Hi Nanthakumar. I also agree with the above solution. If this solution works for you, don't forget to press 'Accept as Solution' button.

  • 0 kudos
Erik_L
by Contributor II
  • 2415 Views
  • 2 replies
  • 0 kudos

Microbatching incremental updates Delta Live Tables

I need to create a workflow that pulls recent data from a database every two minutes, then transforms that data in various ways, and appends the results to a final table. The problem is that some of these changes _might_ update existing rows in the f...

  • 2415 Views
  • 2 replies
  • 0 kudos
Latest Reply
Manisha_Jena
Databricks Employee
  • 0 kudos

Hi @Erik_L, As my colleague mentioned, to ensure continuous operation of the Delta Live Tables pipeline compute during Workflow runs, choosing a prolonged Databricks Job over a triggered Databricks Workflow is a reliable strategy. This extended job w...

  • 0 kudos
1 More Replies
dbx_deltaSharin
by New Contributor II
  • 2892 Views
  • 2 replies
  • 1 kudos

Resolved! Open sharing protocol in Datbricks notebook

Hello,I utilize an Azure Databricks notebook to access Delta Sharing tables, employing the open sharing protocol. I've successfully uploaded the 'config.share' file to dbfs. Upon executing the commands:  client = delta_sharing.SharingClient(f"/dbfs/p...

Data Engineering
DELTA SHARING
  • 2892 Views
  • 2 replies
  • 1 kudos
Latest Reply
Manisha_Jena
Databricks Employee
  • 1 kudos

Hi @dbx_deltaSharin, When querying the individual partitions, the files are being read by using an S3 access point location while it is using the actual S3 name when reading the table as a whole. This information is fetched from the table metadata it...

  • 1 kudos
1 More Replies
dowdark
by New Contributor
  • 2681 Views
  • 2 replies
  • 0 kudos

UPDATE or DELETE with pipeline that needs to reprocess in DLT

i'm currently trying to replicate a existing pipeline that uses standard RDBMS. No experience in DataBricks at allI have about 4-5 tables (much like dimensions) with different events types and I want to my pipeline output a streaming table as final o...

  • 2681 Views
  • 2 replies
  • 0 kudos
Latest Reply
Manisha_Jena
Databricks Employee
  • 0 kudos

Hi @dowdark, What is the error that you get when the pipeline tries to update the rows instead of performing an insert? That should give us more info about the problem Please raise an SF case with us with this error and its complete stack trace.

  • 0 kudos
1 More Replies
nbakh
by New Contributor II
  • 13379 Views
  • 3 replies
  • 4 kudos

insert into a table with an identity column fails

i am trying to insert into a table with an identity column using a select query.However, if i include the identity column or ignore the identity column in my insert it throws errors. Is thee a way to insert into select * from a table if the insert t...

  • 13379 Views
  • 3 replies
  • 4 kudos
Latest Reply
karan_singh
New Contributor II
  • 4 kudos

Hi, Specify insert columns as below %sqlINSERT INTO demo_test (product_type, sales)SELECT product_type, sales FROM demo

  • 4 kudos
2 More Replies
pyyplacc
by New Contributor
  • 870 Views
  • 0 replies
  • 0 kudos

Buy Pyypl Account

Buy Pyypl AccountBuy Pyypl account in our store! If you are looking for a pyypl account you can go to our website and buy the account without any problems. You can use the link belowBuy Pyypl Account HereBuy Pyypl Account Here

Pyypl.jpg
  • 870 Views
  • 0 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels