cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

erigaud
by Honored Contributor
  • 2105 Views
  • 2 replies
  • 2 kudos

Resolved! DLT - Unity catalog and volume - Dynamically access volume path

Hello, We're using a dlt pipeline using an autoloader that reads from a volume inside Unity catalogThe path of the volume is /Volumes/<my-catalog>/...How can I dynamically access the catalog value of the dlt pipeline to use it in the code ? I don't w...

  • 2105 Views
  • 2 replies
  • 2 kudos
Latest Reply
erigaud
Honored Contributor
  • 2 kudos

Works perfectly, thank you ! It's a shame the documentation does not detail that use case 

  • 2 kudos
1 More Replies
rockybhai
by New Contributor II
  • 1507 Views
  • 1 replies
  • 3 kudos

need urgent help

i am bringing 13000gb of data from redhsift to databricks by reading through spark and then wrting it has delta table so what is the best cluster configuration can you suggest and also wokrer nodes ....if i need to this to be done in 1hr

Data Engineering
clusteconfiguration
Databricks
dataengineering
redhsift
spark
  • 1507 Views
  • 1 replies
  • 3 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 3 kudos

Hi  @rockybhai  ,Transferring 13 TB of data from Amazon Redshift to Databricks and writing it as a Delta table within 1 hour is a significant task.Key ConsiderationsNetwork Bandwidth:Data Transfer Rate: To move 13 TB in 1 hour, you need a sustained d...

  • 3 kudos
Adam_Runarsson
by New Contributor II
  • 1903 Views
  • 3 replies
  • 0 kudos

Autoloader: Backfill on millions of files

Hi all!So I've been using Autoloader with File Notification mode against Azure to great success. Once past all the setup, it's rather seamless to use. I did have some issues in the beginning which is related to my questionThe storage account I'm work...

  • 1903 Views
  • 3 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

The docs are pretty sparse on the backfill process, but I think backfill won't just do a scan of the directory but will instead read the checkpoint file.  That seems logical to me anyways.

  • 0 kudos
2 More Replies
peta
by New Contributor II
  • 4423 Views
  • 3 replies
  • 0 kudos

Slow read using Snowflake connector in Databricks.

Hi,I am trying to read table from Snowflake with Databricks native Snowflake jdbc connector. It is going well for small amount of data (100 rows), but if I am adding more (even just 1000 rows) the query does not finish. I was checking if the query fi...

  • 4423 Views
  • 3 replies
  • 0 kudos
Latest Reply
sri123
New Contributor II
  • 0 kudos

Hi @peta Have your issue got resolved, I'm facing the similar issue. If your issue got resolved can you please post the steps that you followed to resolve your issue?Thanks & RegardsSri

  • 0 kudos
2 More Replies
DJey
by New Contributor III
  • 26384 Views
  • 6 replies
  • 2 kudos

Resolved! MergeSchema Not Working

Hi All, I have a scenario where my Exisiting Delta Table looks like below:Now I have an incremental data with an additional column i.e. owner:Dataframe Name --> scdDFBelow is the code snippet to merge Incremental Dataframe to targetTable, but the new...

image image image image
  • 26384 Views
  • 6 replies
  • 2 kudos
Latest Reply
Amin112
New Contributor II
  • 2 kudos

In Databricks Runtime 15.2 and above, you can specify schema evolution in a merge statement using SQL or Delta table APIs:MERGE WITH SCHEMA EVOLUTION INTO targetUSING sourceON source.key = target.keyWHEN MATCHED THENUPDATE SET *WHEN NOT MATCHED THENI...

  • 2 kudos
5 More Replies
Brad
by Contributor II
  • 1119 Views
  • 2 replies
  • 0 kudos

Why driver memory is capped

Hi teamWe are using a job cluster to run spark with MERGE. Somehow it needs a lot driver memory. We allocate 128G+16core node for driver, and specify spark.driver.memory=96000m. We can see it is 96000m from env table of spark UI. The config is like:"...

  • 1119 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brad
Contributor II
  • 0 kudos

Thanks for response. We are doubt why driver memory cannot be fully used (only 48G out of 128G is used for driver). Is this related with repartition?

  • 0 kudos
1 More Replies
leungi
by Contributor
  • 2964 Views
  • 5 replies
  • 1 kudos

Resolved! Unable to read Unity Catalog schema

Recently bumped into this (first-time) error, without a clear message as to cause.Insights welcomed.Error 

leungi_0-1727296106625.png
  • 2964 Views
  • 5 replies
  • 1 kudos
Latest Reply
leungi
Contributor
  • 1 kudos

@DivyaPasumarthi issue still persists, but found a workaround.Go to SQL Editor module, expand Catalog panel on the left, highlight desired table, right-click > Open in Catalog Explorer. 

  • 1 kudos
4 More Replies
TamD
by Contributor
  • 6901 Views
  • 6 replies
  • 0 kudos

Resolved! SELECT from VIEW to CREATE a table or view

Hi; I'm new to Databricks, so apologies if this is a dumb question.I have a notebook with SQL cells that are selecting data from various Delta tables into temporary views.  Then I have a query that joins up the data from these temporary views.I'd lik...

  • 6901 Views
  • 6 replies
  • 0 kudos
Latest Reply
TamD
Contributor
  • 0 kudos

Thanks, FelixIvy.  Just to clarify, the reason you can't use temporary views to load a materialized view is because materialized views (like regular views) must be created using a single query that is saved as part of the view definition.  So the sol...

  • 0 kudos
5 More Replies
Dave_Nithio
by Contributor II
  • 1128 Views
  • 1 replies
  • 1 kudos

OAuth U2M AWS Token Failure

I am attempting to generate a manual OAuth token using the instructions for AWS. When attempting to generate the account level authentication code I run into a localhost error:I have confirmed that all variables and urls are correct and that I am log...

Dave_Nithio_0-1727366943499.png Dave_Nithio_1-1727367412523.png
  • 1128 Views
  • 1 replies
  • 1 kudos
Latest Reply
Dave_Nithio
Contributor II
  • 1 kudos

After investigating further, the localhost issue was because I was already logged in and did not need to login again. The returned URL contained the authorization code. I was able to authenticate and run account level API requests with the generated ...

  • 1 kudos
vdeorios
by New Contributor II
  • 5121 Views
  • 5 replies
  • 2 kudos

Resolved! 404 on GET Billing usage data (API)

I'm trying to get my billing usage data from Databricks API (documentation: https://docs.databricks.com/api/gcp/account/billableusage/download) but I keep getting an 404 error.Code:import requestsimport jsontoken = dbutils.notebook.entry_point.getDbu...

  • 5121 Views
  • 5 replies
  • 2 kudos
Latest Reply
Dave_Nithio
Contributor II
  • 2 kudos

Bumping this to see if there is a solution. Per Databricks basic authentication is no longer allowed. I am unable to authenticate to get access to this endpoint (401 error). Does anyone have a solution to querying this endpoint?

  • 2 kudos
4 More Replies
richakamat130
by New Contributor
  • 1710 Views
  • 4 replies
  • 2 kudos

Change datetime format from one to another without changing datatype in databricks sql

Change datetime"2002-01-01T00:00:00.000" to 'MM/dd/yyyy HH:mm:ss' format without changing datatype/ having it in datetime data type

  • 1710 Views
  • 4 replies
  • 2 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 2 kudos

Hi @Mister-Dinky ,As @szymon_dybczak if you have a datetime, then you have a datetime.What you see is just a format of defined in the Databricks UI. Other applications may display it differently depending on the defaults, regional formats etc.If you ...

  • 2 kudos
3 More Replies
ChrisLawford_n1
by Contributor
  • 3413 Views
  • 3 replies
  • 1 kudos

Autoloader configuration for multiple tables from the same directory

I would like to get a recommendation on how to structure ingestion of lots of tables of data. I am using autoloader currently with the directory searching mode.I have concerns about performance in the future and have a requirement to ensure that data...

  • 3413 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

there is an easier way to see what has been processed:SELECT * FROM cloud_files_state('path/to/checkpoint'https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/production.html 

  • 1 kudos
2 More Replies
KristiLogos
by Contributor
  • 1460 Views
  • 2 replies
  • 0 kudos

Autoloader not ingesting all file data into Delta Table from Azure Blob Container

I have done the following, ie. crate a Delta Table where I plan to load the Azure Blob Container files that are .json.gz files: df = spark.read.option("multiline", "true").json(f"{container_location}/*.json.gz")  DeltaTable.create(spark) \    .addCol...

  • 1460 Views
  • 2 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

If it's streaming data, space it out with 10 seconds trigger .trigger(processingTime="10 seconds")   Do all the JSON files have the same schema? As your table creation is dynamic (df.schema), if all JSON doesn't have the same schema they may be skipp...

  • 0 kudos
1 More Replies
Brad
by Contributor II
  • 941 Views
  • 1 replies
  • 0 kudos

How to set file size for MERGE

Hi team,I use MERGE to merge source to target table. Source is incremental reading with checkpoint on delta table. Target is delta table without any partition. If the table is empty, with spark.databricks.delta.optimizeWrite.enabled it can create fil...

  • 941 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @Brad ,There are a couple of considerations here, the main being your runtime version and also whether you are using unit catalog.Check this document:https://docs.databricks.com/en/delta/tune-file-size.html

  • 0 kudos
Brad
by Contributor II
  • 1437 Views
  • 3 replies
  • 0 kudos

Will MERGE incur a lot driver memory

Hi team,We have a job to run MERGE on a target table with around 220 million rows. We found it needs a lot driver memory (just for MERGE itself). From the job metrics we can see the MERGE needs at least 46GB memory. Is there some special thing to mak...

  • 1437 Views
  • 3 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @Brad ,Could you try to apply very standard optimization practices and check the outcome:1. If your runtime is greater equal 15.2, could you implement liquid clustering on the source and target tables using JOIN columns?ALTER TABLE <table_name> CL...

  • 0 kudos
2 More Replies
Labels