Data Engineering

Forum Posts

Sorted by:

Start a conversation

by leungi • Contributor

09-25-2024 1:31:20 PM

2117 Views
5 replies
1 kudos

Resolved! Unable to read Unity Catalog schema

Recently bumped into this (first-time) error, without a clear message as to cause.Insights welcomed.Error

Data Engineering

2117 Views
5 replies
1 kudos

09-25-2024 1:31:20 PM

View Replies

Latest Reply

leungi
Contributor

09-26-2024 9:47:56 AM

1 kudos

@DivyaPasumarthi issue still persists, but found a workaround.Go to SQL Editor module, expand Catalog panel on the left, highlight desired table, right-click > Open in Catalog Explorer.

1 kudos

09-26-2024 9:47:56 AM

4 More Replies

by TamD • Contributor

09-18-2024 7:25:35 PM

6289 Views
6 replies
0 kudos

Resolved! SELECT from VIEW to CREATE a table or view

Hi; I'm new to Databricks, so apologies if this is a dumb question.I have a notebook with SQL cells that are selecting data from various Delta tables into temporary views. Then I have a query that joins up the data from these temporary views.I'd lik...

Data Engineering

6289 Views
6 replies
0 kudos

09-18-2024 7:25:35 PM

View Replies

Latest Reply

TamD
Contributor

09-26-2024 1:55:37 PM

0 kudos

Thanks, FelixIvy. Just to clarify, the reason you can't use temporary views to load a materialized view is because materialized views (like regular views) must be created using a single query that is saved as part of the view definition. So the sol...

0 kudos

09-26-2024 1:55:37 PM

5 More Replies

by Dave_Nithio • Contributor II

09-26-2024 9:19:14 AM

1040 Views
1 replies
1 kudos

OAuth U2M AWS Token Failure

I am attempting to generate a manual OAuth token using the instructions for AWS. When attempting to generate the account level authentication code I run into a localhost error:I have confirmed that all variables and urls are correct and that I am log...

Data Engineering

1040 Views
1 replies
1 kudos

09-26-2024 9:19:14 AM

View Replies

Latest Reply

Dave_Nithio
Contributor II

09-26-2024 10:10:02 AM

1 kudos

After investigating further, the localhost issue was because I was already logged in and did not need to login again. The returned URL contained the authorization code. I was able to authenticate and run account level API requests with the generated ...

1 kudos

09-26-2024 10:10:02 AM

by vdeorios • New Contributor II

08-16-2023 11:04:50 AM

4875 Views
5 replies
2 kudos

Resolved! 404 on GET Billing usage data (API)

I'm trying to get my billing usage data from Databricks API (documentation: https://docs.databricks.com/api/gcp/account/billableusage/download) but I keep getting an 404 error.Code:import requestsimport jsontoken = dbutils.notebook.entry_point.getDbu...

Data Engineering

4875 Views
5 replies
2 kudos

08-16-2023 11:04:50 AM

View Replies

Latest Reply

Dave_Nithio
Contributor II

09-26-2024 7:38:47 AM

2 kudos

Bumping this to see if there is a solution. Per Databricks basic authentication is no longer allowed. I am unable to authenticate to get access to this endpoint (401 error). Does anyone have a solution to querying this endpoint?

2 kudos

09-26-2024 7:38:47 AM

4 More Replies

by richakamat130 • New Contributor

09-26-2024 1:10:52 AM

1545 Views
4 replies
2 kudos

Change datetime format from one to another without changing datatype in databricks sql

Change datetime"2002-01-01T00:00:00.000" to 'MM/dd/yyyy HH:mm:ss' format without changing datatype/ having it in datetime data type

Data Engineering

1545 Views
4 replies
2 kudos

09-26-2024 1:10:52 AM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

09-26-2024 6:53:24 AM

2 kudos

Hi @Mister-Dinky ,As @szymon_dybczak if you have a datetime, then you have a datetime.What you see is just a format of defined in the Databricks UI. Other applications may display it differently depending on the defaults, regional formats etc.If you ...

2 kudos

09-26-2024 6:53:24 AM

3 More Replies

by ChrisLawford_n1 • Contributor

09-25-2024 5:48:23 AM

3049 Views
3 replies
1 kudos

Autoloader configuration for multiple tables from the same directory

I would like to get a recommendation on how to structure ingestion of lots of tables of data. I am using autoloader currently with the directory searching mode.I have concerns about performance in the future and have a requirement to ensure that data...

Data Engineering

3049 Views
3 replies
1 kudos

09-25-2024 5:48:23 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-26-2024 6:47:45 AM

1 kudos

there is an easier way to see what has been processed:SELECT * FROM cloud_files_state('path/to/checkpoint'https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/production.html

1 kudos

09-26-2024 6:47:45 AM

2 More Replies

by KristiLogos • Contributor

09-25-2024 9:18:10 AM

1330 Views
2 replies
0 kudos

Autoloader not ingesting all file data into Delta Table from Azure Blob Container

I have done the following, ie. crate a Delta Table where I plan to load the Azure Blob Container files that are .json.gz files: df = spark.read.option("multiline", "true").json(f"{container_location}/*.json.gz") DeltaTable.create(spark) \ .addCol...

Data Engineering

1330 Views
2 replies
0 kudos

09-25-2024 9:18:10 AM

View Replies

Latest Reply

gchandra
Databricks Employee

09-26-2024 3:26:25 AM

0 kudos

If it's streaming data, space it out with 10 seconds trigger .trigger(processingTime="10 seconds") Do all the JSON files have the same schema? As your table creation is dynamic (df.schema), if all JSON doesn't have the same schema they may be skipp...

0 kudos

09-26-2024 3:26:25 AM

1 More Replies

by Brad • Contributor II

09-25-2024 11:33:08 PM

872 Views
1 replies
0 kudos

How to set file size for MERGE

Hi team,I use MERGE to merge source to target table. Source is incremental reading with checkpoint on delta table. Target is delta table without any partition. If the table is empty, with spark.databricks.delta.optimizeWrite.enabled it can create fil...

Data Engineering

872 Views
1 replies
0 kudos

09-25-2024 11:33:08 PM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

09-26-2024 12:09:24 AM

0 kudos

Hi @Brad ,There are a couple of considerations here, the main being your runtime version and also whether you are using unit catalog.Check this document:https://docs.databricks.com/en/delta/tune-file-size.html

0 kudos

09-26-2024 12:09:24 AM

by Brad • Contributor II

09-25-2024 11:34:21 AM

1268 Views
3 replies
0 kudos

Will MERGE incur a lot driver memory

Hi team,We have a job to run MERGE on a target table with around 220 million rows. We found it needs a lot driver memory (just for MERGE itself). From the job metrics we can see the MERGE needs at least 46GB memory. Is there some special thing to mak...

Data Engineering

1268 Views
3 replies
0 kudos

09-25-2024 11:34:21 AM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

09-25-2024 11:30:52 PM

0 kudos

Hi @Brad ,Could you try to apply very standard optimization practices and check the outcome:1. If your runtime is greater equal 15.2, could you implement liquid clustering on the source and target tables using JOIN columns?ALTER TABLE <table_name> CL...

0 kudos

09-25-2024 11:30:52 PM

2 More Replies

by hcord • New Contributor II

09-25-2024 1:43:15 PM

1604 Views
1 replies
2 kudos

Resolved! Trigger a workflow from a different databricks environment

Hello everyone,In the company I work we have a lot of different databricks environments and now we're in need of deeper integration of processes from environment's X and Y. There's a workflow in Y that runs a process that when finished we would like ...

Data Engineering

1604 Views
1 replies
2 kudos

09-25-2024 1:43:15 PM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-25-2024 1:45:44 PM

2 kudos

Hi @hcord ,You can use REST API in the last task to trigger a workflow in different workspace

2 kudos

09-25-2024 1:45:44 PM

by sshynkary • New Contributor

09-25-2024 1:45:27 AM

2855 Views
1 replies
0 kudos

Loading data from spark dataframe directly to Sharepoint

Hi guys!I am trying to load data directly from PySpark dataframe to Sharepoint folder and I cannot find a solution regarding it.I wanted to implement workaround using volumes and logic apps, but there are few issues. I need to partition df in a few f...

Data Engineering

SharePoint

spark

2855 Views
1 replies
0 kudos

09-25-2024 1:45:27 AM

View Replies

Latest Reply

ChKing
New Contributor II

09-25-2024 10:19:06 AM

0 kudos

One approach could involve using Azure Data Lake as an intermediary. You can partition your PySpark DataFrames and load them into Azure Data Lake, which is optimized for large-scale data storage and integrates well with PySpark. Once the data is in A...

0 kudos

09-25-2024 10:19:06 AM

by dpc • Contributor

09-23-2024 3:07:54 AM

13329 Views
4 replies
2 kudos

Resolved! Remove Duplicate rows in tables

HelloI've seen posts that show how to remove duplicates, something like this:MERGE into [deltatable] as targetUSING ( select *, ROW_NUMBER() OVER (Partition By [primary keys] Order By [date] desc) as rn from [deltatable] qualify rn> 1 ) as sourceON ...

Data Engineering

13329 Views
4 replies
2 kudos

09-23-2024 3:07:54 AM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

09-23-2024 7:43:07 AM

2 kudos

Hi @dpc ,if you like using SQL:1. Test data:# Sample data data = [("1", "A"), ("1", "A"), ("2", "B"), ("2", "B"), ("3", "C")] # Create DataFrame df = spark.createDataFrame(data, ["id", "value"]) # Write to Delta table df.write.format("delta").mode(...

2 kudos

09-23-2024 7:43:07 AM

3 More Replies

by 397973 • New Contributor III

09-25-2024 6:38:44 AM

1183 Views
1 replies
0 kudos

First time to see "Databricks is experiencing heavy load" message. What does it mean really?

Hi, I just went to run a Databricks pyspark notebook and saw this message:This is a notebook I've run before but never saw this. Is it referring to my cluster? The Databricks infrastructure? My notebook ran normally, just wondering though. Google sea...

Data Engineering

1183 Views
1 replies
0 kudos

09-25-2024 6:38:44 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-25-2024 6:51:14 AM

0 kudos

never saw that message, but my guess it is not your cluster but the Databricks platform in your region.status.databricks.com perhaps has some info.

0 kudos

09-25-2024 6:51:14 AM

by MustangR • New Contributor

02-16-2024 2:39:51 PM

2491 Views
2 replies
0 kudos

Delta Table Upsert fails when source attributes are missing

Hi All,I am trying to merge a json to delta table. Since the Json is basically from MongoDB which does not have a schema, there are chances of having missing attributes expected by delta table schema validation. Schema Evolution is enabled as well. H...

Data Engineering

2491 Views
2 replies
0 kudos

02-16-2024 2:39:51 PM

View Replies

Latest Reply

JohnM256
New Contributor II

09-25-2024 6:43:44 AM

0 kudos

How do I set Existing Optional Columns?

0 kudos

09-25-2024 6:43:44 AM

1 More Replies

by Paul_Poco • New Contributor II

05-08-2023 9:43:13 AM

78770 Views
5 replies
6 kudos

Asynchronous API calls from Databricks

Hi, I have to send thousands of API calls from a Databricks notebook to an API to retrieve some data. Right now, I am using a sequential approach using the python request package. As the performance is not acceptable anymore, I need to send my API c...

Data Engineering

78770 Views
5 replies
6 kudos

05-08-2023 9:43:13 AM

View Replies

Latest Reply

adarsh8304
New Contributor II

09-25-2024 5:46:05 AM

6 kudos

Hey @Paul_Poco what about using the processpoolexecutor or threadypoolexecutor from the concurrent.futures module ? have u tried them or not . ?

6 kudos

09-25-2024 5:46:05 AM

4 More Replies

Databricks Community

Forum Posts

Resolved! Unable to read Unity Catalog schema

Resolved! SELECT from VIEW to CREATE a table or view

OAuth U2M AWS Token Failure

Resolved! 404 on GET Billing usage data (API)

Change datetime format from one to another without changing datatype in databricks sql

Autoloader configuration for multiple tables from the same directory

Autoloader not ingesting all file data into Delta Table from Azure Blob Container

How to set file size for MERGE

Will MERGE incur a lot driver memory

Resolved! Trigger a workflow from a different databricks environment

Loading data from spark dataframe directly to Sharepoint

Resolved! Remove Duplicate rows in tables

First time to see "Databricks is experiencing heavy load" message. What does it mean really?

Delta Table Upsert fails when source attributes are missing

Asynchronous API calls from Databricks

Join Us as a Local Community Builder!

how to avoid extra column after retry upon Unknown...

user standard serverless with asset bundle on Azur...

ONLY PNG format is available for databricks dashbo...

How to create a Unity Catalog Connection to SQL Se...

remote_query() is not working