cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

vdeorios
by New Contributor II
  • 5123 Views
  • 5 replies
  • 2 kudos

Resolved! 404 on GET Billing usage data (API)

I'm trying to get my billing usage data from Databricks API (documentation: https://docs.databricks.com/api/gcp/account/billableusage/download) but I keep getting an 404 error.Code:import requestsimport jsontoken = dbutils.notebook.entry_point.getDbu...

  • 5123 Views
  • 5 replies
  • 2 kudos
Latest Reply
Dave_Nithio
Contributor II
  • 2 kudos

Bumping this to see if there is a solution. Per Databricks basic authentication is no longer allowed. I am unable to authenticate to get access to this endpoint (401 error). Does anyone have a solution to querying this endpoint?

  • 2 kudos
4 More Replies
richakamat130
by New Contributor
  • 1710 Views
  • 4 replies
  • 2 kudos

Change datetime format from one to another without changing datatype in databricks sql

Change datetime"2002-01-01T00:00:00.000" to 'MM/dd/yyyy HH:mm:ss' format without changing datatype/ having it in datetime data type

  • 1710 Views
  • 4 replies
  • 2 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 2 kudos

Hi @Mister-Dinky ,As @szymon_dybczak if you have a datetime, then you have a datetime.What you see is just a format of defined in the Databricks UI. Other applications may display it differently depending on the defaults, regional formats etc.If you ...

  • 2 kudos
3 More Replies
ChrisLawford_n1
by Contributor
  • 3414 Views
  • 3 replies
  • 1 kudos

Autoloader configuration for multiple tables from the same directory

I would like to get a recommendation on how to structure ingestion of lots of tables of data. I am using autoloader currently with the directory searching mode.I have concerns about performance in the future and have a requirement to ensure that data...

  • 3414 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

there is an easier way to see what has been processed:SELECT * FROM cloud_files_state('path/to/checkpoint'https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/production.html 

  • 1 kudos
2 More Replies
KristiLogos
by Contributor
  • 1461 Views
  • 2 replies
  • 0 kudos

Autoloader not ingesting all file data into Delta Table from Azure Blob Container

I have done the following, ie. crate a Delta Table where I plan to load the Azure Blob Container files that are .json.gz files: df = spark.read.option("multiline", "true").json(f"{container_location}/*.json.gz")  DeltaTable.create(spark) \    .addCol...

  • 1461 Views
  • 2 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

If it's streaming data, space it out with 10 seconds trigger .trigger(processingTime="10 seconds")   Do all the JSON files have the same schema? As your table creation is dynamic (df.schema), if all JSON doesn't have the same schema they may be skipp...

  • 0 kudos
1 More Replies
Brad
by Contributor II
  • 941 Views
  • 1 replies
  • 0 kudos

How to set file size for MERGE

Hi team,I use MERGE to merge source to target table. Source is incremental reading with checkpoint on delta table. Target is delta table without any partition. If the table is empty, with spark.databricks.delta.optimizeWrite.enabled it can create fil...

  • 941 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @Brad ,There are a couple of considerations here, the main being your runtime version and also whether you are using unit catalog.Check this document:https://docs.databricks.com/en/delta/tune-file-size.html

  • 0 kudos
Brad
by Contributor II
  • 1438 Views
  • 3 replies
  • 0 kudos

Will MERGE incur a lot driver memory

Hi team,We have a job to run MERGE on a target table with around 220 million rows. We found it needs a lot driver memory (just for MERGE itself). From the job metrics we can see the MERGE needs at least 46GB memory. Is there some special thing to mak...

  • 1438 Views
  • 3 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @Brad ,Could you try to apply very standard optimization practices and check the outcome:1. If your runtime is greater equal 15.2, could you implement liquid clustering on the source and target tables using JOIN columns?ALTER TABLE <table_name> CL...

  • 0 kudos
2 More Replies
hcord
by New Contributor II
  • 1715 Views
  • 1 replies
  • 2 kudos

Resolved! Trigger a workflow from a different databricks environment

Hello everyone,In the company I work we have a lot of different databricks environments and now we're in need of deeper integration of processes from environment's X and Y. There's a workflow in Y that runs a process that when finished we would like ...

  • 1715 Views
  • 1 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @hcord ,You can use REST API in the last task to trigger a workflow in different workspace

  • 2 kudos
sshynkary
by New Contributor
  • 3049 Views
  • 1 replies
  • 0 kudos

Loading data from spark dataframe directly to Sharepoint

Hi guys!I am trying to load data directly from PySpark dataframe to Sharepoint folder and I cannot find a solution regarding it.I wanted to implement workaround using volumes and logic apps, but there are few issues. I need to partition df in a few f...

Data Engineering
SharePoint
spark
  • 3049 Views
  • 1 replies
  • 0 kudos
Latest Reply
ChKing
New Contributor II
  • 0 kudos

One approach could involve using Azure Data Lake as an intermediary. You can partition your PySpark DataFrames and load them into Azure Data Lake, which is optimized for large-scale data storage and integrates well with PySpark. Once the data is in A...

  • 0 kudos
dpc
by Contributor II
  • 14726 Views
  • 4 replies
  • 2 kudos

Resolved! Remove Duplicate rows in tables

HelloI've seen posts that show how to remove duplicates, something like this:MERGE into [deltatable] as targetUSING ( select *, ROW_NUMBER() OVER (Partition By [primary keys] Order By [date] desc) as rn  from [deltatable] qualify rn> 1 ) as sourceON ...

  • 14726 Views
  • 4 replies
  • 2 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 2 kudos

Hi @dpc ,if you like using SQL:1. Test data:# Sample data data = [("1", "A"), ("1", "A"), ("2", "B"), ("2", "B"), ("3", "C")] # Create DataFrame df = spark.createDataFrame(data, ["id", "value"]) # Write to Delta table df.write.format("delta").mode(...

  • 2 kudos
3 More Replies
397973
by New Contributor III
  • 1322 Views
  • 1 replies
  • 0 kudos

First time to see "Databricks is experiencing heavy load" message. What does it mean really?

Hi, I just went to run a Databricks pyspark notebook and saw this message:This is a notebook I've run before but never saw this. Is it referring to my cluster? The Databricks infrastructure? My notebook ran normally, just wondering though. Google sea...

397973_0-1727271218117.png
  • 1322 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

never saw that message, but my guess it is not your cluster but the Databricks platform in your region.status.databricks.com perhaps has some info.

  • 0 kudos
MustangR
by New Contributor
  • 2648 Views
  • 2 replies
  • 0 kudos

Delta Table Upsert fails when source attributes are missing

Hi All,I am trying to merge a json to delta table. Since the Json is basically from MongoDB which does not have a schema, there are chances of having missing attributes expected by delta table schema validation. Schema Evolution is enabled as well. H...

  • 2648 Views
  • 2 replies
  • 0 kudos
Latest Reply
JohnM256
New Contributor II
  • 0 kudos

How do I set Existing Optional Columns?

  • 0 kudos
1 More Replies
Paul_Poco
by New Contributor II
  • 79764 Views
  • 5 replies
  • 6 kudos

Asynchronous API calls from Databricks

Hi, ​I have to send thousands of API calls from a Databricks notebook to an API to retrieve some data. Right now, I am using a sequential approach using the python request package. As the performance is not acceptable anymore, I need to send my API c...

  • 79764 Views
  • 5 replies
  • 6 kudos
Latest Reply
adarsh8304
New Contributor II
  • 6 kudos

Hey @Paul_Poco what about using the processpoolexecutor or threadypoolexecutor from the concurrent.futures module ? have u tried them or not . ?  

  • 6 kudos
4 More Replies
priyansh
by New Contributor III
  • 2932 Views
  • 3 replies
  • 0 kudos

How Photon Acceleration Actually works?

Hey folks!I would like to know that how photon acceleration actually works, I have tested it on a sample of 219MB, 513MB, 2.7 GB, 4.1 GB of Data and the difference in seconds between normal and photon accelerated compute was not so much, So my questi...

image (4).png
  • 2932 Views
  • 3 replies
  • 0 kudos
Latest Reply
arch_db
New Contributor III
  • 0 kudos

Try to check merge operation on tables over 200GB.

  • 0 kudos
2 More Replies
EricCournarie
by New Contributor III
  • 1385 Views
  • 2 replies
  • 0 kudos

Metadata on a prepared statement return upper case column names

Hello,Using the JDBC Driver , when I check the metadata of a prepared statement, the column names names are all uppercase . This does not happen when running a DESCRIBE on the same select. Any properties to set , or it is a known issue ? or a workaro...

  • 1385 Views
  • 2 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

Looks like a bug. Can you try using double quotes?  SELECT "ColumnName" instead of backticks?   

  • 0 kudos
1 More Replies
shsalami
by New Contributor III
  • 1776 Views
  • 2 replies
  • 0 kudos

Sample streaming table is failed

Running the following databricks sample code in the pipeline: CREATE OR REFRESH STREAMING TABLE customersAS SELECT * FROM cloud_files("/databricks-datasets/retail-org/customers/", "csv") I got error:org.apache.spark.sql.catalyst.ExtendedAnalysisExcep...

  • 1776 Views
  • 2 replies
  • 0 kudos
Latest Reply
shsalami
New Contributor III
  • 0 kudos

There is no table with that name.Also, in that folder just the following file exists:dbfs:/databricks-datasets/retail-org/customers/customers.csv

  • 0 kudos
1 More Replies
Labels