cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

阳光彩虹小白马
by New Contributor
  • 364 Views
  • 3 replies
  • 1 kudos

Databricks overwrite didn't delete previous data

Hi databricks, we met an issue like below picture shows:we use pyspark api to store data into ADLS :df.write.partitionBy("xx").option("partitionOverwriteMode","dynamic").mode("overwrite").parquet(xx)However, not sure why the second time we overwrite ...

_0-1729067185207.png _1-1729067620519.png
  • 364 Views
  • 3 replies
  • 1 kudos
Latest Reply
Panda
Valued Contributor
  • 1 kudos

@阳光彩虹小白马The issue you're encountering seems to involve inconsistent behavior in partition overwrites using PySpark with ADLS.Can you validate the below along with what @Himanshu6 mentioned.Force Spark to refresh the metadata of the data lake director...

  • 1 kudos
2 More Replies
pavansharma36
by New Contributor III
  • 1466 Views
  • 5 replies
  • 1 kudos

Resolved! Databricks Workspace import api size limitation

As libraries from dbfs is deprecated and support is going to be removed. We are moving libs from dbfs to workspace files.But while uploading libraries using https://docs.databricks.com/api/azure/workspace/workspace/import api there seems to be limit ...

  • 1466 Views
  • 5 replies
  • 1 kudos
Latest Reply
NaraKris_40883
New Contributor II
  • 1 kudos

Can we use this API to upload files to /workspace/ location? Any sample curl request ? I am using -X PUT https://<HOST_NAME>/api/2.0/fs/files/Workspace/Shared/jars/all.jar and getting {  "error_code" : "BAD_REQUEST",  "message" : "Invalid path:",  "d...

  • 1 kudos
4 More Replies
Kunal_Mishra
by New Contributor III
  • 620 Views
  • 2 replies
  • 3 kudos

Read Geojson file using Sedona Context in Databricks

Hi Everyone,I am trying to read a geojson file in Databricks using the following syntax as mentioned in the apache sedona official docs Load Geojson Data I am using Sedon 1.6.1 Version which supports this feature but i am getting an error as mentione...

  • 620 Views
  • 2 replies
  • 3 kudos
Latest Reply
filipniziol
Contributor III
  • 3 kudos

Hi @Kunal_Mishra ,The error you are experiencing with Sedona when trying to read a GeoJSON file in Databricks (java.lang.NoSuchMethodError) often indicates a compatibility issue between the Spark version you're using and the Sedona library.Sedona has...

  • 3 kudos
1 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 796 Views
  • 6 replies
  • 2 kudos

Databricks Job cluster for continuous run

Hi AllI am having situation where I wanted to run job as continuous trigger by using job cluster, cluster terminating and re-creating in every run within continuous trigger.I just wanted two know if we have any option where I can use same job cluster...

AjayPandey_0-1728973783760.png
  • 796 Views
  • 6 replies
  • 2 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 2 kudos

@Ajay-Pandey cant we achieve the similar functionalities with the help of cluster Pools , why don't you try cluster pools.

  • 2 kudos
5 More Replies
zmsoft
by New Contributor III
  • 355 Views
  • 2 replies
  • 0 kudos

How to load single line mode json file?

Hi there,The activity log store in adls gen2 container is a single line mode json file.How to load single line mode json file, save data to delta table? Thanks & Regards,zmsoft

  • 355 Views
  • 2 replies
  • 0 kudos
Latest Reply
Panda
Valued Contributor
  • 0 kudos

@zmsoft Since the JSON is a single-line file, ensure it is being read correctly. Try setting the multiLine option to false (it defaults to false, but explicitly setting it ensures correct handling). stageDf = ( spark.read.format("json") .opti...

  • 0 kudos
1 More Replies
SakuraDev1
by New Contributor II
  • 381 Views
  • 2 replies
  • 1 kudos

Resolved! what api calls does autoloader make on s3?

Hey guys I'm trying to find the estimate for an ingestion pipeline that uses autoloader on an S3 bucket every 2 minutes.I found the pricing for s3 bucket api consumption but I am not certain what api calls will autoloader make.Talking to chatGPT it t...

  • 381 Views
  • 2 replies
  • 1 kudos
Latest Reply
filipniziol
Contributor III
  • 1 kudos

Hi @SakuraDev1 ,LIST and GET make sense.How autoloader works is monitors a specified location and then if the new file is discovered, it is being processed to bronze table.So a LIST request is needed to check the files in the source directory, and th...

  • 1 kudos
1 More Replies
TheBeacon
by New Contributor
  • 273 Views
  • 1 replies
  • 0 kudos

Exploring Postman Alternatives for API Testing in VSCode?

Has anyone here explored Postman alternatives within VSCode? I’ve seen mentions of Thunder Client and Apidog. Would love to know if they offer a smoother integration or better functionality.

  • 273 Views
  • 1 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Contributor III
  • 0 kudos

HiI'm using ThunderClient as VS Code extension: https://www.thunderclient.com/ The functionality in the Free version is okay. If you want more features, there are plans for some bucks. 

  • 0 kudos
Rutuja_3641
by New Contributor
  • 223 Views
  • 1 replies
  • 0 kudos

Mongo server to Delta Live Tables

I want to fetch data from mongodb server and then show that in delta live table in gcp.

  • 223 Views
  • 1 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Contributor III
  • 0 kudos

Hi @Rutuja_3641 Have a look here: https://docs.databricks.com/en/connect/external-systems/mongodb.htmlI think you can easily adapt the code to DLT

  • 0 kudos
RyoAriyama
by New Contributor II
  • 738 Views
  • 2 replies
  • 0 kudos

Can't create table in unity catalog.

Hi all.I have created a Databricks workspace on AWS. I can log into the workspace and successfully perform select operations on files in S3, but I am unable to create tables.The error when creating the table is as follows. "Your request failed with s...

  • 738 Views
  • 2 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Contributor III
  • 0 kudos

Hi RyoCan you share the code, how you try to create the table? 

  • 0 kudos
1 More Replies
ChristianRRL
by Valued Contributor
  • 409 Views
  • 1 replies
  • 0 kudos

Databricks "Preferred" Approaches To Backfilling Single Column In Wide Tables

Hi there,I've tried thinking through this and googling as well, but I'm not sure if there's a better approach that I might be missing. We have *wide* tables with hundreds of columns, and on a day-to-day basis these tables are incrementally filled in ...

  • 409 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Contributor III
  • 0 kudos

Hi @ChristianRRL ,If I understand correctly you have an API to get historical data for a column.If yes, you can use MERGE clause.You will join the by key columns with the target table to backfill and when there is a match, then you will UPDATE SET ta...

  • 0 kudos
elgeo
by Valued Contributor II
  • 27742 Views
  • 5 replies
  • 2 kudos

SQL Stored Procedure in Databricks

Hello. Is there an equivalent of SQL stored procedure in Databricks? Please note that I need a procedure that allows DML statements and not only Select statement as a function provides.Thank you in advance

  • 27742 Views
  • 5 replies
  • 2 kudos
Latest Reply
Biswajit
New Contributor II
  • 2 kudos

I have recently went through a video link from databricks which says its possible. But, when I tried it, it did not worked.https://www.youtube.com/watch?v=f4TxNBfSNqMWas anyone able to create stored procedure in databricks ?

  • 2 kudos
4 More Replies
RKNutalapati
by Valued Contributor
  • 3893 Views
  • 5 replies
  • 4 kudos

Read and saving Blob data from oracle to databricks S3 is slow

I am trying to import a table from oracle which has around 1.3 mill rows and one of the column is a Blob, the total size of data on oracle is around 250+ GB. read and save to S3 as delta table is taking around 60 min. I tried with parallel(200 thread...

  • 3893 Views
  • 5 replies
  • 4 kudos
Latest Reply
vinita_mehta
New Contributor II
  • 4 kudos

Any update on this topic what should be the best option to read from oracle and write in ADLS. 

  • 4 kudos
4 More Replies
NLearn
by New Contributor II
  • 515 Views
  • 1 replies
  • 0 kudos

Lakehouse monitoring

I created snapshot and time series lake house monitoring on 2 different tables. After execution, metrics table got created and dashboard also got created but monitoring data is not populating in dashboard and metrics tables even after refresh of moni...

  • 515 Views
  • 1 replies
  • 0 kudos
Latest Reply
mandree
New Contributor II
  • 0 kudos

Do you refreshed the metrics?Table -> Quality -> Refresh metrics

  • 0 kudos
Volker
by New Contributor III
  • 826 Views
  • 5 replies
  • 0 kudos

Structured Streaming: How to handle Schema Changes in source and target Delta Table

Hey Community,we have a streaming pipeline, starting with autoloader to ingest data into a bronze table and this data gets then picked up by another streaming job that transforms this data and writes into a silver table.Now there are some schema chan...

  • 826 Views
  • 5 replies
  • 0 kudos
Latest Reply
Volker
New Contributor III
  • 0 kudos

I adjusted the schema in both bronze and silver, such that I do not need schema evolution. The problem is that the DataStreamReader does not pick up the schema changes in bronze.I already figured out that it has something to do with providing also a ...

  • 0 kudos
4 More Replies
EricCournarie
by New Contributor III
  • 341 Views
  • 1 replies
  • 0 kudos

Resolved! With the JDBC Driver, ResultSet metadata gives the column type as column name

Hello,Using the JDBC Driver , when I check the metadata of a result set, when retrieving the column name, it's the column type that is given . Here what it looks like in the meta data . Using version 2.6.40, with the query SELECT "id" string, "year" ...

Screenshot 2024-10-16 at 11.28.36.png
  • 341 Views
  • 1 replies
  • 0 kudos
Latest Reply
EricCournarie
New Contributor III
  • 0 kudos

forget about it, the select clause is wrong of course .. too tired I guess sorry

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels