cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Kunal_Mishra
by New Contributor III
  • 1107 Views
  • 2 replies
  • 3 kudos

Read Geojson file using Sedona Context in Databricks

Hi Everyone,I am trying to read a geojson file in Databricks using the following syntax as mentioned in the apache sedona official docs Load Geojson Data I am using Sedon 1.6.1 Version which supports this feature but i am getting an error as mentione...

  • 1107 Views
  • 2 replies
  • 3 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 3 kudos

Hi @Kunal_Mishra ,The error you are experiencing with Sedona when trying to read a GeoJSON file in Databricks (java.lang.NoSuchMethodError) often indicates a compatibility issue between the Spark version you're using and the Sedona library.Sedona has...

  • 3 kudos
1 More Replies
zmsoft
by Contributor
  • 526 Views
  • 2 replies
  • 0 kudos

How to load single line mode json file?

Hi there,The activity log store in adls gen2 container is a single line mode json file.How to load single line mode json file, save data to delta table? Thanks & Regards,zmsoft

  • 526 Views
  • 2 replies
  • 0 kudos
Latest Reply
Panda
Valued Contributor
  • 0 kudos

@zmsoft Since the JSON is a single-line file, ensure it is being read correctly. Try setting the multiLine option to false (it defaults to false, but explicitly setting it ensures correct handling). stageDf = ( spark.read.format("json") .opti...

  • 0 kudos
1 More Replies
SakuraDev1
by New Contributor II
  • 551 Views
  • 2 replies
  • 1 kudos

Resolved! what api calls does autoloader make on s3?

Hey guys I'm trying to find the estimate for an ingestion pipeline that uses autoloader on an S3 bucket every 2 minutes.I found the pricing for s3 bucket api consumption but I am not certain what api calls will autoloader make.Talking to chatGPT it t...

  • 551 Views
  • 2 replies
  • 1 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 1 kudos

Hi @SakuraDev1 ,LIST and GET make sense.How autoloader works is monitors a specified location and then if the new file is discovered, it is being processed to bronze table.So a LIST request is needed to check the files in the source directory, and th...

  • 1 kudos
1 More Replies
TheBeacon
by New Contributor
  • 575 Views
  • 1 replies
  • 0 kudos

Exploring Postman Alternatives for API Testing in VSCode?

Has anyone here explored Postman alternatives within VSCode? I’ve seen mentions of Thunder Client and Apidog. Would love to know if they offer a smoother integration or better functionality.

  • 575 Views
  • 1 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Valued Contributor II
  • 0 kudos

HiI'm using ThunderClient as VS Code extension: https://www.thunderclient.com/ The functionality in the Free version is okay. If you want more features, there are plans for some bucks. 

  • 0 kudos
Rutuja_3641
by New Contributor
  • 396 Views
  • 1 replies
  • 0 kudos

Mongo server to Delta Live Tables

I want to fetch data from mongodb server and then show that in delta live table in gcp.

  • 396 Views
  • 1 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Valued Contributor II
  • 0 kudos

Hi @Rutuja_3641 Have a look here: https://docs.databricks.com/en/connect/external-systems/mongodb.htmlI think you can easily adapt the code to DLT

  • 0 kudos
RyoAriyama
by New Contributor II
  • 1129 Views
  • 2 replies
  • 0 kudos

Can't create table in unity catalog.

Hi all.I have created a Databricks workspace on AWS. I can log into the workspace and successfully perform select operations on files in S3, but I am unable to create tables.The error when creating the table is as follows. "Your request failed with s...

  • 1129 Views
  • 2 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Valued Contributor II
  • 0 kudos

Hi RyoCan you share the code, how you try to create the table? 

  • 0 kudos
1 More Replies
ChristianRRL
by Valued Contributor
  • 645 Views
  • 1 replies
  • 0 kudos

Databricks "Preferred" Approaches To Backfilling Single Column In Wide Tables

Hi there,I've tried thinking through this and googling as well, but I'm not sure if there's a better approach that I might be missing. We have *wide* tables with hundreds of columns, and on a day-to-day basis these tables are incrementally filled in ...

  • 645 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @ChristianRRL ,If I understand correctly you have an API to get historical data for a column.If yes, you can use MERGE clause.You will join the by key columns with the target table to backfill and when there is a match, then you will UPDATE SET ta...

  • 0 kudos
RKNutalapati
by Valued Contributor
  • 4423 Views
  • 5 replies
  • 4 kudos

Read and saving Blob data from oracle to databricks S3 is slow

I am trying to import a table from oracle which has around 1.3 mill rows and one of the column is a Blob, the total size of data on oracle is around 250+ GB. read and save to S3 as delta table is taking around 60 min. I tried with parallel(200 thread...

  • 4423 Views
  • 5 replies
  • 4 kudos
Latest Reply
vinita_mehta
New Contributor II
  • 4 kudos

Any update on this topic what should be the best option to read from oracle and write in ADLS. 

  • 4 kudos
4 More Replies
NLearn
by New Contributor II
  • 694 Views
  • 1 replies
  • 0 kudos

Lakehouse monitoring

I created snapshot and time series lake house monitoring on 2 different tables. After execution, metrics table got created and dashboard also got created but monitoring data is not populating in dashboard and metrics tables even after refresh of moni...

  • 694 Views
  • 1 replies
  • 0 kudos
Latest Reply
mandree
New Contributor II
  • 0 kudos

Do you refreshed the metrics?Table -> Quality -> Refresh metrics

  • 0 kudos
Volker
by New Contributor III
  • 1961 Views
  • 5 replies
  • 0 kudos

Structured Streaming: How to handle Schema Changes in source and target Delta Table

Hey Community,we have a streaming pipeline, starting with autoloader to ingest data into a bronze table and this data gets then picked up by another streaming job that transforms this data and writes into a silver table.Now there are some schema chan...

  • 1961 Views
  • 5 replies
  • 0 kudos
Latest Reply
Volker
New Contributor III
  • 0 kudos

I adjusted the schema in both bronze and silver, such that I do not need schema evolution. The problem is that the DataStreamReader does not pick up the schema changes in bronze.I already figured out that it has something to do with providing also a ...

  • 0 kudos
4 More Replies
EricCournarie
by New Contributor III
  • 478 Views
  • 1 replies
  • 0 kudos

Resolved! With the JDBC Driver, ResultSet metadata gives the column type as column name

Hello,Using the JDBC Driver , when I check the metadata of a result set, when retrieving the column name, it's the column type that is given . Here what it looks like in the meta data . Using version 2.6.40, with the query SELECT "id" string, "year" ...

Screenshot 2024-10-16 at 11.28.36.png
  • 478 Views
  • 1 replies
  • 0 kudos
Latest Reply
EricCournarie
New Contributor III
  • 0 kudos

forget about it, the select clause is wrong of course .. too tired I guess sorry

  • 0 kudos
Vladif1
by New Contributor II
  • 8023 Views
  • 8 replies
  • 1 kudos

Error when reading delta lake files with Auto Loader

Hi,When reading Delta Lake file (created by Auto Loader) with this code: df = (    spark.readStream    .format('cloudFiles')    .option("cloudFiles.format", "delta")    .option("cloudFiles.schemaLocation", f"{silver_path}/_checkpoint")    .load(bronz...

  • 8023 Views
  • 8 replies
  • 1 kudos
Latest Reply
Panda
Valued Contributor
  • 1 kudos

@Vladif1 The error occurs because the cloudFiles format in Auto Loader is meant for reading raw file formats like CSV, JSON ... for ingestion for more Format Support. For Delta tables, you should use the Delta format directly. #Sample Example bronze...

  • 1 kudos
7 More Replies
dataslicer
by Contributor
  • 2831 Views
  • 3 replies
  • 1 kudos

How to export/clone Databricks Notebook without results via web UI?

When a Databricks Notebook exceeds size limit, it suggests to `clone/export without results`.  This is exactly what I want to do, but the current web UI does not provide the ability to bypass/skip the results in either the `clone` or `export` context...

  • 2831 Views
  • 3 replies
  • 1 kudos
Latest Reply
dataslicer
Contributor
  • 1 kudos

Thank you @Yeshwanth for the response. I am looking for a way without clearing up the current outputs. This is necessary because I want to preserve the existing outputs and fork off another notebook instance to run with few parameter changes and come...

  • 1 kudos
2 More Replies
dbx_687_3__1b3Q
by New Contributor III
  • 13475 Views
  • 10 replies
  • 4 kudos

Resolved! Databricks Asset Bundle (DAB) from a Git repo?

My earlier question was about creating a Databricks Asset Bundle (DAB) from an existing workspace. I was able to get that working but after further consideration and some experimenting, I need to alter my question. My question is now "how do I create...

  • 13475 Views
  • 10 replies
  • 4 kudos
Latest Reply
mflyingget
New Contributor II
  • 4 kudos

How can i deploy a custom Gitrepo to the .bundle workshapce?

  • 4 kudos
9 More Replies
shadowinc
by New Contributor III
  • 6795 Views
  • 7 replies
  • 4 kudos

Resolved! Databricks SQL endpoint as Linked Service in Azure Data Factory

We have a special endpoint that grants access to delta tables and we want to know if we can use SQL endpoints as a linked service in ADF.If yes then which ADF-linked service would be suitable for this?Appreciate your support on this. 

Data Engineering
SQl endpoint
  • 6795 Views
  • 7 replies
  • 4 kudos
Latest Reply
yashrg
New Contributor III
  • 4 kudos

Azure Databricks Delta Lake (Dataset) uses a linked service that can only connect to a All Purpose/Interactive cluster.If you want to use the SQL Endpoint, you would need a Self Hosted Integration Runtime for ADF with Databricks ODBC driver Installed...

  • 4 kudos
6 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels