cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

raduq
by Contributor
  • 36560 Views
  • 10 replies
  • 12 kudos

How to efficiently process a 50Gb JSON file and store it in Delta?

Hi, I'm a fairly new user and I am using Azure Databricks to process a ~50Gb JSON file containing real estate data. I uploaded the JSON file to Azure Data Lake Gen2 storage and read the JSON file into a dataframe.df = spark.read.option('multiline', '...

image image image
  • 36560 Views
  • 10 replies
  • 12 kudos
Latest Reply
Renzer
New Contributor II
  • 12 kudos

The spark connector is super slow. I found loading json into Azure cosmos dB then writing queries to get sections of data out was 25x times faster because cosmos dB indexes the json. You can stream read data from cosmosdb. You can find python code sn...

  • 12 kudos
9 More Replies
Fredolebeau80
by New Contributor II
  • 1463 Views
  • 2 replies
  • 1 kudos

Refresh delta

How refresh delta table with New raw from CDC Json file. 

  • 1463 Views
  • 2 replies
  • 1 kudos
Latest Reply
Vinay_M_R
Databricks Employee
  • 1 kudos

To refresh a delta table with new raw data from a CDC JSON file, you can use change data capture (CDC) to update tables based on changes in source data. Here are the steps:1. Create a streaming table using the CREATE OR REFRESH STREAMING TABLE statem...

  • 1 kudos
1 More Replies
Manasi_Sarang
by New Contributor II
  • 4688 Views
  • 4 replies
  • 1 kudos

Facing issue while creating Delta Live Table on top of csv file

Hello Everyone,I am trying to create Delta Live Table on top of csv file using below syntax:CREATE OR REFRESH LIVE TABLE employee_bronze_dltCOMMENT "The bronze employee dataset, ingested from /mnt/lakehouse/PoC/DLT/Source/."AS SELECT * FROM csv.`/mnt...

image
  • 4688 Views
  • 4 replies
  • 1 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 1 kudos

Hi @Manasi_Sarang ,I believe the Delta is unable to infer the schema as you are using select statement to read entire content from csv file and I think the inferschema won't work here.  Instead you can try to create a temp live table or live view wit...

  • 1 kudos
3 More Replies
Anonymous
by Not applicable
  • 2920 Views
  • 2 replies
  • 0 kudos

INTERNAL ERROR

I have the following query;select  customer_id,    first(if(name_type = 'Official', name, null),true) official_name,    first(if(name_type = 'Preferred', name, null),true) preferred_namefrom(    select  customer_id,        ifnull(name_type, 'Official...

  • 2920 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

I experienced similar issues from time to time.  What helped is to refresh the browser page.If that does not work, restart the sql warehouse.The internal error indeed is pretty vague, but my experience is that this is not related to a wrong SQL scrip...

  • 0 kudos
1 More Replies
Zhudocode
by New Contributor II
  • 12709 Views
  • 1 replies
  • 2 kudos

Resolved! Difference between using DBT and data bricks's lineage toolol

So my team is using DBT for a lot of data lineage items but then at the data summit it was shown that data bricks also has a similar tool that is in fact better because it does lineage on columns. So what's the main draw of DBT at this point?

  • 12709 Views
  • 1 replies
  • 2 kudos
Latest Reply
Dk_1802
New Contributor III
  • 2 kudos

DBT (Data Build Tool) remains popular for its extensive templating capabilities, modularity, and open-source nature, which allows for customization and integration with various data platforms. While Databricks may offer more advanced lineage features...

  • 2 kudos
Furro33
by New Contributor
  • 602 Views
  • 0 replies
  • 0 kudos

2023 summit feedback

Event covered everything a data engineer would dream of.My favorite discussions:- SparkConnect- AI on top unity catalog- delta live tables pipelines for streaming #Summit23 

  • 602 Views
  • 0 replies
  • 0 kudos
Atius
by New Contributor
  • 409 Views
  • 0 replies
  • 0 kudos

Expo experience

Great partners and SaaS solutions to jump start on floor 

  • 409 Views
  • 0 replies
  • 0 kudos
Sappy
by New Contributor
  • 457 Views
  • 0 replies
  • 0 kudos

Delta sharing

What are the prerequisites for enabling delta sharing between multiple cloud dwh 

  • 457 Views
  • 0 replies
  • 0 kudos
Deepeshn1988
by New Contributor
  • 519 Views
  • 0 replies
  • 0 kudos

Databricks summit

Data+AI event was too good. Kudos to everyone involved!here’s a glimpse â€ƒ

85BE9AA8-756B-4A72-9938-62DF667BA3C7.jpeg
Data Engineering
dataaisummit
Databricks
  • 519 Views
  • 0 replies
  • 0 kudos
acho
by New Contributor II
  • 631 Views
  • 0 replies
  • 0 kudos

Data+AI Summit 2023

It was a great opportunity to learn more about DataBrick's products, especially the Delta Lake which I am very interested in. Also, other great features are worth a while to research into to see if our company can get any benefits from. General exper...

Data Engineering
DataAISummit2023
  • 631 Views
  • 0 replies
  • 0 kudos
Chevron
by New Contributor II
  • 563 Views
  • 0 replies
  • 0 kudos

Databrick Summit

Enjoyed all KEYNOTES and super excited about LakehouseIQ can bring Data Engineers in future

  • 563 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels