cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

SRK
by Contributor III
  • 3332 Views
  • 5 replies
  • 7 kudos

How to handle schema validation for Json file. Using Databricks Autoloader?

Following are the details of the requirement:1.      I am using databricks notebook to read data from Kafka topic and writing into ADLS Gen2 container i.e., my landing layer.2.      I am using Spark code to read data from Kafka and write into landing...

  • 3332 Views
  • 5 replies
  • 7 kudos
Latest Reply
maddy08
New Contributor II
  • 7 kudos

just to clarify, are you reading kafka and writing into adls in json files? like for each message from kafka is 1 json file in adls ?

  • 7 kudos
4 More Replies
MRTN
by New Contributor III
  • 10417 Views
  • 4 replies
  • 3 kudos

Load CSV files with slightly different schemas

I have a set of CSV files generated by a system, where the schema has evolved over the years. Some columns have been added, and at least one column has been renamed in newer files. Is there any way to elegantly load these files into a dataframe? I ha...

  • 10417 Views
  • 4 replies
  • 3 kudos
Latest Reply
MRTN
New Contributor III
  • 3 kudos

For reference - for anybody struggling with the same issues. All online examples using auto loader are written as one block statement on the form: (spark.readStream.format("cloudFiles") .option("cloudFiles.format", "csv") # The schema location di...

  • 3 kudos
3 More Replies
Chris_Konsur
by New Contributor III
  • 1133 Views
  • 0 replies
  • 2 kudos

Schema supported by Autoloader

We do not want to use schema inference with schema evolution in Autoloader. Instead, we want to apply our schema and use the merge option. Our schema is very complex, with multiple nested following levels. When I apply this schema to Autoloader, it r...

  • 1133 Views
  • 0 replies
  • 2 kudos
noimeta
by Contributor III
  • 3556 Views
  • 4 replies
  • 1 kudos

Apply change data with delete and schema evolution

Hi,Currently, I'm using structure streaming to insert/update/delete to a table. A row will be deleted if value in 'Operation' column is 'deleted'. Everything seems to work fine until there's a new column.Since I don't need 'Operation' column in the t...

  • 3556 Views
  • 4 replies
  • 1 kudos
Latest Reply
User16753725469
Contributor II
  • 1 kudos

please go through this documentation https://docs.delta.io/latest/api/python/index.html

  • 1 kudos
3 More Replies
fshimamoto
by New Contributor III
  • 2257 Views
  • 3 replies
  • 2 kudos

What are the best practices for schema drift using Delta Live tables, in a scenario where the main source is a no sql database and we have a lot of ch...

What are the best practices for schema drift using Delta Live tables, in a scenario where the main source is a no sql database and we have a lot of changes in the schema?​

  • 2257 Views
  • 3 replies
  • 2 kudos
Latest Reply
Vartika
Databricks Employee
  • 2 kudos

Hey there @Fernando Martin​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from...

  • 2 kudos
2 More Replies
ilarsen
by Contributor
  • 868 Views
  • 0 replies
  • 1 kudos

Trouble referencing a column that has been added by schema evolution (Auto Loader with Delta Live Tables)

Hi,I have a Delta Live Tables pipeline, using Auto Loader, to ingest from JSON files. I need to do some transformations - in this case, converting timestamps. Except one of the timestamp columns does not exist in every file. This is causing the DLT p...

  • 868 Views
  • 0 replies
  • 1 kudos
Gapy
by New Contributor II
  • 1476 Views
  • 1 replies
  • 1 kudos

Auto Loader Schema-Inference and Evolution for parquet files

Dear all,will (and when) will Auto Loader also support Schema-Inference and Evolution for parquet files, at this point it is only for JSON and CSV supported if i am not mistaken?Thanks and regards,Gapy

  • 1476 Views
  • 1 replies
  • 1 kudos
Latest Reply
Sandeep
Contributor III
  • 1 kudos

@Gasper Zerak​ , This will be available in near future (DBR 10.3 or later). Unfortunately, we don't have an SLA at this moment.

  • 1 kudos
saninanda
by New Contributor II
  • 9286 Views
  • 7 replies
  • 0 kudos

how to read schema from text file stored in cloud storage

I have file a.csv or a.parquet while creating data frame reading we can explictly define schema with struct type. instead of write the schema in the notebook want to create schema lets say for all my csv i have one schema like csv_schema and stored ...

  • 9286 Views
  • 7 replies
  • 0 kudos
Latest Reply
Nakeman
New Contributor II
  • 0 kudos

@shyampsr big thanks, was searching for the solution almost 3 hours _https://luckycanadian.com/

  • 0 kudos
6 More Replies
Labels