cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Prajwal_082
by New Contributor II
  • 968 Views
  • 0 replies
  • 0 kudos

DLT apply_changes_from_snapshot

Is there way to get auto incremented value for next version parameter like next version = previous version + 1 in that case how to get the previous version value.Below code is from documentation "Historical snapshot processing". The APPLY CHANGES API...

Prajwal_082_0-1721887681729.png
  • 968 Views
  • 0 replies
  • 0 kudos
aalanis
by New Contributor II
  • 1531 Views
  • 3 replies
  • 2 kudos

Issues reading json files with databricks vs oss pyspark

Hi Everyone, I'm currently developing an application in which I read json files with nested structure. I developed my code locally on my laptop using the opensource version of pyspark (3.5.1) using a similar code to this:sample_schema:schema = Struct...

  • 1531 Views
  • 3 replies
  • 2 kudos
Latest Reply
sushmithajk
New Contributor II
  • 2 kudos

Hi, I'd like to try the scenario and find a solution. Would you mind sharing a sample file? 

  • 2 kudos
2 More Replies
Splush
by New Contributor II
  • 2691 Views
  • 1 replies
  • 1 kudos

Resolved! Row Level Security while streaming data with Materialized views

Hey,I have the following problem when trying to add row level security to some of our Materialized views. According to the documentation this feature is still in preview - nevertheless, I try to understand why this doesnt work and how it would be sup...

  • 2691 Views
  • 1 replies
  • 1 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 1 kudos

Hello @Splush, There are 2 ways to create materialized views at the current moment:1. Through Databricks SQL: Use Materialized Views in Databricks SQL. These are the limitations. 2. Through DLT: Materialized View (DLT). All DLT tables are subject to ...

  • 1 kudos
robertkoss
by New Contributor III
  • 3663 Views
  • 7 replies
  • 1 kudos

Autoloader Schema Hint are not taken into consideration in schema file

I am using Autoloader with Schema Inference to automatically load some data into S3.I have one column that is a Map which is overwhelming Autoloader (it tries to infer it as struct -> creating a struct with all keys as properties), so I just use a sc...

  • 3663 Views
  • 7 replies
  • 1 kudos
Latest Reply
Witold
Honored Contributor
  • 1 kudos

Sorry, I didn't mean that your solution is poorly designed. I was only referring to the one of the main definitions of your bronze layer: You want to have a defined and optimized data layout, which is  source-driven at the same time. In other words: ...

  • 1 kudos
6 More Replies
phanindra
by New Contributor III
  • 6028 Views
  • 3 replies
  • 5 kudos

Resolved! Support for Varchar data type

In the official documentation for supported data types, Varchar is not listed. But in the product, we are allowed to create a field of varchar data type. We are building an integration with Databricks and we are confused if we should support operatio...

  • 6028 Views
  • 3 replies
  • 5 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 5 kudos

Hi @phanindra ,To be precise, Delta Lake format is based on parquet files. For strings, Parquet only has one data type: StringTypeSo, basically varchar(n) data type under the hood is represented as string with check constraint on the length of the st...

  • 5 kudos
2 More Replies
MichTalebzadeh
by Valued Contributor
  • 767 Views
  • 0 replies
  • 1 kudos

Navigating the Future: Key Questions for Implementing Production-Quality GenAI

"The Big Book of Generative AI from Databricks"https://lnkd.in/dR4VuEyQprovides a comprehensive guide to understanding and implementing Generative AI (GenAI) effectively within an enterprise. Here are some critical questions that businesses should co...

Data Engineering
FinCrime
GenAI ArtificialIntelligence DataScience AI MachineLearning Innovation Technology EnterpriseAI
  • 767 Views
  • 0 replies
  • 1 kudos
sanjay
by Valued Contributor II
  • 16063 Views
  • 13 replies
  • 10 kudos

Spark tasks too slow and not doing parellel processing

Hi,I have spark job which is processing large data set, its taking too long to process the data. In Spark UI, I can see its running 1 tasks out of 9 tasks. Not sure how to run this in parellel. I have already mentioned auto scaling and providing upto...

  • 16063 Views
  • 13 replies
  • 10 kudos
Latest Reply
plondon
New Contributor II
  • 10 kudos

Will it be any different if using Spark but within Azure, i.e. faster? 

  • 10 kudos
12 More Replies
Yyyyy
by New Contributor III
  • 1987 Views
  • 3 replies
  • 2 kudos

showing only a limited number of lines from the CSV file

Expected no of lines is - 16400 Showing only 20 No of records Script spark.conf.set(     "REDACTED",     "REDACTED" ) # File location file_location = "REDACTED" # Read in the data to dataframe df df = spark.read.format("CSV").option("inferSchema",...

  • 1987 Views
  • 3 replies
  • 2 kudos
Latest Reply
Yyyyy
New Contributor III
  • 2 kudos

 hi, pls look help mespark.conf.set(    "REDACTED",    "REDACTED")# File locationfile_location = "REDACTED"# Read in the data to dataframe dfdf = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",")...

  • 2 kudos
2 More Replies
stefano0929
by New Contributor II
  • 552 Views
  • 0 replies
  • 0 kudos

Error 301 Moved Permanently in cells of plotting

Hi, I created a workbook for academic purposes and had completed it... from one moment to the next all the plot cells of charts (and only those) started returning the following error and I really don't know how to solve it by today.Failed to store th...

  • 552 Views
  • 0 replies
  • 0 kudos
Bhabs
by New Contributor
  • 730 Views
  • 1 replies
  • 0 kudos

Replace one tag in a Jason file in the data bricks table .

 There is a column (src_json) in emp_table . I need to replace (ages to age )in each json in the src_json column in emp_table.Can you pls suggest the best way to do it .

  • 730 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Bhabs ,You can do it in following way (assuming that src_json contains json string):from pyspark.sql import SparkSession from pyspark.sql.functions import col, expr spark = SparkSession.builder.appName("Replace JSON Keys").getOrCreate() data = ...

  • 0 kudos
Olaoye_Somide
by New Contributor III
  • 1715 Views
  • 1 replies
  • 0 kudos

AutoLoader File Notification Setup on AWS

I’m encountering issues setting up Databricks AutoLoader in File Notification mode. The error seems to be related to UC access to the S3 bucket. I have tried running it on a single-node dedicated cluster but no luck.Any guidance or assistance on reso...

  • 1715 Views
  • 1 replies
  • 0 kudos
Latest Reply
Olaoye_Somide
New Contributor III
  • 0 kudos

Thanks @Retired_mod. I have reviewed all the steps mentioned, including the IAM policy, as per the setup guide. I believe the permissions granted are sufficient for the setup.To validate the permissions, I used IAM credentials with Admin privileges i...

  • 0 kudos
Sudharsan24
by New Contributor II
  • 1998 Views
  • 2 replies
  • 2 kudos

Job aborted stage failure java.sql.SQLRecoverableException: IO Error: Connection reset by peer

While ingesting data from Oracle to databricks(writing into ADLS) using jdbc I am getting connection reset by peer error when ingesting a large table which has millions of rows.I am using oracle sql developer and azure databricks.I tried every way li...

  • 1998 Views
  • 2 replies
  • 2 kudos
Latest Reply
Rishabh-Pandey
Databricks MVP
  • 2 kudos

Try using this code .import pyspark from pyspark.sql import SparkSession # Initialize Spark session spark = SparkSession.builder.appName("OracleToDatabricks").getOrCreate() # Oracle connection properties conn = "jdbc:oracle:thin:@//<host>:<port>/<s...

  • 2 kudos
1 More Replies
Mehdi-LAMRANI
by New Contributor II
  • 6714 Views
  • 2 replies
  • 2 kudos

Resolved! Upload file from local file system to DBFS (2024)

Recent changes to the worskpace UI (and introduction of Unity Catalog) seem to have discretely sunset the ability to upload data directly to DBFS from the local Filesystem using the UI (NOT the CLI)I want to be able to load a raw file (no matter the ...

  • 6714 Views
  • 2 replies
  • 2 kudos
Latest Reply
pavithra
New Contributor III
  • 2 kudos

not working in community edition 

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels