Data Engineering

Forum Posts

Sorted by:

by robertkoss • New Contributor III

07-15-2024 12:30:17 PM

4604 Views
7 replies
1 kudos

Autoloader Schema Hint are not taken into consideration in schema file

I am using Autoloader with Schema Inference to automatically load some data into S3.I have one column that is a Map which is overwhelming Autoloader (it tries to infer it as struct -> creating a struct with all keys as properties), so I just use a sc...

Data Engineering

4604 Views
7 replies
1 kudos

07-15-2024 12:30:17 PM

View Replies

Latest Reply

Witold
Databricks Partner

07-24-2024 6:38:00 AM

1 kudos

Sorry, I didn't mean that your solution is poorly designed. I was only referring to the one of the main definitions of your bronze layer: You want to have a defined and optimized data layout, which is source-driven at the same time. In other words: ...

1 kudos

07-24-2024 6:38:00 AM

6 More Replies

by phanindra • New Contributor III

07-24-2024 5:57:44 AM

7619 Views
3 replies
5 kudos

Resolved! Support for Varchar data type

In the official documentation for supported data types, Varchar is not listed. But in the product, we are allowed to create a field of varchar data type. We are building an integration with Databricks and we are confused if we should support operatio...

Data Engineering

7619 Views
3 replies
5 kudos

07-24-2024 5:57:44 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

07-24-2024 6:00:37 AM

5 kudos

Hi @phanindra ,To be precise, Delta Lake format is based on parquet files. For strings, Parquet only has one data type: StringTypeSo, basically varchar(n) data type under the hood is represented as string with check constraint on the length of the st...

5 kudos

07-24-2024 6:00:37 AM

2 More Replies

by MichTalebzadeh • Valued Contributor

07-24-2024 5:57:13 AM

893 Views
0 replies
1 kudos

Navigating the Future: Key Questions for Implementing Production-Quality GenAI

"The Big Book of Generative AI from Databricks"https://lnkd.in/dR4VuEyQprovides a comprehensive guide to understanding and implementing Generative AI (GenAI) effectively within an enterprise. Here are some critical questions that businesses should co...

Data Engineering

FinCrime

GenAI ArtificialIntelligence DataScience AI MachineLearning Innovation Technology EnterpriseAI

893 Views
0 replies
1 kudos

07-24-2024 5:57:13 AM

by sanjay • Valued Contributor II

03-30-2023 12:42:50 AM

18014 Views
13 replies
10 kudos

Spark tasks too slow and not doing parellel processing

Hi,I have spark job which is processing large data set, its taking too long to process the data. In Spark UI, I can see its running 1 tasks out of 9 tasks. Not sure how to run this in parellel. I have already mentioned auto scaling and providing upto...

Data Engineering

18014 Views
13 replies
10 kudos

03-30-2023 12:42:50 AM

View Replies

Latest Reply

plondon
New Contributor II

07-24-2024 4:07:00 AM

10 kudos

Will it be any different if using Spark but within Azure, i.e. faster?

10 kudos

07-24-2024 4:07:00 AM

12 More Replies

by Yyyyy • New Contributor III

07-24-2024 2:14:11 AM

2533 Views
3 replies
2 kudos

showing only a limited number of lines from the CSV file

Expected no of lines is - 16400 Showing only 20 No of records Script spark.conf.set( "REDACTED", "REDACTED" ) # File location file_location = "REDACTED" # Read in the data to dataframe df df = spark.read.format("CSV").option("inferSchema",...

Data Engineering

2533 Views
3 replies
2 kudos

07-24-2024 2:14:11 AM

View Replies

Latest Reply

Yyyyy
New Contributor III

07-24-2024 3:16:07 AM

2 kudos

hi, pls look help mespark.conf.set( "REDACTED", "REDACTED")# File locationfile_location = "REDACTED"# Read in the data to dataframe dfdf = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",")...

2 kudos

07-24-2024 3:16:07 AM

2 More Replies

by stefano0929 • New Contributor II

07-24-2024 2:23:26 AM

658 Views
0 replies
0 kudos

Error 301 Moved Permanently in cells of plotting

Hi, I created a workbook for academic purposes and had completed it... from one moment to the next all the plot cells of charts (and only those) started returning the following error and I really don't know how to solve it by today.Failed to store th...

Data Engineering

658 Views
0 replies
0 kudos

07-24-2024 2:23:26 AM

by Bhabs • New Contributor

07-23-2024 12:40:57 PM

886 Views
1 replies
0 kudos

Replace one tag in a Jason file in the data bricks table .

There is a column (src_json) in emp_table . I need to replace (ages to age )in each json in the src_json column in emp_table.Can you pls suggest the best way to do it .

Data Engineering

886 Views
1 replies
0 kudos

07-23-2024 12:40:57 PM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

07-23-2024 1:55:13 PM

0 kudos

Hi @Bhabs ,You can do it in following way (assuming that src_json contains json string):from pyspark.sql import SparkSession from pyspark.sql.functions import col, expr spark = SparkSession.builder.appName("Replace JSON Keys").getOrCreate() data = ...

0 kudos

07-23-2024 1:55:13 PM

by Olaoye_Somide • New Contributor III

07-18-2024 9:05:00 AM

2104 Views
1 replies
0 kudos

AutoLoader File Notification Setup on AWS

I’m encountering issues setting up Databricks AutoLoader in File Notification mode. The error seems to be related to UC access to the S3 bucket. I have tried running it on a single-node dedicated cluster but no luck.Any guidance or assistance on reso...

Data Engineering

2104 Views
1 replies
0 kudos

07-18-2024 9:05:00 AM

View Replies

Latest Reply

Olaoye_Somide
New Contributor III

07-23-2024 7:03:48 AM

0 kudos

Thanks @Retired_mod. I have reviewed all the steps mentioned, including the IAM policy, as per the setup guide. I believe the permissions granted are sufficient for the setup.To validate the permissions, I used IAM credentials with Admin privileges i...

0 kudos

07-23-2024 7:03:48 AM

by Sadam97 • New Contributor III

07-23-2024 4:37:48 AM

1318 Views
0 replies
0 kudos

Error: the Service Account Key in storage credential is not configured correctly

We have databricks on GCP. Streamings are running 24/7, storage credentials and external location are created as we are using managed unity catalog. We get random error, somewhere are around mid night (UTC). Here is trace of Error,ERROR MicroBatchExe...

Data Engineering

1318 Views
0 replies
0 kudos

07-23-2024 4:37:48 AM

by Sudharsan24 • New Contributor II

07-23-2024 2:09:26 AM

2411 Views
2 replies
2 kudos

Job aborted stage failure java.sql.SQLRecoverableException: IO Error: Connection reset by peer

While ingesting data from Oracle to databricks(writing into ADLS) using jdbc I am getting connection reset by peer error when ingesting a large table which has millions of rows.I am using oracle sql developer and azure databricks.I tried every way li...

Data Engineering

2411 Views
2 replies
2 kudos

07-23-2024 2:09:26 AM

View Replies

Latest Reply

Rishabh-Pandey
Databricks MVP

07-23-2024 2:19:24 AM

2 kudos

Try using this code .import pyspark from pyspark.sql import SparkSession # Initialize Spark session spark = SparkSession.builder.appName("OracleToDatabricks").getOrCreate() # Oracle connection properties conn = "jdbc:oracle:thin:@//<host>:<port>/<s...

2 kudos

07-23-2024 2:19:24 AM

1 More Replies

by Mehdi-LAMRANI • New Contributor II

05-24-2024 4:02:32 AM

7703 Views
2 replies
2 kudos

Resolved! Upload file from local file system to DBFS (2024)

Recent changes to the worskpace UI (and introduction of Unity Catalog) seem to have discretely sunset the ability to upload data directly to DBFS from the local Filesystem using the UI (NOT the CLI)I want to be able to load a raw file (no matter the ...

Data Engineering

7703 Views
2 replies
2 kudos

05-24-2024 4:02:32 AM

View Replies

Latest Reply

pavithra
New Contributor III

07-23-2024 2:49:59 AM

2 kudos

not working in community edition

2 kudos

07-23-2024 2:49:59 AM

1 More Replies

by DatabricksHero • New Contributor II

08-10-2023 5:19:47 AM

3222 Views
2 replies
0 kudos

Unity Catalog 2.1 API Not Returning SQL Function/View Dependencies

Hi all,I have a problem with reading responses generated by Unity Catalog API 2.1 as they are missing fields that are otherwise described in the specification:List functions - The fields routine_dependencies, return_params, and input_params are missi...

Data Engineering

API

sql

Unity Catalog

3222 Views
2 replies
0 kudos

08-10-2023 5:19:47 AM

View Replies

Latest Reply

vyas
Databricks Partner

07-23-2024 2:22:16 AM

0 kudos

Hi @Retired_mod , I have the same issue as @DatabricksHero .View dependencies are not returned. Could you clarify the usage of this API call?

0 kudos

07-23-2024 2:22:16 AM

1 More Replies

by ashkd7310 • New Contributor II

07-22-2024 7:55:39 PM

2134 Views
2 replies
4 kudos

date type conversion error

Hello,I am trying to convert the date in MM/dd/yyyy format. So I am first using the date_format function and converting the date into MM/dd/yyyy. So it becomes string. However, my use case is to have the data as date. so I am again converting the str...

Data Engineering

2134 Views
2 replies
4 kudos

07-22-2024 7:55:39 PM

View Replies

Latest Reply

Rishabh-Pandey
Databricks MVP

07-23-2024 12:27:32 AM

4 kudos

Check with this method if it works.# Convert date to MM/dd/yyyy format (string) df = df.withColumn("formatted_date", date_format("date", "MM/dd/yyyy")) # Convert string back to date df = df.withColumn("converted_date", to_date("formatted_date", "MM/...

4 kudos

07-23-2024 12:27:32 AM

1 More Replies

by DataEnginerrOO • New Contributor III

07-20-2024 10:48:08 PM

4028 Views
4 replies
2 kudos

Error while trying to install jdbc8.jar

Hi,I am attempting to connect to an Oracle server. I tried to install the ojdbc8.jar library, but I encountered an error: "Library installation attempted on the driver node of cluster 0718-101257-h5k9c5ud failed. Please refer to the following error m...

Data Engineering

4028 Views
4 replies
2 kudos

07-20-2024 10:48:08 PM

View Replies

by prith • New Contributor III

05-28-2024 8:09:44 AM

7612 Views
7 replies
1 kudos

Resolved! Datbricks JDK 17 upgrade error

We tried upgrading to JDK 17Using Spark version 3.0.5 and runtime 14.3 LTSGetting this exception using parallelstream()With Java 17 I am not able to parallel process different partitions at the same time. This means when there is more than 1 partiti...

Data Engineering

7612 Views
7 replies
1 kudos

05-28-2024 8:09:44 AM

View Replies

Latest Reply

prith
New Contributor III

05-28-2024 4:14:24 PM

1 kudos

Anyways - thanks for your response - We found a workaround for this error and JDK 17 is actually working - it appears faster than JDK 8

1 kudos

05-28-2024 4:14:24 PM

6 More Replies

Databricks Community

Forum Posts

Autoloader Schema Hint are not taken into consideration in schema file

Resolved! Support for Varchar data type

Navigating the Future: Key Questions for Implementing Production-Quality GenAI

Spark tasks too slow and not doing parellel processing

showing only a limited number of lines from the CSV file

Error 301 Moved Permanently in cells of plotting

Replace one tag in a Jason file in the data bricks table .

AutoLoader File Notification Setup on AWS

Error: the Service Account Key in storage credential is not configured correctly

Job aborted stage failure java.sql.SQLRecoverableException: IO Error: Connection reset by peer

Resolved! Upload file from local file system to DBFS (2024)

Unity Catalog 2.1 API Not Returning SQL Function/View Dependencies

date type conversion error

Error while trying to install jdbc8.jar

Resolved! Datbricks JDK 17 upgrade error

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template