Community Discussions

by fazlu_don23 • New Contributor III

09-12-2023 4:49:12 AM

97 Views
0 replies
0 kudos

ronaldo is back

create table SalesReport(TerritoryName NVARCHAR(50), ProductName NVARCHAR(100), TotalSales DECIMAL(10,2), PreviousYearSales DECIMAL(10,2), GrowthRate DECIMAL(10,2)); create table ErrorLog( ErrorID int, ErrorMessage nvarchar(max),ErrorDate datetime);...

Community Discussions

Reply

97 Views
0 replies
0 kudos

09-12-2023 4:49:12 AM

by alesventus • New Contributor III

09-12-2023 12:45:48 AM

534 Views
1 replies
1 kudos

Save dataframe to the same variable

I would like to know if there is any difference if I save dataframe during tranformation to itself as first code or to new dataframe as second example.Thankslog_df = log_df.withColumn("process_timestamp",from_utc_timestamp(lit(current_timestamp()),"E...

Community Discussions

Reply

534 Views
1 replies
1 kudos

09-12-2023 12:45:48 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

09-12-2023 12:50:12 AM

1 kudos

Hi @alesventus, When saving a DataFrame after transformation, there is no difference between saving it to itself or a new DataFrame. Both approaches will result in the same output. Sources:- [Docs: dataframes-python](https://docs.databricks.com/getti...

1 kudos

09-12-2023 12:50:12 AM

by Mohsen • New Contributor

09-11-2023 6:18:19 PM

1090 Views
1 replies
0 kudos

iceberg

Hi fellasi am working on databricks using icebergat first i have configured my notebook as belowspark.conf.set("spark.sql.catalog.spark_catalog","org.apache.iceberg.spark.SparkCatalog")spark.conf.set("spark.sql.catalog.spark_catalog.type", "hadoop")s...

Community Discussions

Reply

1090 Views
1 replies
0 kudos

09-11-2023 6:18:19 PM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

09-11-2023 10:16:52 PM

0 kudos

Hi @Mohsen, • The exception "RuntimeMetaException: Failed to connect to Hive Metastore" occurs because the Hive metastore cannot find the version information. • To resolve the issue, follow the steps below: - Set up a cluster with spark.sql.hive.m...

0 kudos

09-11-2023 10:16:52 PM

by Kaviana • New Contributor III

09-11-2023 7:18:20 AM

1067 Views
1 replies
0 kudos

how to configure EC2 instance connection in databricks

I would like to know how to configure to be aws instance connection, a VPC and an EC2 instance were configured and allowed IP ping ping server onpremise, how would it be possible to make connection in Databricks so that it can make a connection?

Community Discussions

Reply

1067 Views
1 replies
0 kudos

09-11-2023 7:18:20 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

09-11-2023 12:01:15 PM

0 kudos

Hi @Kaviana, To configure an AWS instance connection in Databricks, you need to follow these steps: 1. Create an access policy and a user with access keys in the AWS Console: - Go to the IAM service. - Click the Policies tab in the sidebar. - Clic...

0 kudos

09-11-2023 12:01:15 PM

by lpf • New Contributor

08-29-2023 6:37:41 AM

1420 Views
1 replies
0 kudos

Changing StreamingWrite API in DBR 13.1 may lead to incompatibility with Spark 3.4

I'm using StarRocks Connector[2] to ingest data to StarRocks on DataBricks 13.1 (powered by Spark 3.4.0). The connector could run on community Spark 3.4, but fail on the DBR. The reason is (the full stack trace is attached)java.lang.IncompatibleClass...

Community Discussions

Reply

1420 Views
1 replies
0 kudos

08-29-2023 6:37:41 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

09-11-2023 5:29:30 AM

0 kudos

Hi @lpf, Based on the information provided, there seems to be a compatibility issue between the StarRocks Connector and Databricks Runtime 13.1 (powered by Spark 3.4.0). The problem arises because the StarRocksWrite class implements both the BatchWri...

0 kudos

09-11-2023 5:29:30 AM

by olegmir • New Contributor III

08-28-2023 7:42:27 AM

882 Views
2 replies
1 kudos

Resolved! threads leakage when getConnection fails

Hi,we are using databricks jdbc https://mvnrepository.com/artifact/com.databricks/databricks-jdbc/2.6.33it seems like there is a thread leakage when getConnection failscould anyone advice?can be reproduced with @Test void databricksThreads() {...

Community Discussions

JDBC

Reply

882 Views
2 replies
1 kudos

08-28-2023 7:42:27 AM

View Replies

Latest Reply

olegmir
New Contributor III

09-11-2023 5:08:14 AM

1 kudos

Hi,none of the above suggestion will not work...we already contacted databricks jdbc team, thread leakage was confirmed and was fixed in version 2.6.34https://mvnrepository.com/artifact/com.databricks/databricks-jdbc/2.6.34this leakage still exist if...

1 kudos

09-11-2023 5:08:14 AM

1 More Replies

by yhyhy3 • New Contributor III

08-30-2023 11:17:08 AM

968 Views
1 replies
0 kudos

Displaying Dataframes with ipywidgets.Output is Adding Unexpected Commas

I am currently working in a databricks notebook and using an ipywidgets.Output to display a pandas Dataframe. Because spark.DataFrame cannot be displayed in an ipywidgets.Output widget, I have been using:import pandas as pd import numpy as np import ...

Community Discussions

Reply

968 Views
1 replies
0 kudos

08-30-2023 11:17:08 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

09-11-2023 4:55:07 AM

0 kudos

Hi @yhyhy3, Based on the information, the issue you are facing seems related to the ipywidgets library. The ipywidgets package is used to create interactive elements in Databricks notebooks. However, there might be a compatibility issue with your ve...

0 kudos

09-11-2023 4:55:07 AM

by Policepatil • New Contributor II

09-11-2023 1:38:28 AM

522 Views
1 replies
0 kudos

Missing records while using limit in multithreading

Hi,I need to process nearly 30 files from different locations and insert records to RDS. I am using multi-threading to process these files parallelly like below. Test data: I have configuration like below based on column 4: If column 4=0:...

Community Discussions

Reply

522 Views
1 replies
0 kudos

09-11-2023 1:38:28 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

09-11-2023 3:55:47 AM

0 kudos

Hi @Policepatil, Based on the given information, it seems that the issue occurs when filtering the records based on the record type. The missing documents are inconsistent and can arise from different files or even within the same file but with other...

0 kudos

09-11-2023 3:55:47 AM

by priyakant1 • New Contributor II

09-07-2023 1:03:13 AM

470 Views
1 replies
0 kudos

Suspension of Data Engineer Professional exam

Hi Databricks TeamI had scheduled my exam on 6th sep 2023, during exam same pop up came up, stating that I am looking in some other direction. I told them that my laptop mouse is not working properly, so I was looking at it. But still they suspended ...

Community Discussions

Reply

470 Views
1 replies
0 kudos

09-07-2023 1:03:13 AM

View Replies

Latest Reply

sirishavemula20
New Contributor III

09-10-2023 9:44:32 AM

0 kudos

Hi @priyakant1 ,Have you got any response from the team, like did they reschedule your exam?

0 kudos

09-10-2023 9:44:32 AM

by sirishavemula20 • New Contributor III

08-21-2023 1:11:38 AM

1328 Views
3 replies
1 kudos

Resolved! My exam has suspended , Need help Urgently (21/08/2023)

Hello Team,I encountered Pathetic experience while attempting my 1st DataBricks certification. Abruptly, Proctor asked me to show my desk, after showing he/she asked multiple times.. wasted my time and then suspended my exam.I want to file a complain...

Community Discussions

Reply

1328 Views
3 replies
1 kudos

08-21-2023 1:11:38 AM

View Replies

Latest Reply

sirishavemula20
New Contributor III

09-10-2023 4:04:00 AM

1 kudos

Sub: My exam Datbricks Data Engineer Associate got suspended_need immediate help please (10/09/2023)I encountered Pathetic experience while attempting my DataBricks Data engineer certification. Abruptly, Proctor asked me to show my desk, after showin...

1 kudos

09-10-2023 4:04:00 AM

2 More Replies

by Policepatil • New Contributor II

09-07-2023 2:17:43 AM

1808 Views
2 replies
1 kudos

Resolved! Records are missing while filtering the dataframe in multithreading

Hi, I need to process nearly 30 files from different locations and insert records to RDS. I am using multi-threading to process these files parallelly like below. Test data: I have configuration like below based on column 4: If colum...

Community Discussions

Reply

1808 Views
2 replies
1 kudos

09-07-2023 2:17:43 AM

View Replies

Latest Reply

sean_owen
Honored Contributor II

09-09-2023 8:49:24 AM

1 kudos

Looks like you are comparing to strings like "1", not values like 1 in your filter condition. It's hard to say, there are some details missing like the rest of the code and the DF schema, and what output you are observing.

1 kudos

09-09-2023 8:49:24 AM

1 More Replies

by VMeghraj • New Contributor II

09-06-2023 9:34:03 AM

845 Views
2 replies
0 kudos

Increase cores for Spark History Server

By default SHS uses spark.history.fs.numReplayThreads = 25% of avaliable cores (Number of threads that will be used by history server to process event logs)How can we increase the number of cores for Spark History Server ?

Community Discussions

Reply

845 Views
2 replies
0 kudos

09-06-2023 9:34:03 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

09-06-2023 3:57:53 PM

0 kudos

Hi @VMeghraj, To increase the number of cores for Spark History Server, you can modify the spark.history.fs.numReplayThreads Configuration parameter. You can set the desired number of cores by modifying the value of spark.history.fs.numReplayThreads...

0 kudos

09-06-2023 3:57:53 PM

1 More Replies

by meystingray • New Contributor II

09-05-2023 8:44:07 AM

936 Views
1 replies
0 kudos

Databricks Rstudio Init Script Deprecated

OK so I'm trying to use Open Source Rstudio on Azure Databricks.I'm following the instructions here: https://learn.microsoft.com/en-us/azure/databricks/sparkr/rstudio#install-rstudio-server-open-source-editionI've installed the necessary init script ...

Community Discussions

Reply

936 Views
1 replies
0 kudos

09-05-2023 8:44:07 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

09-07-2023 3:48:15 AM

0 kudos

Hi @meystingray, The error message you're encountering is indicating that the init script path is not absolute. According to the Databricks documentation, init scripts should be stored as workspace files. Here's how you can do it. 1. Store your ini...

0 kudos

09-07-2023 3:48:15 AM

by Policepatil • New Contributor II

09-06-2023 11:28:53 PM

5302 Views
1 replies
0 kudos

Is it good to process files in multithreading?

Hi,I need to process nearly 30 files from different locations and insert records to RDS.I am using multi-threading to process these files parallelly like below. def process_files(file_path): <process files here> 1. Find bad records based on fie...

Community Discussions

Reply

5302 Views
1 replies
0 kudos

09-06-2023 11:28:53 PM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

09-07-2023 1:48:10 AM

0 kudos

Hi @Policepatil , - The approach of parallel processing files can increase the overall speed of the operation.- Multi-threading can optimize CPU usage but not necessarily make I/O operations faster.- I/O operations like reading and writing files are...

0 kudos

09-07-2023 1:48:10 AM

by bachan • New Contributor II

09-04-2023 9:10:44 AM

904 Views
2 replies
0 kudos

Data Insertion

Scenario: Data from blob storage to SQL db once a week.I have 15(from current date to next 15 days) days data into the blob storage, stored date wise in parquet format, and after seven days the next 15 days data will be inserted. Means till 7th day t...

Community Discussions

Reply

904 Views
2 replies
0 kudos

09-04-2023 9:10:44 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

09-06-2023 7:28:39 AM

0 kudos

Hi @bachan, Based on your scenario, you might consider using Azure Data Factory (ADF) for your data pipeline. Azure Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. Here ...

0 kudos

09-06-2023 7:28:39 AM

1 More Replies

Databricks Community

Forum Posts

ronaldo is back

Save dataframe to the same variable

iceberg

how to configure EC2 instance connection in databricks

Changing StreamingWrite API in DBR 13.1 may lead to incompatibility with Spark 3.4

Resolved! threads leakage when getConnection fails

Displaying Dataframes with ipywidgets.Output is Adding Unexpected Commas

Missing records while using limit in multithreading

Suspension of Data Engineer Professional exam

Resolved! My exam has suspended , Need help Urgently (21/08/2023)

Resolved! Records are missing while filtering the dataframe in multithreading

Increase cores for Spark History Server

Databricks Rstudio Init Script Deprecated

Is it good to process files in multithreading?

Data Insertion

Gathering Data Off Of A PDF File

DLT’s

Editor bug when escaping strings

python multiprocessing hangs at map on one cluster...

Regarding Certification renewal process