- 775 Views
- 1 replies
- 1 kudos
Save dataframe to the same variable
I would like to know if there is any difference if I save dataframe during tranformation to itself as first code or to new dataframe as second example.Thankslog_df = log_df.withColumn("process_timestamp",from_utc_timestamp(lit(current_timestamp()),"E...
- 775 Views
- 1 replies
- 1 kudos
- 1 kudos
Hi @alesventus, When saving a DataFrame after transformation, there is no difference between saving it to itself or a new DataFrame. Both approaches will result in the same output. Sources:- [Docs: dataframes-python](https://docs.databricks.com/getti...
- 1 kudos
- 1557 Views
- 1 replies
- 0 kudos
iceberg
Hi fellasi am working on databricks using icebergat first i have configured my notebook as belowspark.conf.set("spark.sql.catalog.spark_catalog","org.apache.iceberg.spark.SparkCatalog")spark.conf.set("spark.sql.catalog.spark_catalog.type", "hadoop")s...
- 1557 Views
- 1 replies
- 0 kudos
- 0 kudos
Hi @Mohsen, • The exception "RuntimeMetaException: Failed to connect to Hive Metastore" occurs because the Hive metastore cannot find the version information. • To resolve the issue, follow the steps below: - Set up a cluster with spark.sql.hive.m...
- 0 kudos
- 1513 Views
- 1 replies
- 0 kudos
how to configure EC2 instance connection in databricks
I would like to know how to configure to be aws instance connection, a VPC and an EC2 instance were configured and allowed IP ping ping server onpremise, how would it be possible to make connection in Databricks so that it can make a connection?
- 1513 Views
- 1 replies
- 0 kudos
- 0 kudos
Hi @Kaviana, To configure an AWS instance connection in Databricks, you need to follow these steps: 1. Create an access policy and a user with access keys in the AWS Console: - Go to the IAM service. - Click the Policies tab in the sidebar. - Clic...
- 0 kudos
- 1663 Views
- 1 replies
- 0 kudos
Changing StreamingWrite API in DBR 13.1 may lead to incompatibility with Spark 3.4
I'm using StarRocks Connector[2] to ingest data to StarRocks on DataBricks 13.1 (powered by Spark 3.4.0). The connector could run on community Spark 3.4, but fail on the DBR. The reason is (the full stack trace is attached)java.lang.IncompatibleClass...
- 1663 Views
- 1 replies
- 0 kudos
- 0 kudos
Hi @lpf, Based on the information provided, there seems to be a compatibility issue between the StarRocks Connector and Databricks Runtime 13.1 (powered by Spark 3.4.0). The problem arises because the StarRocksWrite class implements both the BatchWri...
- 0 kudos
- 1301 Views
- 2 replies
- 1 kudos
Resolved! threads leakage when getConnection fails
Hi,we are using databricks jdbc https://mvnrepository.com/artifact/com.databricks/databricks-jdbc/2.6.33it seems like there is a thread leakage when getConnection failscould anyone advice?can be reproduced with @Test void databricksThreads() {...
- 1301 Views
- 2 replies
- 1 kudos
- 1 kudos
Hi,none of the above suggestion will not work...we already contacted databricks jdbc team, thread leakage was confirmed and was fixed in version 2.6.34https://mvnrepository.com/artifact/com.databricks/databricks-jdbc/2.6.34this leakage still exist if...
- 1 kudos
- 835 Views
- 1 replies
- 0 kudos
Missing records while using limit in multithreading
Hi,I need to process nearly 30 files from different locations and insert records to RDS. I am using multi-threading to process these files parallelly like below. Test data: I have configuration like below based on column 4: If column 4=0:...
- 835 Views
- 1 replies
- 0 kudos
- 0 kudos
Hi @Policepatil, Based on the given information, it seems that the issue occurs when filtering the records based on the record type. The missing documents are inconsistent and can arise from different files or even within the same file but with other...
- 0 kudos
- 708 Views
- 1 replies
- 0 kudos
Suspension of Data Engineer Professional exam
Hi Databricks TeamI had scheduled my exam on 6th sep 2023, during exam same pop up came up, stating that I am looking in some other direction. I told them that my laptop mouse is not working properly, so I was looking at it. But still they suspended ...
- 708 Views
- 1 replies
- 0 kudos
- 0 kudos
Hi @priyakant1 ,Have you got any response from the team, like did they reschedule your exam?
- 0 kudos
- 1960 Views
- 3 replies
- 1 kudos
Resolved! My exam has suspended , Need help Urgently (21/08/2023)
Hello Team,I encountered Pathetic experience while attempting my 1st DataBricks certification. Abruptly, Proctor asked me to show my desk, after showing he/she asked multiple times.. wasted my time and then suspended my exam.I want to file a complain...
- 1960 Views
- 3 replies
- 1 kudos
- 1 kudos
Sub: My exam Datbricks Data Engineer Associate got suspended_need immediate help please (10/09/2023)I encountered Pathetic experience while attempting my DataBricks Data engineer certification. Abruptly, Proctor asked me to show my desk, after showin...
- 1 kudos
- 2342 Views
- 2 replies
- 1 kudos
Resolved! Records are missing while filtering the dataframe in multithreading
Hi, I need to process nearly 30 files from different locations and insert records to RDS. I am using multi-threading to process these files parallelly like below. Test data: I have configuration like below based on column 4: If colum...
- 2342 Views
- 2 replies
- 1 kudos
- 1 kudos
Looks like you are comparing to strings like "1", not values like 1 in your filter condition. It's hard to say, there are some details missing like the rest of the code and the DF schema, and what output you are observing.
- 1 kudos
- 1213 Views
- 2 replies
- 0 kudos
Increase cores for Spark History Server
By default SHS uses spark.history.fs.numReplayThreads = 25% of avaliable cores (Number of threads that will be used by history server to process event logs)How can we increase the number of cores for Spark History Server ?
- 1213 Views
- 2 replies
- 0 kudos
- 0 kudos
Hi @VMeghraj, To increase the number of cores for Spark History Server, you can modify the spark.history.fs.numReplayThreads Configuration parameter. You can set the desired number of cores by modifying the value of spark.history.fs.numReplayThreads...
- 0 kudos
- 1200 Views
- 1 replies
- 0 kudos
Databricks Rstudio Init Script Deprecated
OK so I'm trying to use Open Source Rstudio on Azure Databricks.I'm following the instructions here: https://learn.microsoft.com/en-us/azure/databricks/sparkr/rstudio#install-rstudio-server-open-source-editionI've installed the necessary init script ...
- 1200 Views
- 1 replies
- 0 kudos
- 0 kudos
Hi @meystingray, The error message you're encountering is indicating that the init script path is not absolute. According to the Databricks documentation, init scripts should be stored as workspace files. Here's how you can do it. 1. Store your ini...
- 0 kudos
- 5704 Views
- 1 replies
- 0 kudos
Is it good to process files in multithreading?
Hi,I need to process nearly 30 files from different locations and insert records to RDS.I am using multi-threading to process these files parallelly like below. def process_files(file_path): <process files here> 1. Find bad records based on fie...
- 5704 Views
- 1 replies
- 0 kudos
- 0 kudos
Hi @Policepatil , - The approach of parallel processing files can increase the overall speed of the operation.- Multi-threading can optimize CPU usage but not necessarily make I/O operations faster.- I/O operations like reading and writing files are...
- 0 kudos
- 1339 Views
- 2 replies
- 0 kudos
Data Insertion
Scenario: Data from blob storage to SQL db once a week.I have 15(from current date to next 15 days) days data into the blob storage, stored date wise in parquet format, and after seven days the next 15 days data will be inserted. Means till 7th day t...
- 1339 Views
- 2 replies
- 0 kudos
- 0 kudos
Hi @bachan, Based on your scenario, you might consider using Azure Data Factory (ADF) for your data pipeline. Azure Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. Here ...
- 0 kudos
- 2894 Views
- 2 replies
- 0 kudos
Server error: OK - Notebook
Hi I am currently having a weird notebook behavior. Every time I write, I am getting the following error. My gut feeling is that it causes by the Auto-save feature.Cheers,Gil
- 2894 Views
- 2 replies
- 0 kudos
- 0 kudos
Hi @Gilg , Based on the given information, it seems that the error you are experiencing is related to notebook autosaving. The error message "Failed to save revision: Notebook size exceeds limit" indicates that the notebook size is too large to be a...
- 0 kudos
- 4980 Views
- 1 replies
- 0 kudos
Databricks Terraform Cluster Issue.
Error: default auth: cannot configure default credentials. Config: token=***. Env: DATABRICKS_TOKENon cluster.tf line 27, in data “databricks_spark_version” “latest_lts”:27: data “databricks_spark_version” “latest_lts” {
- 4980 Views
- 1 replies
- 0 kudos
- 0 kudos
Hi @Simon_T , Based on the given error message and the provided information, it seems that the default authentication credentials are not properly configured for Databricks. To resolve this issue, you need to set up the authentication using a person...
- 0 kudos
Connect with Databricks Users in Your Area
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group-
AI Summit
4 -
Azure
3 -
Azure databricks
3 -
Bi
1 -
Certification
1 -
Certification Voucher
2 -
Chatgpt
1 -
Community
7 -
Community Edition
3 -
Community Members
2 -
Community Social
1 -
Contest
1 -
Data + AI Summit
1 -
Data Engineering
1 -
Data Processing
1 -
Databricks Certification
1 -
Databricks Cluster
1 -
Databricks Community
11 -
Databricks community edition
3 -
Databricks Community Rewards Store
3 -
Databricks Lakehouse Platform
5 -
Databricks notebook
1 -
Databricks Office Hours
1 -
Databricks Runtime
1 -
Databricks SQL
4 -
Databricks-connect
1 -
DBFS
1 -
Dear Community
3 -
Delta
10 -
Delta Live Tables
1 -
Documentation
1 -
Exam
1 -
Featured Member Interview
1 -
HIPAA
1 -
Integration
1 -
LLM
1 -
Machine Learning
1 -
Notebook
1 -
Onboarding Trainings
1 -
Python
2 -
Rest API
11 -
Rewards Store
2 -
Serverless
1 -
Social Group
1 -
Spark
1 -
SQL
8 -
Summit22
1 -
Summit23
5 -
Training
1 -
Unity Catalog
4 -
Version
1 -
VOUCHER
1 -
WAVICLE
1 -
Weekly Release Notes
2 -
weeklyreleasenotesrecap
2 -
Workspace
1
- « Previous
- Next »