Data Engineering

Forum Posts

Sorted by:

by amitkmaurya • Contributor

05-02-2024 2:43:47 AM

1729 Views
2 replies
4 kudos

Resolved! How to increase executor memory in Databricks jobs

May be I am new to Databricks that's why I have confusion.Suppose I have worker memory of 64gb in Databricks job max 12 nodes...and my job is failing due to Executor Lost due to 137 (OOM if found on internet).So, to fix this I need to increase execut...

Data Engineering

1729 Views
2 replies
4 kudos

05-02-2024 2:43:47 AM

View Replies

Latest Reply

amitkmaurya
Contributor

05-16-2024 11:17:50 PM

4 kudos

Hi @raphaelblg ,I have solved this issue. Yes, in my case data skewness was the issue that was causing this executor OOM, so adding repartition just before writing resolved this skewness. I didn't change any workers or driver memory.Thanks for your h...

4 kudos

05-16-2024 11:17:50 PM

1 More Replies

by amitkmaurya • Contributor

05-02-2024 1:50:56 AM

2514 Views
2 replies
2 kudos

Resolved! Databricks job keep getting failed due to executor lost.

Getting following error while saving a dataframe partitioned by two columns.Job aborted due to stage failure: Task 5774 in stage 33.0 failed 4 times, most recent failure: Lost task 5774.3 in stage 33.0 (TID 7736) (13.2.96.110 executor 7): ExecutorLos...

Data Engineering

databricks jobs

spark

2514 Views
2 replies
2 kudos

05-02-2024 1:50:56 AM

View Replies

Latest Reply

amitkmaurya
Contributor

05-16-2024 11:10:56 PM

2 kudos

Hi, I have solved the problem with the same workers and driver.In my case data skewness was the problem.Adding repartition to the dataframe just before writing, evenly distributed the data across the nodes and this stage failure resolved.Thanks @Kani...

2 kudos

05-16-2024 11:10:56 PM

1 More Replies

by Mirza1 • New Contributor

05-13-2024 10:31:54 AM

578 Views
1 replies
0 kudos

Error while Running a Table

Hi All,I am trying to run table schema and facing below error.Error - AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table.com.databricks.backend.common.rpc.SparkDriverExceptions$SQLExecutionException: org.apache...

Data Engineering

578 Views
1 replies
0 kudos

05-13-2024 10:31:54 AM

View Replies

Latest Reply

Ayushi_Suthar
Honored Contributor

05-16-2024 12:58:36 PM

0 kudos

Hi @Mirza1 , Greetings! Can you please confirm if it is an ADLS gen2 table? If yes then can you please give it a try to run the table schema by setting spark configs for gen2 at the cluster level? You can refer to this document to set the spark co...

0 kudos

05-16-2024 12:58:36 PM

by Silabs • New Contributor II

05-16-2024 5:32:13 AM

2939 Views
3 replies
4 kudos

Resolved! Set up connection to on prem sql server

Ive just set up our databricks environment. Hosted in AWS. We have an on prem SQL server and would like to connect . How can i do that?

Data Engineering

2939 Views
3 replies
4 kudos

05-16-2024 5:32:13 AM

View Replies

Latest Reply

Yeshwanth
Honored Contributor

05-16-2024 9:32:32 AM

4 kudos

@Silabs good day! To connect your Databricks environment (hosted on AWS) to your on-premise SQL server, follow these steps: 1. Network Setup: Establish a connection between your SQL server and the Databricks virtual private cloud (VPC) using VPN or A...

4 kudos

05-16-2024 9:32:32 AM

2 More Replies

by dbengineer516 • New Contributor III

05-16-2024 8:07:00 AM

2002 Views
4 replies
2 kudos

Resolved! Git Integration with Databricks Query Files and Azure DevOps

I’ve been trying to develop a solution for our team to be able to have Git integration between Databricks and Azure DevOps. However, the “query” file type/workspace item on Databricks can’t be committed and pushed to a Git repo, only regular file typ...

Data Engineering

2002 Views
4 replies
2 kudos

05-16-2024 8:07:00 AM

View Replies

Latest Reply

Yeshwanth
Honored Contributor

05-16-2024 8:29:05 AM

2 kudos

@dbengineer516 Good day! As per the Databricks documentation, only certain Databricks asset types are supported by Git folders. These include Files, Notebooks, and Folders. Databricks asset types that are currently not supported in Git folders includ...

2 kudos

05-16-2024 8:29:05 AM

3 More Replies

by SreeG • New Contributor II

05-08-2024 1:34:57 PM

865 Views
2 replies
0 kudos

Error Reading Kafka message into Azure Databricks

TeamI am trying to test the connection to Kafka broker from Azure Databricks. Telnet and IP is successful.When I am trying to read the data, I am getting "Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid cer...

Data Engineering

Azure DB

kafka

865 Views
2 replies
0 kudos

05-08-2024 1:34:57 PM

View Replies

Latest Reply

SreeG
New Contributor II

05-16-2024 9:20:11 AM

0 kudos

Had to hold on this testing. But, when I get a chance to work on this, I will update my findings. Thank you!

0 kudos

05-16-2024 9:20:11 AM

1 More Replies

by PerformanceTest • New Contributor

05-14-2024 12:11:26 AM

639 Views
1 replies
0 kudos

Databricks to Jmteter connectivity issue

Hi All, we are conducting Databricks performance test with Apache Jmeter, after configuring JDBC config element getting below error Cannot create PoolableConnectionFactory ([Databricks][JDBCDriver](700120) Host xxxxx-xxxxx-xxxx.cloud.databricks.com c...

Data Engineering

639 Views
1 replies
0 kudos

05-14-2024 12:11:26 AM

View Replies

Latest Reply

shan_chandra
Esteemed Contributor

05-16-2024 9:14:55 AM

0 kudos

@PerformanceTest - can you please check your DB workspace terraform script to see if there is a different CNAME defined for your host workspace.

0 kudos

05-16-2024 9:14:55 AM

by ByteForge • New Contributor

05-14-2024 4:46:32 AM

869 Views
1 replies
0 kudos

How to import .dbc files above size limit?

Above is the screenshot of error, is there any other way of processing dbc files? Do no have access/backup to previous workspace where this code is imported from

Data Engineering

dbc

869 Views
1 replies
0 kudos

05-14-2024 4:46:32 AM

View Replies

Latest Reply

shan_chandra
Esteemed Contributor

05-16-2024 9:05:43 AM

0 kudos

@ByteForge - Kindly raise a support case with Databricks to work with Engg to increase the limits for your workspace.

0 kudos

05-16-2024 9:05:43 AM

by databricksdev • New Contributor II

05-15-2024 5:18:05 AM

940 Views
1 replies
0 kudos

Capture Automatically Added tags

Can we capture automatically added tags (ex: RunName) from azure data bricks job cluster to parameters or custom tags in azure data factory

Data Engineering

940 Views
1 replies
0 kudos

05-15-2024 5:18:05 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

05-16-2024 8:59:02 AM

0 kudos

Hi @databricksdev, Azure Databricks applies default tags to each cluster, including Vendor, Creator, ClusterName, and ClusterId. In addition, it applies two default tags on job clusters: RunName and JobId1. However, these tags are only applied to...

0 kudos

05-16-2024 8:59:02 AM

by ac0 • New Contributor III

05-09-2024 8:02:51 PM

866 Views
2 replies
0 kudos

Resolved! Is it more performant to run optimize table commands on a serverless SQL warehouse or elsewhere?

Is it more performant to run optimize table commands on a serverless SQL warehouse or in a job or all-purpose compute cluster? I would presume a serverless warehouse would be faster, but I don't know how to test this.

Data Engineering

866 Views
2 replies
0 kudos

05-09-2024 8:02:51 PM

View Replies

Latest Reply

Yeshwanth
Honored Contributor

05-16-2024 8:35:43 AM

0 kudos

@ac0 Good day! Serverless SQL warehouses are likely to execute "optimize table" commands faster than job or all-purpose compute clusters due to their rapid startup time, quick upscaling for low latency, and efficient handling of varying query demand....

0 kudos

05-16-2024 8:35:43 AM

1 More Replies

by NTRT • New Contributor III

05-15-2024 4:27:43 AM

769 Views
1 replies
0 kudos

how to transform json-stat 2 filte to SparkDataFrame ? how to keep order on MapType structure ?

Hi,I am using different json files of type json-stat2. These kind of json file is quite common used in national statistisc bureau. Its multi dimensional with multy arrays. Using python environment kan we use pyjstat package to easily transform json...

Data Engineering

769 Views
1 replies
0 kudos

05-15-2024 4:27:43 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

05-16-2024 6:45:18 AM

0 kudos

MapType does not maintain order (json itself too).Can you apply the ordering yourself afterwards?

0 kudos

05-16-2024 6:45:18 AM

by NTRT • New Contributor III

05-16-2024 12:48:53 AM

797 Views
2 replies
0 kudos

cant read json file with just 1,75 MiB ?

Hi,I am realtively new on databricks, although I am conscious about lazy evaluation, transformations and actions and peristence.I have a json file (complex-nested) with about 1,73 MiB. when df = spark.read.option("multiLine", "false").json('dbfs:/mnt...

Data Engineering

797 Views
2 replies
0 kudos

05-16-2024 12:48:53 AM

View Replies

Latest Reply

koushiknpvs
New Contributor III

05-16-2024 4:16:07 AM

0 kudos

This can be resolved by redefining the schema structure explicitly and using that schema to read the file. from pyspark.sql.types import StructType, StructField, StringType, IntegerType, ArrayType# Define the schema according to the JSON structuresch...

0 kudos

05-16-2024 4:16:07 AM

1 More Replies

by NTRT • New Contributor III

05-15-2024 12:15:20 PM

1714 Views
4 replies
0 kudos

Resolved! performance issues when readingjson-stat2

Data Engineering

1714 Views
4 replies
0 kudos

05-15-2024 12:15:20 PM

View Replies

Latest Reply

koushiknpvs
New Contributor III

05-15-2024 12:29:27 PM

0 kudos

Please give me a kudos if this works.Efficiency in Data Collection: Using .collect() on large datasets can lead to out-of-memory errors as it collects all rows to the driver node. If the dataset is large, consider alternatives such as extracting only...

0 kudos

05-15-2024 12:29:27 PM

3 More Replies

by Mathias_Peters • Contributor

05-15-2024 1:39:55 PM

829 Views
2 replies
0 kudos

Asset Bundles: Adding project_directory in DBT task breaks previous python task

Hi, I have a job consisting of three tasks: tasks: - task_key: Kinesis_to_S3_new spark_python_task: python_file: ../src/kinesis.py parameters: ["${var.stream_region}", "${var.s3_base_path}"] j...

Data Engineering

829 Views
2 replies
0 kudos

05-15-2024 1:39:55 PM

View Replies

Latest Reply

Mathias_Peters
Contributor

05-16-2024 12:33:03 AM

0 kudos

Hi @Ajay-Pandey ,thank you for the hints. I will try to recreate the job via UI. I ran the tasks in a Github workflow. The file locations are mixed: the first two tasks (python and dlt) are located in the databricks/src folder. The dbt files come fro...

0 kudos

05-16-2024 12:33:03 AM

1 More Replies

by chandan_a_v • Valued Contributor

08-18-2022 1:25:35 AM

1960 Views
2 replies
1 kudos

Can't import local files under repo

I have a yaml file inside one of the sub dir in Databricks, I have appended the repo path to sys. Still I can't access this file. https://docs.databricks.com/_static/notebooks/files-in-repos.html

Data Engineering

1960 Views
2 replies
1 kudos

08-18-2022 1:25:35 AM

View Replies

Latest Reply

Abhishek10745
New Contributor III

05-16-2024 12:27:39 AM

1 kudos

Hello @chandan_a_v ,were you able to solve this issue?I am also experiencing the same thing where i cannot move file with extension .yml from repo folder to shared workspace folder.As per documentation, this is the limitation or functionality of data...

1 kudos

05-16-2024 12:27:39 AM

1 More Replies

User

Count

1609

753

349

285

248

Databricks Community

Forum Posts

Resolved! How to increase executor memory in Databricks jobs

Resolved! Databricks job keep getting failed due to executor lost.

Error while Running a Table

Resolved! Set up connection to on prem sql server

Resolved! Git Integration with Databricks Query Files and Azure DevOps

Error Reading Kafka message into Azure Databricks

Databricks to Jmteter connectivity issue

How to import .dbc files above size limit?

Capture Automatically Added tags

Resolved! Is it more performant to run optimize table commands on a serverless SQL warehouse or elsewhere?

how to transform json-stat 2 filte to SparkDataFrame ? how to keep order on MapType structure ?

cant read json file with just 1,75 MiB ?

Resolved! performance issues when readingjson-stat2

Asset Bundles: Adding project_directory in DBT task breaks previous python task

Can't import local files under repo

Connect with Databricks Users in Your Area

DLT - Unity catalog and volume - Dynamically acces...

How Does an AI-Powered PC Handle Large Datasets Co...

Unable to read Unity Catalog schema

SELECT from VIEW to CREATE a table or view

Trigger a workflow from a different databricks env...