Data Engineering

Forum Posts

Sorted by:

by Serhii • Contributor

08-18-2022 1:40:05 AM

969 Views
3 replies
1 kudos

Could not launch jobs due to node_type_id (instance) unavailability

I am running hourly job on a cluster using p3.2xlarge GPU instance, but sometimes cluster couldn't start due to instance unavailability. I wander is there is any fallback mechanism to, for example, try a different instance type if one is not availabl...

Data Engineering

969 Views
3 replies
1 kudos

08-18-2022 1:40:05 AM

View Replies

Latest Reply

abagshaw
New Contributor III

06-27-2023 11:57:30 AM

1 kudos

(AWS only) For anyone experiencing capacity related cluster launch failures on non-GPU instance types, AWS Fleet instance types are now GA and available for clusters and instance pools. They help improve chance of successful cluster launch by allowi...

1 kudos

06-27-2023 11:57:30 AM

2 More Replies

by Anonymous • Not applicable

06-02-2021 4:36:46 PM

676 Views
1 replies
0 kudos

Instance type in Photon

Can Photon run on all instance/VM types?

Data Engineering

676 Views
1 replies
0 kudos

06-02-2021 4:36:46 PM

View Replies

Latest Reply

abagshaw
New Contributor III

06-27-2023 11:46:44 AM

0 kudos

No, Photon is only supported on a limited set of instance types where it's been benchmarked and tested by Databricks to have optimal performance.

0 kudos

06-27-2023 11:46:44 AM

by JPKC • New Contributor

03-01-2023 6:06:20 AM

936 Views
3 replies
1 kudos

Support for multiple EC2 instance types in a worker pool

As per this thread Databricks now integrates with EC2 CreateFleet API that allows customers to create Databricks pools and get EC2 instances from multiple AZs and multiple instance families & sizes. However, in the Databricks UI you can not select mo...

Data Engineering

936 Views
3 replies
1 kudos

03-01-2023 6:06:20 AM

View Replies

Latest Reply

abagshaw
New Contributor III

06-27-2023 11:37:39 AM

1 kudos

Fleet instances on Databricks is now GA and available in all AWS workspaces - you can find more details here: https://docs.databricks.com/compute/aws-fleet-instances.html

1 kudos

06-27-2023 11:37:39 AM

2 More Replies

by umair_hanif • New Contributor II

06-22-2023 5:12:45 AM

1175 Views
2 replies
1 kudos

Ingesting more than 7 million rows into a SQL Server Table

Hi All, I hope you're super well. I need your recommendations and solution for my problem.I am using a Databricks instance DS12_v2 which has 28GB RAM and 4 cores. I am ingesting 7.2 million rows into a SQL Server table and it is taking 57 min - 1 hou...

Data Engineering

1175 Views
2 replies
1 kudos

06-22-2023 5:12:45 AM

View Replies

Latest Reply

WernerS
New Contributor III

06-27-2023 8:25:45 AM

1 kudos

You can try to use BULK INSERT.https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver16Also using Data Factory instead of Databricks for the copy can be helpful.

1 kudos

06-27-2023 8:25:45 AM

1 More Replies

by verargulla • New Contributor III

11-03-2022 3:58:02 PM

1696 Views
5 replies
8 kudos

Databricks Academy content for Azure Databricks Customers

Hi! We've recently provisioned an Azure Databricks workspace and started building our pipelines. Do we qualify as Databricks 'customers' who have free access to all self-paced content on Databricks Academy? If so, how do we access it? We don't have a...

Data Engineering

1696 Views
5 replies
8 kudos

11-03-2022 3:58:02 PM

View Replies

Latest Reply

fpasid
New Contributor II

06-27-2023 7:42:57 AM

8 kudos

They changed the registration process and added 'Additional Fields' section, where you can provide your company email address, that you use in Azure Databricks. This worked automatically for me and I can access the self-paced trainings for free now.

8 kudos

06-27-2023 7:42:57 AM

4 More Replies

by chandan_a_v • Valued Contributor

06-03-2022 5:22:36 AM

4799 Views
3 replies
6 kudos

How to restart the Spark session within the notebook without reattaching the notebook?

Hi All,I want to run an ETL pipeline in a sequential way in my DB notebook. If I run it without resetting the Spark session or restarting the cluster I am getting a data frame key error. I think this might be because of the Spark cache because If I r...

Data Engineering

4799 Views
3 replies
6 kudos

06-03-2022 5:22:36 AM

View Replies

Latest Reply

g_krilis
New Contributor II

06-27-2023 1:32:16 AM

6 kudos

Is there a solution to the above problem? I also would like to restart SparkSession to free my cluster's resources, but when callingspark.stop()the notebook automatically detach and the following error occurs:The spark context has stopped and the dri...

6 kudos

06-27-2023 1:32:16 AM

2 More Replies

by berdonio • New Contributor

06-27-2023 12:17:27 AM

676 Views
0 replies
0 kudos

Optimal Azure VM type for EventHub streaming

Hello,our spark jobs stream messages from Event Hub then transform it and finally the messages are peristed in storage. We plan to exercise cluster configurations for these jobs in order to find the optimal and procure Azure reservations. Furtemore, ...

Data Engineering

azure

cluster

eventhub

streaming

676 Views
0 replies
0 kudos

06-27-2023 12:17:27 AM

by cloudmonks31 • New Contributor II

06-23-2023 10:52:51 AM

667 Views
1 replies
1 kudos

how to implement incremental loading without last modified date and primary key

Data Engineering

667 Views
1 replies
1 kudos

06-23-2023 10:52:51 AM

View Replies

Latest Reply

Kaniz
Community Manager

06-26-2023 1:48:42 PM

1 kudos

Hi @cloudmonks31 , Would you share a bit more about your question?

1 kudos

06-26-2023 1:48:42 PM

by JKR • New Contributor III

06-23-2023 4:29:39 PM

366 Views
0 replies
1 kudos

Missed Associate Developer for Apache Spark 3.0 - Python Due to Power outage

Dear Databricks Certification Team,Unfortunately, I was unable to take the exam as scheduled due to an unforeseen power breakdown in my area. The power outage occurred just before the exam, rendering me unable to access the necessary resources to com...

Data Engineering

366 Views
0 replies
1 kudos

06-23-2023 4:29:39 PM

by fijoy • Contributor

06-21-2023 8:40:14 AM

1798 Views
4 replies
0 kudos

Resolved! If 2 users run the same notebook on the same cluster, will they share the same Spark session?

Databricks docs here:https://docs.databricks.com/notebooks/notebook-isolation.htmlstate that "Every notebook attached to a cluster has a pre-defined variable named spark that represents a SparkSession." What if 2 users run the same notebook on the sa...

Data Engineering

1798 Views
4 replies
0 kudos

06-21-2023 8:40:14 AM

View Replies

Latest Reply

Lakshay
Esteemed Contributor

06-22-2023 12:19:13 PM

0 kudos

The spark session is isolated at the notebook level and is not isolated at the user level. So, two users accessing the same notebook will be using the same spark session

0 kudos

06-22-2023 12:19:13 PM

3 More Replies

by Arby • New Contributor II

06-22-2023 2:26:13 PM

5265 Views
3 replies
0 kudos

Help With OSError: [Errno 95] Operation not supported: '/Workspace/Repos/Connectors....

Hello,I am experiencing issues with importing from utils repo the schema file I created.this is the logic we use for all ingestion and all other schemas live in this repo utills/schemasI am unable to access the file I created for a new ingestion pipe...

Data Engineering

5265 Views
3 replies
0 kudos

06-22-2023 2:26:13 PM

View Replies

Latest Reply

Arby
New Contributor II

06-23-2023 9:45:30 AM

0 kudos

@Debayan Mukherjee Hello, thank you for your response. please let me know if these are the correct commands to access the file from notebookI can see the files in the repo folderbut I just noticed this. the file I am trying to access the size is 0 b...

0 kudos

06-23-2023 9:45:30 AM

2 More Replies

by Binesh • New Contributor II

05-16-2023 10:08:05 AM

1432 Views
2 replies
0 kudos

Databricks Logs some error messages while trying to read data using databricks-jdbc dependency

I have tried to read data from Databricks using the following java code.String TOKEN = "token..."; String url = "url..."; Properties properties = new Properties(); properties.setProperty("user", "token"); properties.setProperty("PWD", TOKEN); Con...

Data Engineering

1432 Views
2 replies
0 kudos

05-16-2023 10:08:05 AM

View Replies

Latest Reply

shan_chandra
Honored Contributor III

06-15-2023 10:56:36 AM

0 kudos

@Binesh J - The issue could be due to the data type of the column is not compatible with getString() method in line#17. use getObject() method to retrieve the value as a generic value and then convert to string.

0 kudos

06-15-2023 10:56:36 AM

1 More Replies

by Skv • New Contributor II

05-11-2023 12:07:04 AM

1623 Views
2 replies
1 kudos

Resolved! Snowflake query with time travel not working from Databricks while reading into Dataframe.

I am trying to read the changes data from snowflake query into the dataframe using Databricks.Same query is working in snowflake but not in Databricks. Both sides timezones and format are same for the timestamp. I am trying to implement changetrackin...

Data Engineering

1623 Views
2 replies
1 kudos

05-11-2023 12:07:04 AM

View Replies

Latest Reply

sher
Valued Contributor II

06-23-2023 8:18:29 AM

1 kudos

you are format is wrong that's why you got an errortry thisSELECT * FROM TestTable CHANGES(INFORMATION => DEFAULT) AT(TIMESTAMP => TO_TIMESTAMP_TZ('2023-05-03 00:43:34.885','YYYY-MM-DD HH24:MI:SS.FF'))

1 kudos

06-23-2023 8:18:29 AM

1 More Replies

by dbx_8451 • New Contributor II

06-22-2023 7:49:14 PM

1851 Views
3 replies
0 kudos

How to set the permissions to databricks jobs that created and run from Azure Data Factory(ADF)?

I would like to set the permissions to jobs such as granting "CAN_VIEW" or "CAN_MANAGE" to specific groups that run from ADF. It appears that we need to set permissions in pipe line where job runs from ADF, But I could not figure it out.

Data Engineering

1851 Views
3 replies
0 kudos

06-22-2023 7:49:14 PM

View Replies

Latest Reply

dbx_8451
New Contributor II

06-23-2023 5:49:15 AM

0 kudos

Thank you @Debayan Mukherjee and @Vidula Khanna for getting back to me. But, it didn't help my case. I am specifically looking for setting permissions to the job so that our team can see the job cluster including Spark UI with that privilege. ...

0 kudos

06-23-2023 5:49:15 AM

2 More Replies

by dave_hiltbrand • New Contributor II

06-22-2023 7:47:26 PM

1073 Views
3 replies
0 kudos

I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime.

I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime. I open the Spark UI for the cluster and checkout the executors and don't see any tasks for my worker nodes. How ca...

Data Engineering

1073 Views
3 replies
0 kudos

06-22-2023 7:47:26 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-23-2023 12:18:56 AM

0 kudos

Hi @Dave Hiltbrand Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

0 kudos

06-23-2023 12:18:56 AM

2 More Replies

User

Count

1602

736

343

284

247

Databricks

Forum Posts

Could not launch jobs due to node_type_id (instance) unavailability

Instance type in Photon

Support for multiple EC2 instance types in a worker pool

Ingesting more than 7 million rows into a SQL Server Table

Databricks Academy content for Azure Databricks Customers

How to restart the Spark session within the notebook without reattaching the notebook?

Optimal Azure VM type for EventHub streaming

how to implement incremental loading without last modified date and primary key

Missed Associate Developer for Apache Spark 3.0 - Python Due to Power outage

Resolved! If 2 users run the same notebook on the same cluster, will they share the same Spark session?

Help With OSError: [Errno 95] Operation not supported: '/Workspace/Repos/Connectors....

Databricks Logs some error messages while trying to read data using databricks-jdbc dependency

Resolved! Snowflake query with time travel not working from Databricks while reading into Dataframe.

How to set the permissions to databricks jobs that created and run from Azure Data Factory(ADF)?

I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime.

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...