Data Engineering

Forum Posts

Sorted by:

by nyehia • Contributor

04-26-2023 7:35:36 AM

4354 Views
19 replies
1 kudos

Can not access SQL files in the Shared workspace

Hey,we have an issue in that we can access the SQL files whenever the notebook is in the repo path but whenever the CICD pipeline imports the repo notebooks and SQL files to the shared workspace, we can list the SQL files but can not read them.we cha...

Data Engineering

4354 Views
19 replies
1 kudos

04-26-2023 7:35:36 AM

View Replies

Latest Reply

karthik_p
Esteemed Contributor

04-27-2023 10:29:09 AM

1 kudos

@Nermin Yehia yes, as you are moving files to different location manually , just update as can manage permissions in target and that should take care of everything

1 kudos

04-27-2023 10:29:09 AM

18 More Replies

by kinsun • New Contributor II

04-16-2023 6:50:38 PM

906 Views
3 replies
0 kudos

Resolved! Delta Live Table Service Upgrade

Dear experts, Might I know what will happen to the delta live table pipeline which is in a cancelled state, when there is a runtime service upgrade? Thanks!

Data Engineering

906 Views
3 replies
0 kudos

04-16-2023 6:50:38 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-20-2023 7:21:15 PM

0 kudos

@KS LAU :When a runtime service upgrade occurs in Databricks, any running tasks or pipelines may be temporarily interrupted while the upgrade is being applied. In the case of a cancelled Delta Live Table pipeline, it will not be impacted by the upgr...

0 kudos

04-20-2023 7:21:15 PM

2 More Replies

by GuMart • New Contributor III

04-18-2023 6:34:48 AM

2481 Views
5 replies
2 kudos

Resolved! DLT target schema - get value during run time

Hi,I would like to know if it is possible to get the target schema, programmatically, inside a DLT.In DLT pipeline settings, destination, target schema.I want to run more idempotent pipelines. For example, my target table has the fields: reference_da...

Data Engineering

2481 Views
5 replies
2 kudos

04-18-2023 6:34:48 AM

View Replies

Latest Reply

GuMart
New Contributor III

04-27-2023 10:10:59 AM

2 kudos

Thank you @Suteja Kanuri ,Looks like you solution is working, thank you.Regards,

2 kudos

04-27-2023 10:10:59 AM

4 More Replies

by amitca71 • Contributor II

04-23-2023 12:21:47 PM

1424 Views
2 replies
2 kudos

Resolved! sedona/shapely error Unknown WKB type 16

Hi,i stream data from postgis to s3 using debezium. postgis->debezium->s3->spark(databricks)once read it i decode it and i can see that the binary representation is similiar to what i have in postgis, on a wkb formated column.once i try to read it ei...

Data Engineering

1424 Views
2 replies
2 kudos

04-23-2023 12:21:47 PM

View Replies

Latest Reply

Kaniz
Community Manager

04-24-2023 12:08:22 PM

2 kudos

Hi @Amit Cahanovich, The error message "Unknown WKB type 16" indicates that the WKB data you are trying to read has a geometry type that the library does not recognize. WKB type 16 is not valid in the Simple Feature Access (SFA) standard, the most w...

2 kudos

04-24-2023 12:08:22 PM

1 More Replies

by davben93 • New Contributor II

04-25-2023 5:04:53 PM

1177 Views
3 replies
1 kudos

Does Spark Connect is available in JAVA?

Data Engineering

1177 Views
3 replies
1 kudos

04-25-2023 5:04:53 PM

View Replies

Latest Reply

shan_chandra
Honored Contributor III

04-27-2023 8:50:30 AM

1 kudos

@Kaniz Fatma - can you please assist with this?

1 kudos

04-27-2023 8:50:30 AM

2 More Replies

by ivanychev • Contributor

03-07-2023 11:23:40 AM

2932 Views
7 replies
5 kudos

DBR 12.2: DeltaOptimizedWriter: Resolved attribute(s) missing from in operator

After upgrading from DBR 11.3 LTS to DBR 12.2 LTS we started to observe the following error during "read from parquet and write to delta" piece of logic.AnalysisException: Resolved attribute(s) group_id#72,display_name#73,parent_id#74,path#75,path_li...

Data Engineering

2932 Views
7 replies
5 kudos

03-07-2023 11:23:40 AM

View Replies

Latest Reply

Valtor
New Contributor II

04-27-2023 8:08:16 AM

5 kudos

I can confirm that this issue is resolved for us as well in the latest 12.2 release.

5 kudos

04-27-2023 8:08:16 AM

6 More Replies

by playermanny2 • New Contributor II

04-25-2023 12:51:23 PM

923 Views
2 replies
1 kudos

Reading data in Azure Databricks Delta Lake from AWS Redshift

We have Databricks set up and running on Azure. Now we want to connect it with Redshift (AWS) to perform further downstream analysis for our redshift users.I could find the documentation on how to do it within the same cloud (Either AWS or Azure) but...

Data Engineering

923 Views
2 replies
1 kudos

04-25-2023 12:51:23 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-26-2023 9:52:21 PM

1 kudos

@Manny Cato :To allow Redshift to read data from Delta Lake hosted on Azure, you can use AWS Glue Data Catalog as an intermediary. The Glue Data Catalog is a fully managed metadata catalog that integrates with a variety of data sources, including De...

1 kudos

04-26-2023 9:52:21 PM

1 More Replies

by 405041 • New Contributor II

04-26-2023 2:28:08 AM

868 Views
2 replies
0 kudos

Securing the Account Owner

Hey,As I understand, you cannot enable SSO and MFA for the Account Owner.Is there any way on the Databricks side to secure the Account Owner beyond username/password? Is there a lockout that is set up automatically for this user?What are the best pra...

Data Engineering

868 Views
2 replies
0 kudos

04-26-2023 2:28:08 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-26-2023 9:50:25 PM

0 kudos

@Domonkos Rozsa :You are correct that Databricks does not support SSO and MFA for the Account Owner. However, there are several built-in mechanisms that can help secure the Account Owner account and protect it from unauthorized access:Password polic...

0 kudos

04-26-2023 9:50:25 PM

1 More Replies

by source2sea • Contributor

04-25-2023 7:34:12 AM

1863 Views
1 replies
0 kudos

Resolved! what mode is the deploy-mode when calling spark in databricks/

https://spark.apache.org/docs/latest/submitting-applications.htmlmainly want to know if extra class path could be used or not when i submit a job

Data Engineering

1863 Views
1 replies
0 kudos

04-25-2023 7:34:12 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-26-2023 10:05:18 PM

0 kudos

@min shi :In Databricks, when you run a job, you are submitting a Spark application to run in the cluster. The deploy-mode that is used by default depends on the type of job you are running:For interactive clusters, the deploy-mode is client. This m...

0 kudos

04-26-2023 10:05:18 PM

by Hubert-Dudek • Esteemed Contributor III

04-13-2023 6:48:37 AM

579 Views
2 replies
8 kudos

Databricks has added new metrics to its control panel, replacing the outdated Ganglia tool. These new metrics allow users to monitor the following clu...

Databricks has added new metrics to its control panel, replacing the outdated Ganglia tool. These new metrics allow users to monitor the following cluster performance metrics easily:- CPU utilization- Memory usage- Free filesystem space- Network traf...

Data Engineering

579 Views
2 replies
8 kudos

04-13-2023 6:48:37 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

04-26-2023 2:42:32 PM

8 kudos

Thank you for sharing @Hubert Dudek !!!

8 kudos

04-26-2023 2:42:32 PM

1 More Replies

by source2sea • Contributor

04-26-2023 4:29:24 AM

1677 Views
1 replies
0 kudos

Resolved! ERROR RetryingHMSHandler: NoSuchObjectException(message:There is no database named global_temp)

ERROR RetryingHMSHandler: NoSuchObjectException(message:There is no database named global_temp)should one create it in the work space manually via UI? and how?would it get overwritten if work space is created via terraform?I use 10.4 LTS runtime.

Data Engineering

1677 Views
1 replies
0 kudos

04-26-2023 4:29:24 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

04-26-2023 2:23:09 PM

0 kudos

"global_temp" is a special database used for global temp tables that are shared across spark sessions. This error is harmless. You can ignore it.

0 kudos

04-26-2023 2:23:09 PM

by Erik_L • Contributor II

04-21-2023 10:46:09 AM

4611 Views
2 replies
2 kudos

Joining a big amount of data causes "Out of disk space error", how to ingest?

What I am trying to dodf = None # For all of the IDs that are valid for id in ids: # Get the parts of the data from different sources df_1 = spark.read.parquet(url_for_id) df_2 = spark.read.parquet(url_for_id) ... # Join together the pa...

Data Engineering

4611 Views
2 replies
2 kudos

04-21-2023 10:46:09 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-25-2023 10:16:09 PM

2 kudos

@Erik Louie :There are several strategies that you can use to handle large joins like this in Spark:Use a broadcast join: If one of your dataframes is relatively small (less than 10-20 GB), you can use a broadcast join to avoid shuffling data. A bro...

2 kudos

04-25-2023 10:16:09 PM

1 More Replies

by Khalil • Contributor

04-19-2023 10:38:54 AM

3267 Views
6 replies
5 kudos

Resolved! Pivot a DataFrame in Delta Live Table DLT

I wanna apply a pivot on a dataframe in DLT but I'm having the following warningNotebook:XXXX used `GroupedData.pivot` function that will be deprecated soon. Please fix the notebook.I have the same warning if I use the the function collect.Is it risk...

Data Engineering

3267 Views
6 replies
5 kudos

04-19-2023 10:38:54 AM

View Replies

Latest Reply

Khalil
Contributor

04-26-2023 10:09:52 AM

5 kudos

Thanks @Kaniz Fatma for your support.The solution was to do the pivot outside of views or tables and the warning disappeared.

5 kudos

04-26-2023 10:09:52 AM

5 More Replies

by Tico23 • Contributor

02-28-2023 12:07:22 PM

8605 Views
12 replies
10 kudos

Connecting SQL Server (on-premise) to Databricks via jdbc:sqlserver

Is it possible to connect to SQL Server on-premise (Not Azure) from Databricks?I tried to ping my virtualbox VM (with Windows Server 2022) from within Databricks and the request timed out.%sh ping 122.138.0.14This is what my connection might look l...

Data Engineering

8605 Views
12 replies
10 kudos

02-28-2023 12:07:22 PM

View Replies

Latest Reply

DBXC
Contributor

04-26-2023 7:49:10 AM

10 kudos

You need to setup the VNet and wire up the connection between Databricks and on-prem via VPN or ExpressRoute

10 kudos

04-26-2023 7:49:10 AM

11 More Replies

by moski • New Contributor II

12-22-2022 6:18:54 AM

831 Views
3 replies
1 kudos

How to import a data table from SQLQuery2 into Databricks notebook

Can anyone show me a few commands to import a table, say "mytable2 From: Microsoft SQL Server Into: Databricks Notebook using spark dataframe or at least pandas dataframeCheers!

Data Engineering

831 Views
3 replies
1 kudos

12-22-2022 6:18:54 AM

View Replies

Latest Reply

irfanaziz
Contributor II

12-22-2022 1:28:19 PM

1 kudos

You can read any table from MSSQL. You would need to authenticate to the db, so your would need the connection string:def dbProps(): return { "user" : "db-user", "password" : "your password", "driver" : "com.microsoft.sqlserver.jdbc.SQLServerD...

1 kudos

12-22-2022 1:28:19 PM

2 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Can not access SQL files in the Shared workspace

Resolved! Delta Live Table Service Upgrade

Resolved! DLT target schema - get value during run time

Resolved! sedona/shapely error Unknown WKB type 16

Does Spark Connect is available in JAVA?

DBR 12.2: DeltaOptimizedWriter: Resolved attribute(s) missing from in operator

Reading data in Azure Databricks Delta Lake from AWS Redshift

Securing the Account Owner

Resolved! what mode is the deploy-mode when calling spark in databricks/

Databricks has added new metrics to its control panel, replacing the outdated Ganglia tool. These new metrics allow users to monitor the following clu...

Resolved! ERROR RetryingHMSHandler: NoSuchObjectException(message:There is no database named global_temp)

Joining a big amount of data causes "Out of disk space error", how to ingest?

Resolved! Pivot a DataFrame in Delta Live Table DLT

Connecting SQL Server (on-premise) to Databricks via jdbc:sqlserver

How to import a data table from SQLQuery2 into Databricks notebook

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...