Hey,we have an issue in that we can access the SQL files whenever the notebook is in the repo path but whenever the CICD pipeline imports the repo notebooks and SQL files to the shared workspace, we can list the SQL files but can not read them.we cha...
@Nermin Yehia​ yes, as you are moving files to different location manually , just update as can manage permissions in target and that should take care of everything
Dear experts, Might I know what will happen to the delta live table pipeline which is in a cancelled state, when there is a runtime service upgrade? Thanks!
@KS LAU​ :When a runtime service upgrade occurs in Databricks, any running tasks or pipelines may be temporarily interrupted while the upgrade is being applied. In the case of a cancelled Delta Live Table pipeline, it will not be impacted by the upgr...
Hi,I would like to know if it is possible to get the target schema, programmatically, inside a DLT.In DLT pipeline settings, destination, target schema.I want to run more idempotent pipelines. For example, my target table has the fields: reference_da...
Hi,i stream data from postgis to s3 using debezium. postgis->debezium->s3->spark(databricks)once read it i decode it and i can see that the binary representation is similiar to what i have in postgis, on a wkb formated column.once i try to read it ei...
Hi @Amit Cahanovich​, The error message "Unknown WKB type 16" indicates that the WKB data you are trying to read has a geometry type that the library does not recognize. WKB type 16 is not valid in the Simple Feature Access (SFA) standard, the most w...
After upgrading from DBR 11.3 LTS to DBR 12.2 LTS we started to observe the following error during "read from parquet and write to delta" piece of logic.AnalysisException: Resolved attribute(s) group_id#72,display_name#73,parent_id#74,path#75,path_li...
We have Databricks set up and running on Azure. Now we want to connect it with Redshift (AWS) to perform further downstream analysis for our redshift users.I could find the documentation on how to do it within the same cloud (Either AWS or Azure) but...
@Manny Cato​ :To allow Redshift to read data from Delta Lake hosted on Azure, you can use AWS Glue Data Catalog as an intermediary. The Glue Data Catalog is a fully managed metadata catalog that integrates with a variety of data sources, including De...
Hey,As I understand, you cannot enable SSO and MFA for the Account Owner.Is there any way on the Databricks side to secure the Account Owner beyond username/password? Is there a lockout that is set up automatically for this user?What are the best pra...
@Domonkos Rozsa​ :You are correct that Databricks does not support SSO and MFA for the Account Owner. However, there are several built-in mechanisms that can help secure the Account Owner account and protect it from unauthorized access:Password polic...
@min shi​ :In Databricks, when you run a job, you are submitting a Spark application to run in the cluster. The deploy-mode that is used by default depends on the type of job you are running:For interactive clusters, the deploy-mode is client. This m...
Databricks has added new metrics to its control panel, replacing the outdated Ganglia tool. These new metrics allow users to monitor the following cluster performance metrics easily:- CPU utilization- Memory usage- Free filesystem space- Network traf...
ERROR RetryingHMSHandler: NoSuchObjectException(message:There is no database named global_temp)should one create it in the work space manually via UI? and how?would it get overwritten if work space is created via terraform?I use 10.4 LTS runtime.
What I am trying to dodf = None
# For all of the IDs that are valid
for id in ids:
# Get the parts of the data from different sources
df_1 = spark.read.parquet(url_for_id)
df_2 = spark.read.parquet(url_for_id)
...
# Join together the pa...
@Erik Louie​ :There are several strategies that you can use to handle large joins like this in Spark:Use a broadcast join: If one of your dataframes is relatively small (less than 10-20 GB), you can use a broadcast join to avoid shuffling data. A bro...
I wanna apply a pivot on a dataframe in DLT but I'm having the following warningNotebook:XXXX used `GroupedData.pivot` function that will be deprecated soon. Please fix the notebook.I have the same warning if I use the the function collect.Is it risk...
Is it possible to connect to SQL Server on-premise (Not Azure) from Databricks?I tried to ping my virtualbox VM (with Windows Server 2022) from within Databricks and the request timed out.%sh
ping 122.138.0.14This is what my connection might look l...
Can anyone show me a few commands to import a table, say "mytable2 From: Microsoft SQL Server Into: Databricks Notebook using spark dataframe or at least pandas dataframeCheers!
You can read any table from MSSQL. You would need to authenticate to the db, so your would need the connection string:def dbProps():
return {
"user" : "db-user",
"password" : "your password",
"driver" : "com.microsoft.sqlserver.jdbc.SQLServerD...