cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

JH
by New Contributor II
  • 3904 Views
  • 5 replies
  • 1 kudos

Does thrift only exist in databrick control plane?

Hi all, I'm a user of Azure databricks. We recently found there is a thrift vulnerability issue (CVE-2020-13949) in Spark Hive. We have tried to fix it at our side. We also found there is a open issue at Spark jira board - https://issues.apache.org/j...

  • 3904 Views
  • 5 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Jimin Hsieh​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

  • 1 kudos
4 More Replies
Kearon
by New Contributor III
  • 8103 Views
  • 11 replies
  • 0 kudos

Process batches in a streaming pipeline - identifying deletes

OK. So I think I'm probably missing the obvious and tying myself in knots here.Here is the scenario:batch datasets arrive in json format in an Azure data lakeeach batch is a complete set of "current" records (the complete table)these are processed us...

  • 8103 Views
  • 11 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Kearon McNicol​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

  • 0 kudos
10 More Replies
nyehia
by Contributor
  • 16047 Views
  • 19 replies
  • 1 kudos

Can not access SQL files in the Shared workspace

Hey,we have an issue in that we can access the SQL files whenever the notebook is in the repo path but whenever the CICD pipeline imports the repo notebooks and SQL files to the shared workspace, we can list the SQL files but can not read them.we cha...

  • 16047 Views
  • 19 replies
  • 1 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 1 kudos

@Nermin Yehia​ yes, as you are moving files to different location manually , just update as can manage permissions in target and that should take care of everything

  • 1 kudos
18 More Replies
kinsun
by New Contributor II
  • 2661 Views
  • 3 replies
  • 0 kudos

Resolved! Delta Live Table Service Upgrade

Dear experts, Might I know what will happen to the delta live table pipeline which is in a cancelled state, when there is a runtime service upgrade? Thanks!

  • 2661 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@KS LAU​ :When a runtime service upgrade occurs in Databricks, any running tasks or pipelines may be temporarily interrupted while the upgrade is being applied. In the case of a cancelled Delta Live Table pipeline, it will not be impacted by the upgr...

  • 0 kudos
2 More Replies
GuMart
by New Contributor III
  • 5669 Views
  • 4 replies
  • 2 kudos

Resolved! DLT target schema - get value during run time

Hi,I would like to know if it is possible to get the target schema, programmatically, inside a DLT.In DLT pipeline settings, destination, target schema.I want to run more idempotent pipelines. For example, my target table has the fields: reference_da...

  • 5669 Views
  • 4 replies
  • 2 kudos
Latest Reply
GuMart
New Contributor III
  • 2 kudos

Thank you @Suteja Kanuri​ ,Looks like you solution is working, thank you.Regards,

  • 2 kudos
3 More Replies
amitca71
by Contributor II
  • 3527 Views
  • 1 replies
  • 1 kudos

Resolved! sedona/shapely error Unknown WKB type 16

Hi,i stream data from postgis to s3 using debezium. postgis->debezium->s3->spark(databricks)once read it i decode it and i can see that the binary representation is similiar to what i have in postgis, on a wkb formated column.once i try to read it ei...

  • 3527 Views
  • 1 replies
  • 1 kudos
ivanychev
by Contributor II
  • 6621 Views
  • 7 replies
  • 5 kudos

DBR 12.2: DeltaOptimizedWriter: Resolved attribute(s) missing from in operator

After upgrading from DBR 11.3 LTS to DBR 12.2 LTS we started to observe the following error during "read from parquet and write to delta" piece of logic.AnalysisException: Resolved attribute(s) group_id#72,display_name#73,parent_id#74,path#75,path_li...

  • 6621 Views
  • 7 replies
  • 5 kudos
Latest Reply
Valtor
New Contributor II
  • 5 kudos

I can confirm that this issue is resolved for us as well in the latest 12.2 release.

  • 5 kudos
6 More Replies
playermanny2
by New Contributor II
  • 2409 Views
  • 2 replies
  • 1 kudos

Reading data in Azure Databricks Delta Lake from AWS Redshift

We have Databricks set up and running on Azure. Now we want to connect it with Redshift (AWS) to perform further downstream analysis for our redshift users.I could find the documentation on how to do it within the same cloud (Either AWS or Azure) but...

  • 2409 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Manny Cato​ :To allow Redshift to read data from Delta Lake hosted on Azure, you can use AWS Glue Data Catalog as an intermediary. The Glue Data Catalog is a fully managed metadata catalog that integrates with a variety of data sources, including De...

  • 1 kudos
1 More Replies
405041
by New Contributor II
  • 1934 Views
  • 2 replies
  • 0 kudos

Securing the Account Owner

Hey,As I understand, you cannot enable SSO and MFA for the Account Owner.Is there any way on the Databricks side to secure the Account Owner beyond username/password? Is there a lockout that is set up automatically for this user?What are the best pra...

  • 1934 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Domonkos Rozsa​ :You are correct that Databricks does not support SSO and MFA for the Account Owner. However, there are several built-in mechanisms that can help secure the Account Owner account and protect it from unauthorized access:Password polic...

  • 0 kudos
1 More Replies
source2sea
by Contributor
  • 4668 Views
  • 1 replies
  • 0 kudos

Resolved! what mode is the deploy-mode when calling spark in databricks/

https://spark.apache.org/docs/latest/submitting-applications.htmlmainly want to know if extra class path could be used or not when i submit a job

  • 4668 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@min shi​ :In Databricks, when you run a job, you are submitting a Spark application to run in the cluster. The deploy-mode that is used by default depends on the type of job you are running:For interactive clusters, the deploy-mode is client. This m...

  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 1697 Views
  • 2 replies
  • 8 kudos

Databricks has added new metrics to its control panel, replacing the outdated Ganglia tool. These new metrics allow users to monitor the following clu...

Databricks has added new metrics to its control panel, replacing the outdated Ganglia tool. These new metrics allow users to monitor the following cluster performance metrics easily:- CPU utilization- Memory usage- Free filesystem space- Network traf...

Screenshot 2023-04-13 154026
  • 1697 Views
  • 2 replies
  • 8 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 8 kudos

Thank you for sharing @Hubert Dudek​ !!!

  • 8 kudos
1 More Replies
Erik_L
by Contributor II
  • 9477 Views
  • 2 replies
  • 2 kudos

Joining a big amount of data causes "Out of disk space error", how to ingest?

What I am trying to dodf = None   # For all of the IDs that are valid for id in ids: # Get the parts of the data from different sources df_1 = spark.read.parquet(url_for_id) df_2 = spark.read.parquet(url_for_id) ...   # Join together the pa...

  • 9477 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Erik Louie​ :There are several strategies that you can use to handle large joins like this in Spark:Use a broadcast join: If one of your dataframes is relatively small (less than 10-20 GB), you can use a broadcast join to avoid shuffling data. A bro...

  • 2 kudos
1 More Replies
Khalil
by Contributor
  • 11470 Views
  • 4 replies
  • 4 kudos

Resolved! Pivot a DataFrame in Delta Live Table DLT

I wanna apply a pivot on a dataframe in DLT but I'm having the following warningNotebook:XXXX used `GroupedData.pivot` function that will be deprecated soon. Please fix the notebook.I have the same warning if I use the the function collect.Is it risk...

  • 11470 Views
  • 4 replies
  • 4 kudos
Latest Reply
Khalil
Contributor
  • 4 kudos

Thanks @Kaniz Fatma​  for your support.The solution was to do the pivot outside of views or tables and the warning disappeared.

  • 4 kudos
3 More Replies
moski
by New Contributor II
  • 2417 Views
  • 3 replies
  • 1 kudos

How to import a data table from SQLQuery2 into Databricks notebook

Can anyone show me a few commands to import a table, say "mytable2 From: Microsoft SQL Server Into: Databricks Notebook using spark dataframe or at least pandas dataframeCheers!

  • 2417 Views
  • 3 replies
  • 1 kudos
Latest Reply
irfanaziz
Contributor II
  • 1 kudos

You can read any table from MSSQL. You would need to authenticate to the db, so your would need the connection string:def dbProps(): return { "user" : "db-user", "password" : "your password", "driver" : "com.microsoft.sqlserver.jdbc.SQLServerD...

  • 1 kudos
2 More Replies
Data_Analytics_
by New Contributor II
  • 10836 Views
  • 3 replies
  • 3 kudos

Resolved! Connect SQL server using windows authentication

How do I connect to a on-premise SQL server using window authentication from a databricks notebook

  • 10836 Views
  • 3 replies
  • 3 kudos
Latest Reply
User16829050420
Databricks Employee
  • 3 kudos

We should have network setup from databricks Vnet to the on-prem SQL server. Then the connection from the databricks notebook using JDBC using Windows authenticated username/password - https://docs.microsoft.com/en-us/azure/databricks/data/data-sourc...

  • 3 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels