cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ivanychev
by Contributor
  • 3494 Views
  • 7 replies
  • 5 kudos

DBR 12.2: DeltaOptimizedWriter: Resolved attribute(s) missing from in operator

After upgrading from DBR 11.3 LTS to DBR 12.2 LTS we started to observe the following error during "read from parquet and write to delta" piece of logic.AnalysisException: Resolved attribute(s) group_id#72,display_name#73,parent_id#74,path#75,path_li...

  • 3494 Views
  • 7 replies
  • 5 kudos
Latest Reply
Valtor
New Contributor II
  • 5 kudos

I can confirm that this issue is resolved for us as well in the latest 12.2 release.

  • 5 kudos
6 More Replies
playermanny2
by New Contributor II
  • 1140 Views
  • 2 replies
  • 1 kudos

Reading data in Azure Databricks Delta Lake from AWS Redshift

We have Databricks set up and running on Azure. Now we want to connect it with Redshift (AWS) to perform further downstream analysis for our redshift users.I could find the documentation on how to do it within the same cloud (Either AWS or Azure) but...

  • 1140 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Manny Cato​ :To allow Redshift to read data from Delta Lake hosted on Azure, you can use AWS Glue Data Catalog as an intermediary. The Glue Data Catalog is a fully managed metadata catalog that integrates with a variety of data sources, including De...

  • 1 kudos
1 More Replies
405041
by New Contributor II
  • 1079 Views
  • 2 replies
  • 0 kudos

Securing the Account Owner

Hey,As I understand, you cannot enable SSO and MFA for the Account Owner.Is there any way on the Databricks side to secure the Account Owner beyond username/password? Is there a lockout that is set up automatically for this user?What are the best pra...

  • 1079 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Domonkos Rozsa​ :You are correct that Databricks does not support SSO and MFA for the Account Owner. However, there are several built-in mechanisms that can help secure the Account Owner account and protect it from unauthorized access:Password polic...

  • 0 kudos
1 More Replies
source2sea
by Contributor
  • 2356 Views
  • 1 replies
  • 0 kudos

Resolved! what mode is the deploy-mode when calling spark in databricks/

https://spark.apache.org/docs/latest/submitting-applications.htmlmainly want to know if extra class path could be used or not when i submit a job

  • 2356 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@min shi​ :In Databricks, when you run a job, you are submitting a Spark application to run in the cluster. The deploy-mode that is used by default depends on the type of job you are running:For interactive clusters, the deploy-mode is client. This m...

  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 790 Views
  • 2 replies
  • 8 kudos

Databricks has added new metrics to its control panel, replacing the outdated Ganglia tool. These new metrics allow users to monitor the following clu...

Databricks has added new metrics to its control panel, replacing the outdated Ganglia tool. These new metrics allow users to monitor the following cluster performance metrics easily:- CPU utilization- Memory usage- Free filesystem space- Network traf...

Screenshot 2023-04-13 154026
  • 790 Views
  • 2 replies
  • 8 kudos
Latest Reply
jose_gonzalez
Moderator
  • 8 kudos

Thank you for sharing @Hubert Dudek​ !!!

  • 8 kudos
1 More Replies
Erik_L
by Contributor II
  • 5455 Views
  • 2 replies
  • 2 kudos

Joining a big amount of data causes "Out of disk space error", how to ingest?

What I am trying to dodf = None   # For all of the IDs that are valid for id in ids: # Get the parts of the data from different sources df_1 = spark.read.parquet(url_for_id) df_2 = spark.read.parquet(url_for_id) ...   # Join together the pa...

  • 5455 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Erik Louie​ :There are several strategies that you can use to handle large joins like this in Spark:Use a broadcast join: If one of your dataframes is relatively small (less than 10-20 GB), you can use a broadcast join to avoid shuffling data. A bro...

  • 2 kudos
1 More Replies
Khalil
by Contributor
  • 4629 Views
  • 6 replies
  • 5 kudos

Resolved! Pivot a DataFrame in Delta Live Table DLT

I wanna apply a pivot on a dataframe in DLT but I'm having the following warningNotebook:XXXX used `GroupedData.pivot` function that will be deprecated soon. Please fix the notebook.I have the same warning if I use the the function collect.Is it risk...

  • 4629 Views
  • 6 replies
  • 5 kudos
Latest Reply
Khalil
Contributor
  • 5 kudos

Thanks @Kaniz Fatma​  for your support.The solution was to do the pivot outside of views or tables and the warning disappeared.

  • 5 kudos
5 More Replies
Tico23
by Contributor
  • 10023 Views
  • 12 replies
  • 10 kudos

Connecting SQL Server (on-premise) to Databricks via jdbc:sqlserver

Is it possible to connect to SQL Server on-premise (Not Azure) from Databricks?I tried to ping my virtualbox VM (with Windows Server 2022) from within Databricks and the request timed out.%sh   ping 122.138.0.14This is what my connection might look l...

  • 10023 Views
  • 12 replies
  • 10 kudos
Latest Reply
DBXC
Contributor
  • 10 kudos

You need to setup the VNet and wire up the connection between Databricks and on-prem via VPN or ExpressRoute

  • 10 kudos
11 More Replies
moski
by New Contributor II
  • 1087 Views
  • 3 replies
  • 1 kudos

How to import a data table from SQLQuery2 into Databricks notebook

Can anyone show me a few commands to import a table, say "mytable2 From: Microsoft SQL Server Into: Databricks Notebook using spark dataframe or at least pandas dataframeCheers!

  • 1087 Views
  • 3 replies
  • 1 kudos
Latest Reply
irfanaziz
Contributor II
  • 1 kudos

You can read any table from MSSQL. You would need to authenticate to the db, so your would need the connection string:def dbProps(): return { "user" : "db-user", "password" : "your password", "driver" : "com.microsoft.sqlserver.jdbc.SQLServerD...

  • 1 kudos
2 More Replies
Data_Analytics_
by New Contributor II
  • 6477 Views
  • 4 replies
  • 3 kudos

Resolved! Connect SQL server using windows authentication

How do I connect to a on-premise SQL server using window authentication from a databricks notebook

  • 6477 Views
  • 4 replies
  • 3 kudos
Latest Reply
User16829050420
New Contributor III
  • 3 kudos

We should have network setup from databricks Vnet to the on-prem SQL server. Then the connection from the databricks notebook using JDBC using Windows authenticated username/password - https://docs.microsoft.com/en-us/azure/databricks/data/data-sourc...

  • 3 kudos
3 More Replies
gillzer84
by New Contributor
  • 2872 Views
  • 3 replies
  • 2 kudos

An example how to connect to SQL Server data using windows authentication

We use SQL Server to store data. I would like to connect to SQL to pull manipulate and sometimes push data back. I've seen some examples online of connecting but I cannot successfully re-create.

  • 2872 Views
  • 3 replies
  • 2 kudos
Latest Reply
Junee
New Contributor III
  • 2 kudos

You can use jTDS library from maven, add this to your cluster. Once installed, you can write the below code to connect to your Database.Code in Scala will be:import java.util.Properties   val driverClass = "net.sourceforge.jtds.jdbc.Driver" val serve...

  • 2 kudos
2 More Replies
chandra_ym
by New Contributor II
  • 2655 Views
  • 7 replies
  • 2 kudos

Resolved! recommended course ?

hello, I am new here. Any recommended courses for Databricks Certified Associate Developer for Apache Spark 3.0 - Python ? Thank you

  • 2655 Views
  • 7 replies
  • 2 kudos
Latest Reply
fabio2352
Contributor
  • 2 kudos

Hi, this post have a practice exams:https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DCADAS3-Python.pdf?_gl=1*1kqf0to*_gcl_aw*R0NMLjE2ODI0NDkyOTcuRUFJYUlRb2JDaE1JNWFTZ2d0ekZfZ0lWSkc1dkJCMVQ2UTJNRUFBWUFpQUFFZ0pOc3ZEX0J3RQ.

  • 2 kudos
6 More Replies
uzairm
by New Contributor III
  • 3235 Views
  • 12 replies
  • 3 kudos

Resolved! Concurrent Jobs - The spark driver has stopped unexpectedly!

Hi, I am running concurrent notebooks in concurrent workflow jobs in job compute cluster c5a.8xlarge with 5-7 worker nodes. Each job has 100 concurrent child notebooks and there are 10 job instances. 8/10 jobs gives the error the spark driver has sto...

  • 3235 Views
  • 12 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @uzair mustafa​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

  • 3 kudos
11 More Replies
564824
by New Contributor II
  • 791 Views
  • 2 replies
  • 0 kudos

Job webhook alerts are not sending authorization headers

Hi, I have set up a webhook which will send the event to a lambda in AWS. I validate the event through the credentials given while creating the webhook but sometimes the event that is being sent from databricks does not contain authorization in the h...

  • 791 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Muthu Kumaran​ :If the event being sent from Databricks to your Lambda function sometimes does not contain authorization headers, you may need to modify your webhook configuration or Lambda function code to handle this situation. Here are a few sugg...

  • 0 kudos
1 More Replies
qwerty1
by Contributor
  • 1586 Views
  • 3 replies
  • 1 kudos

Is there a way to register a scala function that is available to other notebooks?

I am in a situation where I have a notebook that runs in a pipeline that creates a "live streaming table". So, I cannot use a language other than sql in the pipeline. I would like to format a certain column in the pipeline using a scala code (it's a ...

  • 1586 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

no, DLT does not work with Scala unfortunately.Delta Live Tables are not vanilla spark.Is python an option instead of scala?

  • 1 kudos
2 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels