cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Erik_L
by Contributor II
  • 11040 Views
  • 2 replies
  • 2 kudos

Joining a big amount of data causes "Out of disk space error", how to ingest?

What I am trying to dodf = None   # For all of the IDs that are valid for id in ids: # Get the parts of the data from different sources df_1 = spark.read.parquet(url_for_id) df_2 = spark.read.parquet(url_for_id) ...   # Join together the pa...

  • 11040 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Erik Louie​ :There are several strategies that you can use to handle large joins like this in Spark:Use a broadcast join: If one of your dataframes is relatively small (less than 10-20 GB), you can use a broadcast join to avoid shuffling data. A bro...

  • 2 kudos
1 More Replies
Khalil
by Contributor
  • 12062 Views
  • 4 replies
  • 4 kudos

Resolved! Pivot a DataFrame in Delta Live Table DLT

I wanna apply a pivot on a dataframe in DLT but I'm having the following warningNotebook:XXXX used `GroupedData.pivot` function that will be deprecated soon. Please fix the notebook.I have the same warning if I use the the function collect.Is it risk...

  • 12062 Views
  • 4 replies
  • 4 kudos
Latest Reply
Khalil
Contributor
  • 4 kudos

Thanks @Kaniz Fatma​  for your support.The solution was to do the pivot outside of views or tables and the warning disappeared.

  • 4 kudos
3 More Replies
moski
by New Contributor II
  • 2839 Views
  • 3 replies
  • 1 kudos

How to import a data table from SQLQuery2 into Databricks notebook

Can anyone show me a few commands to import a table, say "mytable2 From: Microsoft SQL Server Into: Databricks Notebook using spark dataframe or at least pandas dataframeCheers!

  • 2839 Views
  • 3 replies
  • 1 kudos
Latest Reply
irfanaziz
Contributor II
  • 1 kudos

You can read any table from MSSQL. You would need to authenticate to the db, so your would need the connection string:def dbProps(): return { "user" : "db-user", "password" : "your password", "driver" : "com.microsoft.sqlserver.jdbc.SQLServerD...

  • 1 kudos
2 More Replies
Data_Analytics_
by New Contributor II
  • 12202 Views
  • 3 replies
  • 3 kudos

Resolved! Connect SQL server using windows authentication

How do I connect to a on-premise SQL server using window authentication from a databricks notebook

  • 12202 Views
  • 3 replies
  • 3 kudos
Latest Reply
User16829050420
Databricks Employee
  • 3 kudos

We should have network setup from databricks Vnet to the on-prem SQL server. Then the connection from the databricks notebook using JDBC using Windows authenticated username/password - https://docs.microsoft.com/en-us/azure/databricks/data/data-sourc...

  • 3 kudos
2 More Replies
chandra_ym
by New Contributor II
  • 19827 Views
  • 7 replies
  • 2 kudos

Resolved! recommended course ?

hello, I am new here. Any recommended courses for Databricks Certified Associate Developer for Apache Spark 3.0 - Python ? Thank you

  • 19827 Views
  • 7 replies
  • 2 kudos
Latest Reply
fabio2352
Contributor
  • 2 kudos

Hi, this post have a practice exams:https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DCADAS3-Python.pdf?_gl=1*1kqf0to*_gcl_aw*R0NMLjE2ODI0NDkyOTcuRUFJYUlRb2JDaE1JNWFTZ2d0ekZfZ0lWSkc1dkJCMVQ2UTJNRUFBWUFpQUFFZ0pOc3ZEX0J3RQ.

  • 2 kudos
6 More Replies
uzairm
by New Contributor III
  • 8981 Views
  • 12 replies
  • 3 kudos

Resolved! Concurrent Jobs - The spark driver has stopped unexpectedly!

Hi, I am running concurrent notebooks in concurrent workflow jobs in job compute cluster c5a.8xlarge with 5-7 worker nodes. Each job has 100 concurrent child notebooks and there are 10 job instances. 8/10 jobs gives the error the spark driver has sto...

  • 8981 Views
  • 12 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @uzair mustafa​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

  • 3 kudos
11 More Replies
564824
by New Contributor II
  • 2078 Views
  • 2 replies
  • 0 kudos

Job webhook alerts are not sending authorization headers

Hi, I have set up a webhook which will send the event to a lambda in AWS. I validate the event through the credentials given while creating the webhook but sometimes the event that is being sent from databricks does not contain authorization in the h...

  • 2078 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Muthu Kumaran​ :If the event being sent from Databricks to your Lambda function sometimes does not contain authorization headers, you may need to modify your webhook configuration or Lambda function code to handle this situation. Here are a few sugg...

  • 0 kudos
1 More Replies
qwerty1
by Contributor
  • 9172 Views
  • 3 replies
  • 1 kudos

Is there a way to register a scala function that is available to other notebooks?

I am in a situation where I have a notebook that runs in a pipeline that creates a "live streaming table". So, I cannot use a language other than sql in the pipeline. I would like to format a certain column in the pipeline using a scala code (it's a ...

  • 9172 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

no, DLT does not work with Scala unfortunately.Delta Live Tables are not vanilla spark.Is python an option instead of scala?

  • 1 kudos
2 More Replies
Sushma
by New Contributor
  • 1994 Views
  • 1 replies
  • 0 kudos

Databricks Lakehouse Fundamentals Certificate and Badge not received

I successfully passed the test after completing the course but I haven't received any certification or badge yet.Any Help is much appreciated. @Vidula Khanna​ 

  • 1994 Views
  • 1 replies
  • 0 kudos
Latest Reply
Vartika
Databricks Employee
  • 0 kudos

Hi @Sushma Rani​,Thank you for reaching out! Please submit a ticket to our Training Team here: https://help.databricks.com/s/contact-us?ReqType=training  and our team will get back to you shortly. 

  • 0 kudos
qwerty1
by Contributor
  • 12353 Views
  • 2 replies
  • 2 kudos

Resolved! Doing a a join within the same row in SQL

My data is a dump of JSON response from an API. The schema of the json iscol_name data_type   data array<struct<attributes:struct<name: String, age: Int relationships:struct<address:struct<data:arraay<struct<id: long, type: string>>>>>>>  ...

  • 12353 Views
  • 2 replies
  • 2 kudos
dalion
by New Contributor III
  • 5804 Views
  • 5 replies
  • 0 kudos

Azure Databricks - ADLS Gen 2.0 Access

Hi all, I have a Azure Databricks Setup (non-premium) and an ADLS Gen 2.0 setup. I am trying to access the ADLS Gen 2.0 containers via a simple access key mode for testing.There is no error, if the ADLS Gen 2.0 is set to "Enable from all networks". B...

  • 5804 Views
  • 5 replies
  • 0 kudos
Latest Reply
fabio2352
Contributor
  • 0 kudos

Hi, can you check two link belowhttps://learn.microsoft.com/en-us/azure/databricks/getting-started/connect-to-azure-storagehttps://docs.databricks.com/storage/azure-storage.html

  • 0 kudos
4 More Replies
sudhanshu1
by New Contributor III
  • 6229 Views
  • 7 replies
  • 0 kudos

Incremental Data copy from one SQL DB to another DB

Hi All,I have 20 tables in source sql DB and we need to create pipeline to incrementally load data into target database .Can some one please suggest me best approach to achieve this using Azure Databricks please?Should i use merge Into ? Copy Into? o...

  • 6229 Views
  • 7 replies
  • 0 kudos
Latest Reply
Vartika
Databricks Employee
  • 0 kudos

Hi @SUDHANSHU RAJ​,Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers...

  • 0 kudos
6 More Replies
ducng
by New Contributor II
  • 10733 Views
  • 1 replies
  • 0 kudos

VScode extension - certificate signature failure

Hi everyone,I'm trying to use the new Databricks extension (v0.3.10) for VS code (v1.77.3).I face this problem when connecting to our workspace:This problem persists when I tried to login through az CLI with our SSO, or through local config using PAT...

image.png
  • 10733 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Minh Duc Nguyen​ :It seems like the error you are facing is due to a failure in verifying the SSL certificate of your Databricks workspace. To resolve this, you need to add the custom CA certificate to your VS Code settings. Here's how you can do it...

  • 0 kudos
k9
by New Contributor II
  • 5569 Views
  • 3 replies
  • 1 kudos

Resolved! Databricks CLI v0.17.6 issue

I do have multiple groups created in my databricks account and I have databricks cli installed on my mac. Some of the cli functions return errors that i cannot find solution for. databricks groups listReturns:Error: b'{"error_code":"INTERNAL_ERROR","...

  • 5569 Views
  • 3 replies
  • 1 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 1 kudos

@kenan hasanov​ which version python you have installed on your machine please, you need to have 3-3.6 or 2-2.7.9 above , please try to go with latest one as you are only seeing issues with few functions. please raise issue in case if you are still f...

  • 1 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels