cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

annetemplon
by New Contributor II
  • 777 Views
  • 3 replies
  • 0 kudos

Explaining the explain plan

Hi All,I am new to Databricks and have recently started exploring databricks' explain plans to try and understand how the queries are executed (and eventually tune them as needed).There are some things that I can somehow "guess" based on what I know ...

  • 777 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @annetemplon ,There are plenty of resources about this topic but they are scattered all over internet  I like below videos, pretty informative:https://m.youtube.com/watch?v=99fYi2mopbshttps://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&u...

  • 0 kudos
2 More Replies
Databricks143
by New Contributor III
  • 20835 Views
  • 14 replies
  • 3 kudos

Recrusive cte in databrick sql

Hi Team,How to write recrusive cte in databricks SQL.Please let me know any one have solution for this 

  • 20835 Views
  • 14 replies
  • 3 kudos
Latest Reply
dlehmann
New Contributor III
  • 3 kudos

Hello @filipniziol , I went with your second suggestion as i preferred to use views in this case. It works very well as there is a limited depth and i could just write that many unions.Thanks for your response!

  • 3 kudos
13 More Replies
s3
by New Contributor II
  • 13269 Views
  • 4 replies
  • 8 kudos

Resolved! notebook for SFTP server connectivity without password.

I am trying to develop some script using python to access an sftp server without password and all valid public/private keys in a notebook. However I am not getting any such example. All examples has a password in it. Can I get some help?

  • 13269 Views
  • 4 replies
  • 8 kudos
Latest Reply
Atanu
Databricks Employee
  • 8 kudos

https://stackoverflow.com/questions/58562744/how-to-upload-text-file-to-ftp-from-databricks-notebook this example looks good to me. or may be try using data libs. https://www.cdata.com/kb/tech/sftp-jdbc-azure-databricks.rst

  • 8 kudos
3 More Replies
maafsl
by New Contributor II
  • 575 Views
  • 1 replies
  • 2 kudos

Vulnerability in the Guava dependency of the Databricks jdbc driver

Good afternoon I want to report that the JDBC driver incorporates a version of com.google.guava:guava that has two vulnerabilities. Image attached. Could the dependency be updated?

  • 575 Views
  • 1 replies
  • 2 kudos
Latest Reply
maafsl
New Contributor II
  • 2 kudos

  • 2 kudos
Dnirmania
by Contributor
  • 1150 Views
  • 2 replies
  • 1 kudos

Resolved! Dynamic Python UDF in unity catalog

Hi Team I am trying to create a python UDF which I want to use for column masking. This function will take 2 input parameters(column name and groups name) and return the column value if user is part of group otherwise return masked value. I wrote fol...

  • 1150 Views
  • 2 replies
  • 1 kudos
Latest Reply
menotron
Valued Contributor
  • 1 kudos

Hi @Dnirmania, You could achieve something similar using this UDF:%sql CREATE OR REPLACE FUNCTION ryanlakehouse.default.column_masking(column_value STRING, groups_str String) RETURNS STRING LANGUAGE SQL COMMENT 'Return the column value if use...

  • 1 kudos
1 More Replies
mvmiller
by New Contributor III
  • 2155 Views
  • 2 replies
  • 1 kudos

Workflow file arrival trigger - does it apply to overwritten files?

I am exploring the use of the "file arrival" trigger for a workflow for a use case I am working on.  I understand from the documentation that it checks every minute for new files in an external location, then initiates the workflow when it detects a ...

  • 2155 Views
  • 2 replies
  • 1 kudos
Latest Reply
Rajani
Contributor II
  • 1 kudos

Hi @mvmiller The  "file arrival" trigger for a workflow considers the name of the file,when the same name file was overwritten the workflow didnt triggerred.hope I answered your question! 

  • 1 kudos
1 More Replies
Brad
by Contributor II
  • 1072 Views
  • 3 replies
  • 0 kudos

One worker is one executor and one physical node

Hi team,Seems in Databricks, instead of like running Spark jobs on a k8s cluster, when a workflow running on a Job Compute/Cluster or instance pool, one physical node can only have one executor. Is this understanding right? If that is true, that mean...

  • 1072 Views
  • 3 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

It can change the number of executors dynamically if I am not mistaken.But maybe Databricks has hammered the 1-1 ratio in stone, something to test I'd say.

  • 0 kudos
2 More Replies
MartinIsti
by New Contributor III
  • 4391 Views
  • 5 replies
  • 1 kudos

DLT - runtime parameterisation of execution

I have started to use DLT in a prototype framework and I now face the below challenge for which any help would be appreciated.First let me give a brief context:I have metadata sitting in a .json file that I read as the first task and put it into a lo...

Data Engineering
configuration
Delta Live Table
job
parameters
workflow
  • 4391 Views
  • 5 replies
  • 1 kudos
Latest Reply
data-engineer-d
Contributor
  • 1 kudos

@Retired_mod Can you please provide some reference to REST API approach? I do not see that available on the docs. TIA

  • 1 kudos
4 More Replies
shivkumar
by New Contributor
  • 434 Views
  • 1 replies
  • 0 kudos

how to write data frame API

how to write data frame API

  • 434 Views
  • 1 replies
  • 0 kudos
Latest Reply
lucasrocha
Databricks Employee
  • 0 kudos

Hello @shivkumar , I hope this message finds you well. I didn't fully understand your question, but to write a DataFrame you can follow the steps in the link below:https://docs.databricks.com/en/getting-started/dataframes.html#tutorial-load-and-trans...

  • 0 kudos
sahil07
by New Contributor III
  • 2224 Views
  • 8 replies
  • 1 kudos

Resolved! FileNotFoundError while reading PDF file in Databricks from DBFS location

I am trying to read a PDF file from DBFS location in Databricks using PyPDF2.PdfFileReader but it's throwing error that file doesn't existBut the file exists in the path, refer below screenshotCan anyone please suggest what is wrong in this?

sahil07_0-1724773944269.png sahil07_1-1724773977934.png
  • 2224 Views
  • 8 replies
  • 1 kudos
Latest Reply
Lucas_TBrabo
Databricks Employee
  • 1 kudos

@sahil07, It seems that with your current setup, you can't read from DBFS using vanilla Python. I've ran some tests and managed to reproduce the error and solve it by copying the file in DBFS to the local file system of the driver node using dbutils....

  • 1 kudos
7 More Replies
ahab
by New Contributor
  • 1844 Views
  • 1 replies
  • 0 kudos

Error deploying DLT DAB: Validation failed for cluster_type, the value must be dlt (is "job")

Hello.I'm getting a cluster validation error while trying to deploy a DLT pipeline via DAB.See attached screenshots for config and error.Hoping someone has run into this before and can guide me.Thanks.

  • 1844 Views
  • 1 replies
  • 0 kudos
Latest Reply
MohcineRouessi
New Contributor II
  • 0 kudos

Hey @ahab,Were you able to solve this issue, if yes would you mind sharing your finds If not, looking at your cluster policy, I'm seeing that you are using dlt as the cluster type which is the correct needed type for DLT pipelines, but in the resourc...

  • 0 kudos
ivanychev
by Contributor II
  • 2324 Views
  • 3 replies
  • 0 kudos

Resolved! Delta table takes too long to write due to S3 full scan

DBR 14.3, Spark 3.5.0. We use AWS Glue Metastore.On August 20th some of our pipelines started timing out during write to a Delta table. We're experiencing many hours of driver executing post commit hooks. We write dataframes to delta with `mode=overw...

  • 2324 Views
  • 3 replies
  • 0 kudos
Latest Reply
ivanychev
Contributor II
  • 0 kudos

spark.databricks.delta.catalog.update.enabled=true setting helped but I still don't understand why the problem started to occur.https://docs.databricks.com/en/archive/external-metastores/external-hive-metastore.html#external-apache-hive-metastore-leg...

  • 0 kudos
2 More Replies
bhakti
by New Contributor II
  • 1621 Views
  • 7 replies
  • 0 kudos

Databricks job getting stuck at saveAsTable

I am trying to write logs to delta table,  but after running for sometime the job is getting stuck at saveAsTable.Traceback (most recent call last):File "/databricks/spark/python/pyspark/errors/exceptions.py", line 228, in decoreturn f(*a, **kw)File ...

  • 1621 Views
  • 7 replies
  • 0 kudos
Latest Reply
ivanychev
Contributor II
  • 0 kudos

Hey @bhakti ! Please provide the full stack trace / error message. Your log doesn't provide any strong clue, the failure during write might occur for various reasons. 

  • 0 kudos
6 More Replies
Amodak91
by New Contributor II
  • 2549 Views
  • 1 replies
  • 4 kudos

Resolved! How to use a Dataframe created in one Notebook from Another, without writing it anywhere ?

I have a large Notebook and want to divide it into multiple Notebooks and use Databricks jobs to run parallelly. However, one of the notebook is using a dataframe from one of the notebooks, so it has to be run downstream of the other ones. Now, since...

  • 2549 Views
  • 1 replies
  • 4 kudos
Latest Reply
menotron
Valued Contributor
  • 4 kudos

Hi @Amodak91, you could use the %run magic command from within the downstream notebook and call the upstream notebook thus having it run in the same context and have all it's variables accessible including the dataframe without needing to persist it....

  • 4 kudos
JaviPA
by New Contributor
  • 657 Views
  • 1 replies
  • 0 kudos

Spark Bigquery Connector Availability

Hello,We are trying to use this library (https://github.com/GoogleCloudDataproc/spark-bigquery-connector) to read Bigquery data from a Databricks cluster in Azure, could someone confirm if this library is fully available and supported on Databricks? ...

  • 657 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @JaviPA ,Documentation referes to the library you're going to use:GitHub - GoogleCloudDataproc/spark-bigquery-connector: BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.Also, it...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels