cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

mvmiller
by New Contributor III
  • 1766 Views
  • 2 replies
  • 1 kudos

Workflow file arrival trigger - does it apply to overwritten files?

I am exploring the use of the "file arrival" trigger for a workflow for a use case I am working on.  I understand from the documentation that it checks every minute for new files in an external location, then initiates the workflow when it detects a ...

  • 1766 Views
  • 2 replies
  • 1 kudos
Latest Reply
Rajani
Contributor II
  • 1 kudos

Hi @mvmiller The  "file arrival" trigger for a workflow considers the name of the file,when the same name file was overwritten the workflow didnt triggerred.hope I answered your question! 

  • 1 kudos
1 More Replies
Brad
by Contributor II
  • 802 Views
  • 3 replies
  • 0 kudos

One worker is one executor and one physical node

Hi team,Seems in Databricks, instead of like running Spark jobs on a k8s cluster, when a workflow running on a Job Compute/Cluster or instance pool, one physical node can only have one executor. Is this understanding right? If that is true, that mean...

  • 802 Views
  • 3 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

It can change the number of executors dynamically if I am not mistaken.But maybe Databricks has hammered the 1-1 ratio in stone, something to test I'd say.

  • 0 kudos
2 More Replies
ckough
by New Contributor III
  • 40500 Views
  • 42 replies
  • 25 kudos

Resolved! Cannot sign in at databricks partner-academy portal

Hi thereI have used my company email to register an account for customer-academy.databricks.com a while back. Now what I need to do is create an account with partner-academy.databricks.com using my company email too.However when I register at partner...

  • 40500 Views
  • 42 replies
  • 25 kudos
Latest Reply
Sanjanavenkat05
New Contributor II
  • 25 kudos

I'm also facing the same problem. My company is a databricks partner and I accidentally registered to the customer portal instead of Partner one. Could you please help me to migrate my account to Partner account?I also raised a ticket and the request...

  • 25 kudos
41 More Replies
MartinIsti
by New Contributor III
  • 3721 Views
  • 5 replies
  • 1 kudos

DLT - runtime parameterisation of execution

I have started to use DLT in a prototype framework and I now face the below challenge for which any help would be appreciated.First let me give a brief context:I have metadata sitting in a .json file that I read as the first task and put it into a lo...

Data Engineering
configuration
Delta Live Table
job
parameters
workflow
  • 3721 Views
  • 5 replies
  • 1 kudos
Latest Reply
data-engineer-d
Contributor
  • 1 kudos

@Retired_mod Can you please provide some reference to REST API approach? I do not see that available on the docs. TIA

  • 1 kudos
4 More Replies
NagarajuBondala
by New Contributor II
  • 238 Views
  • 0 replies
  • 1 kudos

AI-Suggested Comments Not Appearing for Delta Live Tables Populated Tables

I'm working with Delta Live Tables (DLT) in Databricks and have noticed that AI-suggested comments for columns are not showing up for tables populated using DLT. Interestingly, this feature works fine for tables that are not populated using DLT. Is t...

Data Engineering
AI
Delta Live Tables
dlt
  • 238 Views
  • 0 replies
  • 1 kudos
shivkumar
by New Contributor
  • 309 Views
  • 1 replies
  • 0 kudos

how to write data frame API

how to write data frame API

  • 309 Views
  • 1 replies
  • 0 kudos
Latest Reply
lucasrocha
Databricks Employee
  • 0 kudos

Hello @shivkumar , I hope this message finds you well. I didn't fully understand your question, but to write a DataFrame you can follow the steps in the link below:https://docs.databricks.com/en/getting-started/dataframes.html#tutorial-load-and-trans...

  • 0 kudos
sahil07
by New Contributor III
  • 1393 Views
  • 8 replies
  • 1 kudos

Resolved! FileNotFoundError while reading PDF file in Databricks from DBFS location

I am trying to read a PDF file from DBFS location in Databricks using PyPDF2.PdfFileReader but it's throwing error that file doesn't existBut the file exists in the path, refer below screenshotCan anyone please suggest what is wrong in this?

sahil07_0-1724773944269.png sahil07_1-1724773977934.png
  • 1393 Views
  • 8 replies
  • 1 kudos
Latest Reply
Lucas_TBrabo
Databricks Employee
  • 1 kudos

@sahil07, It seems that with your current setup, you can't read from DBFS using vanilla Python. I've ran some tests and managed to reproduce the error and solve it by copying the file in DBFS to the local file system of the driver node using dbutils....

  • 1 kudos
7 More Replies
Karl
by New Contributor II
  • 246 Views
  • 0 replies
  • 0 kudos

DB2 JDBC Connection from Databricks cluster

Has anyone successfully connected to a DB2 database on ZOS from a Databricks cluster using a JDBC connection?I also need to specify an SSL certificate path and not sure if I need to use an init script on the cluster to do so.Any examples would be ver...

  • 246 Views
  • 0 replies
  • 0 kudos
ahab
by New Contributor
  • 1455 Views
  • 1 replies
  • 0 kudos

Error deploying DLT DAB: Validation failed for cluster_type, the value must be dlt (is "job")

Hello.I'm getting a cluster validation error while trying to deploy a DLT pipeline via DAB.See attached screenshots for config and error.Hoping someone has run into this before and can guide me.Thanks.

  • 1455 Views
  • 1 replies
  • 0 kudos
Latest Reply
MohcineRouessi
New Contributor II
  • 0 kudos

Hey @ahab,Were you able to solve this issue, if yes would you mind sharing your finds If not, looking at your cluster policy, I'm seeing that you are using dlt as the cluster type which is the correct needed type for DLT pipelines, but in the resourc...

  • 0 kudos
ivanychev
by Contributor II
  • 1751 Views
  • 3 replies
  • 0 kudos

Resolved! Delta table takes too long to write due to S3 full scan

DBR 14.3, Spark 3.5.0. We use AWS Glue Metastore.On August 20th some of our pipelines started timing out during write to a Delta table. We're experiencing many hours of driver executing post commit hooks. We write dataframes to delta with `mode=overw...

  • 1751 Views
  • 3 replies
  • 0 kudos
Latest Reply
ivanychev
Contributor II
  • 0 kudos

spark.databricks.delta.catalog.update.enabled=true setting helped but I still don't understand why the problem started to occur.https://docs.databricks.com/en/archive/external-metastores/external-hive-metastore.html#external-apache-hive-metastore-leg...

  • 0 kudos
2 More Replies
bhakti
by New Contributor II
  • 977 Views
  • 7 replies
  • 0 kudos

Databricks job getting stuck at saveAsTable

I am trying to write logs to delta table,  but after running for sometime the job is getting stuck at saveAsTable.Traceback (most recent call last):File "/databricks/spark/python/pyspark/errors/exceptions.py", line 228, in decoreturn f(*a, **kw)File ...

  • 977 Views
  • 7 replies
  • 0 kudos
Latest Reply
ivanychev
Contributor II
  • 0 kudos

Hey @bhakti ! Please provide the full stack trace / error message. Your log doesn't provide any strong clue, the failure during write might occur for various reasons. 

  • 0 kudos
6 More Replies
Amodak91
by New Contributor II
  • 727 Views
  • 1 replies
  • 4 kudos

Resolved! How to use a Dataframe created in one Notebook from Another, without writing it anywhere ?

I have a large Notebook and want to divide it into multiple Notebooks and use Databricks jobs to run parallelly. However, one of the notebook is using a dataframe from one of the notebooks, so it has to be run downstream of the other ones. Now, since...

  • 727 Views
  • 1 replies
  • 4 kudos
Latest Reply
menotron
Valued Contributor
  • 4 kudos

Hi @Amodak91, you could use the %run magic command from within the downstream notebook and call the upstream notebook thus having it run in the same context and have all it's variables accessible including the dataframe without needing to persist it....

  • 4 kudos
JaviPA
by New Contributor
  • 391 Views
  • 1 replies
  • 0 kudos

Spark Bigquery Connector Availability

Hello,We are trying to use this library (https://github.com/GoogleCloudDataproc/spark-bigquery-connector) to read Bigquery data from a Databricks cluster in Azure, could someone confirm if this library is fully available and supported on Databricks? ...

  • 391 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @JaviPA ,Documentation referes to the library you're going to use:GitHub - GoogleCloudDataproc/spark-bigquery-connector: BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.Also, it...

  • 0 kudos
RohitKulkarni
by Contributor II
  • 1506 Views
  • 11 replies
  • 6 kudos

Partially upload data of 1.2GB

Hello Team,I have file contain in txt format of 1.2gb file.I am trying to upload the data into ms sql server database table.Only 10% of the data able to upload it.example :Total records in a file : 51303483Number of records inserted :10224430I am usi...

  • 1506 Views
  • 11 replies
  • 6 kudos
Latest Reply
RohitKulkarni
Contributor II
  • 6 kudos

There was a access lines in the document. Because of this there was partially loading.Thanks for the support 

  • 6 kudos
10 More Replies
ahen
by New Contributor
  • 255 Views
  • 0 replies
  • 0 kudos

Deployed DABs job via Gitlab CICD. It is creating duplicate jobs.

We had error in DABs deploy and then subsequent retries resulted in a locked stateAnd as suggested in the logs, we use --force-lock option and the deploy succeededHowever, it created duplicate jobs for all assets in the bundle instead of updating the...

  • 255 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels