cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

William_Scardua
by Valued Contributor
  • 2669 Views
  • 1 replies
  • 3 kudos

How to use Pylint to check your pyspark code quality ?

Hi guys,I would like to use the Pylint to check my pyspark scripts, do you do that ?Thank you ?

  • 2669 Views
  • 1 replies
  • 3 kudos
Latest Reply
developer_lumo
New Contributor II
  • 3 kudos

Currently I am working on Databricks (Notebooks) and have the same issue as unable to find a linter that is well integrated with Python, Pyspark and databricks notebooks. 

  • 3 kudos
ashraf1395
by Valued Contributor III
  • 373 Views
  • 1 replies
  • 0 kudos

Resolved! Creating notebooks which work on both normal databricks jobs as well as dlt pipeline

We are working on automation of our databricks ingestion. We want to make our python scrips or notebooks such that they work on both databricks jobs and dlt pipelines.When i say databricks jobs it means normal run without dlt pipeline.How shall we wo...

  • 373 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @ashraf1395, To address your goal of creating Python scripts or notebooks that work both in Databricks Jobs and Delta Live Tables (DLT) pipelines, here are some ideas: Unified Script Approach:Table Creation: As you mentioned, DLT supports two t...

  • 0 kudos
Dharinip
by New Contributor III
  • 620 Views
  • 1 replies
  • 1 kudos

Resolved! Create a Delta Table with PK and FK constraints for a streaming source data

1. How to create a Delta Table with PK and FK constraints for a streaming source data?2. When the streaming data in the silver layer gets updated, will the delta table also be updated?My use case is:We have a streaming data in the silver layer as SCD...

  • 620 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

1. You can use primary key and foreign key relationships on fields in Unity Catalog tables. Primary and foreign keys are informational only and are not enforced. Foreign keys must reference a primary key in another table. You can declare primary keys...

  • 1 kudos
ashutosh0710
by New Contributor II
  • 1050 Views
  • 4 replies
  • 1 kudos

java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.trees.Origin.<init>

While trying to run  spark.sql("CALL iceberg_catalog.system.expire_snapshots(table => 'iceberg_catalog.d11_stitch.rewards_third_party_impact_base_query', older_than => TIMESTAMP '2024-03-06 00:00:00.000')") to this Im getting  Py4JJavaError: An erro...

  • 1050 Views
  • 4 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

@ashutosh0710 you can inspect the libraries available in a cluster at /databricks/jars to understand which jars and versions are available, or similarly simply inspect the Spark UI Environment Classpath list.

  • 1 kudos
3 More Replies
Reply_Domenico
by New Contributor II
  • 537 Views
  • 1 replies
  • 0 kudos

Issue with Updating Dashboards During Assessment Using UCX

Hello everyone,I am using UCX for the migration to Unity, and I've noticed that re-running the assessment does not update the dashboards with jobs that are incompatible with Unity. To get the dashboards updated, I had to uninstall and reinstall UCX, ...

  • 537 Views
  • 1 replies
  • 0 kudos
Latest Reply
ckunal_meta
New Contributor II
  • 0 kudos

Hi,I am having a similar issue as Domenico. I have installed UCX in my workspace and ran the UCX_assesment job 3 days back. It created crawl_table.log which showed that UCX had scanned through all of my legacy hive_metastore schemas and listed the ta...

  • 0 kudos
DBUser2
by New Contributor III
  • 348 Views
  • 2 replies
  • 0 kudos

COPY INTO size limit

HiI'm using the COPY INTO command to ingest data into a delta table in my Azure Databricks instance. Sometime I get a timeout error running this command. Is there a limit on the size of the data that can be ingested using "COPY INTO" or limit on the ...

  • 348 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @DBUser2, The COPY INTO command does not have a specific documented limit on the size of the data or the number of files that can be ingested at a time. Timeout errors can occur due to network issues, resource limitations, or long-running opera...

  • 0 kudos
1 More Replies
Dp15
by Contributor
  • 638 Views
  • 4 replies
  • 0 kudos

Executing Python code inside a SQL Function

Hi ,I am trying to create a SQL UDF and I am trying to run some python code involving pyspark, I am not able to create a spark session inside the python section of the function, here is how my code looks,  CREATE OR REPLACE FUNCTION test.getValuesFro...

  • 638 Views
  • 4 replies
  • 0 kudos
Latest Reply
Dp15
Contributor
  • 0 kudos

Actually this would if I am using it in a native notebook environment, however I am trying to create a UDF because I want these queries to be executed from an external, JDBC connection, and I dont wish to wait for the cluster to spin up for a noteboo...

  • 0 kudos
3 More Replies
skumarrm
by New Contributor II
  • 525 Views
  • 2 replies
  • 0 kudos

DLT PipelineID/PipleLineName values from the TASK1 should get passed to TASK2 notebook (Non-DLT)

DLT PipelineID/PipleLineName values from the TASK1 should get passed to TASK2 notebook (Non-DLT)TASK1(DLT)---> TASK2(Non-DLT)How to pass the parameters to TASK2 from TASK1. I need to get the DLT task notebook pipelineID,pipelineName and pass to TASK2...

Data Engineering
dlt
DLT parameter
  • 525 Views
  • 2 replies
  • 0 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 0 kudos

@skumarrm Please try the below: Set Up Task Parameters: In the job configuration, you can set up task parameters to pass values from one task to another.For TASK1 (DLT), ensure it outputs the PipelineID or PipelineName. Use Task Parameters in TASK2:...

  • 0 kudos
1 More Replies
PKD28
by New Contributor II
  • 320 Views
  • 1 replies
  • 0 kudos

Databaricks Cluster issue

Jobs within the all purpose DB Cluster are failing with "the spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached"In the event log it says "Event_type=DRIVER_NOT_RESPONDING & MESSAGE= "Driver is up b...

  • 320 Views
  • 1 replies
  • 0 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 0 kudos

@PKD28  The error indicates that the driver memory is not enough to handle the load.Please refer to this doc for more info and on how to fix thishttps://kb.databricks.com/en_US/jobs/driver-unavailable

  • 0 kudos
abduldjafar
by New Contributor
  • 469 Views
  • 1 replies
  • 0 kudos

Merge take too long

Hi all,I performed a merge process on approximately 19 million rows using two i3.4xlarge workers. However, the process took around 20 minutes to complete. How can I further optimize this process? I have already implemented the OPTIMIZE command and us...

  • 469 Views
  • 1 replies
  • 0 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 0 kudos

@abduldjafar Use this general doc to optimize your workload based on your job analysis https://www.databricks.com/discover/pages/optimize-data-workloads-guide

  • 0 kudos
bhanuteja_1
by New Contributor II
  • 242 Views
  • 1 replies
  • 0 kudos

NoClassDefFoundError: scala/Product Caused by: ClassNotFoundException: scala.Product

NoClassDefFoundError: scala/Product Caused by: ClassNotFoundException: scala.Productat preimport step itself . Please suggest me something .

  • 242 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @bhanuteja_1  The scala.Product is a core class in the Scala std library used for tuples, case classes, etc. There seem to be a classloading problem or more likely a jar conflict. Are you deploying a job using custom jars, uber jars, and having de...

  • 0 kudos
bhanuteja_1
by New Contributor II
  • 406 Views
  • 1 replies
  • 0 kudos

NoClassDefFoundError: org/apache/spark/sql/SparkSession$

NoClassDefFoundError: org/apache/spark/sql/SparkSession$    at com.microsoft.nebula.common.ConfigProvider.<init>(configProvider.scala:17)    at $linef37a348949c145718a08f6b29642317b35.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$...

  • 406 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @bhanuteja_1 , Where are you running this from? Based on the short output, it looks like from a Databricks Notebook, but it would be a weird error unless you're having some classpath overrides or jar conflicts, leading to this error; it is simply ...

  • 0 kudos
Nathant93
by New Contributor III
  • 650 Views
  • 1 replies
  • 0 kudos

(java.util.concurrent.ExecutionException) Boxed Error

Has anyone ever come across the error above?I am trying to get two tables from unity catalog and join them, the join is fairly complex as it is imitating a where not exists top 1 sql query.

  • 650 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @Nathant93 , Does it come with a "Caused by" in the error stacktrace? If there isn't any in the Spark logs, perhaps you can provide a reproducer code leading to this exception. Stacktrace, DBR version and repro code would help. The (java.util.conc...

  • 0 kudos
qwerty1
by Contributor
  • 5803 Views
  • 7 replies
  • 19 kudos

Resolved! When will databricks runtime be released for Scala 2.13?

I see that spark fully supports Scala 2.13. I wonder why is there no databricks runtime with Scala 2.13 yet. Any plans on making this available? It would be super useful.

  • 5803 Views
  • 7 replies
  • 19 kudos
Latest Reply
guersam
New Contributor II
  • 19 kudos

I agree with @777. As Scala 3 is getting mature and there are more real use cases with Scala 3 on Spark now, support for Scala 2.13 will be valuable to users including us.I think the recent upgrade of Databricks runtime from JDK 8 to 17 was one of a ...

  • 19 kudos
6 More Replies
sathya08
by New Contributor III
  • 1497 Views
  • 3 replies
  • 0 kudos

Databricks Python function achieving Parallelism

Hello everyone,I have a very basic question wrt Databricks spark parallelism.I have a python function within a for loop, so I believe this is running sequentially.Databricks cluster is enabled with Photon and with Spark 15x, does that mean the driver...

  • 1497 Views
  • 3 replies
  • 0 kudos
Latest Reply
sathya08
New Contributor III
  • 0 kudos

any help here , thanks 

  • 0 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels