cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Dharinip
by Contributor
  • 2798 Views
  • 1 replies
  • 1 kudos

Resolved! Create a Delta Table with PK and FK constraints for a streaming source data

1. How to create a Delta Table with PK and FK constraints for a streaming source data?2. When the streaming data in the silver layer gets updated, will the delta table also be updated?My use case is:We have a streaming data in the silver layer as SCD...

  • 2798 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

1. You can use primary key and foreign key relationships on fields in Unity Catalog tables. Primary and foreign keys are informational only and are not enforced. Foreign keys must reference a primary key in another table. You can declare primary keys...

  • 1 kudos
ashutosh0710
by New Contributor II
  • 2872 Views
  • 4 replies
  • 1 kudos

java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.trees.Origin.<init>

While trying to run  spark.sql("CALL iceberg_catalog.system.expire_snapshots(table => 'iceberg_catalog.d11_stitch.rewards_third_party_impact_base_query', older_than => TIMESTAMP '2024-03-06 00:00:00.000')") to this Im getting  Py4JJavaError: An erro...

  • 2872 Views
  • 4 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

@ashutosh0710 you can inspect the libraries available in a cluster at /databricks/jars to understand which jars and versions are available, or similarly simply inspect the Spark UI Environment Classpath list.

  • 1 kudos
3 More Replies
Reply_Domenico
by Databricks Partner
  • 1307 Views
  • 1 replies
  • 0 kudos

Issue with Updating Dashboards During Assessment Using UCX

Hello everyone,I am using UCX for the migration to Unity, and I've noticed that re-running the assessment does not update the dashboards with jobs that are incompatible with Unity. To get the dashboards updated, I had to uninstall and reinstall UCX, ...

  • 1307 Views
  • 1 replies
  • 0 kudos
Latest Reply
ckunal_meta
Databricks Partner
  • 0 kudos

Hi,I am having a similar issue as Domenico. I have installed UCX in my workspace and ran the UCX_assesment job 3 days back. It created crawl_table.log which showed that UCX had scanned through all of my legacy hive_metastore schemas and listed the ta...

  • 0 kudos
DBUser2
by New Contributor III
  • 1358 Views
  • 2 replies
  • 0 kudos

COPY INTO size limit

HiI'm using the COPY INTO command to ingest data into a delta table in my Azure Databricks instance. Sometime I get a timeout error running this command. Is there a limit on the size of the data that can be ingested using "COPY INTO" or limit on the ...

  • 1358 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @DBUser2, The COPY INTO command does not have a specific documented limit on the size of the data or the number of files that can be ingested at a time. Timeout errors can occur due to network issues, resource limitations, or long-running opera...

  • 0 kudos
1 More Replies
Dp15
by Contributor
  • 3002 Views
  • 4 replies
  • 0 kudos

Executing Python code inside a SQL Function

Hi ,I am trying to create a SQL UDF and I am trying to run some python code involving pyspark, I am not able to create a spark session inside the python section of the function, here is how my code looks,  CREATE OR REPLACE FUNCTION test.getValuesFro...

  • 3002 Views
  • 4 replies
  • 0 kudos
Latest Reply
Dp15
Contributor
  • 0 kudos

Actually this would if I am using it in a native notebook environment, however I am trying to create a UDF because I want these queries to be executed from an external, JDBC connection, and I dont wish to wait for the cluster to spin up for a noteboo...

  • 0 kudos
3 More Replies
skumarrm
by Databricks Partner
  • 1637 Views
  • 2 replies
  • 0 kudos

DLT PipelineID/PipleLineName values from the TASK1 should get passed to TASK2 notebook (Non-DLT)

DLT PipelineID/PipleLineName values from the TASK1 should get passed to TASK2 notebook (Non-DLT)TASK1(DLT)---> TASK2(Non-DLT)How to pass the parameters to TASK2 from TASK1. I need to get the DLT task notebook pipelineID,pipelineName and pass to TASK2...

Data Engineering
dlt
DLT parameter
  • 1637 Views
  • 2 replies
  • 0 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 0 kudos

@skumarrm Please try the below: Set Up Task Parameters: In the job configuration, you can set up task parameters to pass values from one task to another.For TASK1 (DLT), ensure it outputs the PipelineID or PipelineName. Use Task Parameters in TASK2:...

  • 0 kudos
1 More Replies
PKD28
by New Contributor II
  • 919 Views
  • 1 replies
  • 0 kudos

Databaricks Cluster issue

Jobs within the all purpose DB Cluster are failing with "the spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached"In the event log it says "Event_type=DRIVER_NOT_RESPONDING & MESSAGE= "Driver is up b...

  • 919 Views
  • 1 replies
  • 0 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 0 kudos

@PKD28  The error indicates that the driver memory is not enough to handle the load.Please refer to this doc for more info and on how to fix thishttps://kb.databricks.com/en_US/jobs/driver-unavailable

  • 0 kudos
abduldjafar
by New Contributor
  • 1386 Views
  • 1 replies
  • 0 kudos

Merge take too long

Hi all,I performed a merge process on approximately 19 million rows using two i3.4xlarge workers. However, the process took around 20 minutes to complete. How can I further optimize this process? I have already implemented the OPTIMIZE command and us...

  • 1386 Views
  • 1 replies
  • 0 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 0 kudos

@abduldjafar Use this general doc to optimize your workload based on your job analysis https://www.databricks.com/discover/pages/optimize-data-workloads-guide

  • 0 kudos
bhanuteja_1
by New Contributor II
  • 691 Views
  • 1 replies
  • 0 kudos

NoClassDefFoundError: scala/Product Caused by: ClassNotFoundException: scala.Product

NoClassDefFoundError: scala/Product Caused by: ClassNotFoundException: scala.Productat preimport step itself . Please suggest me something .

  • 691 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @bhanuteja_1  The scala.Product is a core class in the Scala std library used for tuples, case classes, etc. There seem to be a classloading problem or more likely a jar conflict. Are you deploying a job using custom jars, uber jars, and having de...

  • 0 kudos
bhanuteja_1
by New Contributor II
  • 1121 Views
  • 1 replies
  • 0 kudos

NoClassDefFoundError: org/apache/spark/sql/SparkSession$

NoClassDefFoundError: org/apache/spark/sql/SparkSession$    at com.microsoft.nebula.common.ConfigProvider.<init>(configProvider.scala:17)    at $linef37a348949c145718a08f6b29642317b35.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$...

  • 1121 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @bhanuteja_1 , Where are you running this from? Based on the short output, it looks like from a Databricks Notebook, but it would be a weird error unless you're having some classpath overrides or jar conflicts, leading to this error; it is simply ...

  • 0 kudos
qwerty1
by Contributor
  • 8632 Views
  • 7 replies
  • 19 kudos

Resolved! When will databricks runtime be released for Scala 2.13?

I see that spark fully supports Scala 2.13. I wonder why is there no databricks runtime with Scala 2.13 yet. Any plans on making this available? It would be super useful.

  • 8632 Views
  • 7 replies
  • 19 kudos
Latest Reply
guersam
New Contributor II
  • 19 kudos

I agree with @777. As Scala 3 is getting mature and there are more real use cases with Scala 3 on Spark now, support for Scala 2.13 will be valuable to users including us.I think the recent upgrade of Databricks runtime from JDK 8 to 17 was one of a ...

  • 19 kudos
6 More Replies
sathya08
by New Contributor III
  • 10155 Views
  • 3 replies
  • 1 kudos

Databricks Python function achieving Parallelism

Hello everyone,I have a very basic question wrt Databricks spark parallelism.I have a python function within a for loop, so I believe this is running sequentially.Databricks cluster is enabled with Photon and with Spark 15x, does that mean the driver...

  • 10155 Views
  • 3 replies
  • 1 kudos
Latest Reply
sathya08
New Contributor III
  • 1 kudos

any help here , thanks 

  • 1 kudos
2 More Replies
TeachingWithDat
by New Contributor II
  • 8270 Views
  • 3 replies
  • 2 kudos

I am getting this error: com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: com.databricks.rpc.UnknownRemoteException: Remote exception occurred:

I am teaching a class for BYU Idaho and every table in every database has been imploded for my class. We keep getting this error:com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: com.databricks.rpc.UnknownRemoteException: ...

  • 8270 Views
  • 3 replies
  • 2 kudos
Latest Reply
aparna123
New Contributor II
  • 2 kudos

i am facing the issue before i trying to execute a code error message:com.databricks.rpc.UnknownRemoteException: Remote exception occurred:

  • 2 kudos
2 More Replies
User16685683696
by Databricks Employee
  • 3113 Views
  • 1 replies
  • 2 kudos

Free Training: Databricks Lakehouse Fundamentals The demand for technology roles is only growing – it&#39;s projected that over 150 million jobs will ...

Free Training: Databricks Lakehouse FundamentalsThe demand for technology roles is only growing – it's projected that over 150 million jobs will be added in the next five years. Across industries and regions, this is translating to increased demand f...

  • 3113 Views
  • 1 replies
  • 2 kudos
Latest Reply
Eddie_AZ
New Contributor II
  • 2 kudos

I watched all 4 videos but getting an error when I try to take the test. How do I complete the test and get my badge? 

  • 2 kudos
Gaurav_Lokhande
by Databricks Partner
  • 4629 Views
  • 7 replies
  • 3 kudos

We are trying to connect to AWS RDS MySQL instance from DBX with PySpark using JDBC

We are trying to connect to AWS RDS MySQL instance from DBX with PySpark using JDBC: jdbc_df = (spark.read.format("jdbc").options(url=f"jdbc:mysql://{creds['host']}:{creds['port']}/{creds['database']}", driver="com.mysql.cj.jdbc.Driver", dbtable="(SE...

  • 4629 Views
  • 7 replies
  • 3 kudos
Latest Reply
arjun_kr
Databricks Employee
  • 3 kudos

@Gaurav_Lokhande  With Spark JDBC usage, connectivity happens between your Databricks VPC (in your AWS account) and RDS VPC, assuming you are using non-serverless clusters. You may need to ensure this connectivity works (like by peering).

  • 3 kudos
6 More Replies
Labels