cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

boitumelodikoko
by Contributor III
  • 8704 Views
  • 7 replies
  • 1 kudos

[RETRIES_EXCEEDED] Error When Displaying DataFrame in Databricks Using Serverless Compute

Hi Databricks Community,I am encountering an issue when trying to display a DataFrame in a Python notebook using serverless compute. The operation seems to fail after several retries, and I get the following error message:[RETRIES_EXCEEDED] The maxim...

  • 8704 Views
  • 7 replies
  • 1 kudos
Latest Reply
felipediassouza
New Contributor III
  • 1 kudos

I'm also getting the same error. I'm trying to create a CATALOG. %sqlCREATE CATALOG IF NOT EXISTS `catalog_sql_databricks`;USE CATALOG `catalog_sql_databricks`;[RETRIES_EXCEEDED] The maximum number of retries has been exceeded. 

  • 1 kudos
6 More Replies
whatever
by New Contributor
  • 419 Views
  • 0 replies
  • 0 kudos

broken file API and inconsistent behavior

Since there is no way to file a bug, I'll post it here.. Honestly, I haven't seen such a broken and inconsistent API from production system yet in my life..what is worse - this same issue is in 'os' module:And their UI (despite actually showing the f...

whatever_0-1753367689463.png whatever_0-1753368641377.png whatever_1-1753368764667.png
  • 419 Views
  • 0 replies
  • 0 kudos
prakashhinduja1
by New Contributor
  • 818 Views
  • 1 replies
  • 2 kudos

Resolved! Prakash Hinduja Geneva (Swiss) Can I use tools like Great Expectations with Databricks?

Hi everyone,I am Prakash Hinduja from Geneva, Switzerland (Swiss) currently exploring ways to improve data quality checks in my Databricks pipelines and came across Great Expectations. I’d love to know if anyone here has experience using it with Data...

  • 818 Views
  • 1 replies
  • 2 kudos
Latest Reply
Nir_Hedvat
Databricks Employee
  • 2 kudos

Hi Prakash,Yes, Great Expectations integrates well with Databricks and is commonly used to enforce data quality checks in pipelines. For example, validating schema, nulls, ranges, or business rules. You can use it in a few ways: Directly in Python n...

  • 2 kudos
lauraxyz
by Contributor
  • 3735 Views
  • 5 replies
  • 1 kudos

Put file into volume within Databricks

Hi!  From a Databricks job, i want to copy a workspace file into volume.  how can i do that?I tried`dbutils.fs.cp("/Workspace/path/to/the/file", "/Volumes/path/to/destination")`but got Public DBFS root is disabled. Access is denied on path: /Workspac...

  • 3735 Views
  • 5 replies
  • 1 kudos
Latest Reply
fjrodriguez
New Contributor II
  • 1 kudos

I do have one question, i think this post is the best suitable. I do want to override a wheel files into a Volume i do have already created in my CICD process. I do have something like this:          - ${{if parameters.filesPackages}}:            - $...

  • 1 kudos
4 More Replies
Aneruth
by New Contributor II
  • 415 Views
  • 1 replies
  • 0 kudos

[INTERNAL_ERROR] Cannot refresh quality dashboard

Hi all,I'm encountering an INTERNAL_ERROR issue when refreshing a Databricks Lakehouse Monitoring job. Here's the full error message:`ProfilingError: INTERNAL_ERROR. Please contact the Databricks team for further assistance and include the refresh id...

  • 415 Views
  • 1 replies
  • 0 kudos
Latest Reply
Aneruth
New Contributor II
  • 0 kudos

Thank you! I'll modify my query based on your explanation. Currently, I'm manually parsing the custom metrics output data types, which works but isn't ideal. I'll implement proper data type formatting through asset bundles to ensure the UI receives c...

  • 0 kudos
Rainer
by New Contributor
  • 412 Views
  • 0 replies
  • 0 kudos

pyspark.testing.assertSchemaEqual() ignoreColumnOrder parameter exists in 3.5.0 only on Databricks

Hi, I am using the pyspark.testing.assertSchemaEqual() function in my code using the ignoreColumnOrder parameter that is available since pyspark 4.0.0. https://spark.apache.org/docs/4.0.0/api/python/reference/api/pyspark.testing.assertSchemaEqual.htm...

  • 412 Views
  • 0 replies
  • 0 kudos
san11
by New Contributor II
  • 655 Views
  • 2 replies
  • 0 kudos

Enabled IP access list for azure databricks workspace but it is not working

Hi,We enabled IP access list for azure databricks workspace using REST API and we are able to see the IPs in allow and block list but it is not working and we are able to login to Web UI from any IP address and run the queries. Does this approach not...

  • 655 Views
  • 2 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Contributor
  • 0 kudos

Hello @san11 what is the error you are getting? I mean on any error on web uican you also share the screenshot of the IPs which you allowed on the azure portal?

  • 0 kudos
1 More Replies
HariharaSam
by Contributor
  • 33809 Views
  • 10 replies
  • 4 kudos

Resolved! To get Number of rows inserted after performing an Insert operation into a table

Consider we have two tables A & B.qry = """INSERT INTO Table ASelect * from Table B where Id is null """spark.sql(qry)I need to get the number of records inserted after running this in databricks.

  • 33809 Views
  • 10 replies
  • 4 kudos
Latest Reply
User16653924625
Databricks Employee
  • 4 kudos

in case someone is looking for purely SQL based solution: (add LIMIT 1 to the query if you are looking for last op only)   select t.timestamp, t.operation, t.operationMetrics.numOutputRows as numOutputRows from ( DESCRIBE HISTORY <catalog>.<schema>....

  • 4 kudos
9 More Replies
ajgold
by New Contributor II
  • 390 Views
  • 6 replies
  • 2 kudos

DLT Expectations Alert for Warning

I want to receive an alert via email or Slack when the @Dlt.expect declaration fails the validation check in my DLT pipeline. I only see the option to add an email alert for @Dlt.expect_or_fail failures, but not for warnings.

  • 390 Views
  • 6 replies
  • 2 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 2 kudos

Hey @ajgold I don't think DLT has this feature yet. You may raise a feature request for Databricks to add it in its future releases over here - https://databricks.aha.io/Cheers!

  • 2 kudos
5 More Replies
ande
by New Contributor
  • 1746 Views
  • 2 replies
  • 0 kudos

IP address for accessing external SFTP server

I am trying to pull in data to my Databricks workspace via an external SFTP server. I am using Azure for my compute. To access the SFTP server they need to whitelist my IP address. My IP address in Azure Databricks seems to be constantly changing fro...

  • 1746 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Azure Databricks, like many cloud services, does not provide static IP addresses for outbound connections. This is because the compute resources are dynamically allocated and can change over time. One potential workaround could be to use a Virtual N...

  • 0 kudos
1 More Replies
fjrodriguez
by New Contributor II
  • 234 Views
  • 2 replies
  • 0 kudos

Job Preview in ADF

I do have one Spark Job that is triggered via ADF as a usual "Python" activity. Now wanted to move to Job which is under Preview. Normally under linked service level i do have spark config and environment that is needed for the execution of this scri...

  • 234 Views
  • 2 replies
  • 0 kudos
Latest Reply
radothede
Valued Contributor II
  • 0 kudos

Hi @fjrodriguez my understanding is You've already created a cluster for Your job. If that's the case, You can put that spark configuration and env variables directly in the cluster Your job is using. If for some reason thats not possible, then You c...

  • 0 kudos
1 More Replies
jdlogos
by New Contributor III
  • 3514 Views
  • 5 replies
  • 2 kudos

apply_changes_from_snapshot with expectations

Hi,Question: Are expectations supposed to function in conjunction with create_streaming_table() and apply_changes_from_snapshot?Our team is investigating Delta Live Tables and we have a working prototype using Autoloader to ingest some files from a m...

  • 3514 Views
  • 5 replies
  • 2 kudos
Latest Reply
jbrmn
New Contributor II
  • 2 kudos

Also facing the same issue - did you find a solution?Thinking I will have to apply expectations at the next stage of the pipeline until this is worked out

  • 2 kudos
4 More Replies
dnz
by New Contributor
  • 968 Views
  • 1 replies
  • 0 kudos

Performance Issue with OPTIMIZE Command for Historical Data Migration Using Liquid Clustering

Hello Databricks Community,I’m experiencing performance issues with the OPTIMIZE command when migrating historical data into a table with liquid clustering. Specifically, I am processing one year’s worth of data at a time. For example:The OPTIMIZE co...

  • 968 Views
  • 1 replies
  • 0 kudos
Latest Reply
HimanshuSingh
New Contributor II
  • 0 kudos

Did you got any solution? If Yes please post it.

  • 0 kudos
yuinagam
by New Contributor II
  • 185 Views
  • 2 replies
  • 0 kudos

how can I verify that the result of a dlt will have enough rows before updating the table?

I have a dlt/lakeflow pipeline that creates a table, and I need to make sure that it will only update the resulting materialized view if it will have more than one million records.I've found this, but it seems to only work if I have already updated t...

  • 185 Views
  • 2 replies
  • 0 kudos
Latest Reply
yuinagam
New Contributor II
  • 0 kudos

Thank you for the quick reply.Is there a common/recommended/possible way to work around this limitation? I don't mind not using the expectation api if it doesn't support logic that's based on aggregations.

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels