cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

gauravchaturved
by New Contributor II
  • 590 Views
  • 1 replies
  • 1 kudos

Resolved! Can I delete specific partition from a Delta Live Table?

if I have created a Delta Live Table with partition on a column (lets say a date column) from a Stream Source, can I delete the partition for specific date values later to save on cost & to keep the table lean? if I can, then -1- how to do it?2- do I...

  • 590 Views
  • 1 replies
  • 1 kudos
Latest Reply
raphaelblg
Honored Contributor
  • 1 kudos

Hello @gauravchaturved , You can remove the partition by filtering it in your source code and triggering a full refresh in your pipeline. There is no need to run vacuum, as DLT has maintenance clusters that perform OPTIMIZE and VACUUM operations on y...

  • 1 kudos
StephanKnox
by New Contributor II
  • 1150 Views
  • 3 replies
  • 2 kudos

Unit Testing with PyTest in Databricks - ModuleNotFoundError

Dear all,I am following the guide in this article: https://docs.databricks.com/en/notebooks/testing.htmlhowever I am unable to run pytest due to the following error: ImportError while importing test module '/Workspace/Users/deadmanhide@gmail.com/test...

  • 1150 Views
  • 3 replies
  • 2 kudos
Latest Reply
StephanKnox
New Contributor II
  • 2 kudos

PS: I have restarted the cluster and ran my run_tests notebook again and now I am getting a different error:E File "/Workspace/Repos/SBIT/SBIT/test_trans.py", line 36 E from transform_functions import * E ^ E SyntaxError: import * only allowed at mod...

  • 2 kudos
2 More Replies
Paul92S
by New Contributor III
  • 3226 Views
  • 3 replies
  • 3 kudos

Resolved! DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Hi,I am having an issue of loading source data into a delta table/ unity catalog. The error we are recieving is the following:grpc_message:"[DELTA_EXCEED_CHAR_VARCHAR_LIMIT] Exceeds char/varchar type length limitation. Failed check: (isnull(\'metric_...

  • 3226 Views
  • 3 replies
  • 3 kudos
Latest Reply
willflwrs
New Contributor II
  • 3 kudos

Setting this config change before making the write command solved it for us:  spark.conf.set("spark.sql.legacy.charVarcharAsString", True) 

  • 3 kudos
2 More Replies
NarenderKumar
by New Contributor III
  • 1426 Views
  • 3 replies
  • 2 kudos

Unable to connect with Databricks Serverless SQL using Dbeaver

I am trying to connect to databricks serverless SQL pool using DBeaver as mentioned in the documentation below:https://learn.microsoft.com/en-us/azure/databricks/dev-tools/dbeaverI am trying to use the Browser based authentication i.e (OAuth user-to-...

  • 1426 Views
  • 3 replies
  • 2 kudos
Latest Reply
binsel
New Contributor III
  • 2 kudos

I'm having the same problem. Any update?

  • 2 kudos
2 More Replies
youcanlearn
by New Contributor III
  • 1018 Views
  • 3 replies
  • 2 kudos

Resolved! Databricks Expectations

In the example in https://docs.databricks.com/en/delta-live-tables/expectations.html#fail-on-invalid-records, it wrote that one is able to query the DLT event log for such expectations violation. In Databricks, I can use expectation to fail or drop r...

  • 1018 Views
  • 3 replies
  • 2 kudos
Latest Reply
brockb
Valued Contributor
  • 2 kudos

That's right, the "reason" would be "x1 is negative" in your example and "valid_max_length" in the example JSON payload that I shared.If you are looking for a descriptive reason, you would name the expectation accordingly such as: @Dlt.expect_or_fail...

  • 2 kudos
2 More Replies
guizsantos
by New Contributor II
  • 608 Views
  • 2 replies
  • 3 kudos

Resolved! How to obtain a query profile programatically?

Hi everyone! Does anyone know if there is a way to obtain the data used to create the graph showed in the "Query profile" section? Particularly, I am interested in the rows produced by the intermediary query operations. I can see there is "Download" ...

  • 608 Views
  • 2 replies
  • 3 kudos
Latest Reply
guizsantos
New Contributor II
  • 3 kudos

Hey @raphaelblg , thanks for you input!I understand that some info may be obtained by the `EXPLAIN` command, however, the output is not very clear on its meaning and definetely does not provide what is most interesting to us, which is the rows proces...

  • 3 kudos
1 More Replies
Sambit_S
by New Contributor III
  • 1242 Views
  • 9 replies
  • 0 kudos

Databricks Autoloader File Notification Not Working As Expected

Hello Everyone,In my project I am using databricks autoloader to incrementally and efficiently processes new data files as they arrive in cloud storage.I am using file notification mode with event grid and queue service setup in azure storage account...

  • 1242 Views
  • 9 replies
  • 0 kudos
Latest Reply
matthew_m
New Contributor III
  • 0 kudos

Hi @Sambit_S , I misread inputRows as inputFiles which aren't the same thing. Considering the limitation on Azure queue, if you are already at the limit then you may need to consider to switching to an event source such as Kafka or Event Hub to get b...

  • 0 kudos
8 More Replies
asingamaneni
by New Contributor II
  • 569 Views
  • 1 replies
  • 0 kudos

Databricks Summit 2023

Databricks summit 2023 have been fantastic and I got a chance to meet many authors and industry leaders whom I admire in the DataEngineering community! #DataAISummit

  • 569 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @asingamaneni, We're thrilled to hear that you had a great experience at DAIS 2023! Your feedback is valuable to us, and we appreciate you taking the time to share it on the community platform. We wanted to let you know that the Databricks Communi...

  • 0 kudos
Tidaldata
by New Contributor
  • 512 Views
  • 1 replies
  • 0 kudos

Loveing Databricks Summit

Loving the summit so far, awesome keynote speakers, great trainers and paid courses. Finished certification #databrickslearning

  • 512 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Tidaldata, We're thrilled to hear that you had a great experience at DAIS 2023! Your feedback is valuable to us, and we appreciate you taking the time to share it on the community platform. We wanted to let you know that the Databricks Community ...

  • 0 kudos
ws4100e
by New Contributor III
  • 2873 Views
  • 8 replies
  • 0 kudos

DLT piplines with UC

I try to run a (very simple) DLT pipeline in with a resulting materialized table is published in UC schema with a managed storage location defined (within an existing EXTERNAL LOCATION). Accoding to the documentation: Publishing to schemas that speci...

  • 2873 Views
  • 8 replies
  • 0 kudos
Latest Reply
DataGeek_JT
New Contributor II
  • 0 kudos

Did this get resolved?  I am getting the same issue.

  • 0 kudos
7 More Replies
Phani1
by Valued Contributor
  • 299 Views
  • 1 replies
  • 0 kudos

Databricks Platform Cleanup and baseline activities.

Hi Team, Kindly share the best practices for managing Databricks Platform Cleanup and baseline activities.

  • 299 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Phani1, Here are some best practices for managing Databricks Platform Cleanup and baseline activities: Platform Administration: Regularly monitor and manage your Databricks platform to ensure optimal performance.Compute Creation: Choose the ri...

  • 0 kudos
dataslicer
by Contributor
  • 639 Views
  • 2 replies
  • 0 kudos

How to export/clone Databricks Notebook without results via web UI?

When a Databricks Notebook exceeds size limit, it suggests to `clone/export without results`.  This is exactly what I want to do, but the current web UI does not provide the ability to bypass/skip the results in either the `clone` or `export` context...

  • 639 Views
  • 2 replies
  • 0 kudos
Latest Reply
dataslicer
Contributor
  • 0 kudos

Thank you @Yeshwanth for the response. I am looking for a way without clearing up the current outputs. This is necessary because I want to preserve the existing outputs and fork off another notebook instance to run with few parameter changes and come...

  • 0 kudos
1 More Replies
Ramana
by Contributor
  • 1267 Views
  • 3 replies
  • 0 kudos

SHOW GROUPS is not giving groups available at the account level

I am trying to capture all the Databricks groups and their mapping to user/ad group(s).I tried to do this by using show groups, show users, and show grants by following the examples mentioned in the below article but the show groups command only fetc...

  • 1267 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ramana
Contributor
  • 0 kudos

Yes, I can use the Rest API but I am looking for a SQL or Programming way to do this rather than doing the API calls and building the Comex Datatype Dataframe and then saving it as a Table.ThanksRamana

  • 0 kudos
2 More Replies
kseyser
by New Contributor II
  • 686 Views
  • 2 replies
  • 1 kudos

Predicting compute required to run Spark jobs

Im working on a project to predict compute (cores) required to run spark jobs. Has anyone work on this or something similar before? How did you get started? 

  • 686 Views
  • 2 replies
  • 1 kudos
Latest Reply
Yeshwanth
Honored Contributor
  • 1 kudos

@kseyser good day, This documentation might help you in your use-case: https://docs.databricks.com/en/compute/cluster-config-best-practices.html#compute-sizing-considerations Kind regards, Yesh

  • 1 kudos
1 More Replies
Lea
by New Contributor II
  • 5909 Views
  • 1 replies
  • 2 kudos

Resolved! Advice for generic file processing for ingestion of multiple data formats

Hello,We are using delta live tables to ingest data from multiple business groups, each with different input file formats and parsing requirements.  The input files are ingested from azure blob storage.  Right now, we are only servicing three busines...

  • 5909 Views
  • 1 replies
  • 2 kudos
Latest Reply
raphaelblg
Honored Contributor
  • 2 kudos

Hello @Lea , I'd like to inform you that our platform does not currently provide a built-in feature for ingesting multiple or interchangeable file formats. However, we highly value your input and encourage you to share your ideas through Databricks' ...

  • 2 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels