cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

alau131
by New Contributor
  • 831 Views
  • 2 replies
  • 2 kudos

How to dynamically have the parent notebook call on a child notebook?

Hi! I would please like help on how to dynamically call one notebook from another in Databricks and have the parent notebook get the dataframe results from the child notebook. Some background info is that I have a main python notebook and multiple SQ...

  • 831 Views
  • 2 replies
  • 2 kudos
Latest Reply
jameshughes
Contributor II
  • 2 kudos

What you are looking to do is really not the intent of notebooks and you cannot pass complex data types between notebooks. You would need to persist your data frame from the child notebook so your parent notebook could retrieve the results after the ...

  • 2 kudos
1 More Replies
Abel_Martinez
by Contributor
  • 20307 Views
  • 10 replies
  • 10 kudos

Resolved! Why I'm getting connection timeout when connecting to MongoDB using MongoDB Connector for Spark 10.x from Databricks

I'm able to connect to MongoDB using org.mongodb.spark:mongo-spark-connector_2.12:3.0.2 and this code:df = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("uri", jdbcUrl)It works well, but if I install last MongoDB Spark Connector ve...

  • 20307 Views
  • 10 replies
  • 10 kudos
Latest Reply
ravisharma1024
New Contributor II
  • 10 kudos

I was facing the same issue, now It is resolved, and thanks to @Abel_Martinez.I am using this like below code:df = spark.read.format("mongodb") \.option('spark.mongodb.read.connection.uri', "mongodb+srv://*****:*****@******/?retryWrites=true&w=majori...

  • 10 kudos
9 More Replies
vanverne
by New Contributor II
  • 2168 Views
  • 3 replies
  • 1 kudos

Assistance with Capturing Auto-Generated IDs in Databricks SQL

Hello,I am currently working on a project where I need to insert multiple rows into a table and capture the auto-generated IDs for each row. I am using databricks sql connector. Here is a simplified version of my current workflow:I create a temporary...

  • 2168 Views
  • 3 replies
  • 1 kudos
Latest Reply
vanverne
New Contributor II
  • 1 kudos

Thanks for the reply, Alfonso. I noticed you mentioned "Below are a few alternatives...", however, I am not seeing those. Please let me know if I am missing something. Also, do you know if Databricks is working on supporting the RETURNING clause soon...

  • 1 kudos
2 More Replies
Yannic
by New Contributor
  • 710 Views
  • 1 replies
  • 0 kudos

Delete a directory in DBFS recursively from Azure

I have an Azure storage mounted to DBFS. I want to delete a directory inside recursively. I tried both,dbutils.fs.rm(f"/mnt/data/to/delete", True)and%fs rm -r /mnt/data/to/delete In both cases I get the following exception:AzureException: hadoop_azur...

  • 710 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @Yannic Azure Blob Storage doesn't have true directories - it simulates them through blob naming conventions,which can cause issues with recursive deletion operations.Try below one. Delete Files First, Then Directorydef delete_directory_recursive(...

  • 0 kudos
Sainath368
by Contributor
  • 648 Views
  • 1 replies
  • 0 kudos

Data Skipping- Partitioned tables

Hi all,I have a question- how can we modify delta.dataSkippingStatsColumns and compute statistics for a partitioned delta table in Databricks? I want to understand the process and best practices for changing this setting and ensuring accurate statist...

  • 648 Views
  • 1 replies
  • 0 kudos
Latest Reply
paolajara
Databricks Employee
  • 0 kudos

Hi, delta.dataSkippingStatsColumns specifies a coma-separated list of column names used by Delta Lake to collect statistics. It will improve the performance by skipping those columns since it will supersede the default behavior of analyzing the first...

  • 0 kudos
GeKo
by Contributor
  • 3015 Views
  • 8 replies
  • 4 kudos

Resolved! how to specify the runtime version for serverless job

Hello,if I understood correctly.... using a serverless cluster comes always with the latest runtime version, by default.Now I need to stick to e.g. runtime version 15.4 for a certain job, which gets deployed via asset bundles. How do I specify/config...

Data Engineering
assetbundle
serverless
  • 3015 Views
  • 8 replies
  • 4 kudos
Latest Reply
GeKo
Contributor
  • 4 kudos

  • 4 kudos
7 More Replies
Avinash_Narala
by Valued Contributor II
  • 3454 Views
  • 9 replies
  • 1 kudos

Redshift Stored Procedure Migration to Databricks

Hi,I want to migrate Redshift SQL Stored Procedures to databricks.As databricks doesn't support the concept of SQL Stored Procedures. How can I do so?

  • 3454 Views
  • 9 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 1 kudos

Databricks docs shows procedures are in public preview and requires runtime 17.0 and above.https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-create-procedure

  • 1 kudos
8 More Replies
JothyGanesan
by New Contributor III
  • 2043 Views
  • 4 replies
  • 1 kudos

Resolved! Streaming data - Merge in Target - DLT

We have streaming inputs coming from streaming tables and also the table from apply_changes.In our target there is only one table which needs to be merged with all the sources. Each source provides different columns in our target table. Challenge: Ev...

  • 2043 Views
  • 4 replies
  • 1 kudos
Latest Reply
vd1
New Contributor II
  • 1 kudos

This can cause concurrent writes issues? Updating same table from multiple streams?

  • 1 kudos
3 More Replies
RevathiTiger
by New Contributor II
  • 4330 Views
  • 3 replies
  • 1 kudos

Expectations vs Great expectations with Databricks DLT pipelines

Hi All,We are working on creating a DQ framework on DLT pipelines in Databricks. Databricks DLT pipelines reads incoming data from Kafka / Files sources. once data is ingested Data validation must happen on top of the ingested data. Customer is evalu...

  • 4330 Views
  • 3 replies
  • 1 kudos
Latest Reply
chanukya-pekala
Contributor III
  • 1 kudos

If you have decided to use DLT, it handles micro batching and checkpointing for you. But typically, we can take more control, if you rewrite the logic using Autoloader or Structured Streaming by custom checkpointing the file directory and maintain yo...

  • 1 kudos
2 More Replies
lmu
by New Contributor III
  • 2386 Views
  • 11 replies
  • 3 kudos

Resolved! Write on External Table with Row Level Security fails

Hey,we are experiencing issues with writing to external tables when using the Unity Catalogue and Row Level Security.As soon as we stop using the serverless compute instance, we receive the following error for writing (Overwrite, append and upsert):E...

  • 2386 Views
  • 11 replies
  • 3 kudos
Latest Reply
lmu
New Contributor III
  • 3 kudos

After further testing, it was found that the dedicated access mode (formerly single user) either does not work or exhibits strange behaviour. In one scenario, the 16.4 cluster with dedicated access mode could write in append mode but not overwrite, a...

  • 3 kudos
10 More Replies
William_Scardua
by Valued Contributor
  • 3462 Views
  • 3 replies
  • 2 kudos

What is the Data Quality Framework do you use/recomend ?

Hi guys,In your opinion what is the best Data Quality Framework (or techinique) do you recommend ? 

Data Engineering
dataquality
  • 3462 Views
  • 3 replies
  • 2 kudos
Latest Reply
chanukya-pekala
Contributor III
  • 2 kudos

DQ is interesting. There are a lot of options in this space. SODA, Great Expectations are kinda well integrate with Databricks setup.I personally try to use dataframe abstractions for validating. We used deequ tool which is very simple to use, just p...

  • 2 kudos
2 More Replies
Samael
by New Contributor II
  • 940 Views
  • 2 replies
  • 1 kudos

Query a "partition metadata logging" enabled external parquet table on Databricks SQL

Hi there,We have a pretty large hive-partitioned parquet table on S3, we followed the document to recreate the table with partition metadata logging on Unity Catalog.We're using Databricks Runtime 16.4 LTS, but despite the release note mentioned that...

  • 940 Views
  • 2 replies
  • 1 kudos
Latest Reply
Samael
New Contributor II
  • 1 kudos

Thanks for helping!Setting table properties unfortunately didn't do the trick. We ended up have a view that points to the latest partition like this for fast queries: SELECT*FROMparquet.`s3://bucket/prefix/partition_column_date=20250616/`We haven't f...

  • 1 kudos
1 More Replies
kenmyers-8451
by Contributor
  • 1049 Views
  • 4 replies
  • 0 kudos

dynamically create file path for sql_task

I am trying to make a reusable workflow where I can run a merge script for any number of tables. The idea is I tell the workflow the table name and/or path to it and it can reference that in the file path field. The simplified idea is below: resource...

  • 1049 Views
  • 4 replies
  • 0 kudos
Latest Reply
jtirila
New Contributor II
  • 0 kudos

Oh, never mind, I got it working. Just using single quotes around the {{  }} part solves it (I guess double quotes would work as well.) I think I tried this yesterday but probably run into another isssue with dashes in task names: https://community.d...

  • 0 kudos
3 More Replies
jommo
by New Contributor
  • 4136 Views
  • 2 replies
  • 0 kudos

Exploring Data Quality Frameworks in Databricks

I’m currently investigating solutions for Data Quality (DQ) within the Databricks environment and would love to hear what frameworks or approaches you are using for this purpose.In the past, I’ve worked with Deequ, but I’ve noticed that it’s not as w...

  • 4136 Views
  • 2 replies
  • 0 kudos
Latest Reply
dataoculus_app
New Contributor III
  • 0 kudos

GE and other DQ tools will fire lot of SQLs, increasing cost and adding delays. so it depends on whats your requirements are. happy to discuss more if you are interested, as I am also going to make such tool available to databricks community as well ...

  • 0 kudos
1 More Replies
Pedro1
by New Contributor II
  • 2710 Views
  • 2 replies
  • 0 kudos

databricks_grants fails because it keeps track of a removed principal

Hi all,My terraform script fails on a databricks_grants with the error: "Error: cannot update grants: Could not find principal with name DataUsers". The principal DataUsers does not exist anymore because it has previously been deleted by terraform.Bo...

  • 2710 Views
  • 2 replies
  • 0 kudos
Latest Reply
wkeifenheim-og
New Contributor II
  • 0 kudos

I'm here searching for a similar but different issue, so this is just a suggestion of something to try..Have you tried setting a depends_on argument within your databricks_grants block?

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels