cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Alex79
by New Contributor II
  • 2423 Views
  • 2 replies
  • 0 kudos

Get Job Run output through Rest API call

I have a simple notebook reading a dataframe as input and returning another dataframe, which is as follows:from pyspark.sql import SparkSessionimport pandas as pd, jsonspark = SparkSession.builder \    .appName("Pandas to Spark DataFrame Conversion")...

  • 2423 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 0 kudos

Hi team,{"error_code": "INVALID_PARAMETER_VALUE","message": "Retrieving the output of runs with multiple tasks is not supported..."}means the job you're triggering (job_id = 'my_job_id') is a multi-task job (even if it has only one task). In such cas...

  • 0 kudos
1 More Replies
cool_cool_cool
by New Contributor II
  • 3128 Views
  • 3 replies
  • 0 kudos

Databricks Workflow is stuck on the first task and doesnt do anyworkload

Heya I have a workflow in databricks with 2 tasks. They are configured to run on the same job cluster, and the second task depends on the first.I have a weird behavior that happened twice now - the job takes a long time (it usually finishes within 30...

  • 3128 Views
  • 3 replies
  • 0 kudos
Latest Reply
Sri_M
New Contributor II
  • 0 kudos

@cool_cool_cool I am facing same issue as well.Is this issue resolved for you? If yes, can you please let me know what action have you taken?

  • 0 kudos
2 More Replies
lorenz
by New Contributor III
  • 14901 Views
  • 8 replies
  • 3 kudos

Resolved! Databricks approaches to CDC

I'm interested in learning more about Change Data Capture (CDC) approaches with Databricks. Can anyone provide insights on the best practices and recommendations for utilizing CDC effectively in Databricks? Are there any specific connectors or tools ...

  • 14901 Views
  • 8 replies
  • 3 kudos
Latest Reply
Deekay
New Contributor II
  • 3 kudos

Hi @jcozar ,Thank you so much for your response  I have some queries, it will be really helpful if you can share your thoughts.How are you segregating the tables from raw to bronze? Suppose Debezium is capturing CDCs from 100 tables, all changes are ...

  • 3 kudos
7 More Replies
lezwon
by Contributor
  • 1444 Views
  • 2 replies
  • 3 kudos

Resolved! Install custom wheel from dbfs in serverless enviroment

Hey folks,I have a job that runs on a serverless compute. I have also created a wheel file with custom functions, which I require in this job. I see that from here, we cannot install libraries for a task and must use notebook-scoped libraries. So wha...

  • 1444 Views
  • 2 replies
  • 3 kudos
Latest Reply
loui_wentzel
Databricks Partner
  • 3 kudos

Is your dbfs mounted?Otherwise, try uploading it to your workspace's "shared" folder - this is a common place to put these sorts of files. dbfs is slowly getting phased out and not really in any best practices.

  • 3 kudos
1 More Replies
pooja_bhumandla
by Databricks Partner
  • 912 Views
  • 3 replies
  • 0 kudos

Auto tuning of file size

Why maxFileSize and minFileSize are different from targetFileSize after optimization? What is the significance of targetFileSize? "numRemovedFiles": "2099","numRemovedBytes": "29658974681","p25FileSize": "29701688","numDeletionVectorsRemoved": "0","m...

  • 912 Views
  • 3 replies
  • 0 kudos
Latest Reply
loui_wentzel
Databricks Partner
  • 0 kudos

there could be several different reasons, but mainly, it's because grouping arbitrary data into some target file-size is well... arbitrary.Imagine I gave you a large container of sand and some emtpy buckets, and asked you to move the sand from the co...

  • 0 kudos
2 More Replies
SreedharVengala
by New Contributor III
  • 31718 Views
  • 11 replies
  • 7 kudos

PGP Encryption / Decryption in Databricks

Is there a way to Decrypt / Encrypt Blob files in Databricks using Key stored in Key Vault. What libraries need to be used? Any code snippets? Links?

  • 31718 Views
  • 11 replies
  • 7 kudos
Latest Reply
Junpei_Liang
New Contributor II
  • 7 kudos

anyone has update on this?

  • 7 kudos
10 More Replies
Ramki
by New Contributor
  • 448 Views
  • 1 replies
  • 0 kudos

Lakeflow clarification

Are there options to modify the streaming table after it has been created by the Lakeflow pipeline? In the use case I'm trying to solve, I need to add delta.enableIcebergCompatV2 and delta.universalFormat.enabledFormats to the target streaming table....

  • 448 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @Ramki Yes, you can modify a streaming table created by a LakeFlow pipeline, especially when the pipeline is in triggered mode (not running continuously).In your case, you want to add the following Delta table properties: TBLPROPERTIES ( 'delta....

  • 0 kudos
michelleliu
by New Contributor III
  • 2449 Views
  • 3 replies
  • 2 kudos

Resolved! DLT Performance Issue

I've been seeing patterns in DLT process time in all my pipelines, as in attached screenshot. Each data point is an "update" that's set to "continuous". The process time keeps increasing until a point and drops back to what it's desired to be. This w...

  • 2449 Views
  • 3 replies
  • 2 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 2 kudos

Hi @michelleliu This sawtooth pattern in DLT processing times is actually quite common and typically indicates one of several underlying issues. Here are the most likely causes and solutions:Common Causes1. Memory Pressure & Garbage CollectionProcess...

  • 2 kudos
2 More Replies
alau131
by New Contributor
  • 1195 Views
  • 2 replies
  • 2 kudos

How to dynamically have the parent notebook call on a child notebook?

Hi! I would please like help on how to dynamically call one notebook from another in Databricks and have the parent notebook get the dataframe results from the child notebook. Some background info is that I have a main python notebook and multiple SQ...

  • 1195 Views
  • 2 replies
  • 2 kudos
Latest Reply
jameshughes
Databricks Partner
  • 2 kudos

What you are looking to do is really not the intent of notebooks and you cannot pass complex data types between notebooks. You would need to persist your data frame from the child notebook so your parent notebook could retrieve the results after the ...

  • 2 kudos
1 More Replies
Abel_Martinez
by Contributor
  • 22096 Views
  • 10 replies
  • 10 kudos

Resolved! Why I'm getting connection timeout when connecting to MongoDB using MongoDB Connector for Spark 10.x from Databricks

I'm able to connect to MongoDB using org.mongodb.spark:mongo-spark-connector_2.12:3.0.2 and this code:df = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("uri", jdbcUrl)It works well, but if I install last MongoDB Spark Connector ve...

  • 22096 Views
  • 10 replies
  • 10 kudos
Latest Reply
ravisharma1024
New Contributor II
  • 10 kudos

I was facing the same issue, now It is resolved, and thanks to @Abel_Martinez.I am using this like below code:df = spark.read.format("mongodb") \.option('spark.mongodb.read.connection.uri', "mongodb+srv://*****:*****@******/?retryWrites=true&w=majori...

  • 10 kudos
9 More Replies
vanverne
by New Contributor II
  • 2707 Views
  • 3 replies
  • 1 kudos

Assistance with Capturing Auto-Generated IDs in Databricks SQL

Hello,I am currently working on a project where I need to insert multiple rows into a table and capture the auto-generated IDs for each row. I am using databricks sql connector. Here is a simplified version of my current workflow:I create a temporary...

  • 2707 Views
  • 3 replies
  • 1 kudos
Latest Reply
vanverne
New Contributor II
  • 1 kudos

Thanks for the reply, Alfonso. I noticed you mentioned "Below are a few alternatives...", however, I am not seeing those. Please let me know if I am missing something. Also, do you know if Databricks is working on supporting the RETURNING clause soon...

  • 1 kudos
2 More Replies
Yannic
by New Contributor
  • 994 Views
  • 1 replies
  • 0 kudos

Delete a directory in DBFS recursively from Azure

I have an Azure storage mounted to DBFS. I want to delete a directory inside recursively. I tried both,dbutils.fs.rm(f"/mnt/data/to/delete", True)and%fs rm -r /mnt/data/to/delete In both cases I get the following exception:AzureException: hadoop_azur...

  • 994 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @Yannic Azure Blob Storage doesn't have true directories - it simulates them through blob naming conventions,which can cause issues with recursive deletion operations.Try below one. Delete Files First, Then Directorydef delete_directory_recursive(...

  • 0 kudos
Sainath368
by Contributor
  • 954 Views
  • 1 replies
  • 0 kudos

Data Skipping- Partitioned tables

Hi all,I have a question- how can we modify delta.dataSkippingStatsColumns and compute statistics for a partitioned delta table in Databricks? I want to understand the process and best practices for changing this setting and ensuring accurate statist...

  • 954 Views
  • 1 replies
  • 0 kudos
Latest Reply
paolajara
Databricks Employee
  • 0 kudos

Hi, delta.dataSkippingStatsColumns specifies a coma-separated list of column names used by Delta Lake to collect statistics. It will improve the performance by skipping those columns since it will supersede the default behavior of analyzing the first...

  • 0 kudos
GeKo
by Contributor
  • 4889 Views
  • 8 replies
  • 4 kudos

Resolved! how to specify the runtime version for serverless job

Hello,if I understood correctly.... using a serverless cluster comes always with the latest runtime version, by default.Now I need to stick to e.g. runtime version 15.4 for a certain job, which gets deployed via asset bundles. How do I specify/config...

Data Engineering
assetbundle
serverless
  • 4889 Views
  • 8 replies
  • 4 kudos
Latest Reply
GeKo
Contributor
  • 4 kudos

  • 4 kudos
7 More Replies
Avinash_Narala
by Databricks Partner
  • 4060 Views
  • 9 replies
  • 1 kudos

Redshift Stored Procedure Migration to Databricks

Hi,I want to migrate Redshift SQL Stored Procedures to databricks.As databricks doesn't support the concept of SQL Stored Procedures. How can I do so?

  • 4060 Views
  • 9 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 1 kudos

Databricks docs shows procedures are in public preview and requires runtime 17.0 and above.https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-create-procedure

  • 1 kudos
8 More Replies
Labels