cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DE5
by New Contributor
  • 144 Views
  • 1 replies
  • 1 kudos

Resolved! Unable to see the Assistant suggested code and current code side by side

Hi,I'm unable to see the Assistant suggested code and current code side by side. Previously I'm able to see the my code and Assistant suggested code side by side which helped me to understand the changes. Please suggest if there is any ways for it. T...

  • 144 Views
  • 1 replies
  • 1 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 1 kudos

@DE5 Some recent updates moved comparison features into the SQL Editor side panel or rely on “Cell Actions,” where you can generate code or format it and then see differences before applying changeshttps://www.databricks.com/blog/introducing-new-sql-...

  • 1 kudos
Dhruv-22
by Contributor II
  • 674 Views
  • 7 replies
  • 6 kudos

Reading empty json file in serverless gives error

I ran a databricks notebook to do incremental loads from files in raw layer to bronze layer tables. Today, I encountered a case where the delta file was empty. I tried running it manually on the serverless compute and encountered an error.df = spark....

  • 674 Views
  • 7 replies
  • 6 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 6 kudos

Hello @Dhruv-22 , Can you share the schema of the df? Do you have a _corrupt_record column in your dataframe? If yes.. where are you getting it from, because you said its an empty file correct?As per the design ,Spark blocks queries that only referen...

  • 6 kudos
6 More Replies
dbdev
by Contributor
  • 327 Views
  • 3 replies
  • 1 kudos

Resolved! Lakehouse Federation - fetch size parameter for optimization

Hi,We use lakehouse federation to connect to a database.A performance recommendation is to use 'fetchSize':Lakehouse Federation performance recommendations - Azure Databricks | Microsoft Learn SELECT * FROM mySqlCatalog.schema.table WITH ('fetchSiz...

  • 327 Views
  • 3 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hello @dbdev , I did some digging and here are some suggestions. The `fetchSize` parameter in Lakehouse Federation is currently only available through SQL syntax using the `WITH` clause, as documented in the performance recommendations. Unfortunately...

  • 1 kudos
2 More Replies
databricksero
by New Contributor II
  • 380 Views
  • 2 replies
  • 3 kudos

Resolved! Databricks Bundle Validation Error After CLI Upgrade (0.274.0 → 0.276.0)

After upgrading the Databricks CLI from version 0.274.0 to 0.276.0, bundle validation is failing with an error indicating that my configuration is formatted for "open-source Spark Declarative Pipelines" while the CLI now only supports "Lakeflow Decla...

  • 380 Views
  • 2 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @databricksero ,It's a bug. I've checked and the PR fixing this bug is already merged to main branch. Check below github thread and then once they build new release just update databricks CLI (soon they should release version without bug). Fix oss...

  • 3 kudos
1 More Replies
erigaud
by Honored Contributor
  • 3515 Views
  • 10 replies
  • 8 kudos

Databricks asset bundles and Dashboards - pass parameters depending on bundle target

Hello everyone !Since Databricks Asset Bundles can now be used to deploy dashboards, I'm wondering how to pass parameters so that the queries for the dev dashboard query the dev catalog, and the dashboard in stg query the stg catalog etc.Is there any...

  • 3515 Views
  • 10 replies
  • 8 kudos
Latest Reply
Coffee77
Contributor III
  • 8 kudos

What I did as a workaround. It works pretty fine but you'll need to duplicate Dashboard JSON code per environment and then, replace catalog names  It is not the perfect solution but the only way I could find to include these deployment in my Databric...

  • 8 kudos
9 More Replies
LBISWAS
by New Contributor
  • 175 Views
  • 1 replies
  • 0 kudos

Search result shows presence of a text in notebook, but its not present in notebook

Search result shows presence of a text in notebook, but its not present in notebook

  • 175 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

Ah yes a classic.  The search also looks into hidden/collapsed content which is not visible.F.e. results or metadata.

  • 0 kudos
02CSE33
by New Contributor
  • 385 Views
  • 2 replies
  • 0 kudos

Resolved! Migrating SQL Server Tables and Views to Databricks using Lakebridge

We have a requirement to carry out migration of few 100 tables which are present in SQL Server to Databricks Delta Table. We intend to explore Lakebridge capability for carrying out a PoC for this. We also want to migrate few historic records say las...

  • 385 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Migrating several hundred SQL Server tables to Databricks Delta Lake, using Lakebridge for a Proof of Concept (PoC), can be approached with custom pipelines—especially for filtering by a date/time column to migrate only the last two years of data. Of...

  • 0 kudos
1 More Replies
gudurusreddy99
by New Contributor II
  • 295 Views
  • 1 replies
  • 1 kudos

Resolved! DLT or DP: How to do full refresh of Delta table from DLT Pipeline to consider all records from Tbl

RequirementI have a Kafka streaming pipeline that ingests Pixels data. For each incoming record, I need to validate the Pixels key against an existing Delta table (pixel_tracking_data), which contains over 2 billion records accumulated over the past ...

  • 295 Views
  • 1 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

Matching streaming data in real time against a massive, fast-changing Delta table requires careful architectural choices. In your case, latency is high for the most recent records, and the solution only matches against data ≥10 minutes old. This is a...

  • 1 kudos
der
by Contributor II
  • 749 Views
  • 10 replies
  • 0 kudos

Rasterio on shared/standard cluster has no access to proj.db

We try to use rasterio on a Databricks shared/standard cluster with DBR 17.1. Rasterio is directly installed on the cluster as library. Code:import rasterio rasterio.show_versions()Output: rasterio info:rasterio: 1.4.3GDAL: 3.9.3PROJ: 9.4.1GEOS: 3.11...

  • 749 Views
  • 10 replies
  • 0 kudos
Latest Reply
der
Contributor II
  • 0 kudos

Current Workaround:If you select the "Photon" engine on a Standard/Shared Cluster, they change the access rights of /databricks/native/proj-data and rasterio works fine.The downside:Pay for "Photon" compute to use a Python library, which do not use S...

  • 0 kudos
9 More Replies
jano
by New Contributor III
  • 258 Views
  • 2 replies
  • 2 kudos

Resolved! DABs with multi github sources

I want to deploy a dabs that has dev using a github branch and prod using a github release tag. I can't seem to find a way to make this part dynamic based on the target. Things I've tried:- Setting the git varaible in the databricks.yml- making the g...

  • 258 Views
  • 2 replies
  • 2 kudos
Latest Reply
jano
New Contributor III
  • 2 kudos

I ended up finding this discussion which mostly ended up working. What was not mentioned is the first resources block should be in the job.yml and the overwrite parameters mentioned below are in the databricks.yml. You cannot put both in the databric...

  • 2 kudos
1 More Replies
Volker
by Contributor
  • 3872 Views
  • 5 replies
  • 4 kudos

Asset Bundles cannot run job with single node job cluster

Hello community,we are deploying a job using asset bundles and the job should run on a single node job cluster. Here is the DAB job definition:resources: jobs: example_job: name: example_job tasks: - task_key: main_task ...

  • 3872 Views
  • 5 replies
  • 4 kudos
Latest Reply
kunalmishra9
Contributor
  • 4 kudos

In case this is now breaking for anyone (as it is for me), there's an update here to follow along with on how to define single node compute!https://github.com/databricks/databricks-sdk-py/issues/881

  • 4 kudos
4 More Replies
hanspetter
by New Contributor III
  • 66941 Views
  • 21 replies
  • 7 kudos

Resolved! Is it possible to get Job Run ID of notebook run by dbutils.notbook.run?

When running a notebook using dbutils.notebook.run from a master-notebook, an url to that running notebook is printed, i.e.: Notebook job #223150 Notebook job #223151 Are there any ways to capture that Job Run ID (#223150 or #223151)? We have 50 or ...

  • 66941 Views
  • 21 replies
  • 7 kudos
Latest Reply
no2
New Contributor II
  • 7 kudos

Thanks for the response @Manoj5 - I had to use this "safeToJson()" option too because all of the previous suggestions in this thread were erroring out for me with a message like "py4j.security.Py4JSecurityException: Method public java.lang.String com...

  • 7 kudos
20 More Replies
Richard_547342
by New Contributor III
  • 3651 Views
  • 2 replies
  • 2 kudos

Resolved! Column comments in DLT python notebook

The SQL API specification in the DLT docs shows an option for adding column comments when creating a table. Is there an equivalent way to do this when creating a DLT pipeline with a python notebook? The Python API specification in the DLT docs does n...

  • 3651 Views
  • 2 replies
  • 2 kudos
Latest Reply
jonathandbyrd
New Contributor II
  • 2 kudos

this works in a readStream writeStream scenario for us, but the exact same code fails when put in a DLT

  • 2 kudos
1 More Replies
alex307
by New Contributor II
  • 315 Views
  • 1 replies
  • 2 kudos

Resolved! How to Stop Driver Node from Overloading When Using ThreadPoolExecutor in Databricks

Hi everyone,I'm using a ThreadPoolExecutor in Databricks to run multiple notebooks at the same time. The problem is that it seems like all the processing happens on the driver node, while the executor nodes are idle. This causes the driver to run out...

  • 315 Views
  • 1 replies
  • 2 kudos
Latest Reply
mmayorga
Databricks Employee
  • 2 kudos

Greetings @alex307 and thank you for sending your question. When using ThreadPoolExecutor to run multiple notebooks concurrently in Databricks, the workload is being executed on the driver node rather than distributed across Spark executors. This res...

  • 2 kudos
vartyg
by New Contributor
  • 263 Views
  • 2 replies
  • 0 kudos

Scaling Declarative Streaming Pipelines for CDC from On-Prem Database to Lakehouse

We have a scenario where we need to mirror thousands of tables from on-premises Db2 databases to an Azure Lakehouse. The goal is to create mirror Delta tables in the Lakehouse.Since LakeFlow Connect currently does not support direct mirroring from on...

  • 263 Views
  • 2 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

Yes, a databricks labs project seems perfect for your scenario. https://databrickslabs.github.io/dlt-meta/index.html  

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels