Data Engineering

Forum Posts

Sorted by:

by aav331 • Visitor

2 hours ago

15 Views
1 replies
0 kudos

Unable to install libraries from requirements.txt in a Serverless Job and spark_python_task

I am running into the following error while trying to deploy a serverless job running a spark_python_task with GIT as the source for the code. The Job was deployed as part of a DAB from a Github Actions Runner.Run failed with error message Library i...

Data Engineering

15 Views
1 replies
0 kudos

2 hours ago

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

19m ago

0 kudos

Hey @aav331 , here’s a focused analysis of the community post’s issue and how to fix it. Summary of the problem The job is a serverless spark_python_task sourced from Git, and it fails to install packages from a requirements.txt because the file i...

0 kudos

19m ago

by dbdev • Contributor

Tuesday

62 Views
3 replies
0 kudos

Lakehouse Federation - fetch size parameter for optimization

Hi,We use lakehouse federation to connect to a database.A performance recommendation is to use 'fetchSize':Lakehouse Federation performance recommendations - Azure Databricks | Microsoft Learn SELECT * FROM mySqlCatalog.schema.table WITH ('fetchSiz...

Data Engineering

62 Views
3 replies
0 kudos

Tuesday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

35m ago

0 kudos

Hello @dbdev , I did some digging and here are some suggestions. The `fetchSize` parameter in Lakehouse Federation is currently only available through SQL syntax using the `WITH` clause, as documented in the performance recommendations. Unfortunately...

0 kudos

35m ago

2 More Replies

by hgm251 • New Contributor

6 hours ago

18 Views
1 replies
0 kudos

badrequest: cannot create online table is being deprecated. creating new online table is not allowed

Hello!This seems so sudden that we cannot create online tables anymore? Is there a workaround to being able to create online tables temporarily as we need more time to move to synced tables? #online_tables

Data Engineering

18 Views
1 replies
0 kudos

6 hours ago

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor

5 hours ago

0 kudos

Yes, the Databricks online tables (legacy) are being deprecated, and after January 15, 2026, you will no longer be able to access or create them.https://docs.databricks.com/aws/en/machine-learning/feature-store/migrate-from-online-tablesHere are few ...

0 kudos

5 hours ago

by databricksero • New Contributor II

15 hours ago

33 Views
2 replies
3 kudos

Databricks Bundle Validation Error After CLI Upgrade (0.274.0 → 0.276.0)

After upgrading the Databricks CLI from version 0.274.0 to 0.276.0, bundle validation is failing with an error indicating that my configuration is formatted for "open-source Spark Declarative Pipelines" while the CLI now only supports "Lakeflow Decla...

Data Engineering

33 Views
2 replies
3 kudos

15 hours ago

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

6 hours ago

3 kudos

Hi @databricksero ,It's a bug. I've checked and the PR fixing this bug is already merged to main branch. Check below github thread and then once they build new release just update databricks CLI (soon they should release version without bug). Fix oss...

3 kudos

6 hours ago

1 More Replies

by Y_WANG • Visitor

8 hours ago

18 Views
1 replies
0 kudos

Want to use DataFrame equality functions but also Numpy >= 2.0

In my team, we has a lot of Data science workflow using Spark and Pandas. In order to rassure the stability of workflows, we need to implement the unit test. Recently, I found out the DataFrame equality test functions introduced in Spark 3.5 which se...

Data Engineering

18 Views
1 replies
0 kudos

8 hours ago

View Replies

Latest Reply

ManojkMohan
Honored Contributor

6 hours ago

0 kudos

@Y_WANG The root cause of the AttributeError you face when importing assertDataFrameEqual from pyspark.testing in Spark 3.5 is due to Spark's code using the deprecated np.NaN attribute, which was removed in NumPy 2.0 (replaced by np.nan). This break...

0 kudos

6 hours ago

by der • Contributor II

Tuesday

103 Views
5 replies
0 kudos

EXCEL_DATA_SOURCE_NOT_ENABLED Excel data source is not enabled in this cluster

I want to read an Excel xlsx file on DBR 17.3. On the Cluster the library dev.mauch:spark-excel_2.13:4.0.0_0.31.2 is installed. V1 Implementation works fine:df = spark.read.format("dev.mauch.spark.excel").schema(schema).load(excel_file) display(df)V2...

Data Engineering

103 Views
5 replies
0 kudos

Tuesday

View Replies

Latest Reply

mmayorga
Databricks Employee

7 hours ago

0 kudos

hi @der First of all thank you for your patience and for providing more information about your case. Use of ".format("excel")" I replicated equally your cluster config in Azure. Without installing any library, I was able to run and load the xlsx fil...

0 kudos

7 hours ago

4 More Replies

by erigaud • Honored Contributor

12-02-2024 7:08:32 AM

2948 Views
10 replies
8 kudos

Databricks asset bundles and Dashboards - pass parameters depending on bundle target

Hello everyone !Since Databricks Asset Bundles can now be used to deploy dashboards, I'm wondering how to pass parameters so that the queries for the dev dashboard query the dev catalog, and the dashboard in stg query the stg catalog etc.Is there any...

Data Engineering

2948 Views
10 replies
8 kudos

12-02-2024 7:08:32 AM

View Replies

Latest Reply

Coffee77
Contributor

7 hours ago

8 kudos

What I did as a workaround. It works pretty fine but you'll need to duplicate Dashboard JSON code per environment and then, replace catalog names It is not the perfect solution but the only way I could find to include these deployment in my Databric...

8 kudos

7 hours ago

9 More Replies

by bidek56 • Contributor

a week ago

101 Views
3 replies
0 kudos

Location of spark.scheduler.allocation.file

In DBR 164.LTS, I am trying to add the following Spark config: spark.scheduler.allocation.file: file:/Workspace/init/fairscheduler.xmlBut the all purpose cluster is throwing this error Spark error: Driver down cause: com.databricks.backend.daemon.dri...

Data Engineering

101 Views
3 replies
0 kudos

a week ago

View Replies

Latest Reply

bidek56
Contributor

9 hours ago

0 kudos

@mark_ott Setting WSFS_ENABLE=false does not effect anything. Thx

0 kudos

9 hours ago

2 More Replies

by LBISWAS • New Contributor

yesterday

20 Views
1 replies
0 kudos

Search result shows presence of a text in notebook, but its not present in notebook

Data Engineering

20 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12 hours ago

0 kudos

Ah yes a classic. The search also looks into hidden/collapsed content which is not visible.F.e. results or metadata.

0 kudos

12 hours ago

by 02CSE33 • New Contributor

Monday

69 Views
2 replies
0 kudos

Migrating SQL Server Tables and Views to Databricks using Lakebridge

We have a requirement to carry out migration of few 100 tables which are present in SQL Server to Databricks Delta Table. We intend to explore Lakebridge capability for carrying out a PoC for this. We also want to migrate few historic records say las...

Data Engineering

69 Views
2 replies
0 kudos

Monday

View Replies

Latest Reply

mark_ott
Databricks Employee

12 hours ago

0 kudos

Migrating several hundred SQL Server tables to Databricks Delta Lake, using Lakebridge for a Proof of Concept (PoC), can be approached with custom pipelines—especially for filtering by a date/time column to migrate only the last two years of data. Of...

0 kudos

12 hours ago

1 More Replies

by gudurusreddy99 • New Contributor II

Monday

48 Views
1 replies
0 kudos

DLT or DP: How to do full refresh of Delta table from DLT Pipeline to consider all records from Tbl

RequirementI have a Kafka streaming pipeline that ingests Pixels data. For each incoming record, I need to validate the Pixels key against an existing Delta table (pixel_tracking_data), which contains over 2 billion records accumulated over the past ...

Data Engineering

48 Views
1 replies
0 kudos

Monday

View Replies

Latest Reply

mark_ott
Databricks Employee

12 hours ago

0 kudos

Matching streaming data in real time against a massive, fast-changing Delta table requires careful architectural choices. In your case, latency is high for the most recent records, and the solution only matches against data ≥10 minutes old. This is a...

0 kudos

12 hours ago

by der • Contributor II

2 weeks ago

443 Views
10 replies
0 kudos

Rasterio on shared/standard cluster has no access to proj.db

We try to use rasterio on a Databricks shared/standard cluster with DBR 17.1. Rasterio is directly installed on the cluster as library. Code:import rasterio rasterio.show_versions()Output: rasterio info:rasterio: 1.4.3GDAL: 3.9.3PROJ: 9.4.1GEOS: 3.11...

Data Engineering

443 Views
10 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

der
Contributor II

13 hours ago

0 kudos

Current Workaround:If you select the "Photon" engine on a Standard/Shared Cluster, they change the access rights of /databricks/native/proj-data and rasterio works fine.The downside:Pay for "Photon" compute to use a Python library, which do not use S...

0 kudos

13 hours ago

9 More Replies

by jano • New Contributor III

Monday

78 Views
2 replies
0 kudos

Resolved! DABs with multi github sources

I want to deploy a dabs that has dev using a github branch and prod using a github release tag. I can't seem to find a way to make this part dynamic based on the target. Things I've tried:- Setting the git varaible in the databricks.yml- making the g...

Data Engineering

78 Views
2 replies
0 kudos

Monday

View Replies

Latest Reply

jano
New Contributor III

yesterday

0 kudos

I ended up finding this discussion which mostly ended up working. What was not mentioned is the first resources block should be in the job.yml and the overwrite parameters mentioned below are in the databricks.yml. You cannot put both in the databric...

0 kudos

yesterday

1 More Replies

by Volker • Contributor

06-07-2024 7:02:19 AM

3258 Views
5 replies
4 kudos

Asset Bundles cannot run job with single node job cluster

Hello community,we are deploying a job using asset bundles and the job should run on a single node job cluster. Here is the DAB job definition:resources: jobs: example_job: name: example_job tasks: - task_key: main_task ...

Data Engineering

3258 Views
5 replies
4 kudos

06-07-2024 7:02:19 AM

View Replies

Latest Reply

kunalmishra9
Contributor

yesterday

4 kudos

In case this is now breaking for anyone (as it is for me), there's an update here to follow along with on how to define single node compute!https://github.com/databricks/databricks-sdk-py/issues/881

4 kudos

yesterday

4 More Replies

by hanspetter • New Contributor III

08-02-2017 12:26:46 AM

65419 Views
21 replies
7 kudos

Resolved! Is it possible to get Job Run ID of notebook run by dbutils.notbook.run?

When running a notebook using dbutils.notebook.run from a master-notebook, an url to that running notebook is printed, i.e.: Notebook job #223150 Notebook job #223151 Are there any ways to capture that Job Run ID (#223150 or #223151)? We have 50 or ...

Data Engineering

65419 Views
21 replies
7 kudos

08-02-2017 12:26:46 AM

View Replies

Latest Reply

no2
New Contributor II

yesterday

7 kudos

Thanks for the response @Manoj5 - I had to use this "safeToJson()" option too because all of the previous suggestions in this thread were erroring out for me with a message like "py4j.security.Py4JSecurityException: Method public java.lang.String com...

7 kudos

yesterday

20 More Replies

Databricks Community

Forum Posts

Unable to install libraries from requirements.txt in a Serverless Job and spark_python_task

Lakehouse Federation - fetch size parameter for optimization

badrequest: cannot create online table is being deprecated. creating new online table is not allowed

Databricks Bundle Validation Error After CLI Upgrade (0.274.0 → 0.276.0)

Want to use DataFrame equality functions but also Numpy >= 2.0

EXCEL_DATA_SOURCE_NOT_ENABLED Excel data source is not enabled in this cluster

Databricks asset bundles and Dashboards - pass parameters depending on bundle target

Location of spark.scheduler.allocation.file

Search result shows presence of a text in notebook, but its not present in notebook

Migrating SQL Server Tables and Views to Databricks using Lakebridge

DLT or DP: How to do full refresh of Delta table from DLT Pipeline to consider all records from Tbl

Rasterio on shared/standard cluster has no access to proj.db

Resolved! DABs with multi github sources

Asset Bundles cannot run job with single node job cluster

Resolved! Is it possible to get Job Run ID of notebook run by dbutils.notbook.run?

Join Us as a Local Community Builder!

DABs with multi github sources

DLT Streaming With Watermark fails, suggesting I s...

Bug in Asset Bundle Sync

Migrating from on-premises HDFS to Unity Catalog -...

Webinars