cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

erigaud
by Honored Contributor
  • 9015 Views
  • 3 replies
  • 3 kudos

Get total number of files of a Delta table

I'm looking to know programatically how many files a delta table is made of.I know I can do %sqlDESCRIBE DETAIL my_tableBut that would only give me the number of files of the current version. I am looking to know the total number of files (basically ...

  • 9015 Views
  • 3 replies
  • 3 kudos
Latest Reply
gmiguel
Databricks Partner
  • 3 kudos

The best way to get this is executing the following statement:ANALYZE TABLE [table_name] COMPUTE STORAGE METRICS;Applies to: Databricks Runtime 18.0 and above

  • 3 kudos
2 More Replies
plankton
by Visitor
  • 18 Views
  • 0 replies
  • 0 kudos

R plots not rendering

Has anyone been experiencing the issue of R plots not rendering in notebooks, starting a few days ago?t's not related to splarkly or plotly, or specifc data types, or anything. For example in base R: plot(1:3, 5:7) calculates without error, but does ...

  • 18 Views
  • 0 replies
  • 0 kudos
AlexM
by Visitor
  • 28 Views
  • 0 replies
  • 0 kudos

Serverless Custom Environment Imaging

Hi,I'm looking at moving from job clusters to serverless environments. Ideally to reduce cost and improve start up time.I can see that it is now possible to specify a custom environment .yaml file - and specify Python packages to be installed.Is ther...

  • 28 Views
  • 0 replies
  • 0 kudos
flourishingsing
by New Contributor III
  • 43 Views
  • 1 replies
  • 0 kudos

Resolved! How can retrieve backfill run parameter in Python?

I'm trying to run backfill with the following parameter. How can I access this in the Python script?Do I need to change anything in the yml?I usually set task parameters the following way:These are then parsed using argparse Python module.  

flourishingsing_0-1779284296139.png flourishingsing_1-1779284438804.png
  • 43 Views
  • 1 replies
  • 0 kudos
Latest Reply
flourishingsing
New Contributor III
  • 0 kudos

Found the following solution:Add job level parameters:parameters: - name: run_timestamp default: "some_default_value" Reference in task level parameters:tasks: - task_key: my_task spark_python_task: python_file: ../../script.py ...

  • 0 kudos
manish_de
by New Contributor II
  • 377 Views
  • 5 replies
  • 5 kudos

query based connector snapshot feature

In ingestion pipeline, for query based connector there is option of selecting batch snapshot instead of column name under dropdown - Cursor column. If I choose batch snapshot, will the databricks engine run select * from my source table, say Sql serv...

  • 377 Views
  • 5 replies
  • 5 kudos
Latest Reply
michaelfriendly
New Contributor II
  • 5 kudos

@rbtv It may execute something very similar to a `SELECT *` on the source table unless the platform adds its own partitioning or optimisation behind the scenes. From what I've observed, selecting batch snapshot often means the connector handles each ...

  • 5 kudos
4 More Replies
koen_hai
by New Contributor II
  • 82 Views
  • 2 replies
  • 0 kudos

Resolved! Custom and community connectors

Hi,The option to enable custom and community connectors does not seem to be available on the Previews page, how can this be enabled? Feature I'm referencing: Community connectors in Lakeflow Connect - Azure Databricks | Microsoft Learn

  • 82 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @koen_hai, The Community Connectors feature is controlled from the workspace-level Previews page by a workspace admin. If you don’t see that option there, the workspace likely hasn’t been enrolled for the preview yet. In that case, please contact ...

  • 0 kudos
1 More Replies
RTabur
by New Contributor III
  • 2506 Views
  • 4 replies
  • 2 kudos

[Bug] Orphan storage location

Hello,I'm not able to re-create an external location after removing its owner from Databricks Account. I'm getting the following error:Input path url 'abfss://foo@bar.dfs.core.windows.net/' overlaps with an existing external location within 'CreateEx...

  • 2506 Views
  • 4 replies
  • 2 kudos
Latest Reply
PL_db
Databricks Employee
  • 2 kudos

Your metastore admin can list all external locationsYour metastore admin can then drop the external location 

  • 2 kudos
3 More Replies
mnissen1337
by New Contributor II
  • 83 Views
  • 1 replies
  • 1 kudos

Resolved! Managing Default Start State for Continuous Streaming Jobs in Databricks Asset Bundles

 â€™ve created a notebook that uses Spark Structured Streaming and runs continuously, so I’ve deployed the corresponding Databricks job using the continuous trigger mode.What I’d like is for this job to start automatically only in certain environments ...

  • 83 Views
  • 1 replies
  • 1 kudos
Latest Reply
mnissen1337
New Contributor II
  • 1 kudos

I figured out that the continuous property has a pause_status aswell, not sure why I did not see this. So I think the above is solved!

  • 1 kudos
mnissen1337
by New Contributor II
  • 103 Views
  • 3 replies
  • 0 kudos

Resolved! Best Compute Option for Near-Real-Time Databricks API Ingestion Pipeline

I’ve built an ingestion pipeline in Databricks consisting of two notebooks:The first notebook calls an external API every four minutes to retrieve the latest available data.Each API call returns approximately 109 rows.The API only exposes the most re...

  • 103 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @mnissen1337 ,I would use serverless for that use case. It takes a time for job cluster to spin up (of course you can use pools, but given that your job needs to run every 5 minutes it doesn't make much sense), so serverless seems to be a great fi...

  • 0 kudos
2 More Replies
Bank_Kirati
by New Contributor III
  • 23 Views
  • 0 replies
  • 0 kudos

Cross-region S3 reads suddenly fail with 400 Bad Request — eu-west-1 metastore to af-south-1 bucket

What changedA production daily job that has worked unchanged for ~8 months started failing on 2026-05-18 ~23:46 UTC. The notebook does a plain spark.read.json("s3://BUCKET/...") against a bucket in af-south-1. The metastore is in eu-west-1. Same code...

  • 23 Views
  • 0 replies
  • 0 kudos
maikel
by Contributor II
  • 134 Views
  • 2 replies
  • 0 kudos

Job tasks monitoring

Hello Community,We have a case in our project that we would like to solve in an elegant and scalable manner. As always, I would really appreciate your suggestions and experience.In short:We have a multi-step job consisting of 4 stages. In one of the ...

  • 134 Views
  • 2 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

I don't think there is anything native for this in Databricks. The closest match would have been system tables (system.lakeflow.job_run_timeline / job_task_run_timeline) but I don't think it will have the necessary grain for what your pattern. There'...

  • 0 kudos
1 More Replies
der
by Valued Contributor
  • 100 Views
  • 4 replies
  • 2 kudos

spark.databricks.sql.excel.enabled false at cluster level

Native databricks excel data source is GAhttps://www.reddit.com/r/databricks/comments/1t4un82/native_excel_support_is_now_ga/https://docs.databricks.com/aws/en/query/formats/excelHowever, as long as it is not possible to read from another adress than...

  • 100 Views
  • 4 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi  @der ,Most likely because spark.databricks.sql.excel.enabled is a Databricks SQL/session-level internal config, not a SparkConf setting.This specific key appears to be read from the Spark SQL session config, so setting it after the notebook sessi...

  • 2 kudos
3 More Replies
Labels