cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

prajwalpoojary
by Visitor
  • 23 Views
  • 1 replies
  • 0 kudos

Databricks Apps Hosting Backend and Frontend

Hello, I want to host a webapp whose frontend will be on Streamlit and backend running on FastApi. Currently Databricks app listens to host 0.0.0.0 and port 8000 and my backend is running on host '127.0.0.1' and port 8080(if it's available). I want t...

  • 23 Views
  • 1 replies
  • 0 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 0 kudos

Hi @prajwalpoojary , Given you already have Streamlit on 0.0.0.0:8000 and FastAPI on 127.0.0.1:8080, you can keep that split and do server-side calls from Streamlit to http://127.0.0.1:8080/. It’s efficient and avoids cross-origin/auth issues. If you...

  • 0 kudos
Michael_Appiah
by Contributor II
  • 15262 Views
  • 16 replies
  • 11 kudos

Parameterized spark.sql() not working

Spark 3.4 introduced parameterized SQL queries and Databricks also discussed this new functionality in a recent blog post (https://www.databricks.com/blog/parameterized-queries-pyspark)Problem: I cannot run any of the examples provided in the PySpark...

Michael_Appiah_0-1704459542967.png Michael_Appiah_1-1704459570498.png
  • 15262 Views
  • 16 replies
  • 11 kudos
Latest Reply
Malthe
Contributor III
  • 11 kudos

@adriennn this has nothing to do with DLT, but about Databricks providing a different session implementation here than regular Spark.

  • 11 kudos
15 More Replies
dave_d
by New Contributor II
  • 9217 Views
  • 3 replies
  • 0 kudos

What is the "Columnar To Row" node in this simple Databricks SQL query profile?

I am running a relatively simple SQL query that writes back to a table on a Databricks serverless SQL warehouse, and I'm trying to understand why there is a "Columnar To Row" node in the query profile that is consuming the vast majority of the time s...

dave_d_0-1696974904324.png
  • 9217 Views
  • 3 replies
  • 0 kudos
Latest Reply
Annapurna_Hiriy
Databricks Employee
  • 0 kudos

 @dave_d We do not have a document with list of operations that would bring up ColumnarToRow node. This node provides a common executor to translate an RDD of ColumnarBatch into an RDD of InternalRow. This is inserted whenever such a transition is de...

  • 0 kudos
2 More Replies
liquibricks
by Contributor
  • 389 Views
  • 8 replies
  • 4 kudos

Declarative Pipeline error: Name 'kdf' is not defined. Did you mean: 'sdf'

We have a Lakeflow Spark Declarative Pipeline using the new PySpark Pipelines API. This was working fine until about 7am (Central European) this morning when the pipeline started failing with a PYTHON.NAME_ERROR: name 'kdf' is not defined. Did you me...

  • 389 Views
  • 8 replies
  • 4 kudos
Latest Reply
liquibricks
Contributor
  • 4 kudos

It turns out this problem was caused by a package that was pip installed using an init script. This package had for some reason started pulling in pandas 3.x (despite the fact that the package itself had not been updated), and our Databricks contact ...

  • 4 kudos
7 More Replies
YuriS
by New Contributor II
  • 29 Views
  • 1 replies
  • 0 kudos

StreamingQueryListener metrics strange behaviour (inputRowsPerSecond metric is set to 0)

After implementing StreamingQueryListener to enable integration with our monitoring solution we have noticed some strange metrics for our DeltaSource streams (based on https://learn.microsoft.com/en-us/azure/databricks/structured-streaming/stream-mon...

YuriS_0-1769419735190.png YuriS_1-1769419836870.png
  • 29 Views
  • 1 replies
  • 0 kudos
Latest Reply
hasnat_unifeye
New Contributor III
  • 0 kudos

Firstly -  let’s talk about batch vs trigger.A trigger is the scheduling event that tells Spark when to check for new data (eg processingTime, availableNow, once). A batch (micro-batch) is the actual unit of work that processes data, reads input, and...

  • 0 kudos
francisix
by New Contributor III
  • 7171 Views
  • 6 replies
  • 9 kudos

Resolved! I haven't received badge for completion

Hi,Today I completed the test for Lakehouse fundamentals by scored 85%, still I haven't received the badge through my email francis@intellectyx.comKindly let me know please !-Francis

  • 7171 Views
  • 6 replies
  • 9 kudos
Latest Reply
sureshrocks1984
New Contributor II
  • 9 kudos

HI  I completed the test for Databricks Certified Data Engineer Associate on 17 December 2024.  still I haven't received the badge through my email sureshrocks.1984@hotmail.comKindly let me know please !SURESHK 

  • 9 kudos
5 More Replies
Danish11052000
by New Contributor III
  • 164 Views
  • 5 replies
  • 9 kudos

How to get read/write bytes per table using Databricks system tables?

I’m working on a data usage use case and want to understand the right way to get read bytes and written bytes per table in Databricks, especially for Unity Catalog tables.What I wantFor each table, something like:DateTable name (catalog.schema.table)...

  • 164 Views
  • 5 replies
  • 9 kudos
Latest Reply
pradeep_singh
New Contributor II
  • 9 kudos

system.access.audit focuses on governance and admin/security events. It doesn’t capture per-table I/O metrics such as read_bytes or written_bytes.Use system.query.history for per-statement I/O metrics (read_bytes, written_bytes, read_rows, written_ro...

  • 9 kudos
4 More Replies
danny_frontgrad
by New Contributor
  • 230 Views
  • 11 replies
  • 3 kudos

Resolved! Question on Ingestion Pipelines

Is there a better way to select source tables than having to manually select them 1 by 1. I have 96 tables and it's a pain. The gui keeps back to the schema and i have to search through all the tables again. Is there a way to import the tables using ...

  • 230 Views
  • 11 replies
  • 3 kudos
Latest Reply
pradeep_singh
New Contributor II
  • 3 kudos

So you dont see the option to edit the pipeline ?Oronce you click on edit pipeline you dont see the option to Switch to code version(YAML)Or After you Switch to code version(YAML) you can only view that yaml and cant edit it ?

  • 3 kudos
10 More Replies
Ericsson
by New Contributor II
  • 5776 Views
  • 3 replies
  • 1 kudos

SQL week format issue its not showing result as 01(ww)

Hi Folks,I've requirement to show the week number as ww format. Please see the below codeselect weekofyear(date_add(to_date(current_date, 'yyyyMMdd'), +35)). also plz refre the screen shot for result.

result
  • 5776 Views
  • 3 replies
  • 1 kudos
Latest Reply
Fowlkes
New Contributor
  • 1 kudos

What Is Papa’s Freezeria?Papa’s Freezeria is part of the famous Papa Louie game series, where players take on the role of restaurant employees running one of Papa Louie’s many eateries. http://papasfreezeria.online/

  • 1 kudos
2 More Replies
kenny_hero
by New Contributor III
  • 444 Views
  • 7 replies
  • 1 kudos

How do I import a python module when deploying with DAB?

Below is how the folder structure of my project looks like: resources/ |- etl_event/ |- etl_event.job.yml src/ |- pipeline/ |- etl_event/ |- transformers/ |- transformer_1.py |- utils/ |- logger.py databricks.ym...

  • 444 Views
  • 7 replies
  • 1 kudos
Latest Reply
pradeep_singh
New Contributor II
  • 1 kudos

You dont need to use wheel files . Use glob as the key instead of file - https://docs.databricks.com/aws/en/dev-tools/bundles/resources#pipelinelibrariesHere is the screenshot .  

  • 1 kudos
6 More Replies
Danish11052000
by New Contributor III
  • 168 Views
  • 5 replies
  • 1 kudos

How to incrementally backup system.information_schema.table_privileges (no streaming, no unique keys

I'm trying to incrementally backup system.information_schema.table_privileges but facing challenges:No streaming support: Is streaming supported: FalseNo unique columns for MERGE: All columns contain common values, no natural key combinationNo timest...

  • 168 Views
  • 5 replies
  • 1 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 1 kudos

information_schema is not a Delta Table, which is why you can't stream from it. They are basically views on top of the information coming straight from the control plane database. Also your query is actually going to be quite slow/expensive (you prob...

  • 1 kudos
4 More Replies
aranjan99
by Contributor
  • 133 Views
  • 3 replies
  • 1 kudos

System table missing primary keys?

This simple query takes 50seconds for me on a X-Small warehouse.select * from SYSTEM.access.workspaces_latest where workspace_id = '442224551661121'Can the team comment on why querying on system tables takes so long? I also dont see any primary keys ...

  • 133 Views
  • 3 replies
  • 1 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 1 kudos

System tables are a Databricks‑hosted, read‑only analytical store shared to your workspace via Delta Sharing; they aren’t modifiable (no indexes you can add), and the first read can have extra overhead on a very small warehouse. This can make “simple...

  • 1 kudos
2 More Replies
echol
by New Contributor
  • 180 Views
  • 3 replies
  • 1 kudos

Redeploy Databricks Asset Bundle created by others

Hi everyone,Our team is using Databricks Asset Bundles (DAB) with a customized template to develop data pipelines. We have a core team that maintains the shared infrastructure and templates, and multiple product teams that use this template to develo...

  • 180 Views
  • 3 replies
  • 1 kudos
Latest Reply
pradeep_singh
New Contributor II
  • 1 kudos

There is a purpose of development mode . its not a limitation . Its meant to make sure developers can test the changes individually . If you plan to have this deployed by multiple users you will have to deploy in production mode . 

  • 1 kudos
2 More Replies
dpc
by Contributor II
  • 225 Views
  • 6 replies
  • 2 kudos

Using AD groups for object ownership

Databricks has a general issue with object ownership in that only the creator can delete them.So, if I create a catalog, table, view, schema etc. I am the only person who can delete it.No good if it's a general table or view and some other developer ...

  • 225 Views
  • 6 replies
  • 2 kudos
Latest Reply
dpc
Contributor II
  • 2 kudos

Hi So, I've just tested this If I create a schema and somebody else creates a table in that schema, I can drop their table If they create a schema along with a table in that schemaThen grant me All privileges on the table, I cannot drop it as it says...

  • 2 kudos
5 More Replies
Fox19
by New Contributor III
  • 180 Views
  • 4 replies
  • 1 kudos

CSV Ingestion using Autoloader with single variant column

I've been working on ingesting csv files with varying schemas using Autoloader. Goal is to take the csvs and ingest them into a bronze table that writes each record as a key-value mapping with only the relevant fields for that record. I also want to ...

  • 180 Views
  • 4 replies
  • 1 kudos
Latest Reply
pradeep_singh
New Contributor II
  • 1 kudos

If i understand the problem correctly you are getting extra keys for records from files where the keys actually dont exist . I was not able to reproduce this issue . I am getting diffrent keys , value pairs and no extra keys with null. Can you share ...

  • 1 kudos
3 More Replies
Labels