cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Ganeshch
by New Contributor III
  • 1538 Views
  • 3 replies
  • 0 kudos

No option to create cluster

I don't see any option to create cluster inside compute in community edition, is it disable? .How to create cluster ? Please help me

  • 1538 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ganeshch
New Contributor III
  • 0 kudos

If i create notebook and run it , explicitly cluster will not be created but it will work in the backend , am i right?

  • 0 kudos
2 More Replies
ghilage
by New Contributor III
  • 1344 Views
  • 4 replies
  • 0 kudos

Not able to write to dbfs from workflow

Hi All,I am facing below issue while writing to dbfs.I have a pyspark code in which I am writing a dataframe to dbfs using below code :dbfs_path.mkdir(parents=True, exist_ok=True)my_df.write.format("parquet").mode("overwrite").save(f"{dbfs_path}/my_d...

  • 1344 Views
  • 4 replies
  • 0 kudos
Latest Reply
ghilage
New Contributor III
  • 0 kudos

looks like some problem withing my dataframe itself.If i skip some of the expensive field calculations then it is able to write to dbfs.

  • 0 kudos
3 More Replies
Davila
by New Contributor II
  • 1669 Views
  • 2 replies
  • 2 kudos

Resolved! Asset Bundle Validation Not Completing – Stuck on files_to_sync

I have a Databricks asset bundle with the following structure:bundle: name: <some value here> uuid: <some value here> include: - resources/*.yml variables: catalog_bronze: {} catalog_silver: {} user_name: {} targets: dev: mode: ...

  • 1669 Views
  • 2 replies
  • 2 kudos
Latest Reply
Renu_
Valued Contributor II
  • 2 kudos

Hi @Davila, Validation can be slow if your bundle root includes a large number of files. However, since your bundle contains only a few files, the delay may be due to the root_path pointing to a broader directory structure in the Databricks workspace...

  • 2 kudos
1 More Replies
cloudengineer
by New Contributor
  • 1016 Views
  • 2 replies
  • 0 kudos
  • 1016 Views
  • 2 replies
  • 0 kudos
Latest Reply
MadhuB
Valued Contributor
  • 0 kudos

@cloudengineer By default workspace admins can create interactive clusters. Non-admin users should be granted access either to a compute policy or should be provisioned access onto a existing clusters. If there is a requirement to enable cluster crea...

  • 0 kudos
1 More Replies
ashraf1395
by Honored Contributor
  • 2540 Views
  • 6 replies
  • 2 kudos

Empty Streaming tables in dlt

I want to create empty streaming tables in dlt with only schema specified. Is it possible ?I want to do it in dlt python.

  • 2540 Views
  • 6 replies
  • 2 kudos
Latest Reply
brunoillipronti
New Contributor II
  • 2 kudos

I confirm that ashraf1395 solution works. All approaches I tried (of creating an empty table) created a materialized view (which you can't merge). It's disappointing though, since there is no quick param in a "create_table" command to create a simple...

  • 2 kudos
5 More Replies
Zachary_Higgins
by Contributor
  • 15298 Views
  • 9 replies
  • 13 kudos

ignoreDeletes' option with Delta Live Table streaming source

We have a delta streaming source in our delta live table pipelines that may have data deleted from time to time. The error message is pretty self explanatory:...from streaming source at version 191. This is currently not supported. If you'd like to i...

  • 15298 Views
  • 9 replies
  • 13 kudos
Latest Reply
IanB_Argento
New Contributor II
  • 13 kudos

I had this same issue whilst doing some POC work. I was able to overcome it as follows:Navigate to Workflows | Jobs & pipelines.Select your pipeline.Click the drop-down next to the Start button.Choose "Full refresh all".That resets it all and fixes t...

  • 13 kudos
8 More Replies
Pavankumar7
by New Contributor III
  • 3304 Views
  • 6 replies
  • 4 kudos

Resolved! Error in connecting serverless compute in free edition

I am unable to connect serverless compute under Free edition of DB, also in compute tab, I can see only the 3 tabs (SQL warehouses, Vector search, apps) not able to create new compute as we used to create in community edition  

Pavankumar7_0-1750675424179.png Pavankumar7_1-1750675457641.png
  • 3304 Views
  • 6 replies
  • 4 kudos
Latest Reply
Thomas_W
New Contributor III
  • 4 kudos

@Pavankumar7 - are you experiencing this issue for existing/imported notebooks, or for brand new notebooks too?If it's the former, the notebook may be using an old serverless environment version. When Databricks updates the Serverless environment, ex...

  • 4 kudos
5 More Replies
pacman
by New Contributor
  • 17807 Views
  • 7 replies
  • 0 kudos

How to run a saved query from a Notebook (PySpark)

Hi Team! Noob to Databricks, so apologies if I ask a dumb question.I have created a relatively large series of queries that fetch and organize the data I want.  I'm ready to drive all of these from a Notebook (likely PySpark).An example query is save...

  • 17807 Views
  • 7 replies
  • 0 kudos
Latest Reply
aethorimn_cgr
New Contributor II
  • 0 kudos

@uday_satapathy Hi Uday. Do you know if this method works for many users? In case I need to share the script so a teammate may use it.

  • 0 kudos
6 More Replies
Pratikmsbsvm
by Contributor
  • 1893 Views
  • 2 replies
  • 2 kudos

Resolved! Data Lakehouse architecture with Azure Databricks and Unity Catalog

I am Creating a Data lakehouse solution on Azure Databricks.Source : SAP, SALESFORCE, AdobeTarget: Hightouch (External Application), Mad Mobile (External Application)The datalake house also have transactional records which should be store in ACID pro...

  • 1893 Views
  • 2 replies
  • 2 kudos
Latest Reply
KaranamS
Contributor III
  • 2 kudos

Hi @Pratikmsbsvm , from what I understand, you have a lakehouse on Azure databricks and would like to share this data with another databricks account or workspace. If Unity Catalog is enabled on your Azure databricks account, you can leverage Delta S...

  • 2 kudos
1 More Replies
data_learner1
by New Contributor II
  • 1320 Views
  • 4 replies
  • 1 kudos

Need to track the schema changes/column renames/column drops in Data bricks Unity Catalog

Hi Team, We are getting data from third party vendor to the databricks unity Catalog. They are doing schema changes frequently and we would like to track that. Just wanted to know if I can do this using audit table on the system catalog. As we only h...

  • 1320 Views
  • 4 replies
  • 1 kudos
Latest Reply
CURIOUS_DE
Valued Contributor
  • 1 kudos

@data_learner1  Unity Catalog logs all data access and metadata operations (including schema changes) into the audit logs — which are stored in the system catalog tables, such as:system.access.auditYou mentioned you only have read access — and likely...

  • 1 kudos
3 More Replies
NikosLoutas
by Databricks Partner
  • 2678 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks Full Refresh of DLT Pipeline

Hello, I have a question regarding the full refresh of a DLT pipeline, where the data source is an external table. When running the pipeline without a full refresh, then the streaming will pull data which are currently present in the external source ...

  • 2678 Views
  • 2 replies
  • 0 kudos
Latest Reply
seeyesbee
New Contributor II
  • 0 kudos

Hi @paolajara — in your point 5 you mentioned using Delta Lake for tracking changes. Could you point me to any official docs or examples that walk through enabling CDC / row-tracking on a Delta table?I pull data from SharePoint via its REST endpoint,...

  • 0 kudos
1 More Replies
Pratikmsbsvm
by Contributor
  • 1743 Views
  • 2 replies
  • 0 kudos

How to build architecture for Batch as well Stream Data Pipeline in Databricks

Hello,I am planning to Create a Data Lake house using Azure and Databricks.Earlier i planned to do with Azure, but use cases looks complex.Can someone please help me with suggestions.Source System : SAP, SALESFORCE, SAP CAR, Adobe Clickstream.Consume...

  • 1743 Views
  • 2 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor II
  • 0 kudos

Hi @Pratikmsbsvm ,The appropriate approach would be:Data Ingestion:Ingest data from SAP, SAP CAR, and Salesforce using Azure Data Factory or third-party connectors. For near real-time updates, enable CDC-based ingestion.Data Lakehouse Storage:Store a...

  • 0 kudos
1 More Replies
guizsantos
by New Contributor II
  • 4269 Views
  • 3 replies
  • 3 kudos

Resolved! How to obtain a query profile programatically?

Hi everyone! Does anyone know if there is a way to obtain the data used to create the graph showed in the "Query profile" section? Particularly, I am interested in the rows produced by the intermediary query operations. I can see there is "Download" ...

  • 4269 Views
  • 3 replies
  • 3 kudos
Latest Reply
artsheiko
Databricks Employee
  • 3 kudos

@guizsantos,  Query history list api provides metrics, see include_metrics  an executed query definition may be seen using query history system table 

  • 3 kudos
2 More Replies
seefoods
by Valued Contributor
  • 1689 Views
  • 1 replies
  • 1 kudos

Resolved! python task

Hello Guys,I have define asset bundle which have rule to run a python task. This task have some parameters, So how can i interract with this using argparse ? Cordially, 

  • 1689 Views
  • 1 replies
  • 1 kudos
Latest Reply
SP_6721
Honored Contributor II
  • 1 kudos

Hi @seefoods ,In your asset bundle YAML, define the parameters using the named_parameters field, for example like this:tasks:  - task_key: python_task    python_wheel_task:      entry_point: main      named_parameters:        input_path: "/data/input...

  • 1 kudos
mkwparth
by Databricks Partner
  • 1990 Views
  • 4 replies
  • 1 kudos

Spark Failed to start: Driver unresponsive

Hi everyone,I'm encountering an intermittent issue when launching a Databricks pipeline cluster. Error messagecom.databricks.pipelines.common.errors.deployment.DeploymentException: Failed to launch pipeline cluster xxxx-xxxxxx-ofgxxxxx: Attempt to la...

  • 1990 Views
  • 4 replies
  • 1 kudos
Latest Reply
Gopichand_G
Databricks Partner
  • 1 kudos

I have personally witnessed these kind of issues. Why these failures happen, usually as far as I have witnessed that the Driver Node might be unavailable or not responsive as you might have hit the maximum cpu or memory usage, may be your cache utili...

  • 1 kudos
3 More Replies
Labels