cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

radix
by New Contributor II
  • 1191 Views
  • 1 replies
  • 0 kudos

Pool clusters and init scripts

Hey, just trying out pool clusters and providing the instance_pool_type and driver_instance_pool_id configuration to the Airflow new_cluster fieldI also pass the init_scripts field with an s3 link as usual but it this case of pool clusters it doesn't...

  • 1191 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

When using a non pool cluster are you able to see the init script being deployed? You could set init script logging to see if it is being called or not at all https://docs.databricks.com/en/init-scripts/logs.html 

  • 0 kudos
Direo
by Contributor II
  • 1494 Views
  • 1 replies
  • 0 kudos

Managing Secrets for Different Groups in a Databricks Workspace

Hi everyone,I'm looking for some advice on how people are managing secrets within Databricks when you have different groups (or teams) in the same workspace, each requiring access to different sets of secrets.Here’s the challenge:We have multiple gro...

  • 1494 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Managing secrets within Databricks when you have different groups or teams in the same workspace can be approached in several ways, each with its own advantages. Here are some best practices and methods based on the context provided: Using Azure Key...

  • 0 kudos
mjedy78
by New Contributor II
  • 724 Views
  • 3 replies
  • 0 kudos

How to enable AQE in foreachbatch mode

I am processing the daily data from checkpoint to checkpoint everyday by using for each batch in streaming way.df.writeStream.format("delta") .option("checkpointLocation", "dbfs/loc") .foreachBatch(transform_and_upsert) .outpu...

mjedy78_0-1733819593344.png
  • 724 Views
  • 3 replies
  • 0 kudos
Latest Reply
mjedy78
New Contributor II
  • 0 kudos

@MuthuLakshmi any idea?

  • 0 kudos
2 More Replies
niruban
by New Contributor II
  • 2541 Views
  • 3 replies
  • 0 kudos

Databricks Asset Bundle to deploy only one workflow

Hello Community -I am trying to deploy only one workflow from my CICD. But whenever I am trying to deploy one workflow using "databricks bundle deploy - prod", it is deleting all the existing workflow in the target environment. Is there any option av...

Data Engineering
CICD
DAB
Databricks Asset Bundle
DevOps
  • 2541 Views
  • 3 replies
  • 0 kudos
Latest Reply
nvashisth
New Contributor III
  • 0 kudos

Hi Team, the deployment via DAB(Databricks Asset Bundle) reads all yml files present and based on that workflows are generated. In the previous versions of Databricks CLI prior to 0.236(or latest one), it use to delete all the workflow by making dele...

  • 0 kudos
2 More Replies
sangwan
by New Contributor
  • 601 Views
  • 1 replies
  • 0 kudos

Issue: 'Catalog hive_metastore doesn't exist. Create it?' Error When Installing Reconcile

Utility : Remorph (Databricks)Issue: 'Catalog  hive_metastore doesn't exist. Create it?'Error When Installing ReconcileI am encountering an issue while installing Reconcile on Databricks. Despite hive_metastore catalog is by default present in the Da...

  • 601 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @sangwan , Its not very clear, does the error come with a stacktrace? if so could you please share it? Also, any WARN/ERROR messages in the Driver log by any chance?

  • 0 kudos
oliverw
by New Contributor II
  • 1045 Views
  • 3 replies
  • 0 kudos

Structured Streaming QueryProgressEvent Metrics incorrect

Hi All,I've been working on implementing a custom StreamingQueryListener in pyspark to enable integration with our monitoring solution, I've had quite a lot of success with this on multiple different streaming pipelines, however on the last set I've ...

  • 1045 Views
  • 3 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @oliverw , I believe this will require some logs and information correlation, could you please raise a support ticket for the same? Sharing further details here may expose some sensitive data, hence a ticket would be more appropriate. Looking forw...

  • 0 kudos
2 More Replies
brokeTechBro
by New Contributor II
  • 574 Views
  • 2 replies
  • 0 kudos

Bug Community Edition Sign Up Error

Please help hereBug Community Edition Sign Up Error - an error occurred please try again laterI am frustrated

  • 574 Views
  • 2 replies
  • 0 kudos
Latest Reply
GSam
New Contributor II
  • 0 kudos

@gchandra The issue is still there. Tried it on multiple browsers (Incognito and otherwise) and on multiple devicces in different networks. Still unable to sign up after 2 days of trying.

  • 0 kudos
1 More Replies
Miasu
by New Contributor II
  • 2036 Views
  • 2 replies
  • 0 kudos

Unable to analyze external table | FileAlreadyExistsException

Hello experts, There's a csv file, "nyc_taxi.csv" saved under users/myfolder on DBFS, and I used this file created 2 tables:1. nyc_taxi : created using the UI, and it appeared as a managed table saved under dbfs:/user/hive/warehouse/mydatabase.db/nyc...

  • 2036 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Did you initially want to create an external or managed table?  Just trying to understand what was your intent for the file.

  • 0 kudos
1 More Replies
RantoB
by Valued Contributor
  • 25447 Views
  • 8 replies
  • 7 kudos

Resolved! read csv directly from url with pyspark

I would like to load a csv file directly to a spark dataframe in Databricks. I tried the following code :url = "https://opendata.reseaux-energies.fr/explore/dataset/eco2mix-national-tr/download/?format=csv&timezone=Europe/Berlin&lang=fr&use_labels_fo...

  • 25447 Views
  • 8 replies
  • 7 kudos
Latest Reply
anwangari
New Contributor II
  • 7 kudos

Hello it's end of 2024 and I still have this issue with python. As mentioned sc method nolonger works. Also, working with volumes within "/databricks/driver/" is not supported in Apache Spark.ALTERNATIVE SOLUTION: Use requests to download the file fr...

  • 7 kudos
7 More Replies
abaghel
by New Contributor II
  • 637 Views
  • 2 replies
  • 0 kudos

Azure application insights logging not working after upgrading cluster to databricks runtime 14.x

I have a basic code setup to read a stream from a Delta table and write it into another Delta table. I am using logging to send logs to Application Insights. However, within the foreachBatch function, the logs I write are not being sent to Applicatio...

  • 637 Views
  • 2 replies
  • 0 kudos
Latest Reply
abaghel
New Contributor II
  • 0 kudos

@MuthuLakshmi  Thank you for getting back to me. I have read the article and understand that "Any files, modules, or objects referenced in the function must be serializable and available on Spark." However, based on the code provided, can you help me...

  • 0 kudos
1 More Replies
None123
by New Contributor III
  • 8223 Views
  • 3 replies
  • 3 kudos

Open a Support Ticket

Anyone know how to submit a support ticket? I keep getting into a loop that takes me back to the community page, but I need to submit an urgent ticket. I'm told our company pays a ridiculous sum for this feature yet it is impossible to find.Thanks ...

  • 8223 Views
  • 3 replies
  • 3 kudos
Latest Reply
vickytscv
New Contributor II
  • 3 kudos

Hi Team,     We are working with Adobe tool for campaign metrics. which needs to pull data from AEP using explode option, when we pass query it is taking long time and performance is also very. Is there any better way to pull data from AEP, Please le...

  • 3 kudos
2 More Replies
cltj
by New Contributor III
  • 11743 Views
  • 5 replies
  • 2 kudos

Experiences using managed tables

We are looking into the use of managed tables on databricks. As this decision won’t be easy to reverse I am reaching out to all of you fine folks to learn more about your experience with using this.If I understand correctly we dont have to deal with ...

  • 11743 Views
  • 5 replies
  • 2 kudos
Latest Reply
JimmyEatBrick
Databricks Employee
  • 2 kudos

Databricks recommends to ALWAYS use Managed Tables always UNLESS:Your tables are not DeltaYou explicitly need to have the table files in a specific location Managed Tables are just better... Databricks manages:the upgrades (Deletion Vectors? Column M...

  • 2 kudos
4 More Replies
PaoloF
by New Contributor II
  • 339 Views
  • 1 replies
  • 0 kudos

Lakehouse Federation roadmap

Hi all, there is a roadmap to increase the numbers of sources available for the lakehouse federation?I’m interested to know if and when it is possible create a foreign catalog with MariaDB.Thanks

  • 339 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @PaoloF, As of now there is no ETA on adding MariaDB as lakehouse federation. I will raise an internal feature request for this source to be consider for its implementation. As far of the roadmap it might need to be followed up with the account...

  • 0 kudos
ImranA
by Contributor
  • 1302 Views
  • 5 replies
  • 3 kudos

Resolved! Schema issue when dropping a delta live table

For example there is a table called "cars", if I remove the table from DLT pipeline and drop the table from catalogue. Now if I change the schema of the table, and create the table again using the same table name "cars" through the same pipeline, Why...

  • 1302 Views
  • 5 replies
  • 3 kudos
Latest Reply
gchandra
Databricks Employee
  • 3 kudos

https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/schema.html#how-does-auto-loader-schema-evolution-work https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/patterns.html#enable-easy-etl

  • 3 kudos
4 More Replies
charl-p-botha
by New Contributor III
  • 638 Views
  • 1 replies
  • 1 kudos

Thank you for the "setting up tables" speed-up from dlt release 2024.42.rc0 to 2024.44.rc1

Dear DataBricks people,We are currently measuring DLT performance and cost on a medallion architecture with 150 to 300 tables, and we're interested in adding even more tables.I've been doing automated incremental streaming DLT pipelines every 3 hours...

  • 638 Views
  • 1 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

Hi @charl-p-botha Thanks for your feedback ! Thank you for sharing your experience with Delta Live Tables (DLT) performance improvements. We are pleased to hear that the upgrade to dlt:15.4.4-delta-pipelines-dlt-release-2024.44-rc1-commit-1a62345-ima...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels