cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

dh
by New Contributor
  • 308 Views
  • 1 replies
  • 0 kudos

Data Lineage without Spark, but with Polars (and Delta Lake) instead

Some context: I am completely new to Databricks; have heard good stuff, but also some things that worry me.One thing that worries me is the performance (and eventual costs) of running Spark with smaller (sub 1TB) datasets. However, one requirement fr...

  • 308 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @dh thanks for your question! I believe It’s possible to run Polars with Delta Lake on Databricks, but automatic data lineage tracking is not native outside of Spark jobs. You would likely need to implement custom lineage tracking or integrate ext...

  • 0 kudos
Mithos
by New Contributor
  • 53 Views
  • 1 replies
  • 0 kudos

ZCube Tags not present in Databricks Delta Tables

The design doc for Liquid Clustering for Delta refer to Z-Cube to enable  incremental clustering in batches. This is the link - https://docs.google.com/document/d/1FWR3odjOw4v4-hjFy_hVaNdxHVs4WuK1asfB6M6XEMw/edit?pli=1&tab=t.0.It is also mentioned th...

  • 53 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @Mithos thanks for the question! This is the OSS version of LC applicable to OSS Delta. Databricks has a different implementation, so you won't be able to find it in a liquid table written by DBR. 

  • 0 kudos
templier2
by New Contributor
  • 45 Views
  • 3 replies
  • 0 kudos

Log jobs stdout to an Azure Logs Analytics workspace

Hello,I have enabled cluster logs sending through an mspnp/spark-monitoring, but I don't see there stdout/stderr/log4j logs.Is it supported?

  • 45 Views
  • 3 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @templier2  If it works, it’s not duct tape and chewing gum; it’s a paperclip away from advanced engineering!  You're right, I forgot this option is only there for AWS/S3. So, yeah I think that's the current and only way, mount points.

  • 0 kudos
2 More Replies
Timmes0815
by Visitor
  • 25 Views
  • 1 replies
  • 0 kudos

Set up Loacation using widget

I'm struggeling using the databricks widget to set up the location in an sql create table statement. I tried the following to set up the location:Step1: Creating a notebook (Notebook1) to define the variable.Location_Path =   'abfss:xxxxx@xxxx.xxx.ne...

  • 25 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

Hi @Timmes0815  - could you please try the below example and let us know?  Location_Path = dbutils.widgets.text("Location_Path","") Location_Path = dbutils.widgets.getArgument("Location_Path") display(Location_Path) dbutils.notebook.run("Notebook2", ...

  • 0 kudos
jkb7
by New Contributor III
  • 48 Views
  • 6 replies
  • 1 kudos

Resolved! Keep history of task runs in Databricks Workflows while moving it from one job to another

We are using Databricks Asset Bundles (DAB) to orchestrate multiple workflow jobs, each containing multiple tasks.The execution schedules is managed on the job level, i.e., all tasks within a job start together.We often face the issue of rescheduling...

  • 48 Views
  • 6 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

You can submit it through https://docs.databricks.com/en/resources/ideas.html#ideas

  • 1 kudos
5 More Replies
adam_mich
by New Contributor
  • 60 Views
  • 4 replies
  • 0 kudos

How to Pass Data to a Databricks App?

I am developing a Databricks application using the Streamlit package. I was able to get a "hello world" app deployed successfully, but now I am trying to pass data that exists in the dbfs on the same instance. I try to read a csv saved to the dbfs bu...

  • 60 Views
  • 4 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

What if you try to list the file using dbutils.fs.ls("dbfs:/mnt/path/to/data") does it list it?

  • 0 kudos
3 More Replies
vickytscv
by New Contributor
  • 56 Views
  • 3 replies
  • 0 kudos

Adobe query support from databricks

Hi Team,     We are working with Adobe tool for campaign metrics. which needs to pull data from AEP using explode option, when we pass query it is taking long time and performance is also very. Is there any better way to pull data from AEP, Please le...

  • 56 Views
  • 3 replies
  • 0 kudos
Latest Reply
jodbx
Databricks Employee
  • 0 kudos

https://github.com/Adobe-Marketing-Cloud/aep-cloud-ml-ecosystem 

  • 0 kudos
2 More Replies
BenceCzako
by Visitor
  • 32 Views
  • 1 replies
  • 0 kudos

Databricks mount bug

Hello,I have a weird problem in databricks for which I hope you can suggest some solutions.I have an azureml blob storage mounted to databricks with some folder structure that can be accessed from a notebook as/dbfs/mnt/azuremount/foo/bar/something.t...

  • 32 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @BenceCzako, Have you tried by "Detach and re-attaching" the compute on the notebook? And what DBR version are you using? 

  • 0 kudos
T_I
by New Contributor II
  • 45 Views
  • 4 replies
  • 0 kudos

Connect Databricks to Airflow

Hi,I have Databricks on top of aws. I have a Databricks connection on Airflow (mwaa). I am able to conect and execute a Datbricks job via Airflow using a personal access token. I believe the best practice is to conect using a service principal. I und...

  • 45 Views
  • 4 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @T_I, Instead of the PAT token you have to specify the below settings to be able to use the Service Principal: For workspace-level operations, set the following environment variables: DATABRICKS_HOST, set to the Databricks workspace URL, for exam...

  • 0 kudos
3 More Replies
drag7ter
by New Contributor III
  • 40 Views
  • 3 replies
  • 0 kudos

Disable ssl for federated connection on Amazon Redshift

Here is a doc how to set up connection and foreign catalog, but there is no any mentions how to disable ssl for the connection.https://docs.databricks.com/en/query-federation/redshift.htmlWhen I set up connection and foreign catalog I get this error,...

  • 40 Views
  • 3 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

It's missing connection statement, could you please try: CREATE FOREIGN CATALOG redshift_catalog USING CONNECTION com.databricks.spark.redshift OPTIONS (   dbtable '<table>',   forward_spark_s3_credentials 'true',   aws_iam_role 'arn:aws:iam::<your-r...

  • 0 kudos
2 More Replies
Phani1
by Valued Contributor II
  • 6603 Views
  • 10 replies
  • 10 kudos

Delta Live Table name dynamically

Hi Team,Can we pass Delta Live Table name dynamically [from a configuration file, instead of hardcoding the table name]? We would like to build a metadata-driven pipeline.

  • 6603 Views
  • 10 replies
  • 10 kudos
Latest Reply
bmhardy
New Contributor II
  • 10 kudos

Is this post referring to Direct Publishing Mode? As we are multi-tenanted we have to have separate schema per client, which currently means a single pipeline per client. This is not cost effective at all, so we are very much reliant on DPM. I believ...

  • 10 kudos
9 More Replies
maikl
by New Contributor III
  • 54 Views
  • 4 replies
  • 0 kudos

Resolved! DABs job name must start with a letter or underscore

Hi,In UI I used the pipeline name 00101_source_bronze. I wanted to do the same in the Databricks Asset Bundles.but when the configuration is refreshed against Databricks Workspace I see this error:I found that this issue can be connect to Terraform v...

maikl_0-1733912307017.png maikl_1-1733912509922.png
  • 54 Views
  • 4 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

As mentioned above, this is a limitation directly with Terraform due to this our engineering team is limited on the actions that can be done, you can find more information about this limitation on the Terraform documentation: https://developer.hashic...

  • 0 kudos
3 More Replies
dcrezee
by New Contributor III
  • 254 Views
  • 1 replies
  • 0 kudos

workflow set maximum queued items

Hi all,I have a question regarding Workflows and queuing of job runs. I'm running into a case where jobs are running longer than expected and result in job runs being queued, which is expected and desired. However, in this particular case we only nee...

  • 254 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Unfortunately there is no way to control the number of jobs that will be moved to queue status when queuing is enabled.

  • 0 kudos
Steve_Harrison
by New Contributor III
  • 299 Views
  • 2 replies
  • 0 kudos

Invalid Path when getting Notebook Path

The undocumented feature to get a notebook path as is great but it does not actually return a valid path that can be used in python, e.g.:from pathlib import Pathprint(Path(dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPat...

  • 299 Views
  • 2 replies
  • 0 kudos
Latest Reply
Steve_Harrison
New Contributor III
  • 0 kudos

I actually think the major issue is that the above is undocumented and not supported. A supported and documented way of doing this would be much appreciated.

  • 0 kudos
1 More Replies
manojpatil04
by New Contributor II
  • 18 Views
  • 1 replies
  • 0 kudos

External dependency on serverless job from Airflow is not working on s3 path and workspace

I am working on use case where we have to run python script from serverless job through Airflow. when we are trying to trigger serverless job and passing external dependency as wheel from s3 path or workspace path it is not working, but on volume it ...

  • 18 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

It seems that is should be supported, are you using the following format for the URI: { "whl": "s3://my-bucket/library.whl" }?

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels