cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Jeewan
by New Contributor
  • 819 Views
  • 0 replies
  • 0 kudos

Partition In Spark with subqeury which include Union

I have a SQL query like this:select ... from table1 where id in (slect id from table 1 where (some condition) UNION select id from table2 where (some condition)) table1I have made a partition of 200 where upper bound is 200 and lower bound is 0 and p...

  • 819 Views
  • 0 replies
  • 0 kudos
Prashanth24
by New Contributor III
  • 2605 Views
  • 3 replies
  • 3 kudos

Resolved! Databricks workflow each task cost

Suppose if we have 4 tasks (3 notebooks and 1 normal python code) in a workflow then i would like to know the cost incurred for each task in the Databricks workflow. Please let me know the any way to find out this details.

  • 2605 Views
  • 3 replies
  • 3 kudos
Latest Reply
Edthehead
Contributor III
  • 3 kudos

If each of the tasks are sharing the same cluster then no, you cannot differentiate the costs between the tasks.  However, if you setup each task to have its own job cluster, then pass some custom tags and you can then differentiate/report the costs ...

  • 3 kudos
2 More Replies
guangyi
by Contributor III
  • 763 Views
  • 0 replies
  • 0 kudos

Confuse about large memory usage of cluster

We set up a demo DLT pipeline with no data involved:  @Dlt.table( name="demo" ) def sample(): df = spark.sql("SELECT 'silver' as Layer") return df However, when we check the metric of the cluster, it looks like 10GB memory has already be...

  • 763 Views
  • 0 replies
  • 0 kudos
DBMIVEN
by New Contributor II
  • 1036 Views
  • 0 replies
  • 0 kudos

Ingesting data from SQL Server foreign tables

I have created a connection to a SQL server DB, and set up a catalog for it. i can now view all the tables, and query them. I want to ingest some of the tables into our ADLS gen 2 that we set up with Unity Catalog. What is the best approach here? Lak...

Data Engineering
Data ingestion
Foreign catalogs
Incremental Data Ingestion
LakeFlow
SQL Server
  • 1036 Views
  • 0 replies
  • 0 kudos
ayush19
by New Contributor III
  • 1447 Views
  • 1 replies
  • 0 kudos

Running jar on Databricks cluster from Airflow

Hello,I have a jar file which is installed on a cluster. I need to run this jar from Airflow using DatabricksSubmitRunOperator. I followed the standard instructions as available on Airflow docshttps://airflow.apache.org/docs/apache-airflow-providers-...

ayush19_0-1722491889219.png ayush19_1-1722491926724.png ayush19_2-1722491964523.png ayush19_3-1722492023707.png
  • 1447 Views
  • 1 replies
  • 0 kudos
ruoyuqian
by New Contributor II
  • 2295 Views
  • 0 replies
  • 0 kudos

dbt writting into different schema

I have a unity catalog and it goes like `catalogname.schemaname1`& `catalogname.schemaname2`. and I am trying to write tables into schemaname2 with dbt, the current setup in the dbt profiles.yml is   prj_dbt_databricks: outputs: dev: cata...

  • 2295 Views
  • 0 replies
  • 0 kudos
Fernando_Messas
by New Contributor II
  • 12634 Views
  • 6 replies
  • 3 kudos

Resolved! Error writing data to Google Bigquery

Hello, I'm facing some problems while writing data to Google BigQuery. I'm able to read data from the same table, but when I try to append data I get the following error.Error getting access token from metadata server at: http://169.254.169.254/compu...

  • 12634 Views
  • 6 replies
  • 3 kudos
Latest Reply
asif5494
New Contributor III
  • 3 kudos

Sometime this error occur when your Private key or your service account key is not going in request header, So if you are using Spark or Databricks then you have to configure the JSON Key in Spark config so it will be added in request header.

  • 3 kudos
5 More Replies
colette_chavali
by Databricks Employee
  • 2439 Views
  • 1 replies
  • 6 kudos

Nominations are OPEN for the Databricks Data Team Awards!

Databricks customers - nominate your data team and leaders for one (or more) of the six Data Team Award categories: Data Team Transformation AwardData Team for Good AwardData Team Disruptor AwardData Team Democratization AwardData Team Visionary Awar...

Data Team Awards
  • 2439 Views
  • 1 replies
  • 6 kudos
Latest Reply
Sai_Mani
New Contributor II
  • 6 kudos

Hello! where can I find more details about award nomination requirements, eligibility criteria, application entry & deadline dates for nominations? Judging criteria?  

  • 6 kudos
tobi
by New Contributor III
  • 2212 Views
  • 3 replies
  • 2 kudos

Deleted workspace

Hello guys, I have a question. We have databricks on gcp, we forgot to pay for subscription and they removed our workspace. I had code notebooks on that workspace. If there is any way to reproduce this code? Or maybe it’s automatically saved this cod...

  • 2212 Views
  • 3 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @tobi ,Unfortunately, if the data was stored directly within the Workspace and not backed up externally, there is no much you can do.Once a Databricks subscription is cancelled, all workspaces associated with that account are deleted and this dele...

  • 2 kudos
2 More Replies
mannepk85
by New Contributor III
  • 760 Views
  • 0 replies
  • 0 kudos

Databricks academy courses are defaulting to hive metastore

So far, I started 2 Databricks Academy Courses. In both the course, the default is hive-metastore where the schema is created. In my org, hive metastore is blocked and we have been asked to use Unity Catalog. Is there a way the course material in dat...

  • 760 Views
  • 0 replies
  • 0 kudos
CaptainJack
by New Contributor III
  • 5505 Views
  • 4 replies
  • 1 kudos

Get taskValue from job as task, and then pass it to next task.

I have workflow like this.1 task: job as a task. Inside this job there is task which is seting parameter x as taskValue using dbutils.jobs.taskValues.set. 2. task dependent on previous job as a task. I would like to access this parameter x. I tried t...

  • 5505 Views
  • 4 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

I see, I have requested for someone else to guide you on this. cc: @Retired_mod 

  • 1 kudos
3 More Replies
turtleXturtle
by New Contributor II
  • 1554 Views
  • 1 replies
  • 0 kudos

Delta share existing parquet files in R2

Hi - I have existing parquet files in Cloudflare R2 storage (created outside of Databricks).  I would like to share them via Delta Share, but I keep running into an error.  Is it possible to share existing parquet files without duplicating them?I did...

  • 1554 Views
  • 1 replies
  • 0 kudos
Latest Reply
turtleXturtle
New Contributor II
  • 0 kudos

Thanks @Retired_mod.  It's currently possible to share a delta table stored in an S3 external location without duplication or doing the `DEEP CLONE` first.  Is it on the roadmap to support this for R2 as well?

  • 0 kudos
MYB24
by New Contributor III
  • 12407 Views
  • 6 replies
  • 0 kudos

Resolved! Error: cannot create mws credentials: invalid Databricks Account configuration

Good Evening, I am configuring databricks_mws_credentials through Terraform on AWS.  I am getting the following error:Error: cannot create mws credentials: invalid Databricks Account configuration││ with module.databricks.databricks_mws_credentials.t...

Data Engineering
AWS
credentials
Databricks
Terraform
  • 12407 Views
  • 6 replies
  • 0 kudos
Latest Reply
Alexandre467
New Contributor II
  • 0 kudos

Hello, I'm facing a similaire Issue. I try to update my TF with properly authentification and I have this error ?! â•· │ Error: cannot create mws credentials: failed visitor: context canceled │ │ with databricks_mws_credentials.this, │ on main.tf ...

  • 0 kudos
5 More Replies
riccostamendes
by New Contributor II
  • 61474 Views
  • 3 replies
  • 0 kudos

Just a doubt, can we develop a kedro project in databricks?

I am asking this because up to now I have just seen some examples of deploying a pre-existent kedro project in databricks in order to run some pipelines...

  • 61474 Views
  • 3 replies
  • 0 kudos
Latest Reply
noklam
New Contributor II
  • 0 kudos

Hi! Kedro Dev here. You can surely develop Kedro on Databricks, in fact we have a lot of Kedro project running on Databricks. In the past there has been some friction, mainly because Kedro are project based while Databricks focus a lot on notebook. T...

  • 0 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels