cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

hpant
by New Contributor III
  • 2617 Views
  • 9 replies
  • 7 kudos

Resolved! Where exactly I should create Volume in a catalog?

Currently my Databricks looks like this: I want to create volume to access external location. Where exactly should I create it? Should a create new schema in "poe" catalog and create a volume inside it or create it in a existing schema? What is the b...

hpant_0-1722505474676.png
  • 2617 Views
  • 9 replies
  • 7 kudos
Latest Reply
hpant1
New Contributor III
  • 7 kudos

No, I don't have.  

  • 7 kudos
8 More Replies
juanicobsider
by New Contributor
  • 763 Views
  • 2 replies
  • 3 kudos

How to parse VARIANT type column using Pyspark sintax?

I trying to parse VARIANT data type column, what is the correct sintax to parse sub columns using Pyspark, is it possible?.I'd like to know how to do it this way (I know how to do it using SQL syntax).   

juanicobsider_0-1722907722976.png juanicobsider_1-1722907840323.png juanicobsider_2-1722907947212.png
  • 763 Views
  • 2 replies
  • 3 kudos
Latest Reply
Witold
Honored Contributor
  • 3 kudos

As an addition to what @szymon_dybczak already said correctly. It's actually not a workaround, it's designed and documented that way. Make sure that you understand the difference between `:`, and `.`.Regarding PySpark, the API has other variant relat...

  • 3 kudos
1 More Replies
tramtran
by Contributor
  • 2444 Views
  • 6 replies
  • 7 kudos

Make the job fail if a task fail

Hi everyone,I have a job with 2 tasks running independently. If one of them fails, the remaining task continues to run. I would like the job to fail if any task fails.Is there any way to do that?Thank you!

  • 2444 Views
  • 6 replies
  • 7 kudos
Latest Reply
Edthehead
Contributor II
  • 7 kudos

Extending to what @mhiltner has suggested, let's  say you have 2 streaming tasks streamA and streamB. Create 2 separate tasks taskA and taskB. Each of these tasks should execute the same notebook which makes an API call to the CANCEL RUN or CANCEL AL...

  • 7 kudos
5 More Replies
DanR
by New Contributor II
  • 16697 Views
  • 4 replies
  • 3 kudos

PermissionError: [Errno 1] Operation not permitted: '/Volumes/mycatalog'

We are having intermittent errors where a Job Task cannot access a Catalog through a Volume, with the error: `PermissionError: [Errno 1] Operation not permitted: '/Volumes/mycatalog'`.The Job has 40 tasks running in parallel and every few runs we exp...

Data Engineering
Unity Catalog
Volumes
  • 16697 Views
  • 4 replies
  • 3 kudos
Latest Reply
NandiniN
Databricks Employee
  • 3 kudos

It appears to be a concurrency limitation, and there were fixes in the past but there is a possibility it may be a new code flow, adding a retry to the operation can mitigate the issue and work as a workaround. But you can report the issue with Datab...

  • 3 kudos
3 More Replies
delta_bravo
by New Contributor
  • 6125 Views
  • 2 replies
  • 0 kudos

Cluster termination issue

I am using Databricks as a Community Edition user with a limited cluster (just 1 Driver: 15.3 GB Memory, 2 Cores, 1 DBU). I am trying to run some custom algorithms for continuous calculations and writing results to the delta table every 15 minutes al...

  • 6125 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

If you set the "Terminate after" setting to 0 minutes during the creation of an all-purpose compute, it means that the auto-termination feature will be turned off. This is because the "Terminate after" setting is used to specify an inactivity period ...

  • 0 kudos
1 More Replies
curiousoctopus
by New Contributor III
  • 3695 Views
  • 4 replies
  • 4 kudos

Run multiple jobs with different source code at the same time with Databricks asset bundles

Hi,I am migrating from dbx to databricks asset bundles. Previously with dbx I could work on different features in separate branches and launch jobs without issue of one job overwritting the other. Now with databricks asset bundles it seems like I can...

  • 3695 Views
  • 4 replies
  • 4 kudos
Latest Reply
mo_moattar
New Contributor III
  • 4 kudos

We have the same issue. We might have multiple open PR on the bundles that are deploying the code, pipelines, jobs, etc. to the same workspace before the merge and they keep overwriting each other in the workspace.The jobs already have a separate ID ...

  • 4 kudos
3 More Replies
narenderkumar53
by New Contributor II
  • 811 Views
  • 3 replies
  • 2 kudos

can we parameterize the tags in the job compute

I want to monitor the cost better for the databricks job computes.I am using tags in the cluster to monitor cost.The tag values is static as of now.can we parameterize the compute the job cluster so that I can pass the tag values during the runtime a...

  • 811 Views
  • 3 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @,If you're using ADF you can look at below article:Applying Dynamic Tags To Databricks Job Clusters in Azure Data Factory | by Kyle Hale | MediumIf not, I think you can try to write some code that will use below endpoint. The idea is, before exec...

  • 2 kudos
2 More Replies
Jeewan
by New Contributor
  • 455 Views
  • 0 replies
  • 0 kudos

Partition In Spark with subqeury which include Union

I have a SQL query like this:select ... from table1 where id in (slect id from table 1 where (some condition) UNION select id from table2 where (some condition)) table1I have made a partition of 200 where upper bound is 200 and lower bound is 0 and p...

  • 455 Views
  • 0 replies
  • 0 kudos
Prashanth24
by New Contributor III
  • 1188 Views
  • 3 replies
  • 3 kudos

Resolved! Databricks workflow each task cost

Suppose if we have 4 tasks (3 notebooks and 1 normal python code) in a workflow then i would like to know the cost incurred for each task in the Databricks workflow. Please let me know the any way to find out this details.

  • 1188 Views
  • 3 replies
  • 3 kudos
Latest Reply
Edthehead
Contributor II
  • 3 kudos

If each of the tasks are sharing the same cluster then no, you cannot differentiate the costs between the tasks.  However, if you setup each task to have its own job cluster, then pass some custom tags and you can then differentiate/report the costs ...

  • 3 kudos
2 More Replies
guangyi
by Contributor III
  • 508 Views
  • 0 replies
  • 0 kudos

Confuse about large memory usage of cluster

We set up a demo DLT pipeline with no data involved:  @Dlt.table( name="demo" ) def sample(): df = spark.sql("SELECT 'silver' as Layer") return df However, when we check the metric of the cluster, it looks like 10GB memory has already be...

  • 508 Views
  • 0 replies
  • 0 kudos
DBMIVEN
by New Contributor II
  • 514 Views
  • 0 replies
  • 0 kudos

Ingesting data from SQL Server foreign tables

I have created a connection to a SQL server DB, and set up a catalog for it. i can now view all the tables, and query them. I want to ingest some of the tables into our ADLS gen 2 that we set up with Unity Catalog. What is the best approach here? Lak...

Data Engineering
Data ingestion
Foreign catalogs
Incremental Data Ingestion
LakeFlow
SQL Server
  • 514 Views
  • 0 replies
  • 0 kudos
ayush19
by New Contributor III
  • 846 Views
  • 1 replies
  • 0 kudos

Running jar on Databricks cluster from Airflow

Hello,I have a jar file which is installed on a cluster. I need to run this jar from Airflow using DatabricksSubmitRunOperator. I followed the standard instructions as available on Airflow docshttps://airflow.apache.org/docs/apache-airflow-providers-...

ayush19_0-1722491889219.png ayush19_1-1722491926724.png ayush19_2-1722491964523.png ayush19_3-1722492023707.png
  • 846 Views
  • 1 replies
  • 0 kudos
ruoyuqian
by New Contributor II
  • 1154 Views
  • 0 replies
  • 0 kudos

dbt writting into different schema

I have a unity catalog and it goes like `catalogname.schemaname1`& `catalogname.schemaname2`. and I am trying to write tables into schemaname2 with dbt, the current setup in the dbt profiles.yml is   prj_dbt_databricks: outputs: dev: cata...

  • 1154 Views
  • 0 replies
  • 0 kudos
Fernando_Messas
by New Contributor II
  • 10228 Views
  • 6 replies
  • 3 kudos

Resolved! Error writing data to Google Bigquery

Hello, I'm facing some problems while writing data to Google BigQuery. I'm able to read data from the same table, but when I try to append data I get the following error.Error getting access token from metadata server at: http://169.254.169.254/compu...

  • 10228 Views
  • 6 replies
  • 3 kudos
Latest Reply
asif5494
New Contributor III
  • 3 kudos

Sometime this error occur when your Private key or your service account key is not going in request header, So if you are using Spark or Databricks then you have to configure the JSON Key in Spark config so it will be added in request header.

  • 3 kudos
5 More Replies
colette_chavali
by Databricks Employee
  • 1573 Views
  • 1 replies
  • 6 kudos

Nominations are OPEN for the Databricks Data Team Awards!

Databricks customers - nominate your data team and leaders for one (or more) of the six Data Team Award categories: Data Team Transformation AwardData Team for Good AwardData Team Disruptor AwardData Team Democratization AwardData Team Visionary Awar...

Data Team Awards
  • 1573 Views
  • 1 replies
  • 6 kudos
Latest Reply
Sai_Mani
New Contributor II
  • 6 kudos

Hello! where can I find more details about award nomination requirements, eligibility criteria, application entry & deadline dates for nominations? Judging criteria?  

  • 6 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels