cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ADB0513
by New Contributor III
  • 1364 Views
  • 0 replies
  • 0 kudos

Pass variable from one notebook to another

I have a main notebook where I am setting a python variable to the name of the catalog I want to work in.  I then call another notebook, using %run, which runs an insert into using a SQL command where I want to specify the catalog using the catalog v...

  • 1364 Views
  • 0 replies
  • 0 kudos
copper-carrot
by New Contributor II
  • 997 Views
  • 1 replies
  • 1 kudos

spark.sql() is suddenly giving an error "Unable to instantiate org.apache.hadoop.hive.metastore.Hive

spark.sql() is suddenly giving an error "Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient" on databricks jobs and python scripts that worked last month.  No local changes on my end.What could be the cause of this and what sh...

  • 997 Views
  • 1 replies
  • 1 kudos
neointab
by New Contributor
  • 419 Views
  • 1 replies
  • 0 kudos

how to restrict group/user cant create unstricted cluster.

we have set up the entitlement,but it doest work, i checked the blogs. it also need the set up in cluster policy. but i dont find how to set up in cluster policy. could you give some suggestions?

  • 419 Views
  • 1 replies
  • 0 kudos
Latest Reply
antonuzzo96
New Contributor III
  • 0 kudos

Hi, have you checked if users are admins inside the workspace? Because this can greatly change the policies and restrictions on the clusters

  • 0 kudos
hpant1
by New Contributor III
  • 457 Views
  • 1 replies
  • 0 kudos

Does it make sense to create volume at external location in dev enviroment?

I have create a dev resource group for databricks which includes "storage account", "access connector" and "databricks workspace". In the storage account I have created a container which is linked to the metastore. This container also contain raw dat...

  • 457 Views
  • 1 replies
  • 0 kudos
Latest Reply
antonuzzo96
New Contributor III
  • 0 kudos

Hei, for some use cases we have created external volumes in Databricks because they needed to access them outside of Databricks and directly on the storage account, as the files had to interact with other tools.

  • 0 kudos
hpant1
by New Contributor III
  • 489 Views
  • 1 replies
  • 2 kudos

What is more optimized way of writing delta table in a workflow, "append" or "overwrite"?

What is more optimized way of writing delta table in a workflow which is running every hour, "append" or "overwrite"?

  • 489 Views
  • 1 replies
  • 2 kudos
Latest Reply
Witold
Honored Contributor
  • 2 kudos

There's no "optimized way", as these are two different concepts, and depend on your use case: Overwrite  removes existing data, i.e. replaces it with new data, while append adds new data to your existing table.

  • 2 kudos
alesventus
by Contributor
  • 779 Views
  • 2 replies
  • 0 kudos

Resolved! How to handle load of 300 tables to delta lake

My task is to sync 300 tables from on prem sql server to delta lake. I will load CDC from Raw. First step is to move CDC data to bronze with autoloader. Then using delta stream get changes from bronze, make simple datatype changes and merge this data...

  • 779 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @alesventus ,You can apply metadata/config driven approch. You can create control table (or json/yaml file) with all information that are required for processing like:- table name- target table- table primary keys- transformation to applyAnd then ...

  • 0 kudos
1 More Replies
ahmed_zarar
by New Contributor III
  • 1307 Views
  • 2 replies
  • 3 kudos

Resolved! Process single data set with different JSON schema rows using Pyspark in databricks

 Hi,i am getting data from event hub and stored in delta table as a row table, i data i received in json , the problem i data i have different schema in each row but i code i use it take first row a json schema i am stuck how to do please any one gui...

ahmed_zarar_0-1722683168135.png
  • 1307 Views
  • 2 replies
  • 3 kudos
Latest Reply
ahmed_zarar
New Contributor III
  • 3 kudos

Thank you , I got it.

  • 3 kudos
1 More Replies
hpant
by New Contributor III
  • 2670 Views
  • 9 replies
  • 7 kudos

Resolved! Where exactly I should create Volume in a catalog?

Currently my Databricks looks like this: I want to create volume to access external location. Where exactly should I create it? Should a create new schema in "poe" catalog and create a volume inside it or create it in a existing schema? What is the b...

hpant_0-1722505474676.png
  • 2670 Views
  • 9 replies
  • 7 kudos
Latest Reply
hpant1
New Contributor III
  • 7 kudos

No, I don't have.  

  • 7 kudos
8 More Replies
juanicobsider
by New Contributor
  • 780 Views
  • 2 replies
  • 3 kudos

How to parse VARIANT type column using Pyspark sintax?

I trying to parse VARIANT data type column, what is the correct sintax to parse sub columns using Pyspark, is it possible?.I'd like to know how to do it this way (I know how to do it using SQL syntax).   

juanicobsider_0-1722907722976.png juanicobsider_1-1722907840323.png juanicobsider_2-1722907947212.png
  • 780 Views
  • 2 replies
  • 3 kudos
Latest Reply
Witold
Honored Contributor
  • 3 kudos

As an addition to what @szymon_dybczak already said correctly. It's actually not a workaround, it's designed and documented that way. Make sure that you understand the difference between `:`, and `.`.Regarding PySpark, the API has other variant relat...

  • 3 kudos
1 More Replies
tramtran
by Contributor
  • 2515 Views
  • 6 replies
  • 7 kudos

Make the job fail if a task fail

Hi everyone,I have a job with 2 tasks running independently. If one of them fails, the remaining task continues to run. I would like the job to fail if any task fails.Is there any way to do that?Thank you!

  • 2515 Views
  • 6 replies
  • 7 kudos
Latest Reply
Edthehead
Contributor II
  • 7 kudos

Extending to what @mhiltner has suggested, let's  say you have 2 streaming tasks streamA and streamB. Create 2 separate tasks taskA and taskB. Each of these tasks should execute the same notebook which makes an API call to the CANCEL RUN or CANCEL AL...

  • 7 kudos
5 More Replies
DanR
by New Contributor II
  • 16880 Views
  • 4 replies
  • 3 kudos

PermissionError: [Errno 1] Operation not permitted: '/Volumes/mycatalog'

We are having intermittent errors where a Job Task cannot access a Catalog through a Volume, with the error: `PermissionError: [Errno 1] Operation not permitted: '/Volumes/mycatalog'`.The Job has 40 tasks running in parallel and every few runs we exp...

Data Engineering
Unity Catalog
Volumes
  • 16880 Views
  • 4 replies
  • 3 kudos
Latest Reply
NandiniN
Databricks Employee
  • 3 kudos

It appears to be a concurrency limitation, and there were fixes in the past but there is a possibility it may be a new code flow, adding a retry to the operation can mitigate the issue and work as a workaround. But you can report the issue with Datab...

  • 3 kudos
3 More Replies
delta_bravo
by New Contributor
  • 6217 Views
  • 2 replies
  • 0 kudos

Cluster termination issue

I am using Databricks as a Community Edition user with a limited cluster (just 1 Driver: 15.3 GB Memory, 2 Cores, 1 DBU). I am trying to run some custom algorithms for continuous calculations and writing results to the delta table every 15 minutes al...

  • 6217 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

If you set the "Terminate after" setting to 0 minutes during the creation of an all-purpose compute, it means that the auto-termination feature will be turned off. This is because the "Terminate after" setting is used to specify an inactivity period ...

  • 0 kudos
1 More Replies
curiousoctopus
by New Contributor III
  • 3752 Views
  • 4 replies
  • 4 kudos

Run multiple jobs with different source code at the same time with Databricks asset bundles

Hi,I am migrating from dbx to databricks asset bundles. Previously with dbx I could work on different features in separate branches and launch jobs without issue of one job overwritting the other. Now with databricks asset bundles it seems like I can...

  • 3752 Views
  • 4 replies
  • 4 kudos
Latest Reply
mo_moattar
New Contributor III
  • 4 kudos

We have the same issue. We might have multiple open PR on the bundles that are deploying the code, pipelines, jobs, etc. to the same workspace before the merge and they keep overwriting each other in the workspace.The jobs already have a separate ID ...

  • 4 kudos
3 More Replies
narenderkumar53
by New Contributor II
  • 826 Views
  • 3 replies
  • 2 kudos

can we parameterize the tags in the job compute

I want to monitor the cost better for the databricks job computes.I am using tags in the cluster to monitor cost.The tag values is static as of now.can we parameterize the compute the job cluster so that I can pass the tag values during the runtime a...

  • 826 Views
  • 3 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @,If you're using ADF you can look at below article:Applying Dynamic Tags To Databricks Job Clusters in Azure Data Factory | by Kyle Hale | MediumIf not, I think you can try to write some code that will use below endpoint. The idea is, before exec...

  • 2 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels