cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Ricklen
by New Contributor III
  • 1391 Views
  • 1 replies
  • 1 kudos

VSCode Databricks Extension Performance

Hello Everyone!I've been using the Databricks extension in VSCode for a while know and I'm syncing my repository to my Databricks workspace. In the beginning syncing files to my workspace was basically instant. But now it is starting to take a lot of...

  • 1391 Views
  • 1 replies
  • 1 kudos
alm
by New Contributor III
  • 986 Views
  • 1 replies
  • 0 kudos

Define SQL table name using Python

I want to control which schema a notebook writes. I want it to depend on the user that runs the notebook.For now, the scope is to suport languages Python and SQL. I have written a Python function, `get_path`, that returns the full path of the destina...

  • 986 Views
  • 1 replies
  • 0 kudos
rajeevk
by New Contributor
  • 1272 Views
  • 1 replies
  • 0 kudos

Is there a %%capture or equivalent possible in databricks notebook

I want to suppress all output of a cell, including text and charts plots, Is it possible to do in Data Bricks. I am able to do the same in other notebook environments, but exactly the same does not work in Databricks. Any insight or even understandab...

  • 1272 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @rajeevk ,The one way is to use cell hiding:Databricks notebook interface and controls | Databricks on AWS

  • 0 kudos
Pawanukey12
by New Contributor
  • 1063 Views
  • 1 replies
  • 0 kudos

How to get the details of the notebook i.e who is the owner of a notebook ?

I am using azure data bricks. we have a version control system git along with it . How do i get to know if this particular notebook is created or owned by whom ??

  • 1063 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Pawanukey12 ,There is no direct API to get the owner of a notebook using the notebook path in Databricks. However, you can manually check the owner of the notebook by the notebook name. You can manually go to the folder where the notebook is loca...

  • 0 kudos
ruoyuqian
by New Contributor II
  • 1550 Views
  • 1 replies
  • 0 kudos

Resolved! Delta Live Table run outside out pipeline

I have created a notebook for my Delta Live Table pipeline and it runs without errors however if I run the notebook alone in my cluster it,  says not allowed and show this error.  Does it mean I can only run delta live table in the pipeline and canno...

  • 1550 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh-Pandey
Databricks MVP
  • 0 kudos

Hi @ruoyuqian Delta Live Tables (DLT) have specific execution contexts and dependencies that are managed within their pipeline environment. This is why the code runs successfully only when executed within the pipeline, as DLT creates its own job clus...

  • 0 kudos
ShankarM
by Contributor
  • 1851 Views
  • 2 replies
  • 0 kudos

Intelligent source to target mapping

I want to implement source to target mapping in such a way that source and target columns are auto mapped using intelligent AI mapping resulting in reduction of mapping efforts especially when there are 100+ columns in a table. Metadata information o...

  • 1851 Views
  • 2 replies
  • 0 kudos
Latest Reply
ShankarM
Contributor
  • 0 kudos

Can you please reply to my latest follow up question?

  • 0 kudos
1 More Replies
thiagoawstest
by Contributor
  • 1486 Views
  • 1 replies
  • 0 kudos

add or change roles

Hello, I have a Databricks environment provisioned by AWS. I would like to know if it is possible to add new roles or change existing roles. In my environment, Admin and User appear. I have the following need: how can I have a group, but the users th...

  • 1486 Views
  • 1 replies
  • 0 kudos
copper-carrot
by New Contributor II
  • 1794 Views
  • 1 replies
  • 1 kudos

spark.sql() is suddenly giving an error "Unable to instantiate org.apache.hadoop.hive.metastore.Hive

spark.sql() is suddenly giving an error "Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient" on databricks jobs and python scripts that worked last month.  No local changes on my end.What could be the cause of this and what sh...

  • 1794 Views
  • 1 replies
  • 1 kudos
neointab
by New Contributor
  • 852 Views
  • 1 replies
  • 0 kudos

how to restrict group/user cant create unstricted cluster.

we have set up the entitlement,but it doest work, i checked the blogs. it also need the set up in cluster policy. but i dont find how to set up in cluster policy. could you give some suggestions?

  • 852 Views
  • 1 replies
  • 0 kudos
Latest Reply
antonuzzo96
New Contributor III
  • 0 kudos

Hi, have you checked if users are admins inside the workspace? Because this can greatly change the policies and restrictions on the clusters

  • 0 kudos
hpant1
by New Contributor III
  • 907 Views
  • 1 replies
  • 0 kudos

Does it make sense to create volume at external location in dev enviroment?

I have create a dev resource group for databricks which includes "storage account", "access connector" and "databricks workspace". In the storage account I have created a container which is linked to the metastore. This container also contain raw dat...

  • 907 Views
  • 1 replies
  • 0 kudos
Latest Reply
antonuzzo96
New Contributor III
  • 0 kudos

Hei, for some use cases we have created external volumes in Databricks because they needed to access them outside of Databricks and directly on the storage account, as the files had to interact with other tools.

  • 0 kudos
hpant1
by New Contributor III
  • 836 Views
  • 1 replies
  • 2 kudos

What is more optimized way of writing delta table in a workflow, "append" or "overwrite"?

What is more optimized way of writing delta table in a workflow which is running every hour, "append" or "overwrite"?

  • 836 Views
  • 1 replies
  • 2 kudos
Latest Reply
Witold
Honored Contributor
  • 2 kudos

There's no "optimized way", as these are two different concepts, and depend on your use case: Overwrite  removes existing data, i.e. replaces it with new data, while append adds new data to your existing table.

  • 2 kudos
alesventus
by Contributor
  • 2014 Views
  • 2 replies
  • 0 kudos

Resolved! How to handle load of 300 tables to delta lake

My task is to sync 300 tables from on prem sql server to delta lake. I will load CDC from Raw. First step is to move CDC data to bronze with autoloader. Then using delta stream get changes from bronze, make simple datatype changes and merge this data...

  • 2014 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @alesventus ,You can apply metadata/config driven approch. You can create control table (or json/yaml file) with all information that are required for processing like:- table name- target table- table primary keys- transformation to applyAnd then ...

  • 0 kudos
1 More Replies
ahmed_zarar
by New Contributor III
  • 2521 Views
  • 2 replies
  • 3 kudos

Resolved! Process single data set with different JSON schema rows using Pyspark in databricks

 Hi,i am getting data from event hub and stored in delta table as a row table, i data i received in json , the problem i data i have different schema in each row but i code i use it take first row a json schema i am stuck how to do please any one gui...

ahmed_zarar_0-1722683168135.png
  • 2521 Views
  • 2 replies
  • 3 kudos
Latest Reply
ahmed_zarar
New Contributor III
  • 3 kudos

Thank you , I got it.

  • 3 kudos
1 More Replies
hpant
by New Contributor III
  • 4977 Views
  • 9 replies
  • 7 kudos

Resolved! Where exactly I should create Volume in a catalog?

Currently my Databricks looks like this: I want to create volume to access external location. Where exactly should I create it? Should a create new schema in "poe" catalog and create a volume inside it or create it in a existing schema? What is the b...

hpant_0-1722505474676.png
  • 4977 Views
  • 9 replies
  • 7 kudos
Latest Reply
hpant1
New Contributor III
  • 7 kudos

No, I don't have.  

  • 7 kudos
8 More Replies
juanicobsider
by New Contributor
  • 1767 Views
  • 2 replies
  • 3 kudos

How to parse VARIANT type column using Pyspark sintax?

I trying to parse VARIANT data type column, what is the correct sintax to parse sub columns using Pyspark, is it possible?.I'd like to know how to do it this way (I know how to do it using SQL syntax).   

juanicobsider_0-1722907722976.png juanicobsider_1-1722907840323.png juanicobsider_2-1722907947212.png
  • 1767 Views
  • 2 replies
  • 3 kudos
Latest Reply
Witold
Honored Contributor
  • 3 kudos

As an addition to what @szymon_dybczak already said correctly. It's actually not a workaround, it's designed and documented that way. Make sure that you understand the difference between `:`, and `.`.Regarding PySpark, the API has other variant relat...

  • 3 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels