cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

vijay_boopathy
by New Contributor
  • 266 Views
  • 1 replies
  • 0 kudos

Hive vs Delta

I'm curious about your experiences with Hive and Delta Lake. What are the advantages of using Delta over Hive, and in what scenarios would you recommend choosing Delta for data processing tasks? I'd appreciate any insights or recommendations based on...

  • 266 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Valued Contributor II
  • 0 kudos

Delta Lake offers several advantages over Hive. One of the key benefits is its design for petabyte-scale data lakes with streaming and fast access at the forefront. This makes it more suitable for near-real-time streams, unlike Hive. Delta Lake also ...

  • 0 kudos
William_Scardua
by Valued Contributor
  • 456 Views
  • 2 replies
  • 0 kudos

Drop array in a struct field

Hi guys,look my table definitionwell, I need to remove 'med array' inside that 'equip' field.have any idea ?Thank you

Screenshot 2024-04-02 at 19.03.38.png
  • 456 Views
  • 2 replies
  • 0 kudos
Latest Reply
Sampath_Kumar
New Contributor II
  • 0 kudos

Hi William,There is array_remove method that can help to remove the elements from an array. Here med array is an element in equip array.If it not helpful, please share some sample data so that we can solve it together.Reference: array_removeThanks

  • 0 kudos
1 More Replies
michael_mehrten
by New Contributor III
  • 15068 Views
  • 27 replies
  • 14 kudos

Resolved! How to use Databricks Repos with a service principal for CI/CD in Azure DevOps?

Databricks Repos best-practices recommend using the Repos REST API to update a repo via your git provider. The REST API requires authentication, which can be done one of two ways:A user / personal access tokenA service principal access tokenUsing a u...

  • 15068 Views
  • 27 replies
  • 14 kudos
Latest Reply
martindlarsson
New Contributor III
  • 14 kudos

Having the exact same problem. Did you find a solution @michael_mehrten ?In my case Im using a managed identity so the solution some topics suggest on generating an access token from a Entra ID service principal is not applicable.

  • 14 kudos
26 More Replies
sharma_kamal
by New Contributor III
  • 392 Views
  • 2 replies
  • 1 kudos

Resolved! Getting errors while reading data from URL

I'm encountering some issues while trying to read a public dataset from a URL using Databricks. Here's the code snippet(along with errors) I'm working with: I'm confused about Delta format error here.When I read data from a URL, how would it have a D...

sharma_kamal_1-1710132330915.png
  • 392 Views
  • 2 replies
  • 1 kudos
Latest Reply
MuthuLakshmi
New Contributor III
  • 1 kudos

@sharma_kamal  Please disable the formatCheck in notebook and check if you could read the data The configuration command %sql SET spark.databricks.delta.formatCheck.enabled=false will disable the format check for Delta tables in Databricks. Databrick...

  • 1 kudos
1 More Replies
Yuki
by New Contributor
  • 578 Views
  • 3 replies
  • 0 kudos

Can I use Git provider with using Service Principal in job

Hi everyone,I'm trying to use Git provider in Databricks job.First, I was using my personal user account to `Run as`.But when I change `Run as` to Service Principal, it was failed because of permission error.And I can't find a way to solve it.Could I...

Yuki_0-1699340000007.png
  • 578 Views
  • 3 replies
  • 0 kudos
Latest Reply
martindlarsson
New Contributor III
  • 0 kudos

The documentation is lacking in this area which should be easy to set up. Instead we are forced to search among community topics such as these.

  • 0 kudos
2 More Replies
satishnavik
by New Contributor II
  • 1109 Views
  • 4 replies
  • 0 kudos

How to connect Databricks Database with Springboot application using JPA

facing issue with integrating our Spring boot JPA supported application with Databricks.Below are the steps and setting we did for the integration.When we are starting the spring boot application we are getting a warning as :HikariPool-1 - Driver doe...

  • 1109 Views
  • 4 replies
  • 0 kudos
Latest Reply
SpringBoot
New Contributor II
  • 0 kudos

Thanks @SanjayTS for the response.Unfortunately, none of the approach seems very promising given the dependencies and efforts requires to makes these changes.in the current scenario, we are looking for some ready to use drivers or options to minimize...

  • 0 kudos
3 More Replies
r-goswami
by New Contributor II
  • 163 Views
  • 3 replies
  • 0 kudos

Unable to create/save job of type "python script"

Hi All,We are facing an issue while creating a simple job of type "python script". A python file in workspace is selected as a source. No arguments/job parameters are provided. This is a strange behavior and just started occurring since today morning...

  • 163 Views
  • 3 replies
  • 0 kudos
Latest Reply
r-goswami
New Contributor II
  • 0 kudos

Hi Ayushi,How can I call call RESET API? this issue is occurring when creating a new job from databricks web UI. It looks like REST API is for resetting job settings of an existing job.Can this be an issue with the databricks workspace we are using?A...

  • 0 kudos
2 More Replies
hyedesign
by New Contributor II
  • 326 Views
  • 3 replies
  • 0 kudos

Getting SparkConnectGrpcException: (java.io.EOFException) error when using foreachBatch

Hello, I am trying to write a simple upsert statement following the steps in the tutorials. here is what my code looks like:from pyspark.sql import functions as Fdef upsert_source_one(self df_source = spark.readStream.format("delta").table(self.so...

  • 326 Views
  • 3 replies
  • 0 kudos
Latest Reply
hyedesign
New Contributor II
  • 0 kudos

Using sample data sets. Here is the full code. This error does seem to be related to runtime version 15,df_source = spark.readStream.format("delta").table("`cat1`.`bronze`.`officer_info`")df_orig_state = spark.read.format("delta").table("`sample-db`....

  • 0 kudos
2 More Replies
IshaBudhiraja
by New Contributor II
  • 463 Views
  • 3 replies
  • 0 kudos

Migration of Synapse Data bricks activity executions from All purpose cluster to New job cluster

Hi,We have been planning to migrate the Synapse Data bricks activity executions from 'All-purpose cluster' to 'New job cluster' to reduce overall cost. We are using Standard_D3_v2 as cluster node type that has 4 CPU cores in total. The current quota ...

IshaBudhiraja_0-1711688756158.png
  • 463 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @IshaBudhiraja,  Quotas are used for different resource groups, subscriptions, accounts, and scopes. The number of cores for a particular region may be restricted by your subscription.To verify your subscription’s usage and quotas, follow these st...

  • 0 kudos
2 More Replies
Husky
by New Contributor II
  • 888 Views
  • 3 replies
  • 0 kudos

Upload file from local file system to Unity Catalog Volume (via databricks-connect)

Context:IDE: IntelliJ 2023.3.2Library: databricks-connect 13.3Python: 3.10Description:I develop notebooks and python scripts locally in the IDE and I connect to the spark cluster via databricks-connect for a better developer experience.  I download a...

  • 888 Views
  • 3 replies
  • 0 kudos
Latest Reply
lathaniel
New Contributor II
  • 0 kudos

Late to the discussion, but I too was looking for a way to do this _programmatically_, as opposed to the UI.The solution I landed on was using the Python SDK (though you could assuredly do this using an API request instead if you're not in Python):w ...

  • 0 kudos
2 More Replies
AxelBrsn
by New Contributor III
  • 524 Views
  • 3 replies
  • 2 kudos

Resolved! Use DLT from another pipeline

Hello, I have a question.Context :I have a Unity Catalog organized with three schemas (bronze, silver and gold). Logically, I would like to create tables in each schemas.I tried to organize my pipelines on the layers, which mean that I would like to ...

  • 524 Views
  • 3 replies
  • 2 kudos
Latest Reply
AxelBrsn
New Contributor III
  • 2 kudos

Hello, thanks for the answers @YuliyanBogdanov, @standup1.So the solution is to use catalog.schema.table, and not LIVE.table, that's the key, you were right standup!But, you won't have the visibility of the tables on Bronze Pipeline, if you are on Si...

  • 2 kudos
2 More Replies
EDDatabricks
by Contributor
  • 446 Views
  • 2 replies
  • 0 kudos

Concurrency issue with append only writed

Dear all,We have a pyspark streaming job (DBR: 14.3) that continuously writes new data on a Delta Table (TableA).On this table, there is a pyspark batch job (DBR: 14.3) that operates every 15 minuted and in some cases it may delete some records from ...

Data Engineering
Concurrency
DBR 14.3
delta
MERGE
  • 446 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @EDDatabricks,  Thank you for providing the details about your PySpark streaming and batch jobs operating on a Delta Table.  The concurrency issue you’re encountering seems to be related to the deletion of records from your Delta Table (TableA) du...

  • 0 kudos
1 More Replies
maikelos272
by New Contributor II
  • 1054 Views
  • 4 replies
  • 2 kudos

Cannot create storage credential without Contributor role

Hello,I am trying to create a Storage Credential. I have created the access connector and gave the managed identity "Storage Blob Data Owner" permissions. However when I want to create a storage credential I get the following error:Creating a storage...

  • 1054 Views
  • 4 replies
  • 2 kudos
Latest Reply
Kim3
New Contributor II
  • 2 kudos

Hi @Kaniz Can you elaborate on the error "Refresh token not found for userId"?I have exactly the same problem as described in this thread. I am trying to create a storage credential using a Personal Access Token from a Service Principal. This results...

  • 2 kudos
3 More Replies
BenDataBricks
by New Contributor
  • 169 Views
  • 1 replies
  • 0 kudos

OAuth U2M Manual token generation failing

I am writing a frontend webpage that will log into DataBricks and allow the user to select datasets.I am new to front end development, so there may be some things I am missing here, but I know that the DataBricks SQL connector for javascript only wor...

  • 169 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @BenDataBricks,  Ensure that the auth_code variable in your Python script contains the correct authorization code obtained from the browser.Verify that the code_verifier you’re using matches the one you generated earlier.Confirm that the redirect_...

  • 0 kudos
SenthilJ
by New Contributor III
  • 373 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks Deep Clone

Hi,I am working on a DR design for Databricks in Azure. The recommendation from Databricks is to use Deep Clone to clone the Unity Catalog tables (within or across catalogs). My design is to ensure that DR is managed across different regions i.e. pri...

Data Engineering
Disaster Recovery
Unity Catalog
  • 373 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @SenthilJ, The recommendation from Databricks to use Deep Clone for cloning Unity Catalog (UC) tables is indeed a prudent approach. Deep Clone facilitates the seamless replication of UC objects, including schemas, managed tables, access permission...

  • 0 kudos
Labels