cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Sachin_
by New Contributor II
  • 1912 Views
  • 4 replies
  • 0 kudos

The spark context has stopped and the driver is restarting. Your notebook will be automatically

I am trying to execute a scala jar in notebook. When I execute it explicitly I am able to run the jar like this :but when I am trying to run a notebook through databricks workflow I get the below error : The spark context has stopped and the driver i...

Sachin__1-1709881658170.png Sachin__2-1709881677411.png Sachin__3-1709881874830.png
Data Engineering
dataengineering
  • 1912 Views
  • 4 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Could you share the code you have in your JAR file? how are you creating your Spark context in your JAR file?  

  • 0 kudos
3 More Replies
Jamie_209389
by New Contributor III
  • 6030 Views
  • 7 replies
  • 3 kudos

Resolved! In Azure Databricks CLI, how to pass in the parameter notebook_params? Error: Got unexpected extra argument

I am trying to call run-now with notebook_params in Azure Databricks CLI, following https://learn.microsoft.com/en-us/azure/databricks/dev-tools/cli/jobs-cliandescapse the quotes as stated in the documentationhttps://learn.microsoft.com/en-us/azure/d...

  • 6030 Views
  • 7 replies
  • 3 kudos
Latest Reply
Vaitheesh
New Contributor II
  • 3 kudos

I have the latest Databricks CLI setup and configured in my Ubuntu VM. When I tried to run a job using the json template I generated using databricks jobs get 'xxxjob_idxxx' > orig.json it throws an unknown error.Databricks CLI v0.216.0databricks job...

  • 3 kudos
6 More Replies
SS_RATH
by New Contributor
  • 1015 Views
  • 1 replies
  • 0 kudos

I have a notebook in workspace, how to know in which job this particular notebook is referenced.

I have a notebook in workspace, how to know in which job this particular notebook is referenced.

  • 1015 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Honored Contributor
  • 0 kudos

Hi @SS_RATH, Good Day!  I want to inform you that there isn't a direct way to search for a job by the notebook it references. You will have to manually check each job to see which one is using the specific notebook you are interested in.  You can fol...

  • 0 kudos
Braxx
by Contributor II
  • 4911 Views
  • 3 replies
  • 3 kudos

Resolved! cluster creation - access mode option

I am a bit lazy and trying to manually recreate a cluster I have in one workspace into another one. The cluster was created some time ago. Looking at the configuration, the access mode field is "custom": When trying to create a new cluster, I do not...

Captureaa Capturebb
  • 4911 Views
  • 3 replies
  • 3 kudos
Latest Reply
yatharth
New Contributor III
  • 3 kudos

I am facing the same type of issue and now changing back to old UI option is not available, cause of this issue i am not able to view cluster metrics

  • 3 kudos
2 More Replies
vijay_boopathy
by New Contributor
  • 1964 Views
  • 1 replies
  • 0 kudos

Hive vs Delta

I'm curious about your experiences with Hive and Delta Lake. What are the advantages of using Delta over Hive, and in what scenarios would you recommend choosing Delta for data processing tasks? I'd appreciate any insights or recommendations based on...

  • 1964 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Honored Contributor
  • 0 kudos

Delta Lake offers several advantages over Hive. One of the key benefits is its design for petabyte-scale data lakes with streaming and fast access at the forefront. This makes it more suitable for near-real-time streams, unlike Hive. Delta Lake also ...

  • 0 kudos
William_Scardua
by Valued Contributor
  • 1454 Views
  • 2 replies
  • 0 kudos

Drop array in a struct field

Hi guys,look my table definitionwell, I need to remove 'med array' inside that 'equip' field.have any idea ?Thank you

Screenshot 2024-04-02 at 19.03.38.png
  • 1454 Views
  • 2 replies
  • 0 kudos
Latest Reply
Sampath_Kumar
New Contributor II
  • 0 kudos

Hi William,There is array_remove method that can help to remove the elements from an array. Here med array is an element in equip array.If it not helpful, please share some sample data so that we can solve it together.Reference: array_removeThanks

  • 0 kudos
1 More Replies
michael_mehrten
by New Contributor III
  • 23221 Views
  • 27 replies
  • 14 kudos

Resolved! How to use Databricks Repos with a service principal for CI/CD in Azure DevOps?

Databricks Repos best-practices recommend using the Repos REST API to update a repo via your git provider. The REST API requires authentication, which can be done one of two ways:A user / personal access tokenA service principal access tokenUsing a u...

  • 23221 Views
  • 27 replies
  • 14 kudos
Latest Reply
martindlarsson
New Contributor III
  • 14 kudos

Having the exact same problem. Did you find a solution @michael_mehrten ?In my case Im using a managed identity so the solution some topics suggest on generating an access token from a Entra ID service principal is not applicable.

  • 14 kudos
26 More Replies
sharma_kamal
by New Contributor III
  • 1333 Views
  • 2 replies
  • 1 kudos

Resolved! Getting errors while reading data from URL

I'm encountering some issues while trying to read a public dataset from a URL using Databricks. Here's the code snippet(along with errors) I'm working with: I'm confused about Delta format error here.When I read data from a URL, how would it have a D...

sharma_kamal_1-1710132330915.png
  • 1333 Views
  • 2 replies
  • 1 kudos
Latest Reply
MuthuLakshmi
New Contributor III
  • 1 kudos

@sharma_kamal  Please disable the formatCheck in notebook and check if you could read the data The configuration command %sql SET spark.databricks.delta.formatCheck.enabled=false will disable the format check for Delta tables in Databricks. Databrick...

  • 1 kudos
1 More Replies
Yuki
by New Contributor II
  • 1580 Views
  • 3 replies
  • 1 kudos

Can I use Git provider with using Service Principal in job

Hi everyone,I'm trying to use Git provider in Databricks job.First, I was using my personal user account to `Run as`.But when I change `Run as` to Service Principal, it was failed because of permission error.And I can't find a way to solve it.Could I...

Yuki_0-1699340000007.png
  • 1580 Views
  • 3 replies
  • 1 kudos
Latest Reply
martindlarsson
New Contributor III
  • 1 kudos

The documentation is lacking in this area which should be easy to set up. Instead we are forced to search among community topics such as these.

  • 1 kudos
2 More Replies
r-goswami
by New Contributor II
  • 993 Views
  • 3 replies
  • 0 kudos

Unable to create/save job of type "python script"

Hi All,We are facing an issue while creating a simple job of type "python script". A python file in workspace is selected as a source. No arguments/job parameters are provided. This is a strange behavior and just started occurring since today morning...

  • 993 Views
  • 3 replies
  • 0 kudos
Latest Reply
r-goswami
New Contributor II
  • 0 kudos

Hi Ayushi,How can I call call RESET API? this issue is occurring when creating a new job from databricks web UI. It looks like REST API is for resetting job settings of an existing job.Can this be an issue with the databricks workspace we are using?A...

  • 0 kudos
2 More Replies
hyedesign
by New Contributor II
  • 1525 Views
  • 3 replies
  • 0 kudos

Getting SparkConnectGrpcException: (java.io.EOFException) error when using foreachBatch

Hello, I am trying to write a simple upsert statement following the steps in the tutorials. here is what my code looks like:from pyspark.sql import functions as Fdef upsert_source_one(self df_source = spark.readStream.format("delta").table(self.so...

  • 1525 Views
  • 3 replies
  • 0 kudos
Latest Reply
hyedesign
New Contributor II
  • 0 kudos

Using sample data sets. Here is the full code. This error does seem to be related to runtime version 15,df_source = spark.readStream.format("delta").table("`cat1`.`bronze`.`officer_info`")df_orig_state = spark.read.format("delta").table("`sample-db`....

  • 0 kudos
2 More Replies
IshaBudhiraja
by New Contributor II
  • 1249 Views
  • 3 replies
  • 0 kudos

Migration of Synapse Data bricks activity executions from All purpose cluster to New job cluster

Hi,We have been planning to migrate the Synapse Data bricks activity executions from 'All-purpose cluster' to 'New job cluster' to reduce overall cost. We are using Standard_D3_v2 as cluster node type that has 4 CPU cores in total. The current quota ...

IshaBudhiraja_0-1711688756158.png
  • 1249 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @IshaBudhiraja,  Quotas are used for different resource groups, subscriptions, accounts, and scopes. The number of cores for a particular region may be restricted by your subscription.To verify your subscription’s usage and quotas, follow these st...

  • 0 kudos
2 More Replies
AxelBrsn
by New Contributor III
  • 3556 Views
  • 3 replies
  • 2 kudos

Resolved! Use DLT from another pipeline

Hello, I have a question.Context :I have a Unity Catalog organized with three schemas (bronze, silver and gold). Logically, I would like to create tables in each schemas.I tried to organize my pipelines on the layers, which mean that I would like to ...

  • 3556 Views
  • 3 replies
  • 2 kudos
Latest Reply
AxelBrsn
New Contributor III
  • 2 kudos

Hello, thanks for the answers @YuliyanBogdanov, @standup1.So the solution is to use catalog.schema.table, and not LIVE.table, that's the key, you were right standup!But, you won't have the visibility of the tables on Bronze Pipeline, if you are on Si...

  • 2 kudos
2 More Replies
maikelos272
by New Contributor II
  • 3232 Views
  • 4 replies
  • 2 kudos

Cannot create storage credential without Contributor role

Hello,I am trying to create a Storage Credential. I have created the access connector and gave the managed identity "Storage Blob Data Owner" permissions. However when I want to create a storage credential I get the following error:Creating a storage...

  • 3232 Views
  • 4 replies
  • 2 kudos
Latest Reply
Kim3
New Contributor II
  • 2 kudos

Hi @Kaniz_Fatma Can you elaborate on the error "Refresh token not found for userId"?I have exactly the same problem as described in this thread. I am trying to create a storage credential using a Personal Access Token from a Service Principal. This r...

  • 2 kudos
3 More Replies
SenthilJ
by New Contributor III
  • 2055 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks Deep Clone

Hi,I am working on a DR design for Databricks in Azure. The recommendation from Databricks is to use Deep Clone to clone the Unity Catalog tables (within or across catalogs). My design is to ensure that DR is managed across different regions i.e. pri...

Data Engineering
Disaster Recovery
Unity Catalog
  • 2055 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @SenthilJ, The recommendation from Databricks to use Deep Clone for cloning Unity Catalog (UC) tables is indeed a prudent approach. Deep Clone facilitates the seamless replication of UC objects, including schemas, managed tables, access permission...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels