cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

hyedesign
by New Contributor II
  • 508 Views
  • 3 replies
  • 0 kudos

Getting SparkConnectGrpcException: (java.io.EOFException) error when using foreachBatch

Hello, I am trying to write a simple upsert statement following the steps in the tutorials. here is what my code looks like:from pyspark.sql import functions as Fdef upsert_source_one(self df_source = spark.readStream.format("delta").table(self.so...

  • 508 Views
  • 3 replies
  • 0 kudos
Latest Reply
hyedesign
New Contributor II
  • 0 kudos

Using sample data sets. Here is the full code. This error does seem to be related to runtime version 15,df_source = spark.readStream.format("delta").table("`cat1`.`bronze`.`officer_info`")df_orig_state = spark.read.format("delta").table("`sample-db`....

  • 0 kudos
2 More Replies
IshaBudhiraja
by New Contributor II
  • 579 Views
  • 3 replies
  • 0 kudos

Migration of Synapse Data bricks activity executions from All purpose cluster to New job cluster

Hi,We have been planning to migrate the Synapse Data bricks activity executions from 'All-purpose cluster' to 'New job cluster' to reduce overall cost. We are using Standard_D3_v2 as cluster node type that has 4 CPU cores in total. The current quota ...

IshaBudhiraja_0-1711688756158.png
  • 579 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @IshaBudhiraja,  Quotas are used for different resource groups, subscriptions, accounts, and scopes. The number of cores for a particular region may be restricted by your subscription.To verify your subscription’s usage and quotas, follow these st...

  • 0 kudos
2 More Replies
AxelBrsn
by New Contributor III
  • 832 Views
  • 3 replies
  • 2 kudos

Resolved! Use DLT from another pipeline

Hello, I have a question.Context :I have a Unity Catalog organized with three schemas (bronze, silver and gold). Logically, I would like to create tables in each schemas.I tried to organize my pipelines on the layers, which mean that I would like to ...

  • 832 Views
  • 3 replies
  • 2 kudos
Latest Reply
AxelBrsn
New Contributor III
  • 2 kudos

Hello, thanks for the answers @YuliyanBogdanov, @standup1.So the solution is to use catalog.schema.table, and not LIVE.table, that's the key, you were right standup!But, you won't have the visibility of the tables on Bronze Pipeline, if you are on Si...

  • 2 kudos
2 More Replies
EDDatabricks
by Contributor
  • 597 Views
  • 2 replies
  • 0 kudos

Concurrency issue with append only writed

Dear all,We have a pyspark streaming job (DBR: 14.3) that continuously writes new data on a Delta Table (TableA).On this table, there is a pyspark batch job (DBR: 14.3) that operates every 15 minuted and in some cases it may delete some records from ...

Data Engineering
Concurrency
DBR 14.3
delta
MERGE
  • 597 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @EDDatabricks,  Thank you for providing the details about your PySpark streaming and batch jobs operating on a Delta Table.  The concurrency issue you’re encountering seems to be related to the deletion of records from your Delta Table (TableA) du...

  • 0 kudos
1 More Replies
maikelos272
by New Contributor II
  • 1527 Views
  • 4 replies
  • 2 kudos

Cannot create storage credential without Contributor role

Hello,I am trying to create a Storage Credential. I have created the access connector and gave the managed identity "Storage Blob Data Owner" permissions. However when I want to create a storage credential I get the following error:Creating a storage...

  • 1527 Views
  • 4 replies
  • 2 kudos
Latest Reply
Kim3
New Contributor II
  • 2 kudos

Hi @Kaniz Can you elaborate on the error "Refresh token not found for userId"?I have exactly the same problem as described in this thread. I am trying to create a storage credential using a Personal Access Token from a Service Principal. This results...

  • 2 kudos
3 More Replies
BenDataBricks
by New Contributor
  • 260 Views
  • 1 replies
  • 0 kudos

OAuth U2M Manual token generation failing

I am writing a frontend webpage that will log into DataBricks and allow the user to select datasets.I am new to front end development, so there may be some things I am missing here, but I know that the DataBricks SQL connector for javascript only wor...

  • 260 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @BenDataBricks,  Ensure that the auth_code variable in your Python script contains the correct authorization code obtained from the browser.Verify that the code_verifier you’re using matches the one you generated earlier.Confirm that the redirect_...

  • 0 kudos
SenthilJ
by New Contributor III
  • 619 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks Deep Clone

Hi,I am working on a DR design for Databricks in Azure. The recommendation from Databricks is to use Deep Clone to clone the Unity Catalog tables (within or across catalogs). My design is to ensure that DR is managed across different regions i.e. pri...

Data Engineering
Disaster Recovery
Unity Catalog
  • 619 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @SenthilJ, The recommendation from Databricks to use Deep Clone for cloning Unity Catalog (UC) tables is indeed a prudent approach. Deep Clone facilitates the seamless replication of UC objects, including schemas, managed tables, access permission...

  • 0 kudos
Khaled75
by New Contributor
  • 301 Views
  • 1 replies
  • 0 kudos

Connect databricks

I discovered recently mlflow managed by Databricks so I'm very new to this and I need some help.Can someone explain for me clearly the steps to do to be able to track my runs into the Databricks API.Here are the steps I followed :1/ Installing Databr...

Capture d’écran 2024-03-30 à 01.20.12.png Capture d’écran 2024-03-31 à 16.54.55.png
Data Engineering
Data
tracking_ui
  • 301 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Khaled75,  The specific error message you provided is related to fetching the experiment by name. It’s essential to understand the exact error message. Can you share the complete error text?Confirm that your Databricks authentication is working c...

  • 0 kudos
Jiri_Koutny
by New Contributor III
  • 2742 Views
  • 10 replies
  • 3 kudos

Delay in files update on filesystem

Hi, I noticed that there is quite a significant delay (2 - 10s) between making a change to some file in Repos via Databricks file edit window and propagation of such change to the filesystem. Our engineers and scientists use YAML config files. If the...

  • 2742 Views
  • 10 replies
  • 3 kudos
Latest Reply
DaniyarZ
New Contributor II
  • 3 kudos

There is a trick: if you execute "%sh ls" command, it forces update of filesystem immediately

  • 3 kudos
9 More Replies
Debi-Moha
by New Contributor II
  • 347 Views
  • 1 replies
  • 1 kudos

Unable to write to S3 bucket from Databricks using boto3

I am unable to write data from Databricks into an S3 bucket. I have set up the permissions both on the bucket policy level, and the user level as well (Put, List, and others are added, have also tried with s3*). Bucket region and workspace region are...

  • 347 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Debi-Moha,  Ensure that the IAM role associated with your Databricks cluster has the necessary permissions to access the S3 bucket. Specifically, it should have permissions for s3:PutObject and s3:ListBucket.Double-check that the IAM role is corr...

  • 1 kudos
Kibour
by Contributor
  • 2758 Views
  • 2 replies
  • 2 kudos

Resolved! Import from repo

Hi all,I am trying the new "git folder" feature, with a repo that works fine from the "Repos". In the new folder location, my imports from my own repo don't work anymore. Anyone faced something similar?Thanks in advance for sharing your experience

  • 2758 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kibour
Contributor
  • 2 kudos

Hi @Kaniz ,Thanks for all the suggested options.I tried again with a brand new git folder. I just changed cluster from DBR 14.2 ML to 14.3 ML, and now the imports work as expected.Kind regards

  • 2 kudos
1 More Replies
cosminsanda
by New Contributor III
  • 1377 Views
  • 9 replies
  • 0 kudos

Adding a new column triggers reprocessing of Auto Loader source table

I have a source table A in Unity Catalog. This table is constantly written to and is a streaming table.I also have another table B in Unity Catalog. This is a managed table with liquid clustering.Using Auto Loader I move new data from A to B using a ...

Data Engineering
auto-loader
  • 1377 Views
  • 9 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

change data feed might be a solution for you perhaps.https://docs.databricks.com/en/delta/delta-change-data-feed.html

  • 0 kudos
8 More Replies
databrick_usert
by New Contributor
  • 441 Views
  • 1 replies
  • 0 kudos

Workspace client creation error

Hi,We are trying to use Python SDK and create a workspace client using the following code:%pip install databricks-sdk --upgrade dbutils.library.restartPython()from databricks.sdk import WorkspaceClientw = WorkspaceClient()Here is the notebook: https:...

  • 441 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Honored Contributor
  • 0 kudos

Hi @databrick_usert , Hope you are doing well!  Can you check the version of the SDK running in this notebook? If it's not an upgraded version then could you please try to upgrade the SDK version and then restart the python after the pip install?  %p...

  • 0 kudos
MartinIsti
by New Contributor III
  • 387 Views
  • 1 replies
  • 0 kudos

Python UDF in Unity Catalog - spark.sql error

I'm trying to utilise the option to create UDFs in Unity Catalog. That would be a great way to have functions available in a fairly straightforward manner without e.g. putting the function definitions in an extra notebook that I %run to make them ava...

Data Engineering
function
udf
  • 387 Views
  • 1 replies
  • 0 kudos
Latest Reply
MartinIsti
New Contributor III
  • 0 kudos

I can see someone has asked a very similar question with the same error message:https://community.databricks.com/t5/data-engineering/unable-to-use-sql-udf/td-p/61957The OP hasn't yet provided sufficient details about his/her function so no proper res...

  • 0 kudos
Kingston
by New Contributor II
  • 360 Views
  • 3 replies
  • 0 kudos

Unable to overwrite table to Azure sql db

Hi I have a requirement to read table from azure sql db and update the table in azure databricks with transformations and overwrite updated table to the azure sql db but due to lazy evaluation of pyspark im unable to overwrite the table in azure sql ...

  • 360 Views
  • 3 replies
  • 0 kudos
Latest Reply
YuliyanBogdanov
New Contributor III
  • 0 kudos

 Hi @Kingston Make sure that you have the proper permissions on the SQL server for the user you do the authentication through JDBC with, i.e. database reader / database writer. Then your approach can go in two directions, push the data from Databrick...

  • 0 kudos
2 More Replies
Labels
Top Kudoed Authors