Data Engineering

Forum Posts

Sorted by:

by hyedesign • New Contributor II

04-01-2024 10:58:31 AM

508 Views
3 replies
0 kudos

Getting SparkConnectGrpcException: (java.io.EOFException) error when using foreachBatch

Hello, I am trying to write a simple upsert statement following the steps in the tutorials. here is what my code looks like:from pyspark.sql import functions as Fdef upsert_source_one(self df_source = spark.readStream.format("delta").table(self.so...

Data Engineering

pyspark

508 Views
3 replies
0 kudos

04-01-2024 10:58:31 AM

View Replies

Latest Reply

hyedesign
New Contributor II

04-02-2024 7:05:30 AM

0 kudos

Using sample data sets. Here is the full code. This error does seem to be related to runtime version 15,df_source = spark.readStream.format("delta").table("`cat1`.`bronze`.`officer_info`")df_orig_state = spark.read.format("delta").table("`sample-db`....

0 kudos

04-02-2024 7:05:30 AM

2 More Replies

by IshaBudhiraja • New Contributor II

03-28-2024 10:15:06 PM

579 Views
3 replies
0 kudos

Migration of Synapse Data bricks activity executions from All purpose cluster to New job cluster

Hi,We have been planning to migrate the Synapse Data bricks activity executions from 'All-purpose cluster' to 'New job cluster' to reduce overall cost. We are using Standard_D3_v2 as cluster node type that has 4 CPU cores in total. The current quota ...

Data Engineering

579 Views
3 replies
0 kudos

03-28-2024 10:15:06 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-29-2024 4:02:22 PM

0 kudos

Hi @IshaBudhiraja, Quotas are used for different resource groups, subscriptions, accounts, and scopes. The number of cores for a particular region may be restricted by your subscription.To verify your subscription’s usage and quotas, follow these st...

0 kudos

03-29-2024 4:02:22 PM

2 More Replies

by AxelBrsn • New Contributor III

03-27-2024 9:06:52 AM

832 Views
3 replies
2 kudos

Resolved! Use DLT from another pipeline

Hello, I have a question.Context :I have a Unity Catalog organized with three schemas (bronze, silver and gold). Logically, I would like to create tables in each schemas.I tried to organize my pipelines on the layers, which mean that I would like to ...

Data Engineering

832 Views
3 replies
2 kudos

03-27-2024 9:06:52 AM

View Replies

Latest Reply

AxelBrsn
New Contributor III

04-02-2024 6:07:48 AM

2 kudos

Hello, thanks for the answers @YuliyanBogdanov, @standup1.So the solution is to use catalog.schema.table, and not LIVE.table, that's the key, you were right standup!But, you won't have the visibility of the tables on Bronze Pipeline, if you are on Si...

2 kudos

04-02-2024 6:07:48 AM

2 More Replies

by EDDatabricks • Contributor

04-01-2024 2:30:30 AM

597 Views
2 replies
0 kudos

Concurrency issue with append only writed

Dear all,We have a pyspark streaming job (DBR: 14.3) that continuously writes new data on a Delta Table (TableA).On this table, there is a pyspark batch job (DBR: 14.3) that operates every 15 minuted and in some cases it may delete some records from ...

Data Engineering

Concurrency

DBR 14.3

delta

MERGE

597 Views
2 replies
0 kudos

04-01-2024 2:30:30 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-02-2024 4:02:49 AM

0 kudos

Hi @EDDatabricks, Thank you for providing the details about your PySpark streaming and batch jobs operating on a Delta Table. The concurrency issue you’re encountering seems to be related to the deletion of records from your Delta Table (TableA) du...

0 kudos

04-02-2024 4:02:49 AM

1 More Replies

by maikelos272 • New Contributor II

01-16-2024 2:03:28 AM

1527 Views
4 replies
2 kudos

Cannot create storage credential without Contributor role

Hello,I am trying to create a Storage Credential. I have created the access connector and gave the managed identity "Storage Blob Data Owner" permissions. However when I want to create a storage credential I get the following error:Creating a storage...

Data Engineering

1527 Views
4 replies
2 kudos

01-16-2024 2:03:28 AM

View Replies

Latest Reply

Kim3
New Contributor II

04-02-2024 5:14:51 AM

2 kudos

Hi @Kaniz Can you elaborate on the error "Refresh token not found for userId"?I have exactly the same problem as described in this thread. I am trying to create a storage credential using a Personal Access Token from a Service Principal. This results...

2 kudos

04-02-2024 5:14:51 AM

3 More Replies

by BenDataBricks • New Contributor

04-01-2024 11:44:44 AM

260 Views
1 replies
0 kudos

OAuth U2M Manual token generation failing

I am writing a frontend webpage that will log into DataBricks and allow the user to select datasets.I am new to front end development, so there may be some things I am missing here, but I know that the DataBricks SQL connector for javascript only wor...

Data Engineering

260 Views
1 replies
0 kudos

04-01-2024 11:44:44 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-02-2024 4:20:26 AM

0 kudos

Hi @BenDataBricks, Ensure that the auth_code variable in your Python script contains the correct authorization code obtained from the browser.Verify that the code_verifier you’re using matches the one you generated earlier.Confirm that the redirect_...

0 kudos

04-02-2024 4:20:26 AM

by SenthilJ • New Contributor III

03-31-2024 11:30:39 PM

619 Views
1 replies
0 kudos

Resolved! Databricks Deep Clone

Hi,I am working on a DR design for Databricks in Azure. The recommendation from Databricks is to use Deep Clone to clone the Unity Catalog tables (within or across catalogs). My design is to ensure that DR is managed across different regions i.e. pri...

Data Engineering

Disaster Recovery

Unity Catalog

619 Views
1 replies
0 kudos

03-31-2024 11:30:39 PM

View Replies

Latest Reply

Kaniz
Community Manager

04-02-2024 3:54:18 AM

0 kudos

Hi @SenthilJ, The recommendation from Databricks to use Deep Clone for cloning Unity Catalog (UC) tables is indeed a prudent approach. Deep Clone facilitates the seamless replication of UC objects, including schemas, managed tables, access permission...

0 kudos

04-02-2024 3:54:18 AM

by Khaled75 • New Contributor

03-31-2024 7:57:24 AM

301 Views
1 replies
0 kudos

Connect databricks

I discovered recently mlflow managed by Databricks so I'm very new to this and I need some help.Can someone explain for me clearly the steps to do to be able to track my runs into the Databricks API.Here are the steps I followed :1/ Installing Databr...

Capture d’écran 2024-03-30 à 01.20.12.png

Capture d’écran 2024-03-31 à 16.54.55.png

Data Engineering

Data

tracking_ui

301 Views
1 replies
0 kudos

03-31-2024 7:57:24 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-02-2024 3:42:54 AM

0 kudos

Hi @Khaled75, The specific error message you provided is related to fetching the experiment by name. It’s essential to understand the exact error message. Can you share the complete error text?Confirm that your Databricks authentication is working c...

0 kudos

04-02-2024 3:42:54 AM

by Jiri_Koutny • New Contributor III

11-25-2021 4:47:28 AM

2742 Views
10 replies
3 kudos

Delay in files update on filesystem

Hi, I noticed that there is quite a significant delay (2 - 10s) between making a change to some file in Repos via Databricks file edit window and propagation of such change to the filesystem. Our engineers and scientists use YAML config files. If the...

Data Engineering

2742 Views
10 replies
3 kudos

11-25-2021 4:47:28 AM

View Replies

Latest Reply

DaniyarZ
New Contributor II

04-02-2024 3:17:35 AM

3 kudos

There is a trick: if you execute "%sh ls" command, it forces update of filesystem immediately

3 kudos

04-02-2024 3:17:35 AM

9 More Replies

by Debi-Moha • New Contributor II

03-31-2024 2:40:18 AM

347 Views
1 replies
1 kudos

Unable to write to S3 bucket from Databricks using boto3

I am unable to write data from Databricks into an S3 bucket. I have set up the permissions both on the bucket policy level, and the user level as well (Put, List, and others are added, have also tried with s3*). Bucket region and workspace region are...

Data Engineering

347 Views
1 replies
1 kudos

03-31-2024 2:40:18 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-02-2024 3:16:27 AM

1 kudos

Hi @Debi-Moha, Ensure that the IAM role associated with your Databricks cluster has the necessary permissions to access the S3 bucket. Specifically, it should have permissions for s3:PutObject and s3:ListBucket.Double-check that the IAM role is corr...

1 kudos

04-02-2024 3:16:27 AM

by Kibour • Contributor

03-28-2024 8:15:54 AM

2758 Views
2 replies
2 kudos

Resolved! Import from repo

Hi all,I am trying the new "git folder" feature, with a repo that works fine from the "Repos". In the new folder location, my imports from my own repo don't work anymore. Anyone faced something similar?Thanks in advance for sharing your experience

Data Engineering

2758 Views
2 replies
2 kudos

03-28-2024 8:15:54 AM

View Replies

Latest Reply

Kibour
Contributor

04-02-2024 1:14:24 AM

2 kudos

Hi @Kaniz ,Thanks for all the suggested options.I tried again with a brand new git folder. I just changed cluster from DBR 14.2 ML to 14.3 ML, and now the imports work as expected.Kind regards

2 kudos

04-02-2024 1:14:24 AM

1 More Replies

by cosminsanda • New Contributor III

03-25-2024 9:19:28 AM

1377 Views
9 replies
0 kudos

Adding a new column triggers reprocessing of Auto Loader source table

I have a source table A in Unity Catalog. This table is constantly written to and is a streaming table.I also have another table B in Unity Catalog. This is a managed table with liquid clustering.Using Auto Loader I move new data from A to B using a ...

Data Engineering

auto-loader

1377 Views
9 replies
0 kudos

03-25-2024 9:19:28 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

04-02-2024 12:54:35 AM

0 kudos

change data feed might be a solution for you perhaps.https://docs.databricks.com/en/delta/delta-change-data-feed.html

0 kudos

04-02-2024 12:54:35 AM

8 More Replies

by databrick_usert • New Contributor

04-01-2024 4:28:37 PM

441 Views
1 replies
0 kudos

Workspace client creation error

Hi,We are trying to use Python SDK and create a workspace client using the following code:%pip install databricks-sdk --upgrade dbutils.library.restartPython()from databricks.sdk import WorkspaceClientw = WorkspaceClient()Here is the notebook: https:...

Data Engineering

441 Views
1 replies
0 kudos

04-01-2024 4:28:37 PM

View Replies

Latest Reply

Ayushi_Suthar
Honored Contributor

04-01-2024 10:00:23 PM

0 kudos

Hi @databrick_usert , Hope you are doing well! Can you check the version of the SDK running in this notebook? If it's not an upgraded version then could you please try to upgrade the SDK version and then restart the python after the pip install? %p...

0 kudos

04-01-2024 10:00:23 PM

by MartinIsti • New Contributor III

04-01-2024 8:15:02 PM

387 Views
1 replies
0 kudos

Python UDF in Unity Catalog - spark.sql error

I'm trying to utilise the option to create UDFs in Unity Catalog. That would be a great way to have functions available in a fairly straightforward manner without e.g. putting the function definitions in an extra notebook that I %run to make them ava...

Data Engineering

function

udf

387 Views
1 replies
0 kudos

04-01-2024 8:15:02 PM

View Replies

Latest Reply

MartinIsti
New Contributor III

04-01-2024 8:22:32 PM

0 kudos

I can see someone has asked a very similar question with the same error message:https://community.databricks.com/t5/data-engineering/unable-to-use-sql-udf/td-p/61957The OP hasn't yet provided sufficient details about his/her function so no proper res...

0 kudos

04-01-2024 8:22:32 PM

by Kingston • New Contributor II

03-30-2024 9:41:28 PM

360 Views
3 replies
0 kudos

Unable to overwrite table to Azure sql db

Hi I have a requirement to read table from azure sql db and update the table in azure databricks with transformations and overwrite updated table to the azure sql db but due to lazy evaluation of pyspark im unable to overwrite the table in azure sql ...

Data Engineering

360 Views
3 replies
0 kudos

03-30-2024 9:41:28 PM

View Replies

Latest Reply

YuliyanBogdanov
New Contributor III

04-01-2024 4:00:45 AM

0 kudos

Hi @Kingston Make sure that you have the proper permissions on the SQL server for the user you do the authentication through JDBC with, i.e. database reader / database writer. Then your approach can go in two directions, push the data from Databrick...

0 kudos

04-01-2024 4:00:45 AM

2 More Replies

User

Count

1603

736

344

284

247

Databricks

Forum Posts

Getting SparkConnectGrpcException: (java.io.EOFException) error when using foreachBatch

Migration of Synapse Data bricks activity executions from All purpose cluster to New job cluster

Resolved! Use DLT from another pipeline

Concurrency issue with append only writed

Cannot create storage credential without Contributor role

OAuth U2M Manual token generation failing

Resolved! Databricks Deep Clone

Connect databricks

Delay in files update on filesystem

Unable to write to S3 bucket from Databricks using boto3

Resolved! Import from repo

Adding a new column triggers reprocessing of Auto Loader source table

Workspace client creation error

Python UDF in Unity Catalog - spark.sql error

Unable to overwrite table to Azure sql db

Load multiple delta tables at once from Sql server

Starting Serverless sql cluster on GCP

"Can't login to databricks socket is closed" when ...

Temporary views no longer working for Share Comput...

Does DLT use one single SparkSession?