Data Engineering

Forum Posts

Sorted by:

by NTRT • Visitor

46m ago

8 Views
1 replies
0 kudos

performance issues when transformin json-stat2

Hi,I am realtively new on databricks, although I am conscious about lazy evaluation, transformations and actions and peristence.I have a piece of code that i want to run on a folder with around 20 json-files. goal is to create a temporary table on ea...

Data Engineering

8 Views
1 replies
0 kudos

46m ago

View Replies

Latest Reply

koushiknpvs
New Contributor III

32m ago

0 kudos

Please give me a kudos if this works.Efficiency in Data Collection: Using .collect() on large datasets can lead to out-of-memory errors as it collects all rows to the driver node. If the dataset is large, consider alternatives such as extracting only...

0 kudos

32m ago

by SamGreene • Contributor

Thursday

346 Views
3 replies
0 kudos

Using parameters in a SQL Notebook and COPY INTO statement

Hi, My scenario is I have an export of a table being dropped in ADLS every day. I would like to load this data into a UC table and then repeat the process every day, replacing the data. This seems to rule out DLT as it is meant for incremental proc...

Data Engineering

346 Views
3 replies
0 kudos

Thursday

View Replies

Latest Reply

Cary
New Contributor II

2 hours ago

0 kudos

I would use widgets in the notebook which will process in Jobs. SQL in Notebooks can use parameters, as would the SQL in the jobs with parameterized queries now supported.

0 kudos

2 hours ago

2 More Replies

by zero234 • New Contributor III

5 hours ago

23 Views
0 replies
0 kudos

Delta live table is inserting data multiple times

So I have created a delta live table Which uses spark.sql() to execute a query And uses df.write.mode(append).insert intoTo insert data into the respective table And at the end i return a dumy table Since this was the requirement So now I have also ...

Data Engineering

23 Views
0 replies
0 kudos

5 hours ago

by TimB • New Contributor III

Monday

173 Views
8 replies
3 kudos

Passing multiple paths to .load in autoloader

I am trying to use autoloader to load data from two different blobs from within the same account so that spark will discover the data asynchronously. However, when I try this, it doesn't work and I get the error outlined below. Can anyone point out w...

Data Engineering

173 Views
8 replies
3 kudos

Monday

View Replies

Latest Reply

TimB
New Contributor III

5 hours ago

3 kudos

If were were to upgrade to ADLSg2, but retain the same structure, would there be scope for this method above to be improved (besides moving to notification mode)?

3 kudos

5 hours ago

7 More Replies

by pshuk • New Contributor II

yesterday

121 Views
2 replies
0 kudos

run md5 using CLI

Hi,I want to run a md5 checksum on the uploaded file to databricks. I can generate md5 on the local file but how do I generate one on uploaded file on databricks using CLI (Command line interface). Any help would be appreciated.I tried running databr...

Data Engineering

121 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

11 hours ago

0 kudos

Hi @pshuk, Unfortunately, the databricks fs md5 command is not supported directly. You can run a Python script to compute the MD5 hash of the uploaded file.If your uploaded file is stored in Azure Blob Storage, you can use the azcopy tool to calcula...

0 kudos

11 hours ago

1 More Replies

by Amit_Dass_Chmp • New Contributor II

yesterday

79 Views
1 replies
0 kudos

On Unity Catalog - what is the best way to adding members to groups

Hi All, On Unity Catalog - what is the best way to adding members to groups using API or CLI? API should be the best option, but thought to check with you all.

Data Engineering

79 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

6 hours ago

0 kudos

Hi @Amit_Dass_Chmp, In general, both API and CLI can be used to manage members and groups in the Unity Catalog. The choice between the two often depends on your specific use case and comfort level with each tool. APIs are often preferred for their...

0 kudos

6 hours ago

by danial • New Contributor II

02-21-2023 4:30:57 AM

3213 Views
3 replies
1 kudos

Connect Databricks hosted on Azure, with RDS on AWS.

We have Databricks set up and running on Azure. Now we want to connect it with RDS (AWS) to transfer data from RDS to Azure DataLake using the Databricks.I could find the documentation on how to do it within the same cloud (Either AWS or Azure) but n...

Data Engineering

3213 Views
3 replies
1 kudos

02-21-2023 4:30:57 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-21-2023 11:45:16 PM

1 kudos

Hi @Danial Malik Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...

1 kudos

04-21-2023 11:45:16 PM

2 More Replies

by databricksdev • New Contributor II

8 hours ago

75 Views
0 replies
0 kudos

Capture Automatically Added tags

Can we capture automatically added tags (ex: RunName) from azure data bricks job cluster to parameters or custom tags in azure data factory

Data Engineering

75 Views
0 replies
0 kudos

8 hours ago

by NTRT • Visitor

9 hours ago

75 Views
0 replies
0 kudos

how to transform json-stat 2 filte to SparkDataFrame ? how to keep order on MapType structure ?

Hi,I am using different json files of type json-stat2. These kind of json file is quite common used in national statistisc bureau. Its multi dimensional with multy arrays. Using python environment kan we use pyjstat package to easily transform json...

Data Engineering

75 Views
0 replies
0 kudos

9 hours ago

by chardv • New Contributor

9 hours ago

11 Views
0 replies
0 kudos

Lakehouse Federation Multi-User Authorization

Since Lakehouse Fed uses only one credential per connection to the foreign database, all queries using the connection will see all the data the credentials has to access to. Would anyone know if Lakehouse Fed will support authorization using the cred...

Data Engineering

11 Views
0 replies
0 kudos

9 hours ago

by Michael_Appiah • New Contributor III

01-05-2024 5:10:04 AM

2444 Views
6 replies
2 kudos

Resolved! Parameterized spark.sql() not working

Spark 3.4 introduced parameterized SQL queries and Databricks also discussed this new functionality in a recent blog post (https://www.databricks.com/blog/parameterized-queries-pyspark)Problem: I cannot run any of the examples provided in the PySpark...

Data Engineering

2444 Views
6 replies
2 kudos

01-05-2024 5:10:04 AM

View Replies

Latest Reply

Michael_Appiah
New Contributor III

9 hours ago

2 kudos

@Cas Unfortunately I do not have any information on this. However, I have seen that DBR 14.3 and 15.0 introduced some changes to spark.sql(). I have not checked whether those changes resolve the issue outlined here. Your best bet is probably to go ah...

2 kudos

9 hours ago

5 More Replies

by bradleyjamrozik • New Contributor III

yesterday

106 Views
1 replies
0 kudos

Autoloader Failure Creating EventSubscription

Posting this here too in case anyone else has run into this issue... Trying to set up Autoloader File Notifications but keep getting an "Internal Server Error" message.Failure on Write EventSubscription - Internal error - Microsoft Q&A

Data Engineering

106 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

11 hours ago

0 kudos

Hi @bradleyjamrozik, Ensure that your service principal for Event Grid and your storage account have the necessary permissions.Specifically, grant the Contributor role to your service principal for Event Grid and your storage account

0 kudos

11 hours ago

by Phuonganh • New Contributor II

a month ago

213 Views
2 replies
1 kudos

Databricks SDK for Python: Errors with parameters for Statement Execution

Hi team,Im using Databricks SDK for python to run SQL queries. I created a variable as below:param = [{'name' : 'a', 'value' :x'}, {'name' : 'b', 'value' : 'y'}]and passed it the statement as below_ = w.statement_execution.execute_statement( warehous...

Data Engineering

213 Views
2 replies
1 kudos

a month ago

View Replies

Latest Reply

DonkeyKong
Visitor

yesterday

1 kudos

@Kaniz This does not help resolve the issue. I am experiencing the same issue when following the above pointers. Here is the statement:response = w.statement_execution.execute_statement( statement='ALTER TABLE users ALTER COLUMN :col_name SET NOT...

1 kudos

yesterday

1 More Replies

by cszczotka • New Contributor III

yesterday

84 Views
4 replies
0 kudos

Shallow clone and issue with MODIFY permission to source table

Hi,I'm running shallow clone for external delta tables. The shallow clone is failing for source tables where I don't have MODIFY permission. I'm getting below exception. I don't understand why MODIFY permission to source table is required. Is there a...

Data Engineering

84 Views
4 replies
0 kudos

yesterday

View Replies

Latest Reply

Amit_Dass_Chmp
New Contributor II

yesterday

0 kudos

Also check this documentation on access mode :Shallow clone for Unity Catalog tables | Databricks on AWS Working with Unity Catalog shallow clones in Single User access mode, you must have permissions on the resources for the cloned table source as w...

0 kudos

yesterday

3 More Replies

by pavansharma36 • New Contributor

Friday

190 Views
3 replies
0 kudos

Job fails on cluster with runtime version 14.3 with library installation failure error

Library installation failed for library due to user error for jar: \"dbfs:////<<PATH>>/jackson-annotations-2.16.1.jar\"\n Error messages:\nLibrary installation attempted on the driver node of cluster <<clusterId>> and failed. Please refer to the foll...

Data Engineering

190 Views
3 replies
0 kudos

Friday

View Replies

Latest Reply

Edouard_JH
New Contributor

yesterday

0 kudos

@pavansharma36 Thanks for the details.I had a look on runtime and platform release notes and I can't find nothing that could explain a change of behavior. I can only suppose that background changes happened but guessing is not fact.It's only an opini...

0 kudos

yesterday

2 More Replies

User

Count

1603

736

344

284

247

Databricks

Forum Posts

performance issues when transformin json-stat2

Using parameters in a SQL Notebook and COPY INTO statement

Delta live table is inserting data multiple times

Passing multiple paths to .load in autoloader

run md5 using CLI

On Unity Catalog - what is the best way to adding members to groups

Connect Databricks hosted on Azure, with RDS on AWS.

Capture Automatically Added tags

how to transform json-stat 2 filte to SparkDataFrame ? how to keep order on MapType structure ?

Lakehouse Federation Multi-User Authorization

Resolved! Parameterized spark.sql() not working

Autoloader Failure Creating EventSubscription

Databricks SDK for Python: Errors with parameters for Statement Execution

Shallow clone and issue with MODIFY permission to source table

Job fails on cluster with runtime version 14.3 with library installation failure error

Lowcode ETL in Databricks

Unity Catalog Metastore Details

DLT table not picked in python notebook

Load multiple delta tables at once from Sql server

Starting Serverless sql cluster on GCP