Hi,I am realtively new on databricks, although I am conscious about lazy evaluation, transformations and actions and peristence.I have a piece of code that i want to run on a folder with around 20 json-files. goal is to create a temporary table on ea...
Please give me a kudos if this works.Efficiency in Data Collection: Using .collect() on large datasets can lead to out-of-memory errors as it collects all rows to the driver node. If the dataset is large, consider alternatives such as extracting only...
Hi, My scenario is I have an export of a table being dropped in ADLS every day. I would like to load this data into a UC table and then repeat the process every day, replacing the data. This seems to rule out DLT as it is meant for incremental proc...
I would use widgets in the notebook which will process in Jobs. SQL in Notebooks can use parameters, as would the SQL in the jobs with parameterized queries now supported.
So I have created a delta live table Which uses spark.sql() to execute a query And uses df.write.mode(append).insert intoTo insert data into the respective table And at the end i return a dumy table Since this was the requirement So now I have also ...
I am trying to use autoloader to load data from two different blobs from within the same account so that spark will discover the data asynchronously. However, when I try this, it doesn't work and I get the error outlined below. Can anyone point out w...
If were were to upgrade to ADLSg2, but retain the same structure, would there be scope for this method above to be improved (besides moving to notification mode)?
Hi,I want to run a md5 checksum on the uploaded file to databricks. I can generate md5 on the local file but how do I generate one on uploaded file on databricks using CLI (Command line interface). Any help would be appreciated.I tried running databr...
Hi @pshuk, Unfortunately, the databricks fs md5 command is not supported directly.
You can run a Python script to compute the MD5 hash of the uploaded file.If your uploaded file is stored in Azure Blob Storage, you can use the azcopy tool to calcula...
Hi All, On Unity Catalog - what is the best way to adding members to groups using API or CLI? API should be the best option, but thought to check with you all.
Hi @Amit_Dass_Chmp, In general, both API and CLI can be used to manage members and groups in the Unity Catalog. The choice between the two often depends on your specific use case and comfort level with each tool.
APIs are often preferred for their...
We have Databricks set up and running on Azure. Now we want to connect it with RDS (AWS) to transfer data from RDS to Azure DataLake using the Databricks.I could find the documentation on how to do it within the same cloud (Either AWS or Azure) but n...
Hi @Danial Malik​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...
Hi,I am using different json files of type json-stat2. These kind of json file is quite common used in national statistisc bureau. Its multi dimensional with multy arrays. Using python environment kan we use pyjstat package to easily transform json...
Since Lakehouse Fed uses only one credential per connection to the foreign database, all queries using the connection will see all the data the credentials has to access to. Would anyone know if Lakehouse Fed will support authorization using the cred...
Spark 3.4 introduced parameterized SQL queries and Databricks also discussed this new functionality in a recent blog post (https://www.databricks.com/blog/parameterized-queries-pyspark)Problem: I cannot run any of the examples provided in the PySpark...
@Cas Unfortunately I do not have any information on this. However, I have seen that DBR 14.3 and 15.0 introduced some changes to spark.sql(). I have not checked whether those changes resolve the issue outlined here. Your best bet is probably to go ah...
Posting this here too in case anyone else has run into this issue... Trying to set up Autoloader File Notifications but keep getting an "Internal Server Error" message.Failure on Write EventSubscription - Internal error - Microsoft Q&A
Hi @bradleyjamrozik,
Ensure that your service principal for Event Grid and your storage account have the necessary permissions.Specifically, grant the Contributor role to your service principal for Event Grid and your storage account
Hi team,Im using Databricks SDK for python to run SQL queries. I created a variable as below:param = [{'name' : 'a', 'value' :x'}, {'name' : 'b', 'value' : 'y'}]and passed it the statement as below_ = w.statement_execution.execute_statement( warehous...
@Kaniz This does not help resolve the issue. I am experiencing the same issue when following the above pointers. Here is the statement:response = w.statement_execution.execute_statement(
statement='ALTER TABLE users ALTER COLUMN :col_name SET NOT...
Hi,I'm running shallow clone for external delta tables. The shallow clone is failing for source tables where I don't have MODIFY permission. I'm getting below exception. I don't understand why MODIFY permission to source table is required. Is there a...
Also check this documentation on access mode :Shallow clone for Unity Catalog tables | Databricks on AWS Working with Unity Catalog shallow clones in Single User access mode, you must have permissions on the resources for the cloned table source as w...
Library installation failed for library due to user error for jar: \"dbfs:////<<PATH>>/jackson-annotations-2.16.1.jar\"\n Error messages:\nLibrary installation attempted on the driver node of cluster <<clusterId>> and failed. Please refer to the foll...
@pavansharma36 Thanks for the details.I had a look on runtime and platform release notes and I can't find nothing that could explain a change of behavior. I can only suppose that background changes happened but guessing is not fact.It's only an opini...