by
Bilal1
• New Contributor III
- 35312 Views
- 7 replies
- 2 kudos
When writing a dataframe in Pyspark to a CSV file, a folder is created and a partitioned CSV file is created. I have then rename this file in order to distribute it my end user.Is there any way I can simply write my data to a CSV file, with the name ...
- 35312 Views
- 7 replies
- 2 kudos
Latest Reply
I know this post is a little old, but Chat GPT actually put together a very clean and straightforward solution for me (in scala): // Set the temporary output directory and the desired final file pathval tempDir = "/tmp/your_file_name"val finalOutputP...
6 More Replies
- 2575 Views
- 5 replies
- 3 kudos
I am using Delta Live Tables and have my pipeline defined using the code below. My understanding is that a checkpoint is automatically set when using Delta Live Tables. I am using the Unity Catalog and Schema settings in the pipeline as the storage d...
- 2575 Views
- 5 replies
- 3 kudos
Latest Reply
Hi @ggsmith ,If you use Delta Live Tables then checkpoints are stored under the storage location specified in the DLT settings. Each table gets a dedicated directory under storage_location/checkpoints/<dlt_table_name.
4 More Replies
- 927 Views
- 3 replies
- 1 kudos
I tried to follow the instructions found here: Connect to Azure Data Lake Storage Gen2 and Blob Storage - Azure Databricks | Microsoft LearnE.g. this code:spark.conf.set("fs.azure.account.key.<storage-account>.dfs.core.windows.net",dbutils.secrets.ge...
- 927 Views
- 3 replies
- 1 kudos
Latest Reply
Can you point me to some documentation on how to do that?
2 More Replies
by
sticky
• New Contributor II
- 738 Views
- 2 replies
- 0 kudos
So, i have a R-notebook with different cells and a '15.4 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)' cluster.If i select 'run all' all cells will be run immediately and the run finishes quickly and fine. But if i would like to run the cells one...
- 738 Views
- 2 replies
- 0 kudos
Latest Reply
Today, I tried the glm function from the SparkR package. And it seemed to have initially solved the problem with the glm function. However, when you save the result of the glm function in a variable, things seem to go wrong. But only when the variabl...
1 More Replies
- 803 Views
- 2 replies
- 1 kudos
I am having existing delta-lake as target, and the small set of records at hand as CURRENT_BATCH,I have a requirement to update dateTimeUpdated column inside parent2, using following merge query.========MERGE INTO mydataset AS targetUSING CURRENT_BA...
- 803 Views
- 2 replies
- 1 kudos
Latest Reply
Hi @SagarJi ,According to the documentation updates to the nested columns are not supported:What you can do you can construct the whole struct and update the parent:MERGE INTO mydataset AS target
USING CURRENT_BATCH AS incoming
ON target.parent1.comp...
1 More Replies
by
Fz1
• New Contributor III
- 9994 Views
- 6 replies
- 3 kudos
I have DLT tables created under the hive_metastore with external data stored in ADL gen2.The ADL blob storage is mounted into /mnt/<storage-account>The tables are successfully created and accessible from my notebooks, as well the ADL storage.I have c...
- 9994 Views
- 6 replies
- 3 kudos
- 724 Views
- 1 replies
- 1 kudos
I have a specific use case where I call another notebook using the dbutils.notebook.run() function. The other notebook do some processing and return a string in the dbutils.notebook.exit() function to the caller notebook. The returned string has some...
- 724 Views
- 1 replies
- 1 kudos
Latest Reply
@jfpatenaude starbuckssecretmenu wrote:I have a specific use case where I call another notebook using the dbutils.notebook.run() function. The other notebook do some processing and return a string in the dbutils.notebook.exit() function to the caller...
- 23695 Views
- 2 replies
- 0 kudos
Hi All Does anyone have some code or example of how to export my Databricks SQL results directly to an existing spreadsheet?Many ThanksKody_Devl
- 23695 Views
- 2 replies
- 0 kudos
Latest Reply
There is an add-on directly import table to spreadsheet. https://workspace.google.com/marketplace/app/bricksheet/979793077657
1 More Replies
- 698 Views
- 3 replies
- 0 kudos
Hi,I have a delta table under UC, no partition, no liquid clustering. I tried OPTIMIZE foo;
-- OR
ALTER TABLE foo SET TBLPROPERTIES(delta.targetFileSize = '128mb');
OPTIMIZE foo; I expect to see the files can have some change after above, but the OP...
- 698 Views
- 3 replies
- 0 kudos
Latest Reply
Hi @Brad ,Databricks is a big data processing engine. Instead of testing 3 files try to test 3000 files OPTIMIZE isn't merging your small files because there may not be enough files or data for it to act upon.Regarding why DESC DETAILS shows 3 files...
2 More Replies
- 1277 Views
- 2 replies
- 1 kudos
Hello, For some reason I am not able to see the external location that we have in our workspace. I am 100% sure that we have a lot that exist but some reason I am not able to see them, is there a reason why , am I missing something. I know other user...
- 1277 Views
- 2 replies
- 1 kudos
Latest Reply
Hi @kjoudeh ,It is due to permissions.For external locations you would need to have BROWSE permissions:https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/manage-privileges/privilegesAsk the metastore admin or a workspace...
1 More Replies
- 1349 Views
- 5 replies
- 3 kudos
I am trying to read in a JSON file with this in SQL Editor & it fails with None.get CREATE TEMPORARY VIEW multilineJson
USING json
OPTIONS (path="/Volumes/my_catalog/my_schema/jsondir/test.json", multiline=true); None.get is all the error it has.Th...
- 1349 Views
- 5 replies
- 3 kudos
Latest Reply
@filipniziol - Yes, I was on Serverless SQL Warehouse. It works with "CERATE TABLE .. " thx! I am surprised that the warehouse type is impacting this feature.But I got the SQL from databricks documentation -https://docs.databricks.com/en/query/format...
4 More Replies
- 5902 Views
- 1 replies
- 1 kudos
I want to confirm if this understanding is correct ???To calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster with the given configuration, we need to consider the number of executors that can run on each node a...
- 5902 Views
- 1 replies
- 1 kudos
Latest Reply
Hi @Retired_mod , this is really fantastic guidance, will something similar be added to the Databricks docs?
- 829 Views
- 3 replies
- 0 kudos
Hi,We are trying to insert into a table using a CTE (WITH clause query).In the insert we are using the INDENTIFIER function as the catalog name is retrieved dynamically.This is causing the insert to fail with an error - The table or view `cte_query` ...
- 829 Views
- 3 replies
- 0 kudos
- 2999 Views
- 3 replies
- 1 kudos
The usual way to use Apache Sedona inside pySpark is by first registering Sedona types and functions withSedonaRegistrator.registerAll(spark)We need to have these autoregistered when the cluster start (to be able, for example, to perform geospatial q...
- 2999 Views
- 3 replies
- 1 kudos
Latest Reply
Hi @Giovanni Allegri Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...
2 More Replies
by
ae20cg
• New Contributor III
- 17493 Views
- 17 replies
- 12 kudos
I want to run a block of code in a script and not in a notebook on databricks, however I cannot properly instantiate the spark context without some error.I have tried ` SparkContext.getOrCreate()`, but this does not work.Is there a simple way to do t...
- 17493 Views
- 17 replies
- 12 kudos
Latest Reply
Is there some solution for this.We got struck where a cluster having unity catalog is not able to get spark context.This is not allowing to use distributed nature of spark in databricks.
16 More Replies