cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Kody_Devl
by New Contributor II
  • 30763 Views
  • 2 replies
  • 0 kudos

Export to Excel xlsx

Hi All Does anyone have some code or example of how to export my Databricks SQL results directly to an existing spreadsheet?Many ThanksKody_Devl

  • 30763 Views
  • 2 replies
  • 0 kudos
Latest Reply
Emit
New Contributor II
  • 0 kudos

There is an add-on directly import table to spreadsheet. https://workspace.google.com/marketplace/app/bricksheet/979793077657 

  • 0 kudos
1 More Replies
Brad
by Contributor II
  • 1462 Views
  • 3 replies
  • 0 kudos

How to control file size by OPTIMIZE

Hi,I have a delta table under UC, no partition, no liquid clustering. I tried  OPTIMIZE foo; -- OR ALTER TABLE foo SET TBLPROPERTIES(delta.targetFileSize = '128mb'); OPTIMIZE foo; I expect to see the files can have some change after above, but the OP...

  • 1462 Views
  • 3 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @Brad ,Databricks is a big data processing engine. Instead of testing 3 files try to test 3000 files  OPTIMIZE isn't merging your small files because there may not be enough files or data for it to act upon.Regarding why DESC DETAILS shows 3 files...

  • 0 kudos
2 More Replies
kjoudeh
by New Contributor II
  • 2570 Views
  • 2 replies
  • 1 kudos

External Location not showing up

Hello, For some reason I am not able to see the external location that we have in our workspace. I am 100% sure that we have a lot that exist but some reason I am not able to see them, is there a reason why , am I missing something. I know other user...

  • 2570 Views
  • 2 replies
  • 1 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 1 kudos

Hi @kjoudeh ,It is due to permissions.For external locations you would need to have BROWSE permissions:https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/manage-privileges/privilegesAsk the metastore admin or a workspace...

  • 1 kudos
1 More Replies
sathyafmt
by New Contributor III
  • 2964 Views
  • 5 replies
  • 3 kudos

Resolved! Cannot read JSON from /Volumes

I am trying to read in a JSON file with this in SQL Editor & it fails with None.get   CREATE TEMPORARY VIEW multilineJson USING json OPTIONS (path="/Volumes/my_catalog/my_schema/jsondir/test.json", multiline=true); None.get is all the error it has.Th...

  • 2964 Views
  • 5 replies
  • 3 kudos
Latest Reply
sathyafmt
New Contributor III
  • 3 kudos

@filipniziol - Yes, I was on Serverless SQL Warehouse. It works with "CERATE TABLE .. " thx! I am surprised that the warehouse type is impacting this feature.But I got the SQL from databricks documentation -https://docs.databricks.com/en/query/format...

  • 3 kudos
4 More Replies
manish1987c
by New Contributor III
  • 7682 Views
  • 1 replies
  • 1 kudos

calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster

I want to confirm if this understanding is correct ???To calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster with the given configuration, we need to consider the number of executors that can run on each node a...

  • 7682 Views
  • 1 replies
  • 1 kudos
Latest Reply
dylanberry
New Contributor II
  • 1 kudos

Hi @Retired_mod , this is really fantastic guidance, will something similar be added to the Databricks docs?

  • 1 kudos
UdayRPai
by New Contributor II
  • 1696 Views
  • 3 replies
  • 0 kudos

Issue with Combination of INSERT + CTE (with clause) + Dynamic query (IDENTIFIER function)

Hi,We are trying to insert into a table using a CTE (WITH clause query).In the insert we are using the INDENTIFIER function as the catalog name is retrieved dynamically.This is causing the insert to fail with an error - The table or view `cte_query` ...

  • 1696 Views
  • 3 replies
  • 0 kudos
Latest Reply
UdayRPai
New Contributor II
  • 0 kudos

Please mark this as resolved.

  • 0 kudos
2 More Replies
giohappy
by New Contributor III
  • 4553 Views
  • 3 replies
  • 1 kudos

Resolved! SedonaSqlExtensions is not autoregistering types and functions

The usual way to use Apache Sedona inside pySpark is by first registering Sedona types and functions withSedonaRegistrator.registerAll(spark)We need to have these autoregistered when the cluster start (to be able, for example, to perform geospatial q...

  • 4553 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Giovanni Allegri​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 1 kudos
2 More Replies
ae20cg
by New Contributor III
  • 23472 Views
  • 17 replies
  • 12 kudos

How to instantiate Databricks spark context in a python script?

I want to run a block of code in a script and not in a notebook on databricks, however I cannot properly instantiate the spark context without some error.I have tried ` SparkContext.getOrCreate()`, but this does not work.Is there a simple way to do t...

  • 23472 Views
  • 17 replies
  • 12 kudos
Latest Reply
ayush007
New Contributor II
  • 12 kudos

Is there some solution for this.We got struck where a cluster having unity catalog is not able to get spark context.This is not allowing to use distributed nature of spark in databricks.

  • 12 kudos
16 More Replies
dadrake3
by New Contributor II
  • 3337 Views
  • 2 replies
  • 0 kudos

Delta Live Tables INSUFFICIENT_PERMISSIONS

I have a delta live table pipeline which reads from a delta table then applies 3 layers of transformations before merging the legs of the pipeline and outputting. I am getting this error when I run my pipeline against unity catalog.```org.apache.spar...

Screenshot 2024-10-02 at 1.25.28 PM.png
  • 3337 Views
  • 2 replies
  • 0 kudos
Latest Reply
dadrake3
New Contributor II
  • 0 kudos

I don't see how that can be the underlying issue because. 1. the first step of the pipeline which reads from unity catalog and azure sql is just fine.2. when I remove the enrichment logic from the second step of the and just pass the table input as i...

  • 0 kudos
1 More Replies
alonisser
by Contributor II
  • 10278 Views
  • 7 replies
  • 3 kudos

Resolved! Changing shuffle.partitions with spark.conf in a spark stream - isn't respected even after a checkpoint

Question about spark checkpoints and offsets in a running streamWhen the stream started I needed tons of partitions, so we've set it with spark.conf to 5000As expected offsets in the checkpoint contain this info and the job used this value. Then we'...

  • 10278 Views
  • 7 replies
  • 3 kudos
Latest Reply
Leszek
Contributor
  • 3 kudos

@Jose Gonzalez​ thanks for that information! This is super useful. I was struggling why my streaming still using 200 partitions. This is quite a paint for me because changing checkpoint will insert all data from the source. Do you know where this can...

  • 3 kudos
6 More Replies
Dave1967
by New Contributor III
  • 2626 Views
  • 2 replies
  • 2 kudos

Resolved! Serverless Compute - How to determine if being used programatically

Hi,  We use a common notebook for all our "common" settings, this notebook is called in the first cell of each notebook we develop.  This issue we are now having is that we need 2 common notebooks, one for a normal shared compute and one for serverle...

  • 2626 Views
  • 2 replies
  • 2 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 2 kudos

Hi @Dave1967 ,If you know any spark config command that is not supported in serverless, then build your logic around this command using try, catch:def is_config_supported(): try: spark.sparkContext.getConf() return True except...

  • 2 kudos
1 More Replies
jv_v
by Contributor
  • 1228 Views
  • 1 replies
  • 1 kudos

Assistance Required: Issues Creating External Table in Legacy Hive Metastore

I am currently trying to create an external table with an external location in the legacy Hive metastore. As part of this process, I have also created the necessary secret scope using below steps, but I am still encountering issues when attempting to...

  • 1228 Views
  • 1 replies
  • 1 kudos
Latest Reply
gchandra
Databricks Employee
  • 1 kudos

Did you mount the Location using the secrets?  https://learn.microsoft.com/en-us/azure/databricks/dbfs/mounts#mount-adls-gen2-or-blob-storage-with-abfs  

  • 1 kudos
cszczotka
by New Contributor III
  • 1962 Views
  • 3 replies
  • 0 kudos

Shared access mode and py4j.security.Py4JSecurityException

Hi,We are getting below exception on shared access mode cluster.py4j.security.Py4JSecurityException: Method public java.lang.String com.databricks.backend.common.rpc.CommandContext.toJson() is not whitelisted on class class com.databricks.backend.com...

  • 1962 Views
  • 3 replies
  • 0 kudos
Latest Reply
cszczotka
New Contributor III
  • 0 kudos

@gchandra  I'm aware that I can use .safeToJson() but did you see the code of  py4j.security.WhitelistingPy4JSecurityManager ? How it works what can be whitelist what can't be whitelist ? I don't see py4j.security.WhitelistingPy4JSecurityManager on o...

  • 0 kudos
2 More Replies
ccs
by New Contributor II
  • 6070 Views
  • 6 replies
  • 2 kudos

Resolved! What would happen it my dynamic IP changed in IP Access list?

On this feature IP access lists IP access lists - Azure Databricks | Microsoft Docs, what we observe is that if your IP is not on the access list, you cannot modify the list via API since you are not on trusted location. What if I specify only 1 IP s...

  • 6070 Views
  • 6 replies
  • 2 kudos
Latest Reply
Ralph_RevoData
New Contributor II
  • 2 kudos

Curious to learn if somebody also figured out a way to solve for the above as we've encountered this situation and are now locked out...

  • 2 kudos
5 More Replies
greyamber
by New Contributor II
  • 32594 Views
  • 4 replies
  • 0 kudos

Select job cluster vs all purpose cluster

I have workflow and need to run at every 1 minute interval, it is rest api call, should I go for all purpose cluster or job cluster to meet the SLA. We need to get the as soon as it is available. 

  • 32594 Views
  • 4 replies
  • 0 kudos
Latest Reply
kulkpd
Contributor
  • 0 kudos

@greyamber Interactive cluster costs two time more than job cluster. can you explain use-case of why job API needs to invoked and what API is doing. 

  • 0 kudos
3 More Replies
Labels