cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

omsurapu
by New Contributor II
  • 1837 Views
  • 2 replies
  • 0 kudos

One workspace can connects to the multiple AWS accounts/regions

HI,I'd like to know if one workspace can be used to connect to the multiple accounts (account A and account B) / regions. I knew that multiple accounts/regions can't be selected during the setup. is it possible?

  • 1837 Views
  • 2 replies
  • 0 kudos
Latest Reply
omsurapu
New Contributor II
  • 0 kudos

ok, thanks! there is no DB official documentation available for this requirement. I assume it can be done with the cross account IAM roles, but never tested. any leads?

  • 0 kudos
1 More Replies
SagarJi
by New Contributor II
  • 2637 Views
  • 2 replies
  • 1 kudos

SQL merge to update one of the nested column

 I am having existing delta-lake as target, and the small set of records at hand as CURRENT_BATCH,I have a requirement to update dateTimeUpdated column inside parent2, using following merge query.========MERGE INTO mydataset AS targetUSING CURRENT_BA...

  • 2637 Views
  • 2 replies
  • 1 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 1 kudos

Hi @SagarJi ,According to the documentation updates to the nested columns are not supported:What you can do you can construct the whole struct and update the parent:MERGE INTO mydataset AS target USING CURRENT_BATCH AS incoming ON target.parent1.comp...

  • 1 kudos
1 More Replies
Fz1
by New Contributor III
  • 11247 Views
  • 6 replies
  • 3 kudos

Resolved! SQL Warehouse Serverless - Not able to access the external tables in the hive_metastore

I have DLT tables created under the hive_metastore with external data stored in ADL gen2.The ADL blob storage is mounted into /mnt/<storage-account>The tables are successfully created and accessible from my notebooks, as well the ADL storage.I have c...

  • 11247 Views
  • 6 replies
  • 3 kudos
Latest Reply
TjommeV-Vlaio
New Contributor III
  • 3 kudos

Can this be done using Terraform as well?

  • 3 kudos
5 More Replies
jfpatenaude
by New Contributor
  • 906 Views
  • 1 replies
  • 1 kudos

MalformedInputException when using extended ascii characters in dbutils.notebook.exit()

I have a specific use case where I call another notebook using the dbutils.notebook.run() function. The other notebook do some processing and return a string in the dbutils.notebook.exit() function to the caller notebook. The returned string has some...

  • 906 Views
  • 1 replies
  • 1 kudos
Latest Reply
jennie258fitz
New Contributor III
  • 1 kudos

@jfpatenaude starbuckssecretmenu wrote:I have a specific use case where I call another notebook using the dbutils.notebook.run() function. The other notebook do some processing and return a string in the dbutils.notebook.exit() function to the caller...

  • 1 kudos
Kody_Devl
by New Contributor II
  • 28717 Views
  • 2 replies
  • 0 kudos

Export to Excel xlsx

Hi All Does anyone have some code or example of how to export my Databricks SQL results directly to an existing spreadsheet?Many ThanksKody_Devl

  • 28717 Views
  • 2 replies
  • 0 kudos
Latest Reply
Emit
New Contributor II
  • 0 kudos

There is an add-on directly import table to spreadsheet. https://workspace.google.com/marketplace/app/bricksheet/979793077657 

  • 0 kudos
1 More Replies
Brad
by Contributor II
  • 947 Views
  • 3 replies
  • 0 kudos

How to control file size by OPTIMIZE

Hi,I have a delta table under UC, no partition, no liquid clustering. I tried  OPTIMIZE foo; -- OR ALTER TABLE foo SET TBLPROPERTIES(delta.targetFileSize = '128mb'); OPTIMIZE foo; I expect to see the files can have some change after above, but the OP...

  • 947 Views
  • 3 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @Brad ,Databricks is a big data processing engine. Instead of testing 3 files try to test 3000 files  OPTIMIZE isn't merging your small files because there may not be enough files or data for it to act upon.Regarding why DESC DETAILS shows 3 files...

  • 0 kudos
2 More Replies
kjoudeh
by New Contributor II
  • 1770 Views
  • 2 replies
  • 1 kudos

External Location not showing up

Hello, For some reason I am not able to see the external location that we have in our workspace. I am 100% sure that we have a lot that exist but some reason I am not able to see them, is there a reason why , am I missing something. I know other user...

  • 1770 Views
  • 2 replies
  • 1 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 1 kudos

Hi @kjoudeh ,It is due to permissions.For external locations you would need to have BROWSE permissions:https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/manage-privileges/privilegesAsk the metastore admin or a workspace...

  • 1 kudos
1 More Replies
sathyafmt
by New Contributor III
  • 1802 Views
  • 5 replies
  • 3 kudos

Resolved! Cannot read JSON from /Volumes

I am trying to read in a JSON file with this in SQL Editor & it fails with None.get   CREATE TEMPORARY VIEW multilineJson USING json OPTIONS (path="/Volumes/my_catalog/my_schema/jsondir/test.json", multiline=true); None.get is all the error it has.Th...

  • 1802 Views
  • 5 replies
  • 3 kudos
Latest Reply
sathyafmt
New Contributor III
  • 3 kudos

@filipniziol - Yes, I was on Serverless SQL Warehouse. It works with "CERATE TABLE .. " thx! I am surprised that the warehouse type is impacting this feature.But I got the SQL from databricks documentation -https://docs.databricks.com/en/query/format...

  • 3 kudos
4 More Replies
manish1987c
by New Contributor III
  • 6575 Views
  • 1 replies
  • 1 kudos

calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster

I want to confirm if this understanding is correct ???To calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster with the given configuration, we need to consider the number of executors that can run on each node a...

  • 6575 Views
  • 1 replies
  • 1 kudos
Latest Reply
dylanberry
New Contributor II
  • 1 kudos

Hi @Retired_mod , this is really fantastic guidance, will something similar be added to the Databricks docs?

  • 1 kudos
UdayRPai
by New Contributor II
  • 1077 Views
  • 3 replies
  • 0 kudos

Issue with Combination of INSERT + CTE (with clause) + Dynamic query (IDENTIFIER function)

Hi,We are trying to insert into a table using a CTE (WITH clause query).In the insert we are using the INDENTIFIER function as the catalog name is retrieved dynamically.This is causing the insert to fail with an error - The table or view `cte_query` ...

  • 1077 Views
  • 3 replies
  • 0 kudos
Latest Reply
UdayRPai
New Contributor II
  • 0 kudos

Please mark this as resolved.

  • 0 kudos
2 More Replies
giohappy
by New Contributor III
  • 3545 Views
  • 3 replies
  • 1 kudos

Resolved! SedonaSqlExtensions is not autoregistering types and functions

The usual way to use Apache Sedona inside pySpark is by first registering Sedona types and functions withSedonaRegistrator.registerAll(spark)We need to have these autoregistered when the cluster start (to be able, for example, to perform geospatial q...

  • 3545 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Giovanni Allegri​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 1 kudos
2 More Replies
ae20cg
by New Contributor III
  • 19798 Views
  • 17 replies
  • 12 kudos

How to instantiate Databricks spark context in a python script?

I want to run a block of code in a script and not in a notebook on databricks, however I cannot properly instantiate the spark context without some error.I have tried ` SparkContext.getOrCreate()`, but this does not work.Is there a simple way to do t...

  • 19798 Views
  • 17 replies
  • 12 kudos
Latest Reply
ayush007
New Contributor II
  • 12 kudos

Is there some solution for this.We got struck where a cluster having unity catalog is not able to get spark context.This is not allowing to use distributed nature of spark in databricks.

  • 12 kudos
16 More Replies
dadrake3
by New Contributor II
  • 2201 Views
  • 2 replies
  • 0 kudos

Delta Live Tables INSUFFICIENT_PERMISSIONS

I have a delta live table pipeline which reads from a delta table then applies 3 layers of transformations before merging the legs of the pipeline and outputting. I am getting this error when I run my pipeline against unity catalog.```org.apache.spar...

Screenshot 2024-10-02 at 1.25.28 PM.png
  • 2201 Views
  • 2 replies
  • 0 kudos
Latest Reply
dadrake3
New Contributor II
  • 0 kudos

I don't see how that can be the underlying issue because. 1. the first step of the pipeline which reads from unity catalog and azure sql is just fine.2. when I remove the enrichment logic from the second step of the and just pass the table input as i...

  • 0 kudos
1 More Replies
alonisser
by Contributor II
  • 8429 Views
  • 7 replies
  • 3 kudos

Resolved! Changing shuffle.partitions with spark.conf in a spark stream - isn't respected even after a checkpoint

Question about spark checkpoints and offsets in a running streamWhen the stream started I needed tons of partitions, so we've set it with spark.conf to 5000As expected offsets in the checkpoint contain this info and the job used this value. Then we'...

  • 8429 Views
  • 7 replies
  • 3 kudos
Latest Reply
Leszek
Contributor
  • 3 kudos

@Jose Gonzalez​ thanks for that information! This is super useful. I was struggling why my streaming still using 200 partitions. This is quite a paint for me because changing checkpoint will insert all data from the source. Do you know where this can...

  • 3 kudos
6 More Replies
Dave1967
by New Contributor III
  • 1624 Views
  • 2 replies
  • 2 kudos

Resolved! Serverless Compute - How to determine if being used programatically

Hi,  We use a common notebook for all our "common" settings, this notebook is called in the first cell of each notebook we develop.  This issue we are now having is that we need 2 common notebooks, one for a normal shared compute and one for serverle...

  • 1624 Views
  • 2 replies
  • 2 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 2 kudos

Hi @Dave1967 ,If you know any spark config command that is not supported in serverless, then build your logic around this command using try, catch:def is_config_supported(): try: spark.sparkContext.getConf() return True except...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels