cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

sticky
by New Contributor II
  • 394 Views
  • 2 replies
  • 0 kudos

Running a cell with R-script keeps waiting status

So, i have a R-notebook with different cells and a '15.4 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)' cluster.If i select 'run all' all cells will be run immediately and the run finishes quickly and fine. But if i would like to run the cells one...

  • 394 Views
  • 2 replies
  • 0 kudos
Latest Reply
sticky
New Contributor II
  • 0 kudos

Today, I tried the glm function from the SparkR package. And it seemed to have initially solved the problem with the glm function. However, when you save the result of the glm function in a variable, things seem to go wrong. But only when the variabl...

  • 0 kudos
1 More Replies
omsurapu
by New Contributor II
  • 223 Views
  • 2 replies
  • 0 kudos

One workspace can connects to the multiple AWS accounts/regions

HI,I'd like to know if one workspace can be used to connect to the multiple accounts (account A and account B) / regions. I knew that multiple accounts/regions can't be selected during the setup. is it possible?

  • 223 Views
  • 2 replies
  • 0 kudos
Latest Reply
omsurapu
New Contributor II
  • 0 kudos

ok, thanks! there is no DB official documentation available for this requirement. I assume it can be done with the cross account IAM roles, but never tested. any leads?

  • 0 kudos
1 More Replies
pthaenraj
by New Contributor III
  • 5466 Views
  • 13 replies
  • 8 kudos

Resolved! Databricks Certified Professional Data Scientist Exam Question Types

Hello,I am not seeing a lot of information regarding the Databricks Certified Professional Data Scientistexam. I took the Associate Developer in Apache Spark Exam last year and the materials for the exam seemed much more focused than what I found for...

  • 5466 Views
  • 13 replies
  • 8 kudos
Latest Reply
ivanabaquero
New Contributor II
  • 8 kudos

Hello!I understand your concerns, and having recently cleared the Databricks Certified Professional Data Scientist exam, I can share some insight's.From my experience, the exam primarily focuses on machine learning and data science theory. The questi...

  • 8 kudos
12 More Replies
SagarJi
by New Contributor II
  • 385 Views
  • 2 replies
  • 1 kudos

SQL merge to update one of the nested column

 I am having existing delta-lake as target, and the small set of records at hand as CURRENT_BATCH,I have a requirement to update dateTimeUpdated column inside parent2, using following merge query.========MERGE INTO mydataset AS targetUSING CURRENT_BA...

  • 385 Views
  • 2 replies
  • 1 kudos
Latest Reply
filipniziol
Contributor
  • 1 kudos

Hi @SagarJi ,According to the documentation updates to the nested columns are not supported:What you can do you can construct the whole struct and update the parent:MERGE INTO mydataset AS target USING CURRENT_BATCH AS incoming ON target.parent1.comp...

  • 1 kudos
1 More Replies
Fz1
by New Contributor III
  • 8145 Views
  • 6 replies
  • 3 kudos

Resolved! SQL Warehouse Serverless - Not able to access the external tables in the hive_metastore

I have DLT tables created under the hive_metastore with external data stored in ADL gen2.The ADL blob storage is mounted into /mnt/<storage-account>The tables are successfully created and accessible from my notebooks, as well the ADL storage.I have c...

  • 8145 Views
  • 6 replies
  • 3 kudos
Latest Reply
TjommeV-Vlaio
New Contributor III
  • 3 kudos

Can this be done using Terraform as well?

  • 3 kudos
5 More Replies
jfpatenaude
by New Contributor
  • 354 Views
  • 1 replies
  • 1 kudos

MalformedInputException when using extended ascii characters in dbutils.notebook.exit()

I have a specific use case where I call another notebook using the dbutils.notebook.run() function. The other notebook do some processing and return a string in the dbutils.notebook.exit() function to the caller notebook. The returned string has some...

  • 354 Views
  • 1 replies
  • 1 kudos
Latest Reply
jennie258fitz
New Contributor III
  • 1 kudos

@jfpatenaude starbuckssecretmenu wrote:I have a specific use case where I call another notebook using the dbutils.notebook.run() function. The other notebook do some processing and return a string in the dbutils.notebook.exit() function to the caller...

  • 1 kudos
Kody_Devl
by New Contributor II
  • 20076 Views
  • 2 replies
  • 0 kudos

Export to Excel xlsx

Hi All Does anyone have some code or example of how to export my Databricks SQL results directly to an existing spreadsheet?Many ThanksKody_Devl

  • 20076 Views
  • 2 replies
  • 0 kudos
Latest Reply
Emit
New Contributor II
  • 0 kudos

There is an add-on directly import table to spreadsheet. https://workspace.google.com/marketplace/app/bricksheet/979793077657 

  • 0 kudos
1 More Replies
Brad
by Contributor II
  • 341 Views
  • 3 replies
  • 0 kudos

How to control file size by OPTIMIZE

Hi,I have a delta table under UC, no partition, no liquid clustering. I tried  OPTIMIZE foo; -- OR ALTER TABLE foo SET TBLPROPERTIES(delta.targetFileSize = '128mb'); OPTIMIZE foo; I expect to see the files can have some change after above, but the OP...

  • 341 Views
  • 3 replies
  • 0 kudos
Latest Reply
filipniziol
Contributor
  • 0 kudos

Hi @Brad ,Databricks is a big data processing engine. Instead of testing 3 files try to test 3000 files  OPTIMIZE isn't merging your small files because there may not be enough files or data for it to act upon.Regarding why DESC DETAILS shows 3 files...

  • 0 kudos
2 More Replies
kjoudeh
by New Contributor II
  • 345 Views
  • 2 replies
  • 1 kudos

External Location not showing up

Hello, For some reason I am not able to see the external location that we have in our workspace. I am 100% sure that we have a lot that exist but some reason I am not able to see them, is there a reason why , am I missing something. I know other user...

  • 345 Views
  • 2 replies
  • 1 kudos
Latest Reply
filipniziol
Contributor
  • 1 kudos

Hi @kjoudeh ,It is due to permissions.For external locations you would need to have BROWSE permissions:https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/manage-privileges/privilegesAsk the metastore admin or a workspace...

  • 1 kudos
1 More Replies
sathyafmt
by New Contributor III
  • 558 Views
  • 5 replies
  • 3 kudos

Resolved! Cannot read JSON from /Volumes

I am trying to read in a JSON file with this in SQL Editor & it fails with None.get   CREATE TEMPORARY VIEW multilineJson USING json OPTIONS (path="/Volumes/my_catalog/my_schema/jsondir/test.json", multiline=true); None.get is all the error it has.Th...

  • 558 Views
  • 5 replies
  • 3 kudos
Latest Reply
sathyafmt
New Contributor III
  • 3 kudos

@filipniziol - Yes, I was on Serverless SQL Warehouse. It works with "CERATE TABLE .. " thx! I am surprised that the warehouse type is impacting this feature.But I got the SQL from databricks documentation -https://docs.databricks.com/en/query/format...

  • 3 kudos
4 More Replies
manish1987c
by New Contributor III
  • 3759 Views
  • 1 replies
  • 1 kudos

calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster

I want to confirm if this understanding is correct ???To calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster with the given configuration, we need to consider the number of executors that can run on each node a...

  • 3759 Views
  • 1 replies
  • 1 kudos
Latest Reply
dylanberry
New Contributor II
  • 1 kudos

Hi @Retired_mod , this is really fantastic guidance, will something similar be added to the Databricks docs?

  • 1 kudos
lprevost
by Contributor
  • 134 Views
  • 0 replies
  • 0 kudos

sampleBy stream in DLT

I would like to create a sampleBy (stratified version of sample) copy/clone of my delta table.   Ideally, I'd like to do this using a DLT.     My source table grows incrementally each month as batch files are added and autoloader picks them up.    Id...

  • 134 Views
  • 0 replies
  • 0 kudos
UdayRPai
by New Contributor II
  • 367 Views
  • 3 replies
  • 0 kudos

Issue with Combination of INSERT + CTE (with clause) + Dynamic query (IDENTIFIER function)

Hi,We are trying to insert into a table using a CTE (WITH clause query).In the insert we are using the INDENTIFIER function as the catalog name is retrieved dynamically.This is causing the insert to fail with an error - The table or view `cte_query` ...

  • 367 Views
  • 3 replies
  • 0 kudos
Latest Reply
UdayRPai
New Contributor II
  • 0 kudos

Please mark this as resolved.

  • 0 kudos
2 More Replies
giohappy
by New Contributor III
  • 2223 Views
  • 3 replies
  • 1 kudos

Resolved! SedonaSqlExtensions is not autoregistering types and functions

The usual way to use Apache Sedona inside pySpark is by first registering Sedona types and functions withSedonaRegistrator.registerAll(spark)We need to have these autoregistered when the cluster start (to be able, for example, to perform geospatial q...

  • 2223 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Giovanni Allegri​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 1 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels