cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 
Databricks Learning Festival (Virtual): 10 October - 31 October

 Join the Databricks Learning Festival (Virtual)!   Mark your calendars from 10 October - 31 October 2024!  Upskill today across data engineering, data analysis, machine learning, and generative AI. Join the thousands who have elevated their career w...

  • 17225 Views
  • 60 replies
  • 25 kudos
2 weeks ago
Databricks Community Champion - September 2024 - Szymon Dybczak

Meet Szymon Dybczak, a valued member of our community! Szymon is a Senior Data Engineer at Nordcloud. He brings a wealth of knowledge and expertise to the group, and we're thrilled to have him here. We presented him with a range of questions, and be...

  • 255 Views
  • 5 replies
  • 5 kudos
Tuesday
Intelligent Data Engineering: Beyond the AI Hype

Don't Miss Out on "Making AI-Powered Data Engineering Practical"! Join us for this exciting virtual event where we’ll cut through the hype and explore how AI-powered data intelligence is transforming data engineering.  AMER: Nov 4 / 10 AM PT EMEA: ...

  • 1367 Views
  • 0 replies
  • 0 kudos
a week ago
GenAI: The Shift to Data Intelligence

Shifting to customized GenAI that deeply understands your data AMER  October 8 / 10 AM PTEMEA  October 9 / 9 AM BST / 10 AM CESTAPJ  October 10 / 12 PM SGT Click here to check out the agenda and speakers and register now! Why are 9 out of 10 organiz...

  • 454 Views
  • 0 replies
  • 1 kudos
2 weeks ago
Big Book of Data Engineering — 3rd Edition

Get practical guidance, notebooks, code snippets You know this better than anyone: the best GenAI models in the world will not succeed without good data. That’s why data engineers are even more critical today. The challenge is staying ahead of the ra...

  • 976 Views
  • 0 replies
  • 2 kudos
2 weeks ago

Community Activity

sheilaL
by > New Contributor II
  • 2771 Views
  • 3 replies
  • 0 kudos

File size upload limit through CLI

Does anyone know the size limit for uploading files through the CLI? I'm not finding it in the documentation.

  • 2771 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @sheilaL, For Databricks, the size limit for uploading files through the Command Line Interface (CLI) is 2GB. If you use local file I/O APIs to read or write files larger than 2GB, you might see corrupted files. Instead, for files larger than 2GB,...

  • 0 kudos
2 More Replies
turtleXturtle
by > New Contributor II
  • 1 Views
  • 0 replies
  • 0 kudos

Refreshing DELTA external table

I'm having trouble with the REFRESH TABLE command - does it work with DELTA external tables?  I'm doing the following steps:Create table: CREATE TABLE IF NOT EXISTS `catalog`.`default`.`table_name` ( KEY DOUBLE, CUSTKEY DOUBLE, STATUS STRING, PRICE D...

  • 1 Views
  • 0 replies
  • 0 kudos
lprevost
by > Contributor
  • 3 Views
  • 0 replies
  • 0 kudos

sampleBy stream in DLT

I would like to create a sampleBy (stratified version of sample) copy/clone of my delta table.   Ideally, I'd like to do this using a DLT.     My source table grows incrementally each month as batch files are added and autoloader picks them up.    Id...

  • 3 Views
  • 0 replies
  • 0 kudos
UdayRPai
by > New Contributor II
  • 112 Views
  • 3 replies
  • 0 kudos

Issue with Combination of INSERT + CTE (with clause) + Dynamic query (IDENTIFIER function)

Hi,We are trying to insert into a table using a CTE (WITH clause query).In the insert we are using the INDENTIFIER function as the catalog name is retrieved dynamically.This is causing the insert to fail with an error - The table or view `cte_query` ...

  • 112 Views
  • 3 replies
  • 0 kudos
Latest Reply
UdayRPai
New Contributor II
  • 0 kudos

Please mark this as resolved.

  • 0 kudos
2 More Replies
anonymous_567
by > New Contributor II
  • 21 Views
  • 0 replies
  • 0 kudos

Retrieve file size from azure in databricks

Hello, I am running a job that requires reading in files of different sizes, each one representing a different dataset, and loading them into a delta table. Some files are as big as 100Gib and others as small as 500 MiB. I want to repartition each fi...

  • 21 Views
  • 0 replies
  • 0 kudos
giohappy
by > New Contributor III
  • 1904 Views
  • 3 replies
  • 1 kudos

Resolved! SedonaSqlExtensions is not autoregistering types and functions

The usual way to use Apache Sedona inside pySpark is by first registering Sedona types and functions withSedonaRegistrator.registerAll(spark)We need to have these autoregistered when the cluster start (to be able, for example, to perform geospatial q...

  • 1904 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Giovanni Allegri​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 1 kudos
2 More Replies
ae20cg
by > New Contributor III
  • 12141 Views
  • 18 replies
  • 12 kudos

How to instantiate Databricks spark context in a python script?

I want to run a block of code in a script and not in a notebook on databricks, however I cannot properly instantiate the spark context without some error.I have tried ` SparkContext.getOrCreate()`, but this does not work.Is there a simple way to do t...

  • 12141 Views
  • 18 replies
  • 12 kudos
Latest Reply
ayush007
New Contributor II
  • 12 kudos

Is there some solution for this.We got struck where a cluster having unity catalog is not able to get spark context.This is not allowing to use distributed nature of spark in databricks.

  • 12 kudos
17 More Replies
jhrcek
by > New Contributor II
  • 3609 Views
  • 10 replies
  • 1 kudos

Misleading UNBOUND_SQL_PARAMETER even though parameter specified

Hello. Please forgive me if this is not the right place to ask, but I'm having issues with databricks' statement execution api. I'm developing Haskell client for this api.I managed to implement most of it, but I'm running into issues with using named...

  • 3609 Views
  • 10 replies
  • 1 kudos
Latest Reply
eriodega
New Contributor III
  • 1 kudos

FYI, I just tested it again yesterday, and with the new runtime version 15.4 LTS, the bug is fixed.

  • 1 kudos
9 More Replies
roberta_cereda
by > Visitor
  • 5 Views
  • 0 replies
  • 0 kudos

Describe history operationMetrics['materializeSourceTimeMs']

Hi, during some checks on MERGE execution , I was running the describe history command and in the operationMetrics column I noticed this information :  operationMetrics['materializeSourceTimeMs'] .I haven't found that metric in the documentation so I...

  • 5 Views
  • 0 replies
  • 0 kudos
dadrake3
by > New Contributor II
  • 61 Views
  • 2 replies
  • 0 kudos

Delta Live Tables INSUFFICIENT_PERMISSIONS

I have a delta live table pipeline which reads from a delta table then applies 3 layers of transformations before merging the legs of the pipeline and outputting. I am getting this error when I run my pipeline against unity catalog.```org.apache.spar...

Screenshot 2024-10-02 at 1.25.28 PM.png
  • 61 Views
  • 2 replies
  • 0 kudos
Latest Reply
dadrake3
New Contributor II
  • 0 kudos

I don't see how that can be the underlying issue because. 1. the first step of the pipeline which reads from unity catalog and azure sql is just fine.2. when I remove the enrichment logic from the second step of the and just pass the table input as i...

  • 0 kudos
1 More Replies
flamezi2
by > Visitor
  • 4 Views
  • 0 replies
  • 0 kudos

Invalid request when using the Manual generation of an account-level access token

I need to generate access token using REST API and was using the guide seen here:manually-generate-an-account-level-access-tokenWhen i try this cURL in postman, i get an error but the error description is not helpfulError: I don't know what I'm missi...

flamezi2_1-1727934079195.png flamezi2_0-1727934045043.png
  • 4 Views
  • 0 replies
  • 0 kudos
Mohamednazeer
by > New Contributor III
  • 44 Views
  • 0 replies
  • 0 kudos

Serving endpoint with external model (azure openai) is throwing "Public network access is disable "

Databricks serving endpoint is not working as expected throwing an exception as "Public network access is disable, Create private endpoints".We have create the azure openAI resource with public network access disabled, but also we have created the pr...

  • 44 Views
  • 0 replies
  • 0 kudos
alonisser
by > Contributor
  • 5093 Views
  • 7 replies
  • 3 kudos

Resolved! Changing shuffle.partitions with spark.conf in a spark stream - isn't respected even after a checkpoint

Question about spark checkpoints and offsets in a running streamWhen the stream started I needed tons of partitions, so we've set it with spark.conf to 5000As expected offsets in the checkpoint contain this info and the job used this value. Then we'...

  • 5093 Views
  • 7 replies
  • 3 kudos
Latest Reply
Leszek
Contributor
  • 3 kudos

@Jose Gonzalez​ thanks for that information! This is super useful. I was struggling why my streaming still using 200 partitions. This is quite a paint for me because changing checkpoint will insert all data from the source. Do you know where this can...

  • 3 kudos
6 More Replies
chethankumar
by > Visitor
  • 24 Views
  • 1 replies
  • 0 kudos

how to assign account level groups to workspace using, Terraform

in the workspace console when I create groups it creates a source as an account, Basically, it is a account level group, But    provider "databricks" { host = var.databricks_host # client_id = "" # client_secret = " account_id = ...

  • 24 Views
  • 1 replies
  • 0 kudos
Latest Reply
jennie258fitz
New Contributor II
  • 0 kudos

@chethankumar wrote:in the workspace console when I create groups it creates a source as an account, Basically, it is a account level group, But     provider "databricks" { host = var.databricks_host # client_id = "" # client_secre...

  • 0 kudos
Dave1967
by > New Contributor III
  • 32 Views
  • 2 replies
  • 2 kudos

Resolved! Serverless Compute - How to determine if being used programatically

Hi,  We use a common notebook for all our "common" settings, this notebook is called in the first cell of each notebook we develop.  This issue we are now having is that we need 2 common notebooks, one for a normal shared compute and one for serverle...

  • 32 Views
  • 2 replies
  • 2 kudos
Latest Reply
filipniziol
New Contributor III
  • 2 kudos

Hi @Dave1967 ,If you know any spark config command that is not supported in serverless, then build your logic around this command using try, catch:def is_config_supported(): try: spark.sparkContext.getConf() return True except...

  • 2 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Top Kudoed Authors

Latest from our Blog

SAM2 on Databricks

In this post we’ll walk through getting started with Meta’s latest Segment-Anything-Model 2 (SAM2) on Databricks. We’ll cover experimentation with SAM2 in a Databricks Notebook, expand on the default ...

220Views 1kudos