cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

bfridley
by New Contributor II
  • 2253 Views
  • 2 replies
  • 0 kudos

DLT Pipeline Out Of Memory Errors

I have a DLT pipeline that has been running for weeks. Now, trying to rerun the pipeline with the same code and same data fails. I've even tried updating the compute on the cluster to about 3x of what was previously working and it still fails with ou...

bfridley_1-1695328329708.png bfridley_2-1695328372419.png
  • 2253 Views
  • 2 replies
  • 0 kudos
Latest Reply
rajib_bahar_ptg
New Contributor III
  • 0 kudos

I'd focus on understanding the codebase first. It'll help you decide what logic or data asset to keep or not keep when you try to optimize it. If you share the architecture of the application, the problem it solves, and some sample code here, it'll h...

  • 0 kudos
1 More Replies
gkrilis
by New Contributor
  • 4379 Views
  • 1 replies
  • 0 kudos

How to stop SparkSession within notebook without errr

I want to run an ETL job and when the job ends I would like to stop SparkSession to free my cluster's resources, by doing this I could avoid restarting the cluster, but when calling spark.stop() the job returns with status failed even though it has f...

Data Engineering
cluster
SparkSession
  • 4379 Views
  • 1 replies
  • 0 kudos
Latest Reply
PremadasV
New Contributor II
  • 0 kudos

Please refer to this Job fails, but Apache Spark tasks finish - Databricks

  • 0 kudos
Martin1
by New Contributor II
  • 8021 Views
  • 3 replies
  • 1 kudos

Referring to Azure Keyvault secrets in spark config

Hi allIn spark config for a cluster, it works well to refer to a Azure Keyvault secret in the "value" part of the name/value combo on a config row/setting.For example, this works fine (I've removed the string that is our specific storage account name...

  • 8021 Views
  • 3 replies
  • 1 kudos
Latest Reply
kp12
New Contributor II
  • 1 kudos

Hello,Is there any update on this issue please? Databricks no longer recommend mounting external location, so the other way to access Azure storage is to use spark config as mentioned in this document - https://learn.microsoft.com/en-us/azure/databri...

  • 1 kudos
2 More Replies
marvin1
by New Contributor III
  • 209 Views
  • 0 replies
  • 0 kudos

Bamboolib error

What is the status of bamboolib?  I understand that it is public preview but I'm unable to find any support references.  I am getting error below.  I've tried installing in a notebook, on a cluster, creating a pandas dataframe and running bam, etc.  ...

  • 209 Views
  • 0 replies
  • 0 kudos
mbvb_py
by New Contributor II
  • 2535 Views
  • 4 replies
  • 0 kudos

Create cluster error: Backend service unavailable

hello,i'm new to Databricks (community edition account) and encountered a problem just now.When creating a new cluster (default 10.4 LTS) it fails with the following error: Backend service unavailable.I've tried a different runtime > same issue.I've ...

  • 2535 Views
  • 4 replies
  • 0 kudos
Latest Reply
stefnhuy
New Contributor III
  • 0 kudos

Hey mbvb_py,I'm sorry to hear you're facing this "Backend service unavailable" issue with Databricks. I've encountered similar problems in the past, and it can be frustrating. Don't worry; you're not alone in this!From my experience, this error can o...

  • 0 kudos
3 More Replies
DBEnthusiast
by New Contributor III
  • 1609 Views
  • 2 replies
  • 0 kudos

How does Job Cluster knows how many resources to assign to an Application ?

Hi All Enthusiasts !As per my understanding when a user submits an application in spark cluster it specifies how much memory, executors etc. it would need . But in Data bricks notebooks we never specify that anywhere. If we have submitted the noteboo...

  • 1609 Views
  • 2 replies
  • 0 kudos
Latest Reply
BilalAslamDbrx
Esteemed Contributor III
  • 0 kudos

@DBEnthusiast great question! Today, with Job Clusters, you have to specify this. As @btafur note, you do this by setting CPU, memory etc. We are in early preview of Serverless Job Clusters where you no longer specify this configuration, instead Data...

  • 0 kudos
1 More Replies
smurug
by New Contributor II
  • 4107 Views
  • 4 replies
  • 1 kudos

Databricks Job scheduling - continuous mode

While scheduling the Databricks job using continuous mode - what will happen if the job is configured to run with Job cluster.At the end of each run will the cluster be terminated and re-created again for the next run? The official documentation is n...

  • 4107 Views
  • 4 replies
  • 1 kudos
Latest Reply
Jo5h
New Contributor II
  • 1 kudos

Hello @youssefmrini So how is the DBU calculated? As the cluster is reused, the DBU should be calculated per hour on all the jobs run in an hour correct? Or will it be calculated based on each run?I would like to know the cost calculation when runnin...

  • 1 kudos
3 More Replies
Mado
by Valued Contributor II
  • 25937 Views
  • 4 replies
  • 3 kudos

Resolved! How to set a variable and use it in a SQL query

I want to define a variable and use it in a query, like below: %sql   SET database_name = "marketing"; SHOW TABLES in '${database_name}';However, I get the following error:ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near ''''(line 1, pos...

  • 25937 Views
  • 4 replies
  • 3 kudos
Latest Reply
CJS
New Contributor II
  • 3 kudos

Another option is demonstrated by this example:%sql SET database_name.var = marketing; SHOW TABLES in ${database_name.var}; SET database_name.dummy= marketing; SHOW TABLES in ${database_name.dummy};do not use quotesuse format that is variableName...

  • 3 kudos
3 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 897 Views
  • 1 replies
  • 1 kudos

Streaming Data Modeling Normalization with Databricks Delta Live Tables

Streamline Data Modeling Normalization with Databricks Delta Live Tables in Just a Few Steps:- Use the "Apply changes" function to populate tables with slowly changing dimensions using auto-increment IDs.- Register SQL mapping functions to associate ...

scd1.png scd2.png scd3.png
  • 897 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Thank you for sharing this @Hubert-Dudek !!!

  • 1 kudos
BAZA
by New Contributor II
  • 6336 Views
  • 9 replies
  • 0 kudos

Invisible empty spaces when reading .csv files

When importing a .csv file with leading and/or trailing empty spaces around the separators, the output results in strings that appear to be trimmed on the output table or when using .display() but are not actually trimmed.It is possible to identify t...

  • 6336 Views
  • 9 replies
  • 0 kudos
Latest Reply
Raluka
New Contributor III
  • 0 kudos

I discovered an in-depth article that went beyond the physical aspects of aging and testosterone. It examined the emotional https://misterolympia.shop/buy/injectable-steroids/testosterone/testosterone-cypionate/ and psychological aspects of growing o...

  • 0 kudos
8 More Replies
Nico1
by New Contributor II
  • 8745 Views
  • 11 replies
  • 2 kudos

Resolved! Problems connecting Simba ODBC with a M1 Macbook Pro

Hi,There's a way to make work the Simba ODBC Driver for M1 Macbook Pros?I find myself able to run on an old intel version of Macbook easily, but now every time I even test the connection with the iODBC Manager fails.Definitely, the issue is around no...

CleanShot 2022-05-15 at 22.50.36@2x
  • 8745 Views
  • 11 replies
  • 2 kudos
Latest Reply
kunalmishra9
New Contributor III
  • 2 kudos

Things seem to be mostly working for me now. I've added a bit more detail on my connection steps and process in case it's helpful for anyone on Stack Overflow: https://stackoverflow.com/questions/76407426/connecting-rstudio-desktop-to-databricks-comm...

  • 2 kudos
10 More Replies
zak_k
by New Contributor III
  • 3435 Views
  • 5 replies
  • 1 kudos

com.databricks.spark.safespark.UDFException: UNAVAILABLE: Channel shutdownNow invoked

Trying to determine a root cause of UDFException that occurs when returning a variable length ArrayType. If I hardcode the data returned from the UDF to a fixed length, say 19, the error does not occur. Setup codesplit_runs_UDF = udf(split_runs_udf, ...

  • 3435 Views
  • 5 replies
  • 1 kudos
Latest Reply
zak_k
New Contributor III
  • 1 kudos

After further investigation, It reproduces slightly differently on single user mode.Single user mode: runs foreverShared: gives the above messageI've determined that there was a corner case in the dataset which lead to UDF never returning. I am am as...

  • 1 kudos
4 More Replies
miiaramo
by New Contributor II
  • 1855 Views
  • 2 replies
  • 1 kudos

DLT current channel uses same runtime as the preview channel

Hi,According to the latest release notes, the current channel of DLT should be using Databricks runtime 11.3 and the preview channel should be using 12.2. The current channel was using correct runtime version 11.3 still yesterday morning, but since ...

  • 1855 Views
  • 2 replies
  • 1 kudos
Latest Reply
adriennn
Contributor
  • 1 kudos

I'm seeing the same issue with 12 current / 13 preview. Updating the channel didn't bump the runtime version and even creating a pipeline with the preview channel uses the current version.

  • 1 kudos
1 More Replies
Databricks143
by New Contributor III
  • 2046 Views
  • 4 replies
  • 0 kudos

Correlated column is not allowed in non predicate in UDF SQL

Hi Team,I am new to databricks and currently working on creating sql udf 's  in databricks .In udf we are calculating min date and that date column using in where clause also.While running udf getting  Correlated column is not allowed in  non predica...

  • 2046 Views
  • 4 replies
  • 0 kudos
Latest Reply
Noopur_Nigam
Esteemed Contributor III
  • 0 kudos

Could you please provide your full code? I would also like to know which DBR version you are using in your cluster.

  • 0 kudos
3 More Replies
thomann
by New Contributor III
  • 5669 Views
  • 5 replies
  • 6 kudos

Bug? Unity Catalog incompatible with Sparklyr in RStudio (on Driver) and as well if used on one cluster from multiple notebooks?

If I start a RStudio Server with in cluster init script as described here in a Unity Catalog Cluster the sparklyr connection fails with an error about a missing Credential Scope.=LI tried it both in 11.3LTS and 12.0 Beta. I tried it only in a Persona...

image
  • 5669 Views
  • 5 replies
  • 6 kudos
Latest Reply
kunalmishra9
New Contributor III
  • 6 kudos

Have run into this issue as well. Let me know if there was any resolution 

  • 6 kudos
4 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels
Top Kudoed Authors