Data Engineering

Forum Posts

Sorted by:

by DBEnthusiast • New Contributor III

09-27-2023 12:47:43 AM

945 Views
2 replies
0 kudos

How does Job Cluster knows how many resources to assign to an Application ?

Hi All Enthusiasts !As per my understanding when a user submits an application in spark cluster it specifies how much memory, executors etc. it would need . But in Data bricks notebooks we never specify that anywhere. If we have submitted the noteboo...

Data Engineering

945 Views
2 replies
0 kudos

09-27-2023 12:47:43 AM

View Replies

Latest Reply

BilalAslamDbrx
Honored Contributor II

09-29-2023 2:04:53 AM

0 kudos

@DBEnthusiast great question! Today, with Job Clusters, you have to specify this. As @btafur note, you do this by setting CPU, memory etc. We are in early preview of Serverless Job Clusters where you no longer specify this configuration, instead Data...

0 kudos

09-29-2023 2:04:53 AM

1 More Replies

by smurug • New Contributor II

08-01-2023 11:30:03 AM

2074 Views
4 replies
1 kudos

Databricks Job scheduling - continuous mode

While scheduling the Databricks job using continuous mode - what will happen if the job is configured to run with Job cluster.At the end of each run will the cluster be terminated and re-created again for the next run? The official documentation is n...

Data Engineering

2074 Views
4 replies
1 kudos

08-01-2023 11:30:03 AM

View Replies

Latest Reply

Jo5h
New Contributor II

09-29-2023 1:49:19 AM

1 kudos

Hello @youssefmrini So how is the DBU calculated? As the cluster is reused, the DBU should be calculated per hour on all the jobs run in an hour correct? Or will it be calculated based on each run?I would like to know the cost calculation when runnin...

1 kudos

09-29-2023 1:49:19 AM

3 More Replies

by Mado • Valued Contributor II

11-15-2022 3:07:22 AM

14408 Views
4 replies
3 kudos

Resolved! How to set a variable and use it in a SQL query

I want to define a variable and use it in a query, like below: %sql SET database_name = "marketing"; SHOW TABLES in '${database_name}';However, I get the following error:ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near ''''(line 1, pos...

Data Engineering

14408 Views
4 replies
3 kudos

11-15-2022 3:07:22 AM

View Replies

Latest Reply

CJS
New Contributor II

09-28-2023 1:43:43 PM

3 kudos

Another option is demonstrated by this example:%sql SET database_name.var = marketing; SHOW TABLES in ${database_name.var}; SET database_name.dummy= marketing; SHOW TABLES in ${database_name.dummy};do not use quotesuse format that is variableName...

3 kudos

09-28-2023 1:43:43 PM

3 More Replies

by Hubert-Dudek • Esteemed Contributor III

09-26-2023 1:07:46 PM

604 Views
1 replies
1 kudos

Streaming Data Modeling Normalization with Databricks Delta Live Tables

Streamline Data Modeling Normalization with Databricks Delta Live Tables in Just a Few Steps:- Use the "Apply changes" function to populate tables with slowly changing dimensions using auto-increment IDs.- Register SQL mapping functions to associate ...

Data Engineering

604 Views
1 replies
1 kudos

09-26-2023 1:07:46 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

09-28-2023 9:41:42 AM

1 kudos

Thank you for sharing this @Hubert-Dudek !!!

1 kudos

09-28-2023 9:41:42 AM

by BAZA • New Contributor II

06-28-2023 3:37:00 AM

3787 Views
9 replies
0 kudos

Invisible empty spaces when reading .csv files

When importing a .csv file with leading and/or trailing empty spaces around the separators, the output results in strings that appear to be trimmed on the output table or when using .display() but are not actually trimmed.It is possible to identify t...

Data Engineering

3787 Views
9 replies
0 kudos

06-28-2023 3:37:00 AM

View Replies

Latest Reply

Raluka
New Contributor III

09-27-2023 4:31:48 PM

0 kudos

I discovered an in-depth article that went beyond the physical aspects of aging and testosterone. It examined the emotional https://misterolympia.shop/buy/injectable-steroids/testosterone/testosterone-cypionate/ and psychological aspects of growing o...

0 kudos

09-27-2023 4:31:48 PM

8 More Replies

by Nico1 • New Contributor II

05-15-2022 2:58:10 PM

5812 Views
11 replies
2 kudos

Resolved! Problems connecting Simba ODBC with a M1 Macbook Pro

Hi,There's a way to make work the Simba ODBC Driver for M1 Macbook Pros?I find myself able to run on an old intel version of Macbook easily, but now every time I even test the connection with the iODBC Manager fails.Definitely, the issue is around no...

Data Engineering

5812 Views
11 replies
2 kudos

05-15-2022 2:58:10 PM

View Replies

Latest Reply

kunalmishra9
New Contributor III

09-27-2023 2:24:53 PM

2 kudos

Things seem to be mostly working for me now. I've added a bit more detail on my connection steps and process in case it's helpful for anyone on Stack Overflow: https://stackoverflow.com/questions/76407426/connecting-rstudio-desktop-to-databricks-comm...

2 kudos

09-27-2023 2:24:53 PM

10 More Replies

by zak_k • New Contributor III

09-25-2023 2:08:43 PM

2336 Views
5 replies
1 kudos

com.databricks.spark.safespark.UDFException: UNAVAILABLE: Channel shutdownNow invoked

Trying to determine a root cause of UDFException that occurs when returning a variable length ArrayType. If I hardcode the data returned from the UDF to a fixed length, say 19, the error does not occur. Setup codesplit_runs_UDF = udf(split_runs_udf, ...

Data Engineering

2336 Views
5 replies
1 kudos

09-25-2023 2:08:43 PM

View Replies

Latest Reply

zak_k
New Contributor III

09-27-2023 6:03:07 AM

1 kudos

After further investigation, It reproduces slightly differently on single user mode.Single user mode: runs foreverShared: gives the above messageI've determined that there was a corner case in the dataset which lead to UDF never returning. I am am as...

1 kudos

09-27-2023 6:03:07 AM

4 More Replies

by miiaramo • New Contributor II

05-19-2023 2:39:39 AM

1144 Views
2 replies
1 kudos

DLT current channel uses same runtime as the preview channel

Hi,According to the latest release notes, the current channel of DLT should be using Databricks runtime 11.3 and the preview channel should be using 12.2. The current channel was using correct runtime version 11.3 still yesterday morning, but since ...

Data Engineering

1144 Views
2 replies
1 kudos

05-19-2023 2:39:39 AM

View Replies

Latest Reply

adriennn
Contributor

09-27-2023 3:05:30 AM

1 kudos

I'm seeing the same issue with 12 current / 13 preview. Updating the channel didn't bump the runtime version and even creating a pipeline with the preview channel uses the current version.

1 kudos

09-27-2023 3:05:30 AM

1 More Replies

by Databricks143 • New Contributor III

09-25-2023 10:23:37 AM

1141 Views
4 replies
0 kudos

Correlated column is not allowed in non predicate in UDF SQL

Hi Team,I am new to databricks and currently working on creating sql udf 's in databricks .In udf we are calculating min date and that date column using in where clause also.While running udf getting Correlated column is not allowed in non predica...

Data Engineering

1141 Views
4 replies
0 kudos

09-25-2023 10:23:37 AM

View Replies

Latest Reply

Noopur_Nigam
Valued Contributor II

09-26-2023 3:58:50 AM

0 kudos

Could you please provide your full code? I would also like to know which DBR version you are using in your cluster.

0 kudos

09-26-2023 3:58:50 AM

3 More Replies

by thomann • New Contributor III

12-13-2022 8:48:53 AM

4077 Views
5 replies
6 kudos

Bug? Unity Catalog incompatible with Sparklyr in RStudio (on Driver) and as well if used on one cluster from multiple notebooks?

If I start a RStudio Server with in cluster init script as described here in a Unity Catalog Cluster the sparklyr connection fails with an error about a missing Credential Scope.=LI tried it both in 11.3LTS and 12.0 Beta. I tried it only in a Persona...

Data Engineering

4077 Views
5 replies
6 kudos

12-13-2022 8:48:53 AM

View Replies

Latest Reply

kunalmishra9
New Contributor III

09-26-2023 4:42:07 PM

6 kudos

Have run into this issue as well. Let me know if there was any resolution

6 kudos

09-26-2023 4:42:07 PM

4 More Replies

by soumyaPattnaik • New Contributor III

11-01-2022 7:34:26 AM

2003 Views
4 replies
6 kudos

How can I customize the Notebook Job # while using dbutils.notebook.run method?

When running multiple notebooks parallelly using dbutils.notebook.run from a parent notebook, an url to that running notebook is printed, like belowNotebook job #211371132480519Is there a way I can print the notebook name or some customized string in...

Data Engineering

2003 Views
4 replies
6 kudos

11-01-2022 7:34:26 AM

View Replies

Latest Reply

soumyaPattnaik
New Contributor III

06-29-2023 1:17:46 AM

6 kudos

Hi @Debayan Thank you for your reply.However, the answer I am looking for is : how to print/get a more meaningful name of the jobs when running multiple notebooks parallelly using dbutils.notebook.run from a parent notebook.Now in the parent notebook...

6 kudos

06-29-2023 1:17:46 AM

3 More Replies

by Leo_138525 • New Contributor II

09-15-2022 12:38:20 AM

1729 Views
4 replies
1 kudos

Resolved! RDD not picking up spark configuration for azure storage account access

I want to open some CSV files as an RDD, do some processing and then load it as a DataFrame. Since the files are stored in an Azure blob storage account I need to configure the access accordingly, which for some reason does not work when using an RDD...

Data Engineering

1729 Views
4 replies
1 kudos

09-15-2022 12:38:20 AM

View Replies

Latest Reply

Leo_138525
New Contributor II

09-28-2022 12:27:23 AM

1 kudos

I decided to load the files into a DataFrame with a single column and then do the processing before splitting it into separate columns and this works just fine.@Hyper Guy thanks for the link, I didn't try that but it seems like it would resolve the ...

1 kudos

09-28-2022 12:27:23 AM

3 More Replies

by JohnJustus • New Contributor III

09-21-2023 11:48:56 AM

2299 Views
4 replies
2 kudos

TypeError : withcolumn() takes 3 positional arguments but 4 were given.

Hi All,Can some one please help me with the error.This is my small python code.binmaster = binmasterdf.withColumnRenamed("tag_number","BinKey")\.withColumn ("Category", when (length("Bin")==4,'SMALL LOT'),(length("Bin")==7,'RACKING'))TypeError : with...

Data Engineering

2299 Views
4 replies
2 kudos

09-21-2023 11:48:56 AM

View Replies

Latest Reply

Noopur_Nigam
Valued Contributor II

09-26-2023 4:22:17 AM

2 kudos

Hi @JohnJustus If you see closely in .withColumn ("Category", when (length("Bin")==4,'SMALL LOT'), when (length("Bin")==7,'RACKING'), otherwise('FLOOR')), withcolumn would take 2 parameters. The first parameter as a string and the second as the colum...

2 kudos

09-26-2023 4:22:17 AM

3 More Replies

by Erik • Valued Contributor II

09-17-2023 11:45:46 PM

2846 Views
4 replies
4 kudos

Liquid clustering with structured streaming pyspark

I would like to try out liquid clustering, but all the examples I see seem to be SQL tables created from selecting from other tables. Our gold tables are pyspark tables written directly to a table, e.g. like this: silver_df.writeStream.partitionBy(["...

Data Engineering

2846 Views
4 replies
4 kudos

09-17-2023 11:45:46 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-26-2023 1:27:50 AM

4 kudos

I did not find anything in the docs either. I suppose a pyspark version will come in the future?

4 kudos

09-26-2023 1:27:50 AM

3 More Replies

by Yoshe1101 • New Contributor III

11-28-2022 8:24:08 AM

1927 Views
2 replies
1 kudos

Resolved! Cluster terminated. Reason: Npip Tunnel Setup Failure

Hi, I have recently deployed a new Workspace in AWS and getting the following error when trying to start the cluster:"NPIP tunnel setup failure during launch. Please try again later and contact Databricks if the problem persists. Instance bootstrap f...

Data Engineering

1927 Views
2 replies
1 kudos

11-28-2022 8:24:08 AM

View Replies

Latest Reply

Yoshe1101
New Contributor III

01-17-2023 8:41:40 AM

1 kudos

Finally, this error was fixed by changing the DHCP configuration of the VPC.

1 kudos

01-17-2023 8:41:40 AM

1 More Replies

User

Count

1603

736

344

284

247

Databricks

Forum Posts

How does Job Cluster knows how many resources to assign to an Application ?

Databricks Job scheduling - continuous mode

Resolved! How to set a variable and use it in a SQL query

Streaming Data Modeling Normalization with Databricks Delta Live Tables

Invisible empty spaces when reading .csv files

Resolved! Problems connecting Simba ODBC with a M1 Macbook Pro

com.databricks.spark.safespark.UDFException: UNAVAILABLE: Channel shutdownNow invoked

DLT current channel uses same runtime as the preview channel

Correlated column is not allowed in non predicate in UDF SQL

Bug? Unity Catalog incompatible with Sparklyr in RStudio (on Driver) and as well if used on one cluster from multiple notebooks?

How can I customize the Notebook Job # while using dbutils.notebook.run method?

Resolved! RDD not picking up spark configuration for azure storage account access

TypeError : withcolumn() takes 3 positional arguments but 4 were given.

Liquid clustering with structured streaming pyspark

Resolved! Cluster terminated. Reason: Npip Tunnel Setup Failure

Load multiple delta tables at once from Sql server

Starting Serverless sql cluster on GCP

"Can't login to databricks socket is closed" when ...

Temporary views no longer working for Share Comput...

Does DLT use one single SparkSession?