Data Engineering

Forum Posts

Sorted by:

by g96g • New Contributor III

12-01-2022 1:14:45 AM

2956 Views
8 replies
0 kudos

Resolved! ADF pipeline fails when passing the parameter to databricks

I have project where I have to read the data from NETSUITE using API. Databricks Notebook runs perfectly when I manually insert the table names I want to read from the source. I have dataset (csv) file in adf with all the table names that I need to r...

Data Engineering

2956 Views
8 replies
0 kudos

12-01-2022 1:14:45 AM

View Replies

Latest Reply

mcwir
Contributor

12-01-2022 3:20:52 AM

0 kudos

Have you tried do debug the json payload of adf trigger ? maybe it wrongly conveys tables names

0 kudos

12-01-2022 3:20:52 AM

7 More Replies

by Ramabadran • New Contributor II

07-27-2021 6:49:34 PM

9020 Views
6 replies
5 kudos

java.lang.NoClassDefFoundError: scala/Product$class

Hi I am getting "java.lang.NoClassDefFoundError: scala/Product$class" error while using Deequ 1.0.5 version. Please suggest fix to this problem or any work around Error Py4JJavaError Traceback (most recent call last) <command-2625366351750561> in...

Data Engineering

9020 Views
6 replies
5 kudos

07-27-2021 6:49:34 PM

View Replies

Latest Reply

mcwir
Contributor

12-01-2022 3:27:29 AM

5 kudos

its seems like maven issue

5 kudos

12-01-2022 3:27:29 AM

5 More Replies

by tanin • Contributor

10-25-2022 12:42:18 PM

1067 Views
4 replies
7 kudos

Does anybody feel the unit test on Dataset is slow? (much slower than RDD). This is in Scala.

I profile it and it seems the slowness comes from Spark planning, especially for a more complex job (e.g. 100+ joins). Is there a way to speed it up (e.g. by disabling certain optimization)?

Data Engineering

1067 Views
4 replies
7 kudos

10-25-2022 12:42:18 PM

View Replies

Latest Reply

mcwir
Contributor

12-01-2022 1:14:55 AM

7 kudos

I had similar feeling recently.

7 kudos

12-01-2022 1:14:55 AM

3 More Replies

by Merchiv • New Contributor III

11-30-2022 2:13:17 AM

2112 Views
3 replies
1 kudos

Resolved! How to use uuid in SQL merge into statement

I have a Merge into statement that I use to update existing entries or create new entries in a dimension table based on a natural business key.When creating new entries I would like to also create a unique uuid for that entry that I can use to crossr...

Data Engineering

2112 Views
3 replies
1 kudos

11-30-2022 2:13:17 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

11-30-2022 5:21:02 AM

1 kudos

you might wanna look into an identity column, which is possible now in delta lake.https://www.databricks.com/blog/2022/08/08/identity-columns-to-generate-surrogate-keys-are-now-available-in-a-lakehouse-near-you.html

1 kudos

11-30-2022 5:21:02 AM

2 More Replies

by KVNARK • Honored Contributor II

11-24-2022 10:29:49 PM

914 Views
3 replies
11 kudos

Is there any limitation in querying the no. of SQL queries in Databricks SQL workspace.

Data Engineering

914 Views
3 replies
11 kudos

11-24-2022 10:29:49 PM

View Replies

Latest Reply

Rajeev_Basu
Contributor III

11-30-2022 10:10:54 PM

11 kudos

1000 has been documented to be by default, though I have never checked the correctness.

11 kudos

11-30-2022 10:10:54 PM

2 More Replies

by Doug1 • Contributor

10-07-2022 2:46:10 AM

4808 Views
27 replies
58 kudos

What ETL/ELT used the most within this group?

Data Engineering

4808 Views
27 replies
58 kudos

10-07-2022 2:46:10 AM

View Replies

Latest Reply

Rajeev_Basu
Contributor III

11-30-2022 10:08:11 PM

58 kudos

For Azure, I have used ADF, Azure Databricks and Synapse.

58 kudos

11-30-2022 10:08:11 PM

26 More Replies

by Ajay-Pandey • Esteemed Contributor III

11-30-2022 2:53:55 AM

842 Views
2 replies
9 kudos

Kafka integration with Databricks

Hi allI want to integrate Kafka with databricks if anyone can share any doc or code it will help me a lot.Thanks in advance

Data Engineering

842 Views
2 replies
9 kudos

11-30-2022 2:53:55 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

11-30-2022 4:53:59 AM

9 kudos

This is code that I am using to read from KafkainputDF = (spark .readStream .format("kafka") .option("kafka.bootstrap.servers", host) .option("kafka.ssl.endpoint.identification.algorithm", "https") .option("kafka.sasl.mechanism", "PLAIN") .option("ka...

9 kudos

11-30-2022 4:53:59 AM

1 More Replies

by rpshgupta • New Contributor III

06-19-2022 11:56:23 PM

4501 Views
8 replies
6 kudos

Databricks notebook failed with "Caused by: java.io.FileNotFoundException: Operation failed: "The specified path does not exist.", 404, HEAD, https://adls.dfs.core.windows.net/raw/file.csv?upn=false&action=getStatus&timeout=90".

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 458.0 failed 4 times, most recent failure: Lost task 0.3 in stage 458.0 (TID 2247) (172.18.102.75 executor 1): com.databricks.sql.io.FileReadException: Error while rea...

Data Engineering

4501 Views
8 replies
6 kudos

06-19-2022 11:56:23 PM

View Replies

Latest Reply

Vidula
Honored Contributor

08-25-2022 1:51:25 AM

6 kudos

Hi @Rupesh gupta Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

6 kudos

08-25-2022 1:51:25 AM

7 More Replies

by Ryan_Chynoweth • Honored Contributor III

08-06-2021 3:35:32 PM

2336 Views
2 replies
4 kudos

Connecting to Azure SQL from Azure Databricks with firewalls

We are trying to connect to an Azure SQL Server from Azure Databricks using JDBC, but have faced issues because our firewall blocks everything. We decided to whitelist IPs from the SQL Server side and add a public subnet to make the connection work. ...

Data Engineering

2336 Views
2 replies
4 kudos

08-06-2021 3:35:32 PM

View Replies

Latest Reply

Ryan_Chynoweth
Honored Contributor III

08-06-2021 3:39:39 PM

4 kudos

Using subnets for Databricks connectivity is the correct thing to do. This way you ensure the resources (clusters) can connect to the SQL Database. We also recommend using NPIP (No Public IPs) so that there won't be any public ip associated with the...

4 kudos

08-06-2021 3:39:39 PM

1 More Replies

by nameziane • New Contributor III

11-30-2022 2:29:35 AM

4932 Views
4 replies
2 kudos

Set version (VERSION AS OF) dynamically from return of a subquery

Hello,We have a business request to compare the evolution in a certain delta table.We would like to compare the latest version of the table with the previous one using Delta time travel.The main issue we are facing is to retrieve programmatically us...

Data Engineering

4932 Views
4 replies
2 kudos

11-30-2022 2:29:35 AM

View Replies

Latest Reply

apingle
Contributor

11-30-2022 6:27:36 AM

2 kudos

In the docs it says that "'Neither timestamp_expression nor version can be subqueries." So it does sound challenging. I also tried playing with widgets to see if it could be populated using SQL but didn't succeed. With python it's really easy to do.

2 kudos

11-30-2022 6:27:36 AM

3 More Replies

by cmilligan • Contributor II

11-29-2022 9:25:06 AM

989 Views
3 replies
4 kudos

Resolved! Pass through if a job was run as scheduled or if manual

I have a notebook that sets up parameters for the run based on some job parameters set by the user as well as the current date of the run. I want to supersede some of this logic and just use the manual values if kicked off manually. Is there a way to...

Data Engineering

989 Views
3 replies
4 kudos

11-29-2022 9:25:06 AM

View Replies

Latest Reply

SS2
Valued Contributor

11-29-2022 12:33:07 PM

4 kudos

You can create widgets by using this- dbutils.widgets.text("widgetName", "")To get the value for that widget:- dbutils.widgets.get("widgetName")So by using this you can manually create widgets (variable) and can run the process by giving desired valu...

4 kudos

11-29-2022 12:33:07 PM

2 More Replies

by rrussell25 • New Contributor

11-29-2022 7:13:57 PM

686 Views
1 replies
0 kudos

Read arguments in a scala note invoked by a job.

In a scala note, how to I read input arguments (e.g. those proved by a job that runs a scala notebook). In python, dbutils.notebook.entry_point.getCurrentBindings() works. How about for scala.

Data Engineering

686 Views
1 replies
0 kudos

11-29-2022 7:13:57 PM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

11-30-2022 11:04:00 AM

0 kudos

Hi @Robert Russell You can use dbutils.notebook.getContext.currentRunId in scala notebooks. Other methods are also available likedbutils.notebook.getContext.jobGroupdbutils.notebook.getContext.rootRunId dbutils.notebook.getContext.tags etc...You ...

0 kudos

11-30-2022 11:04:00 AM

by Snuki • New Contributor II

11-30-2022 10:40:15 AM

318 Views
0 replies
0 kudos

Hi FOLKS, could you please guide me, why my points not reflecting in reward store, it is showing 0.

Data Engineering

318 Views
0 replies
0 kudos

11-30-2022 10:40:15 AM

by RajibRajib_Mand • New Contributor III

01-03-2022 3:36:03 AM

1958 Views
2 replies
0 kudos

Reading Password protected excel(.xlsx) file in databricks

I want to read password protected excel file and load the data delta table.Can you pleas let me know how this can be achieved in databricks?

Data Engineering

1958 Views
2 replies
0 kudos

01-03-2022 3:36:03 AM

View Replies

Latest Reply

igorsalo22
New Contributor II

11-30-2022 10:11:12 AM

0 kudos

df = spark.read.format("com.crealytics.spark.excel")\ .option("dataAddress", "'Base'!A1")\ .option("header", "true")\ .option("workbookPassword", "test")\ .load("test.xlsx")display(df)

0 kudos

11-30-2022 10:11:12 AM

1 More Replies

by DK03 • Contributor

11-30-2022 5:13:04 AM

898 Views
2 replies
2 kudos

Is it ok to join on the decimal type fields? How does it affect the performance?

Data Engineering

898 Views
2 replies
2 kudos

11-30-2022 5:13:04 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

11-30-2022 10:02:46 AM

2 kudos

As @Werner Stinckens said, it would be ok. But generally decimal column joins are not recommended as other factors come into play like the precision, length etc...Also when you are joining in on decimal columns, be sure to check out the abs value of...

2 kudos

11-30-2022 10:02:46 AM

1 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Resolved! ADF pipeline fails when passing the parameter to databricks

java.lang.NoClassDefFoundError: scala/Product$class

Does anybody feel the unit test on Dataset is slow? (much slower than RDD). This is in Scala.

Resolved! How to use uuid in SQL merge into statement

Is there any limitation in querying the no. of SQL queries in Databricks SQL workspace.

What ETL/ELT used the most within this group?

Kafka integration with Databricks

Databricks notebook failed with "Caused by: java.io.FileNotFoundException: Operation failed: "The specified path does not exist.", 404, HEAD, https://adls.dfs.core.windows.net/raw/file.csv?upn=false&action=getStatus&timeout=90".

Connecting to Azure SQL from Azure Databricks with firewalls

Set version (VERSION AS OF) dynamically from return of a subquery

Resolved! Pass through if a job was run as scheduled or if manual

Read arguments in a scala note invoked by a job.

Hi FOLKS, could you please guide me, why my points not reflecting in reward store, it is showing 0.

Reading Password protected excel(.xlsx) file in databricks

Is it ok to join on the decimal type fields? How does it affect the performance?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...