Data Engineering

Forum Posts

Sorted by:

Start a conversation

by KVNARK • Honored Contributor II

11-24-2022 10:29:49 PM

945 Views
3 replies
11 kudos

Is there any limitation in querying the no. of SQL queries in Databricks SQL workspace.

Data Engineering

945 Views
3 replies
11 kudos

11-24-2022 10:29:49 PM

View Replies

Latest Reply

Rajeev_Basu
Contributor III

11-30-2022 10:10:54 PM

11 kudos

1000 has been documented to be by default, though I have never checked the correctness.

11 kudos

11-30-2022 10:10:54 PM

2 More Replies

by Doug1 • Contributor

10-07-2022 2:46:10 AM

5036 Views
27 replies
58 kudos

What ETL/ELT used the most within this group?

Data Engineering

5036 Views
27 replies
58 kudos

10-07-2022 2:46:10 AM

View Replies

Latest Reply

Rajeev_Basu
Contributor III

11-30-2022 10:08:11 PM

58 kudos

For Azure, I have used ADF, Azure Databricks and Synapse.

58 kudos

11-30-2022 10:08:11 PM

26 More Replies

by Ajay-Pandey • Esteemed Contributor III

11-30-2022 2:53:55 AM

885 Views
2 replies
9 kudos

Kafka integration with Databricks

Hi allI want to integrate Kafka with databricks if anyone can share any doc or code it will help me a lot.Thanks in advance

Data Engineering

885 Views
2 replies
9 kudos

11-30-2022 2:53:55 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

11-30-2022 4:53:59 AM

9 kudos

This is code that I am using to read from KafkainputDF = (spark .readStream .format("kafka") .option("kafka.bootstrap.servers", host) .option("kafka.ssl.endpoint.identification.algorithm", "https") .option("kafka.sasl.mechanism", "PLAIN") .option("ka...

9 kudos

11-30-2022 4:53:59 AM

1 More Replies

by rpshgupta • New Contributor III

06-19-2022 11:56:23 PM

4668 Views
8 replies
6 kudos

Databricks notebook failed with "Caused by: java.io.FileNotFoundException: Operation failed: "The specified path does not exist.", 404, HEAD, https://adls.dfs.core.windows.net/raw/file.csv?upn=false&action=getStatus&timeout=90".

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 458.0 failed 4 times, most recent failure: Lost task 0.3 in stage 458.0 (TID 2247) (172.18.102.75 executor 1): com.databricks.sql.io.FileReadException: Error while rea...

Data Engineering

4668 Views
8 replies
6 kudos

06-19-2022 11:56:23 PM

View Replies

Latest Reply

Vidula
Honored Contributor

08-25-2022 1:51:25 AM

6 kudos

Hi @Rupesh gupta Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

6 kudos

08-25-2022 1:51:25 AM

7 More Replies

by Ryan_Chynoweth • Honored Contributor III

08-06-2021 3:35:32 PM

2399 Views
2 replies
4 kudos

Connecting to Azure SQL from Azure Databricks with firewalls

We are trying to connect to an Azure SQL Server from Azure Databricks using JDBC, but have faced issues because our firewall blocks everything. We decided to whitelist IPs from the SQL Server side and add a public subnet to make the connection work. ...

Data Engineering

2399 Views
2 replies
4 kudos

08-06-2021 3:35:32 PM

View Replies

Latest Reply

Ryan_Chynoweth
Honored Contributor III

08-06-2021 3:39:39 PM

4 kudos

Using subnets for Databricks connectivity is the correct thing to do. This way you ensure the resources (clusters) can connect to the SQL Database. We also recommend using NPIP (No Public IPs) so that there won't be any public ip associated with the...

4 kudos

08-06-2021 3:39:39 PM

1 More Replies

by nameziane • New Contributor III

11-30-2022 2:29:35 AM

5105 Views
4 replies
2 kudos

Set version (VERSION AS OF) dynamically from return of a subquery

Hello,We have a business request to compare the evolution in a certain delta table.We would like to compare the latest version of the table with the previous one using Delta time travel.The main issue we are facing is to retrieve programmatically us...

Data Engineering

5105 Views
4 replies
2 kudos

11-30-2022 2:29:35 AM

View Replies

Latest Reply

apingle
Contributor

11-30-2022 6:27:36 AM

2 kudos

In the docs it says that "'Neither timestamp_expression nor version can be subqueries." So it does sound challenging. I also tried playing with widgets to see if it could be populated using SQL but didn't succeed. With python it's really easy to do.

2 kudos

11-30-2022 6:27:36 AM

3 More Replies

by cmilligan • Contributor II

11-29-2022 9:25:06 AM

1038 Views
3 replies
4 kudos

Resolved! Pass through if a job was run as scheduled or if manual

I have a notebook that sets up parameters for the run based on some job parameters set by the user as well as the current date of the run. I want to supersede some of this logic and just use the manual values if kicked off manually. Is there a way to...

Data Engineering

1038 Views
3 replies
4 kudos

11-29-2022 9:25:06 AM

View Replies

Latest Reply

SS2
Valued Contributor

11-29-2022 12:33:07 PM

4 kudos

You can create widgets by using this- dbutils.widgets.text("widgetName", "")To get the value for that widget:- dbutils.widgets.get("widgetName")So by using this you can manually create widgets (variable) and can run the process by giving desired valu...

4 kudos

11-29-2022 12:33:07 PM

2 More Replies

by rrussell25 • New Contributor

11-29-2022 7:13:57 PM

715 Views
1 replies
0 kudos

Read arguments in a scala note invoked by a job.

In a scala note, how to I read input arguments (e.g. those proved by a job that runs a scala notebook). In python, dbutils.notebook.entry_point.getCurrentBindings() works. How about for scala.

Data Engineering

715 Views
1 replies
0 kudos

11-29-2022 7:13:57 PM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

11-30-2022 11:04:00 AM

0 kudos

Hi @Robert Russell You can use dbutils.notebook.getContext.currentRunId in scala notebooks. Other methods are also available likedbutils.notebook.getContext.jobGroupdbutils.notebook.getContext.rootRunId dbutils.notebook.getContext.tags etc...You ...

0 kudos

11-30-2022 11:04:00 AM

by Snuki • New Contributor II

11-30-2022 10:40:15 AM

327 Views
0 replies
0 kudos

Hi FOLKS, could you please guide me, why my points not reflecting in reward store, it is showing 0.

Data Engineering

327 Views
0 replies
0 kudos

11-30-2022 10:40:15 AM

by RajibRajib_Mand • New Contributor III

01-03-2022 3:36:03 AM

2001 Views
2 replies
0 kudos

Reading Password protected excel(.xlsx) file in databricks

I want to read password protected excel file and load the data delta table.Can you pleas let me know how this can be achieved in databricks?

Data Engineering

2001 Views
2 replies
0 kudos

01-03-2022 3:36:03 AM

View Replies

Latest Reply

igorsalo22
New Contributor II

11-30-2022 10:11:12 AM

0 kudos

df = spark.read.format("com.crealytics.spark.excel")\ .option("dataAddress", "'Base'!A1")\ .option("header", "true")\ .option("workbookPassword", "test")\ .load("test.xlsx")display(df)

0 kudos

11-30-2022 10:11:12 AM

1 More Replies

by DK03 • Contributor

11-30-2022 5:13:04 AM

936 Views
2 replies
2 kudos

Is it ok to join on the decimal type fields? How does it affect the performance?

Data Engineering

936 Views
2 replies
2 kudos

11-30-2022 5:13:04 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

11-30-2022 10:02:46 AM

2 kudos

As @Werner Stinckens said, it would be ok. But generally decimal column joins are not recommended as other factors come into play like the precision, length etc...Also when you are joining in on decimal columns, be sure to check out the abs value of...

2 kudos

11-30-2022 10:02:46 AM

1 More Replies

by fury88 • New Contributor II

11-30-2022 9:04:20 AM

945 Views
1 replies
1 kudos

Does CACHE TABLE/VIEW have a create or replace like view?

I'm trying to cache data/queries that we normally have as temporary views that get replaced when the code is run based on dynamic python. What I'd like to know is will CACHE TABLE get overwritten each time you run it? Is it smart enough to recognize ...

Data Engineering

945 Views
1 replies
1 kudos

11-30-2022 9:04:20 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

11-30-2022 9:53:15 AM

1 kudos

Hi @Matt Fury Yes...I guess cache overwrites each time you run it because for me it took nearly same amount of time for 1million records to be cached. However, you can check whether the table is cached or not using .storageLevel method. E.g. I have...

1 kudos

11-30-2022 9:53:15 AM

by Durbinar • New Contributor III

11-30-2022 1:12:47 AM

2585 Views
4 replies
4 kudos

Resolved! Azure Databricks Default DNS

My Azure Databricks workspace default DNS is #168.63.129.16, this DNS doesn't seem to resolve azure storage accounts which were created a year ago, after tweaking the cluster to use 8.8.8.8 then able to resolve desired storage accounts, is there a d...

Data Engineering

2585 Views
4 replies
4 kudos

11-30-2022 1:12:47 AM

View Replies

Latest Reply

Durbinar
New Contributor III

11-30-2022 8:03:15 AM

4 kudos

IP address 168.63.129.16 is a virtual public IP address that is used to facilitate a communication channel to Azure platform resources. Customers can define any address space for their private virtual network in Azure. Therefore, the Azure platform...

4 kudos

11-30-2022 8:03:15 AM

3 More Replies

by 200723 • New Contributor II

10-26-2022 1:08:05 PM

1338 Views
4 replies
4 kudos

"No SRV records" intermittent error when running Databricks Pyspark to connect Mongo Atlas

My Mongo Atlas connect url is like mongodb+srv://<srv_hostname>I don't want to use direct url like mongodb://<hostname1, hostname2, hostname3....> because our Mongo Atlas global clusters have many hosts. It would be hard to maintain.Our java programs...

Data Engineering

1338 Views
4 replies
4 kudos

10-26-2022 1:08:05 PM

View Replies

Latest Reply

Noopur_Nigam
Valued Contributor II

11-30-2022 7:39:14 AM

4 kudos

Hi @Raymond Lai The issue looks to be on the Mongo DB connector. The connection is created and maintained by the mongo-spark connector. You can try using the direct mongodb hosts in the connection string instead of SRV to avoid doing DNS lookups or...

4 kudos

11-30-2022 7:39:14 AM

3 More Replies

by Dicer • Valued Contributor

10-24-2022 7:56:23 AM

4812 Views
5 replies
7 kudos

Is it reasonable for the process "Determining the location of DBIO file fragments." to take me 7 hours?

I only have 1000 columns. Each column has 252 rows, so there are only 252000 data points.How come it can route tasks for the best-cached locality for 7 hours?

Data Engineering

4812 Views
5 replies
7 kudos

10-24-2022 7:56:23 AM

View Replies

Latest Reply

Noopur_Nigam
Valued Contributor II

11-30-2022 7:01:42 AM

7 kudos

Hi @Cheuk Hin Christophe Poon have you optimize your table anytime since it's creation? If not, then optimize may take some time depending on the no of underlying files.Please try to run optimize manually as described in below document:https://docs....

7 kudos

11-30-2022 7:01:42 AM

4 More Replies

User

Count

1603

736

344

284

247

Databricks

Forum Posts

Is there any limitation in querying the no. of SQL queries in Databricks SQL workspace.

What ETL/ELT used the most within this group?

Kafka integration with Databricks

Databricks notebook failed with "Caused by: java.io.FileNotFoundException: Operation failed: "The specified path does not exist.", 404, HEAD, https://adls.dfs.core.windows.net/raw/file.csv?upn=false&action=getStatus&timeout=90".

Connecting to Azure SQL from Azure Databricks with firewalls

Set version (VERSION AS OF) dynamically from return of a subquery

Resolved! Pass through if a job was run as scheduled or if manual

Read arguments in a scala note invoked by a job.

Hi FOLKS, could you please guide me, why my points not reflecting in reward store, it is showing 0.

Reading Password protected excel(.xlsx) file in databricks

Is it ok to join on the decimal type fields? How does it affect the performance?

Does CACHE TABLE/VIEW have a create or replace like view?

Resolved! Azure Databricks Default DNS

"No SRV records" intermittent error when running Databricks Pyspark to connect Mongo Atlas

Is it reasonable for the process "Determining the location of DBIO file fragments." to take me 7 hours?

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...