cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

KVNARK
by Honored Contributor II
  • 945 Views
  • 3 replies
  • 11 kudos

Is there any limitation in querying the no. of SQL queries in Databricks SQL workspace.

Is there any limitation in querying the no. of SQL queries in Databricks SQL workspace. 

  • 945 Views
  • 3 replies
  • 11 kudos
Latest Reply
Rajeev_Basu
Contributor III
  • 11 kudos

1000 has been documented to be by default, though I have never checked the correctness.

  • 11 kudos
2 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 885 Views
  • 2 replies
  • 9 kudos

Kafka integration with Databricks

Hi allI want to integrate Kafka with databricks if anyone can share any doc or code it will help me a lot.Thanks in advance

  • 885 Views
  • 2 replies
  • 9 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 9 kudos

This is code that I am using to read from KafkainputDF = (spark .readStream .format("kafka") .option("kafka.bootstrap.servers", host) .option("kafka.ssl.endpoint.identification.algorithm", "https") .option("kafka.sasl.mechanism", "PLAIN") .option("ka...

  • 9 kudos
1 More Replies
rpshgupta
by New Contributor III
  • 4668 Views
  • 8 replies
  • 6 kudos

Databricks notebook failed with "Caused by: java.io.FileNotFoundException: Operation failed: "The specified path does not exist.", 404, HEAD, https://adls.dfs.core.windows.net/raw/file.csv?upn=false&action=getStatus&timeout=90".

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 458.0 failed 4 times, most recent failure: Lost task 0.3 in stage 458.0 (TID 2247) (172.18.102.75 executor 1): com.databricks.sql.io.FileReadException: Error while rea...

  • 4668 Views
  • 8 replies
  • 6 kudos
Latest Reply
Vidula
Honored Contributor
  • 6 kudos

Hi @Rupesh gupta​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

  • 6 kudos
7 More Replies
Ryan_Chynoweth
by Honored Contributor III
  • 2399 Views
  • 2 replies
  • 4 kudos

Connecting to Azure SQL from Azure Databricks with firewalls

We are trying to connect to an Azure SQL Server from Azure Databricks using JDBC, but have faced issues because our firewall blocks everything. We decided to whitelist IPs from the SQL Server side and add a public subnet to make the connection work. ...

  • 2399 Views
  • 2 replies
  • 4 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 4 kudos

Using subnets for Databricks connectivity is the correct thing to do. This way you ensure the resources (clusters) can connect to the SQL Database. We also recommend using NPIP (No Public IPs) so that there won't be any public ip associated with the...

  • 4 kudos
1 More Replies
nameziane
by New Contributor III
  • 5105 Views
  • 4 replies
  • 2 kudos

Set version (VERSION AS OF) dynamically from return of a subquery

Hello,We have a business request to compare the evolution in a certain delta table.We would like to compare the latest version of the table with the previous one using Delta time travel.The main issue we are facing is to retrieve programmatically us...

  • 5105 Views
  • 4 replies
  • 2 kudos
Latest Reply
apingle
Contributor
  • 2 kudos

In the docs it says that "'Neither timestamp_expression nor version can be subqueries." So it does sound challenging. I also tried playing with widgets to see if it could be populated using SQL but didn't succeed. With python it's really easy to do.

  • 2 kudos
3 More Replies
cmilligan
by Contributor II
  • 1038 Views
  • 3 replies
  • 4 kudos

Resolved! Pass through if a job was run as scheduled or if manual

I have a notebook that sets up parameters for the run based on some job parameters set by the user as well as the current date of the run. I want to supersede some of this logic and just use the manual values if kicked off manually. Is there a way to...

  • 1038 Views
  • 3 replies
  • 4 kudos
Latest Reply
SS2
Valued Contributor
  • 4 kudos

You can create widgets by using this- dbutils.widgets.text("widgetName", "")To get the value for that widget:- dbutils.widgets.get("widgetName")So by using this you can manually create widgets (variable) and can run the process by giving desired valu...

  • 4 kudos
2 More Replies
rrussell25
by New Contributor
  • 715 Views
  • 1 replies
  • 0 kudos

Read arguments in a scala note invoked by a job.

In a scala note, how to I read input arguments (e.g. those proved by a job that runs a scala notebook). In python, dbutils.notebook.entry_point.getCurrentBindings() works. How about for scala.

  • 715 Views
  • 1 replies
  • 0 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 0 kudos

Hi @Robert Russell​ You can use dbutils.notebook.getContext.currentRunId in scala notebooks. Other methods are also available likedbutils.notebook.getContext.jobGroupdbutils.notebook.getContext.rootRunId dbutils.notebook.getContext.tags etc...You ...

  • 0 kudos
RajibRajib_Mand
by New Contributor III
  • 2001 Views
  • 2 replies
  • 0 kudos

Reading Password protected excel(.xlsx) file in databricks

I want to read password protected excel file and load the data delta table.Can you pleas let me know how this can be achieved in databricks?

  • 2001 Views
  • 2 replies
  • 0 kudos
Latest Reply
igorsalo22
New Contributor II
  • 0 kudos

df = spark.read.format("com.crealytics.spark.excel")\  .option("dataAddress", "'Base'!A1")\  .option("header", "true")\  .option("workbookPassword", "test")\  .load("test.xlsx")display(df)

  • 0 kudos
1 More Replies
DK03
by Contributor
  • 936 Views
  • 2 replies
  • 2 kudos
  • 936 Views
  • 2 replies
  • 2 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 2 kudos

As @Werner Stinckens​ said, it would be ok. But generally decimal column joins are not recommended as other factors come into play like the precision, length etc...Also when you are joining in on decimal columns, be sure to check out the abs value of...

  • 2 kudos
1 More Replies
fury88
by New Contributor II
  • 945 Views
  • 1 replies
  • 1 kudos

Does CACHE TABLE/VIEW have a create or replace like view?

I'm trying to cache data/queries that we normally have as temporary views that get replaced when the code is run based on dynamic python. What I'd like to know is will CACHE TABLE get overwritten each time you run it? Is it smart enough to recognize ...

  • 945 Views
  • 1 replies
  • 1 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 1 kudos

Hi @Matt Fury​ Yes...I guess cache overwrites each time you run it because for me it took nearly same amount of time for 1million records to be cached. However, you can check whether the table is cached or not using .storageLevel method. E.g. I have...

  • 1 kudos
Durbinar
by New Contributor III
  • 2585 Views
  • 4 replies
  • 4 kudos

Resolved! Azure Databricks Default DNS

My Azure Databricks workspace default DNS is #168.63.129.16, this DNS doesn't seem to resolve azure storage accounts which were created a year ago, after tweaking the cluster to use 8.8.8.8 then able to resolve desired storage accounts, is there a d...

  • 2585 Views
  • 4 replies
  • 4 kudos
Latest Reply
Durbinar
New Contributor III
  • 4 kudos

IP address 168.63.129.16 is a virtual public IP address that is used to facilitate a communication channel to Azure platform resources. Customers can define any address space for their private virtual network in Azure. Therefore, the Azure platform...

  • 4 kudos
3 More Replies
200723
by New Contributor II
  • 1338 Views
  • 4 replies
  • 4 kudos

"No SRV records" intermittent error when running Databricks Pyspark to connect Mongo Atlas

My Mongo Atlas connect url is like mongodb+srv://<srv_hostname>I don't want to use direct url like mongodb://<hostname1, hostname2, hostname3....> because our Mongo Atlas global clusters have many hosts. It would be hard to maintain.Our java programs...

  • 1338 Views
  • 4 replies
  • 4 kudos
Latest Reply
Noopur_Nigam
Valued Contributor II
  • 4 kudos

Hi @Raymond Lai​  The issue looks to be on the Mongo DB connector. The connection is created and maintained by the mongo-spark connector. You can try using the direct mongodb hosts in the connection string instead of SRV to avoid doing DNS lookups or...

  • 4 kudos
3 More Replies
Dicer
by Valued Contributor
  • 4812 Views
  • 5 replies
  • 7 kudos

Is it reasonable for the process "Determining the location of DBIO file fragments." to take me 7 hours?

I only have 1000 columns. Each column has 252 rows, so there are only 252000 data points.How come it can route tasks for the best-cached locality for 7 hours?

  • 4812 Views
  • 5 replies
  • 7 kudos
Latest Reply
Noopur_Nigam
Valued Contributor II
  • 7 kudos

Hi @Cheuk Hin Christophe Poon​ have you optimize your table anytime since it's creation? If not, then optimize may take some time depending on the no of underlying files.Please try to run optimize manually as described in below document:https://docs....

  • 7 kudos
4 More Replies
Labels
Top Kudoed Authors