Data Engineering

Forum Posts

Sorted by:

by aditya_raj_data • New Contributor II

10-28-2021 10:56:12 AM

8628 Views
4 replies
2 kudos

Hosting python application on Azure Databricks and exposing it's rest APIs

Hello, I am trying to host my application on Databricks and I want to expose rest APIs of my application to be accessed from postman but I am unable to find any documentation on how to do this. I tried to write simple flask "hello world" code to tr...

Data Engineering

8628 Views
4 replies
2 kudos

10-28-2021 10:56:12 AM

View Replies

Latest Reply

Manoj
Contributor II

10-28-2021 3:29:39 PM

2 kudos

I did this using Azure web app and exposed the APIs , was able to access that in Post Man and Data bricks. Not used python app on data bricks

2 kudos

10-28-2021 3:29:39 PM

3 More Replies

by User16753725182 • Databricks Employee

05-07-2021 7:43:49 AM

3500 Views
1 replies
0 kudos

How to setup a private git repository in my workspace?

Data Engineering

3500 Views
1 replies
0 kudos

05-07-2021 7:43:49 AM

View Replies

Latest Reply

atulsahu
New Contributor II

10-29-2021 1:25:39 AM

0 kudos

As a platform engineer, I would go to the admin console and click on "workspace settings" and start by looking into the below settings. Repos: true, so that Repos integration is possibleThe next two settings, are important to make the overall experi...

0 kudos

10-29-2021 1:25:39 AM

by Rnmj • New Contributor III

10-25-2021 5:25:36 AM

15911 Views
3 replies
6 kudos

ConnectException: Connection refused (Connection refused) This is often caused by an OOM error

I am trying to run a python code where a json file is flattened to pipe separated file . The code works with smaller files but for huge files of 2.4 GB I get below error:ConnectException: Connection refused (Connection refused)Error while obtaining a...

Data Engineering

15911 Views
3 replies
6 kudos

10-25-2021 5:25:36 AM

View Replies

Latest Reply

Rnmj
New Contributor III

10-28-2021 8:58:14 PM

6 kudos

Hi @Jose Gonzalez , @Werner Stinckens @Kaniz Fatma ,Thanks for your response .Appreciate a lot. The issue was in the code, it was a python /panda code running on Spark. Due to this only driver node was being used. i did validate this by increasin...

6 kudos

10-28-2021 8:58:14 PM

2 More Replies

by krishnakash • New Contributor II

10-04-2021 5:49:21 AM

5419 Views
4 replies
4 kudos

Resolved! Is there any way of determining last stage of SparkSQL Application Execution?

I have created custom UDF's that generate logs. These logs can be flushed by calling another API exposed which is exposed by an internal layer. However I want to call this API just after the execution of the UDF comes to an end. Is there any way of d...

Data Engineering

5419 Views
4 replies
4 kudos

10-04-2021 5:49:21 AM

View Replies

Latest Reply

User16763506586
Databricks Employee

10-13-2021 5:16:09 AM

4 kudos

@Krishna Kashiv May be ExecutorPlugin.java can help. It has all the methods you might required. Let me know if it works or not.You need to implement this interface org.apache.spark.api.plugin.SparkPluginand expose it as spark.plugins = com.abc.Imp...

4 kudos

10-13-2021 5:16:09 AM

3 More Replies

by Braxx • Contributor II

10-28-2021 1:10:53 PM

2841 Views
1 replies
3 kudos

Retry api request if fails

I have a simple API request to query a table and retrive data, which are then suited into a dataframe. May happened, it fails due to different reasons. How to retry it for let's say 5 times when any kind of error takes place? Here is an api request:d...

Data Engineering

2841 Views
1 replies
3 kudos

10-28-2021 1:10:53 PM

View Replies

Latest Reply

Manoj
Contributor II

10-28-2021 3:25:09 PM

3 kudos

@Bartosz Wachocki ,Use timeout, retry interval ,recursion and exception handling pseudo code belowtimeout = 300def exec_query(query,timeout): try: df = spark.createDataFrame(sf.bulk.MyTable.query(query)) except: if timeout > 0 : sleep(60) exec_que...

3 kudos

10-28-2021 3:25:09 PM

by adb-rm • New Contributor II

10-27-2021 9:41:42 AM

3097 Views
2 replies
2 kudos

Resolved! mail configuration azure data bricks pyspark notebook

Hi All,i am new to azure databricks , i am using pyspark .. we need to configure mail alerts when notebook failed or succeeded ..please can some one help me in mail configuration azure data bricks .Thanks

Data Engineering

3097 Views
2 replies
2 kudos

10-27-2021 9:41:42 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-28-2021 2:15:28 AM

2 kudos

the easiest way to schedule notebooks in Azure is to use Data Factory.In Data Factory you can schedule the notebooks and define the alerts you want to send.The other option is the one Hubert mentioned.

2 kudos

10-28-2021 2:15:28 AM

1 More Replies

by dimoobraznii • New Contributor III

10-27-2021 5:38:28 PM

8384 Views
2 replies
9 kudos

databricks-connect' is not recognized as an internal or external command, operable program or batch file on windows

Hello,I've installed databricks-connect on Windows 10:C:\Users\danoshin>pip install -U "databricks-connect==9.1.*" Collecting databricks-connect==9.1.* Downloading databricks-connect-9.1.2.tar.gz (254.6 MB) |████████████████████████████████| 2...

Data Engineering

8384 Views
2 replies
9 kudos

10-27-2021 5:38:28 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-28-2021 1:57:27 AM

9 kudos

@Dmitry Anoshin , that seems messed up.the best you can do is to remove databricks connect and also to uninstall any pyspark installation.And then follow the installation guide.It should work after following the procedure.I use a Linux VM for this p...

9 kudos

10-28-2021 1:57:27 AM

1 More Replies

by Greg_Galloway • New Contributor III

10-18-2021 11:57:53 AM

11194 Views
4 replies
3 kudos

Resolved! Use of private endpoints for storage in workspace with EnableNoPublicIP=Yes and VnetInjection=No

We know that Databricks with VNET injection (our own VNET) allows is to connect to ADLS Gen2 over private endpoints. This is what we typically do.We have a customer who created Databricks with EnableNoPublicIP=Yes (secure cluster connectivity) and Vn...

Data Engineering

11194 Views
4 replies
3 kudos

10-18-2021 11:57:53 AM

View Replies

Latest Reply

User16871418122
Databricks Employee

10-26-2021 9:12:44 PM

3 kudos

Managed VNET is locked and allows very limited config tuning like VNET peering that too facilitated and needs to be done from Databricks UI. If they want more control on VNET they need to migrate to VNET injected workspace.

3 kudos

10-26-2021 9:12:44 PM

3 More Replies

by Manoj • Contributor II

10-25-2021 10:39:06 AM

7931 Views
9 replies
8 kudos

Resolved! Is there a way to persist the delta cache even after the cluster restart?

Hi Team, we are planning to connect Power BI directly to Data bricks, however data fetching using direct query isn't giving great performance, though we are using Zorder by and Partition etc.. We decided to use Delta Cache, but the cache tables area ...

Data Engineering

7931 Views
9 replies
8 kudos

10-25-2021 10:39:06 AM

View Replies

Latest Reply

Manoj
Contributor II

10-26-2021 8:41:08 AM

8 kudos

@Hubert Dudek , I got a good news, I agree with @Werner Stinckens , SQL End Point is super fast, I have tested for 143 million records with Direct Query from power bi, result returned in 10-12 seconds. Don't even try doing incremental in power bi, ...

8 kudos

10-26-2021 8:41:08 AM

8 More Replies

by pantelis_mare • Contributor III

10-22-2021 6:17:25 AM

19096 Views
7 replies
5 kudos

Resolved! [SOLVED] maxPartitionBytes ignored?

Hello all!I'm running a simple read noop query where I read a specific partition of a delta table that looks like this:With the default configuration, I read the data in 12 partitions, which makes sense as the files that are more than 128MB are split...

Data Engineering

19096 Views
7 replies
5 kudos

10-22-2021 6:17:25 AM

View Replies

Latest Reply

ashish1
Databricks Employee

10-22-2021 12:56:56 PM

5 kudos

AQE doesn't affect the read time partitioning but at the shuffle time. It would be better to run optimize on the delta lake which will compact the files to approx 1 GB each, it would provide better read time performance.

5 kudos

10-22-2021 12:56:56 PM

6 More Replies

by Nickels • New Contributor II

10-25-2021 2:04:31 AM

3417 Views
4 replies
1 kudos

Resolved! Reply on inline runtime commands

I feel like the answer to this question should be simple, but none the less I'm struggling.I run a python code that prompts me with the following warning:On my local machine, I can accept this through my terminal and my machine do not run out of memo...

Data Engineering

3417 Views
4 replies
1 kudos

10-25-2021 2:04:31 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-26-2021 1:41:23 PM

1 kudos

Hi @Nickels Köhling ,In Databricks, you will only be able to see the output in the driver logs. If you go to your driver logs, you will be able to see 3 windows that are displaying the output of "stdout", "stderr" and "log4j".If in your code you do ...

1 kudos

10-26-2021 1:41:23 PM

3 More Replies

by francescocamuss • New Contributor III

10-26-2021 7:01:08 AM

5507 Views
3 replies
0 kudos

How to make terminal available when starting a cluster with a docker image?

Hello everybody, I'm starting a cluster with a docker image because I my team use a lot of R libraries that would be time consuming to install them on a init script. The thing is that the cluster starts ok, but I can't acces to the cluster terminal a...

Data Engineering

5507 Views
3 replies
0 kudos

10-26-2021 7:01:08 AM

View Replies

Latest Reply

Prabakar
Databricks Employee

10-26-2021 7:56:47 AM

0 kudos

@Francesco Camussoni are you using databricksruntime/rbase as your base image?

0 kudos

10-26-2021 7:56:47 AM

2 More Replies

by yatharth29 • New Contributor II

10-26-2021 2:54:54 AM

4121 Views
1 replies
2 kudos

How can I get/read Event Logs of a cluster in the form of a delta table within a notebook?

Data Engineering

4121 Views
1 replies
2 kudos

10-26-2021 2:54:54 AM

View Replies

Latest Reply

Sajesh
Databricks Employee

10-26-2021 3:35:47 AM

2 kudos

Hi @Yatharth Kaushik ,You can get the data into a table using the clusters event API: https://docs.databricks.com/dev-tools/api/latest/clusters.html#events

2 kudos

10-26-2021 3:35:47 AM

by snarfed • New Contributor II

10-22-2021 5:39:39 PM

3462 Views
3 replies
5 kudos

Serverless SQL endpoints on Azure?

Serverless SQL Endpoints sound exciting! Sounds like they've been in preview on AWS for a couple months. Any idea if/when they're coming to Azure?

Data Engineering

3462 Views
3 replies
5 kudos

10-22-2021 5:39:39 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-25-2021 11:13:45 AM

5 kudos

There is always Synapse Serverless muhahaha

5 kudos

10-25-2021 11:13:45 AM

2 More Replies

by Jreco • Contributor

10-25-2021 6:59:46 AM

6186 Views
2 replies
5 kudos

Resolved! Reference py file from a notebook

Hi All,I'm trying to reference a py file from a notebook following this documentation: Files in repoI downloaded and added the files to my repo and when I try to run the notebook, the modules is not recognized: Any idea why is this happening? Thanks ...

Data Engineering

6186 Views
2 replies
5 kudos

10-25-2021 6:59:46 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-25-2021 9:36:54 AM

5 kudos

In this topic you can find some more info:https://community.databricks.com/s/question/0D53f00001Pp5EhCAJThe docs are not that clear.

5 kudos

10-25-2021 9:36:54 AM

1 More Replies

Databricks Community

Forum Posts

Hosting python application on Azure Databricks and exposing it's rest APIs

How to setup a private git repository in my workspace?

ConnectException: Connection refused (Connection refused) This is often caused by an OOM error

Resolved! Is there any way of determining last stage of SparkSQL Application Execution?

Retry api request if fails

Resolved! mail configuration azure data bricks pyspark notebook

databricks-connect' is not recognized as an internal or external command, operable program or batch file on windows

Resolved! Use of private endpoints for storage in workspace with EnableNoPublicIP=Yes and VnetInjection=No

Resolved! Is there a way to persist the delta cache even after the cluster restart?

Resolved! [SOLVED] maxPartitionBytes ignored?

Resolved! Reply on inline runtime commands

How to make terminal available when starting a cluster with a docker image?

How can I get/read Event Logs of a cluster in the form of a delta table within a notebook?

Serverless SQL endpoints on Azure?

Resolved! Reference py file from a notebook

Join Us as a Local Community Builder!

Issues Creating Genie Space via API Join Specs Are...

How can I pass parameters from DABs to something(l...

delta live tables - collaborative development

Declarative Pipelines: set Merge Schema to False

Row tracking in Delta tables