cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

User16753725182
by Contributor III
  • 2014 Views
  • 1 replies
  • 0 kudos

How to setup a private git repository in my workspace?

How to setup a private git repository in my workspace?

  • 2014 Views
  • 1 replies
  • 0 kudos
Latest Reply
atulsahu
New Contributor II
  • 0 kudos

As a platform engineer, I would go to the admin console and click on "workspace settings" and start by looking into the below settings. Repos: true, so that Repos integration is possibleThe next two settings, are important to make the overall experi...

  • 0 kudos
Rnmj
by New Contributor III
  • 12341 Views
  • 3 replies
  • 6 kudos

ConnectException: Connection refused (Connection refused) This is often caused by an OOM error

I am trying to run a python code where a json file is flattened to pipe separated file . The code works with smaller files but for huge files of 2.4 GB I get below error:ConnectException: Connection refused (Connection refused)Error while obtaining a...

  • 12341 Views
  • 3 replies
  • 6 kudos
Latest Reply
Rnmj
New Contributor III
  • 6 kudos

Hi @Jose Gonzalez​ , @Werner Stinckens​  @Kaniz Fatma​ ,Thanks for your response .Appreciate a lot. The issue was in the code, it was a python /panda code running on Spark. Due to this only driver node was being used. i did validate this by increasin...

  • 6 kudos
2 More Replies
krishnakash
by New Contributor II
  • 3550 Views
  • 4 replies
  • 4 kudos

Resolved! Is there any way of determining last stage of SparkSQL Application Execution?

I have created custom UDF's that generate logs. These logs can be flushed by calling another API exposed which is exposed by an internal layer. However I want to call this API just after the execution of the UDF comes to an end. Is there any way of d...

  • 3550 Views
  • 4 replies
  • 4 kudos
Latest Reply
User16763506586
Contributor
  • 4 kudos

@Krishna Kashiv​ May be ExecutorPlugin.java can help. It has all the methods you might required. Let me know if it works or not.You need to implement this interface org.apache.spark.api.plugin.SparkPluginand expose it as spark.plugins = com.abc.Imp...

  • 4 kudos
3 More Replies
Braxx
by Contributor II
  • 1957 Views
  • 1 replies
  • 3 kudos

Retry api request if fails

I have a simple API request to query a table and retrive data, which are then suited into a dataframe. May happened, it fails due to different reasons. How to retry it for let's say 5 times when any kind of error takes place? Here is an api request:d...

  • 1957 Views
  • 1 replies
  • 3 kudos
Latest Reply
Manoj
Contributor II
  • 3 kudos

@Bartosz Wachocki​ ,Use timeout, retry interval ,recursion and exception handling pseudo code belowtimeout = 300def exec_query(query,timeout): try: df = spark.createDataFrame(sf.bulk.MyTable.query(query)) except: if timeout > 0 : sleep(60) exec_que...

  • 3 kudos
adb-rm
by New Contributor II
  • 1814 Views
  • 2 replies
  • 2 kudos

Resolved! mail configuration azure data bricks pyspark notebook

Hi All,i am new to azure databricks , i am using pyspark .. we need to configure mail alerts when notebook failed or succeeded ..please can some one help me in mail configuration azure data bricks .Thanks

  • 1814 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

the easiest way to schedule notebooks in Azure is to use Data Factory.In Data Factory you can schedule the notebooks and define the alerts you want to send.The other option is the one Hubert mentioned.

  • 2 kudos
1 More Replies
dimoobraznii
by New Contributor III
  • 6347 Views
  • 2 replies
  • 9 kudos

databricks-connect' is not recognized as an internal or external command, operable program or batch file on windows

Hello,I've installed databricks-connect on Windows 10:C:\Users\danoshin>pip install -U "databricks-connect==9.1.*" Collecting databricks-connect==9.1.* Downloading databricks-connect-9.1.2.tar.gz (254.6 MB) |████████████████████████████████| 2...

  • 6347 Views
  • 2 replies
  • 9 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 9 kudos

@Dmitry Anoshin​ , that seems messed up.the best you can do is to remove databricks connect and also to uninstall any pyspark installation.And then follow the installation guide.It should work after following the procedure.I use a Linux VM for this p...

  • 9 kudos
1 More Replies
Greg_Galloway
by New Contributor III
  • 6405 Views
  • 4 replies
  • 3 kudos

Resolved! Use of private endpoints for storage in workspace with EnableNoPublicIP=Yes and VnetInjection=No

We know that Databricks with VNET injection (our own VNET) allows is to connect to ADLS Gen2 over private endpoints. This is what we typically do.We have a customer who created Databricks with EnableNoPublicIP=Yes (secure cluster connectivity) and Vn...

  • 6405 Views
  • 4 replies
  • 3 kudos
Latest Reply
User16871418122
Contributor III
  • 3 kudos

Managed VNET is locked and allows very limited config tuning like VNET peering that too facilitated and needs to be done from Databricks UI. If they want more control on VNET they need to migrate to VNET injected workspace.

  • 3 kudos
3 More Replies
Manoj
by Contributor II
  • 5028 Views
  • 9 replies
  • 8 kudos

Resolved! Is there a way to persist the delta cache even after the cluster restart?

Hi Team, we are planning to connect Power BI directly to Data bricks, however data fetching using direct query isn't giving great performance, though we are using Zorder by and Partition etc.. We decided to use Delta Cache, but the cache tables area ...

  • 5028 Views
  • 9 replies
  • 8 kudos
Latest Reply
Manoj
Contributor II
  • 8 kudos

@Hubert Dudek​ , I got a good news, I agree with @Werner Stinckens​ , SQL End Point is super fast, I have tested for 143 million records with Direct Query from power bi, result returned in 10-12 seconds. Don't even try doing incremental in power bi, ...

  • 8 kudos
8 More Replies
pantelis_mare
by Contributor III
  • 13352 Views
  • 7 replies
  • 5 kudos

Resolved! [SOLVED] maxPartitionBytes ignored?

Hello all!I'm running a simple read noop query where I read a specific partition of a delta table that looks like this:With the default configuration, I read the data in 12 partitions, which makes sense as the files that are more than 128MB are split...

image
  • 13352 Views
  • 7 replies
  • 5 kudos
Latest Reply
ashish1
New Contributor III
  • 5 kudos

AQE doesn't affect the read time partitioning but at the shuffle time. It would be better to run optimize on the delta lake which will compact the files to approx 1 GB each, it would provide better read time performance.

  • 5 kudos
6 More Replies
Nickels
by New Contributor II
  • 1876 Views
  • 4 replies
  • 1 kudos

Resolved! Reply on inline runtime commands

I feel like the answer to this question should be simple, but none the less I'm struggling.I run a python code that prompts me with the following warning:On my local machine, I can accept this through my terminal and my machine do not run out of memo...

image
  • 1876 Views
  • 4 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @Nickels Köhling​ ,In Databricks, you will only be able to see the output in the driver logs. If you go to your driver logs, you will be able to see 3 windows that are displaying the output of "stdout", "stderr" and "log4j".If in your code you do ...

  • 1 kudos
3 More Replies
francescocamuss
by New Contributor III
  • 3755 Views
  • 3 replies
  • 0 kudos

How to make terminal available when starting a cluster with a docker image?

Hello everybody, I'm starting a cluster with a docker image because I my team use a lot of R libraries that would be time consuming to install them on a init script. The thing is that the cluster starts ok, but I can't acces to the cluster terminal a...

image.png image
  • 3755 Views
  • 3 replies
  • 0 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 0 kudos

@Francesco Camussoni​  are you using databricksruntime/rbase as your base image?

  • 0 kudos
2 More Replies
yatharth29
by New Contributor II
  • 2731 Views
  • 1 replies
  • 2 kudos
  • 2731 Views
  • 1 replies
  • 2 kudos
Latest Reply
Sajesh
Contributor
  • 2 kudos

Hi @Yatharth Kaushik​ ,You can get the data into a table using the clusters event API: https://docs.databricks.com/dev-tools/api/latest/clusters.html#events

  • 2 kudos
snarfed
by New Contributor II
  • 2167 Views
  • 3 replies
  • 5 kudos

Serverless SQL endpoints on Azure?

Serverless SQL Endpoints sound exciting! Sounds like they've been in preview on AWS for a couple months. Any idea if/when they're coming to Azure?

  • 2167 Views
  • 3 replies
  • 5 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 5 kudos

There is always Synapse Serverless muhahaha

  • 5 kudos
2 More Replies
Jreco
by Contributor
  • 3778 Views
  • 2 replies
  • 5 kudos

Resolved! Reference py file from a notebook

Hi All,I'm trying to reference a py file from a notebook following this documentation: Files in repoI downloaded and added the files to my repo and when I try to run the notebook, the modules is not recognized: Any idea why is this happening? Thanks ...

image image
  • 3778 Views
  • 2 replies
  • 5 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 5 kudos

In this topic you can find some more info:https://community.databricks.com/s/question/0D53f00001Pp5EhCAJThe docs are not that clear.

  • 5 kudos
1 More Replies
Mec_Mec
by New Contributor II
  • 4371 Views
  • 6 replies
  • 4 kudos

Resolved! Copy a script from the current subscription to new subscription

I would like to check if there is a process to copy a script/code or migrate the script from the current subscription of the Azure Databricks - Notebooks to new subscription of Databricks (new notebook).

  • 4371 Views
  • 6 replies
  • 4 kudos
Latest Reply
Mec_Mec
New Contributor II
  • 4 kudos

how quickly move the Databricks notebooks from one account to another?

  • 4 kudos
5 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels