cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

PrashantAghara
by New Contributor II
  • 1142 Views
  • 1 replies
  • 0 kudos

org.apache.spark.SparkException: Job aborted due to stage failure when writing to Cosmos

I am writing data to cosmos DB using Python & Spark on DatabricksI am getting below error :org.apache.spark.SparkException: Job aborted due to stage failure: Authorized committer (attemptNumber=0, stage=192, partition=105) failed; but task commit suc...

  • 1142 Views
  • 1 replies
  • 0 kudos
Latest Reply
PrashantAghara
New Contributor II
  • 0 kudos

The configs are for cluster:Worker Type & Driver type : Standard_D16ads_v5RUs for Cosmos : 1.5L

  • 0 kudos
DC3
by New Contributor II
  • 1898 Views
  • 2 replies
  • 0 kudos

Unable to access unity catalog volume via /Volumes in notebook

I have set up a volume in unity catalog in the format catalog/schema/volume, and granted all permissions to all users on the catalog, schema and volume.From the notebook I can see the /Volumes directory in the root of the file system but am unable to...

  • 1898 Views
  • 2 replies
  • 0 kudos
Latest Reply
DC3
New Contributor II
  • 0 kudos

Thanks for your comments. The problem turned out to be the compute resource not having unity catalog enabled.

  • 0 kudos
1 More Replies
Sagas
by New Contributor II
  • 876 Views
  • 1 replies
  • 0 kudos

SparkR or sparklyr not showing history

Hi,for some reason Azure Databricks doesn't show History if the data is saved with SparkR (2 in the figure below) or Sparklyr (3), but it does show it with Data Ingestion (0) or with PySpark (1). Is this a known bug or am I doing something wrong? Is ...

Databricks_history.PNG SparkR.PNG Sparklyr.PNG
Data Engineering
sparklyr
SparkR
  • 876 Views
  • 1 replies
  • 0 kudos
patrickw
by New Contributor II
  • 6751 Views
  • 2 replies
  • 0 kudos

connect timed out error - Connecting to SQL Server from Databricks

I am getting a connect timed out error when attempting to access a sql server. I can successfully ping the server from Databricks. I have used the jdbc connection and the sqlserver included driver and both result in the same error. I have also attemp...

  • 6751 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Can you run the following command in a notebook using the same cluster you are using to connect:%sh nc -vz <hostname> <port> This test will confirm us if we are able to communicate with the SQL server by using the port you are defining to connect. If...

  • 0 kudos
1 More Replies
madrhr
by New Contributor III
  • 2903 Views
  • 3 replies
  • 3 kudos

Resolved! SparkContext lost when running %sh script.py

I need to execute a .py file in Databricks from a notebook (with arguments which for simplicity i exclude here). For this i am using:%sh script.pyscript.py:from pyspark import SparkContext def main(): sc = SparkContext.getOrCreate() print(sc...

Data Engineering
%sh
.py
bash shell
SparkContext
SparkShell
  • 2903 Views
  • 3 replies
  • 3 kudos
Latest Reply
madrhr
New Contributor III
  • 3 kudos

I got it eventually working with a combination of:from databricks.sdk.runtime import *spark.sparkContext.addPyFile("/path/to/your/file")sys.path.append("path/to/your")   

  • 3 kudos
2 More Replies
NOOR_BASHASHAIK
by Contributor
  • 2285 Views
  • 2 replies
  • 1 kudos

Machine Type for VACUUM operation

Dear allI have a workflow with 2 tasks : one that does OPTIMIZE, followed by one that does VACUUM. I used a cluster with F32s driver and F64s - 8 workers (auto-scaling enabled). All 8 workers are launched by Databricks as soon as OPTIMIZE starts. As ...

NOOR_BASHASHAIK_0-1710268182562.png
Data Engineering
best practice
F series
optimize
vacuum
  • 2285 Views
  • 2 replies
  • 1 kudos
Latest Reply
ArturOA
New Contributor III
  • 1 kudos

Hi,were you able to get any useful help on this?

  • 1 kudos
1 More Replies
PrebenOlsen
by New Contributor III
  • 1146 Views
  • 2 replies
  • 0 kudos

How to migrate Git repos with DLT configurations

Hi!I want to migrate all my databricks related code from one github repo to another. I knew this wouldn't be straight forward. When I copy my code for one DLT, I get the errororg.apache.spark.sql.catalyst.ExtendedAnalysisException: Table 'vessel_batt...

  • 1146 Views
  • 2 replies
  • 0 kudos
Latest Reply
PrebenOlsen
New Contributor III
  • 0 kudos

Does cloning take considerably less time then recreating the tables?Can I resume append operations to a cloned table?

  • 0 kudos
1 More Replies
Phani1
by Valued Contributor II
  • 946 Views
  • 0 replies
  • 0 kudos

Job cluster configuration for 24/7

Hi Team,We intend to activate the job cluster around the clock. We  consider the following parameters regarding cost:  - Data volumes - Client SLA for job completion- Starting with a small cluster configuration Please advise on any other options we s...

  • 946 Views
  • 0 replies
  • 0 kudos
Anshul_DBX
by New Contributor
  • 1259 Views
  • 1 replies
  • 1 kudos

Masking rules with Delta Sharing

Hi,We tried Delta sharing to PBI which worked fine, But facing issues while trying to apply row, column level filtering or data masking. It fails with error that its not supported.Can anyone please confirm, if delta sharing with masking rules works w...

0dcc15fc-597c-460f-b37c-1a678ef60997.jpg
  • 1259 Views
  • 1 replies
  • 1 kudos
Latest Reply
Yeshwanth
Databricks Employee
  • 1 kudos

Hi @Anshul_DBX good day! The issue you are encountering is due to a limitation in Delta Sharing. As per the provided information, Delta Sharing does not support row-level security or column masks. This means that you cannot apply row and column level...

  • 1 kudos
Yohannes
by New Contributor
  • 3661 Views
  • 1 replies
  • 0 kudos

Databricks cli workflow

Is there a way that I can set up and configure a Databricks workflow job and tasks from Databricks cli or api tools by using python? Any help would be appreciated. #databricksworkflow #databricks 

  • 3661 Views
  • 1 replies
  • 0 kudos
Latest Reply
steyler-db
Databricks Employee
  • 0 kudos

Hello and yes, you can set up and configure a Databricks workflow job and tasks using Databricks CLI or API tools with Python. Here are some resources and steps to guide you:   Create and run Databricks Jobs: This document: ( https://docs.databrick...

  • 0 kudos
de-hru
by New Contributor III
  • 2355 Views
  • 2 replies
  • 1 kudos

Address Validation, Correction and Enrichment with Databricks Spark Engine

Hi all!In our project, we're thinking about "Validation, Correction and Enrichment of Postal Addresses" with Databricks. For sure we'd need some kind of batch processing, because we have millions of addresses in our system.I'm aware of Address Valida...

  • 2355 Views
  • 2 replies
  • 1 kudos
Latest Reply
Sam99
New Contributor II
  • 1 kudos

Happy to help. Feel free to reach out https://www.linkedin.com/in/saleh-sultan-143ab036?utm_source=share&utm_campaign=share_via&utm_content=profile&utm_medium=android_app

  • 1 kudos
1 More Replies
Phani1
by Valued Contributor II
  • 963 Views
  • 1 replies
  • 0 kudos

udf in databricks

Hi Team,Is there a particular reason why we should avoid using UDF and instead convert to DataFrame code?Are there any restrictions or limitations (in terms of performance or governance) when using UDFs in Databricks? Regards,Janga

  • 963 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Hello some of the things you need to take in consideration is that:UDFs might introduce significant processing bottlenecks into code execution. Databricks uses a number of different optimizers automatically for code written with included Apache Spark...

  • 0 kudos
ande
by New Contributor
  • 1208 Views
  • 1 replies
  • 0 kudos

IP address for accessing external SFTP server

I am trying to pull in data to my Databricks workspace via an external SFTP server. I am using Azure for my compute. To access the SFTP server they need to whitelist my IP address. My IP address in Azure Databricks seems to be constantly changing fro...

  • 1208 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Azure Databricks, like many cloud services, does not provide static IP addresses for outbound connections. This is because the compute resources are dynamically allocated and can change over time. One potential workaround could be to use a Virtual N...

  • 0 kudos
User15787040559
by Databricks Employee
  • 32031 Views
  • 2 replies
  • 8 kudos

What's the difference between a Global view and a Temp view?

The difference between Global and Temp is how the lifetime of the view is tied to the application:http://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.createOrReplaceTempView.html?highlight=createorreplacetempview#pyspar...

  • 32031 Views
  • 2 replies
  • 8 kudos
Latest Reply
ScottSmithDB
Databricks Employee
  • 8 kudos

Correct A Temp View is scoped to the SparkSession and dropped when that session closes.  Each notebook runs in its own SparkSession.  The Global Temp View is scoped to the cluster and dropped when the cluster re-starts or you drop it. ---------------...

  • 8 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels