cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

dev_puli
by New Contributor III
  • 2820 Views
  • 3 replies
  • 0 kudos

tracing the history of a workflow

Hi!I use Databricks in Azure and I find it inconvenient not knowing the last modified user and modified time. How can I trace the history of modified time and user details? Would it be possible to deploy the workflows into higher environments?Thanks!

Data Engineering
azure
Workflows
  • 2820 Views
  • 3 replies
  • 0 kudos
Latest Reply
dev_puli
New Contributor III
  • 0 kudos

Sorry! I added another issue at the end without mentioning it was a new issue I encountered. I had challenges in changing the owner of a workflow when I created a workflow. I ended up seeking help from another user with admin privileges to change the...

  • 0 kudos
2 More Replies
UtkarshTrehan
by New Contributor
  • 3399 Views
  • 3 replies
  • 1 kudos

Inconsistent Results When Writing to Oracle DB with Spark's dropDuplicates and foreachPartition

It's more a spark question then a databricks question, I'm encountering an issue when writing data to an Oracle database using Apache Spark. My workflow involves removing duplicate rows from a DataFrame and then writing the deduplicated DataFrame to ...

  • 3399 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @UtkarshTrehan , To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This will also help other community members who may have similar ...

  • 1 kudos
2 More Replies
carlosna
by New Contributor II
  • 8892 Views
  • 2 replies
  • 1 kudos

Resolved! Recover files from previous cluster execution

I saved a file with results by just opening a file via fopen("filename.csv", "a").Once the execution ended (and the cluster shutted down) I couldn't retrieve the file.I found that the file was stored in "/databricks/driver", and that folder empties w...

  • 8892 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @carlosna , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 1 kudos
1 More Replies
Joe1912
by New Contributor III
  • 1518 Views
  • 3 replies
  • 0 kudos

Resolved! Consume 2 kafka topic with different schemas on 1 cluster databricks

Hi everyone,I have a concern that is there any way to read stream from 2 different kafka topics with 2 different in 1 jobs or same cluster? or we need to create 2 separate jobs for it ? (Job will need to process continually)

  • 1518 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Joe1912 , It's certainly reasonable to run a number # of concurrent streams per driver node. Each .start() consumes a certain amount of driver resources in spark. Your limiting factor will be the load on the driver node and its available resour...

  • 0 kudos
2 More Replies
Joe1912
by New Contributor III
  • 777 Views
  • 2 replies
  • 1 kudos

Resolved! Strategy to add new table base on silver data

I have a merge function for streaming foreachBatch kind ofmergedf(df,i):    merge_func_1(df,i)     merge_func_2(df,i)Then I want to add new merge_func_3 into it. Is there any best practices for this case? when streaming always runs, how can I process...

  • 777 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Joe1912 , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 1 kudos
1 More Replies
DavidStarosta
by New Contributor II
  • 1665 Views
  • 3 replies
  • 0 kudos

Resolved! Databricks Asset Bundles Jobs Updated instead of Create

Hello, is it possible to just update parameter values in different workspaces?YAML source code taken from workflow jobs always create a new job. I'd like to just change/update parameter values when I deploy bundle to different workspaces/environments...

  • 1665 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @DavidStarosta , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 0 kudos
2 More Replies
JohnJustus
by New Contributor III
  • 8276 Views
  • 3 replies
  • 0 kudos

Resolved! Accessing Excel file from Databricks

Hi,I am trying to access excel file that is stored in Azure Blob storage via Databricks.In my understanding, it is not possible to access using Pyspark. So accessing through Pandas is the option,Here is my code.%pip install openpyxlimport pandas as p...

  • 8276 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @JohnJustus ,  To resolve the FileNotFoundError when reading from Azure Blob Storage in Databricks, you need to use the "wasbs" protocol for the file path reference instead of the local file system path. Here's a summary of the steps to address th...

  • 0 kudos
2 More Replies
dbuser1234
by New Contributor
  • 993 Views
  • 1 replies
  • 0 kudos

Resolved! How to readstream from multiple sources?

Hi I am trying to readstream from 2 sources and join them into a target table. How can I do this in pyspark? Egt1 + t2 as my bronze table. I want to readstream from t1 and t2, and merge the changes into t3 (silver table)

  • 993 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @dbuser1234 , Certainly! To read stream data from two sources, join them, and merge the changes into a target table in PySpark, you can follow these steps:   Read Stream Data from Sources (t1 and t2): Use spark.readStream to read data from both t1...

  • 0 kudos
Abel_Martinez
by Contributor
  • 6525 Views
  • 9 replies
  • 6 kudos

Resolved! Why I'm getting connection timeout when connecting to MongoDB using MongoDB Connector for Spark 10.x from Databricks

I'm able to connect to MongoDB using org.mongodb.spark:mongo-spark-connector_2.12:3.0.2 and this code:df = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("uri", jdbcUrl)It works well, but if I install last MongoDB Spark Connector ve...

  • 6525 Views
  • 9 replies
  • 6 kudos
Latest Reply
Kaniz
Community Manager
  • 6 kudos

Hi @Abel_Martinez, I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 6 kudos
8 More Replies
7cb15
by New Contributor
  • 6196 Views
  • 1 replies
  • 0 kudos

Resolved! org.apache.spark.SparkException: Job aborted due to stage failure while saving to s3

Hello, I am having issues saving a spark dataframe generated in a databricks notebook to an s3 bucket. The dataframe contains approximately 1.1M rows and 5 columns. The error is as follows: org.apache.spark.SparkException: Job aborted due to stage fa...

  • 6196 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @7cb15, I understand you’re encountering issues while saving a Spark DataFrame to an S3 bucket. Let’s troubleshoot this together! Here are some steps and recommendations to address the problem: Check S3 Permissions: Ensure that the IAM role or us...

  • 0 kudos
svrdragon
by New Contributor
  • 719 Views
  • 1 replies
  • 0 kudos

Resolved! optimizeWrite takes too long

Hi , We have a spark job write data in delta table for last 90 date partition. We have enabled spark.databricks.delta.autoCompact.enabled and delta.autoOptimize.optimizeWrite. Job takes 50 mins to complete. In that logic takes 12 mins and optimizewri...

  • 719 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @svrdragon, It’s great you’re using Delta Lake features to optimize your Spark job.    Let’s explore some strategies to reduce the total job time potentially:   Optimize Write: You’ve already enabled delta.autoOptimize.optimizeWrite, which is a go...

  • 0 kudos
Rafal9
by New Contributor II
  • 3074 Views
  • 1 replies
  • 0 kudos

Resolved! Issue during testing SparkSession.sql() with pytest.

Dear Community,I am testing pyspark code via pytest using VS code and Databricks Connect.SparkSession is initiated from Databricks Connect: from databricks.connect import DatabricksSessionspark = DatabricksSession.builder.getOrCreate()I am  receiving...

  • 3074 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Rafal9, Thank you for reaching out with your issue related to SparkSession.sql() and Databricks Connect.  Let's explore potential solutions.   Environment Configuration: Ensure that your environment variables and configurations are correctly set ...

  • 0 kudos
anmol_hans_de
by New Contributor
  • 3063 Views
  • 2 replies
  • 0 kudos

Resolved! Exam suspended by proctor

Hi Team,I need urgent support since I was about to submit my exam and was just reviewing the responses but proctor suspended it because i did not satisfy the proctoring conditions. Even though i was sitting in a room with clear background and well li...

  • 3063 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @anmol_hans_de,  Please let us know if you need any further assistance!  

  • 0 kudos
1 More Replies
Noosphera
by New Contributor III
  • 3170 Views
  • 2 replies
  • 1 kudos

Resolved! How to reinstantiate the Cloudformation template for AWS

Hi Everyone!I am new to Databricks, and had chosen to use the Cloudformation template to create my AWS Workspace. I regretfully must admit I felt creative in the process and varied the suggested stackname and that must have created errors which ended...

Data Engineering
AWS
Cloudformation template
Unity Catalog
  • 3170 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Noosphera, I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 1 kudos
1 More Replies
ThomasVanBilsen
by New Contributor III
  • 2957 Views
  • 2 replies
  • 1 kudos

Default Catalog Name setting doesn't work

I've recently started using Unity Catalog and I'm trying to set the default catalog name to something else than the hive_metastore for some of my workspaces.According to the documentation (Update an assignment | Metastores API | REST API reference | ...

ThomasVanBilsen_0-1691046243104.png ThomasVanBilsen_1-1691048305156.png
Data Engineering
Unity Catalog
  • 2957 Views
  • 2 replies
  • 1 kudos
Latest Reply
saldroubi
New Contributor II
  • 1 kudos

I found that setting the default catalog in the workspace "Admin Settings" works for Sql warehouse, spark cluster and compute polices. Consult this documentation : https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#view...

  • 1 kudos
1 More Replies
Labels
Top Kudoed Authors