Hi!I use Databricks in Azure and I find it inconvenient not knowing the last modified user and modified time. How can I trace the history of modified time and user details? Would it be possible to deploy the workflows into higher environments?Thanks!
Sorry! I added another issue at the end without mentioning it was a new issue I encountered. I had challenges in changing the owner of a workflow when I created a workflow. I ended up seeking help from another user with admin privileges to change the...
It's more a spark question then a databricks question, I'm encountering an issue when writing data to an Oracle database using Apache Spark. My workflow involves removing duplicate rows from a DataFrame and then writing the deduplicated DataFrame to ...
Hi @UtkarshTrehan , To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This will also help other community members who may have similar ...
I saved a file with results by just opening a file via fopen("filename.csv", "a").Once the execution ended (and the cluster shutted down) I couldn't retrieve the file.I found that the file was stored in "/databricks/driver", and that folder empties w...
Hi @carlosna , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution.
Hi everyone,I have a concern that is there any way to read stream from 2 different kafka topics with 2 different in 1 jobs or same cluster? or we need to create 2 separate jobs for it ? (Job will need to process continually)
Hi @Joe1912 ,
It's certainly reasonable to run a number # of concurrent streams per driver node.
Each .start() consumes a certain amount of driver resources in spark. Your limiting factor will be the load on the driver node and its available resour...
I have a merge function for streaming foreachBatch kind ofmergedf(df,i): merge_func_1(df,i) merge_func_2(df,i)Then I want to add new merge_func_3 into it. Is there any best practices for this case? when streaming always runs, how can I process...
Hi @Joe1912 , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution.
Hello, is it possible to just update parameter values in different workspaces?YAML source code taken from workflow jobs always create a new job. I'd like to just change/update parameter values when I deploy bundle to different workspaces/environments...
Hi @DavidStarosta , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution.
Hi,I am trying to access excel file that is stored in Azure Blob storage via Databricks.In my understanding, it is not possible to access using Pyspark. So accessing through Pandas is the option,Here is my code.%pip install openpyxlimport pandas as p...
Hi @JohnJustus ,
To resolve the FileNotFoundError when reading from Azure Blob Storage in Databricks, you need to use the "wasbs" protocol for the file path reference instead of the local file system path. Here's a summary of the steps to address th...
Hi I am trying to readstream from 2 sources and join them into a target table. How can I do this in pyspark? Egt1 + t2 as my bronze table. I want to readstream from t1 and t2, and merge the changes into t3 (silver table)
Hi @dbuser1234 , Certainly! To read stream data from two sources, join them, and merge the changes into a target table in PySpark, you can follow these steps:
Read Stream Data from Sources (t1 and t2):
Use spark.readStream to read data from both t1...
I'm able to connect to MongoDB using org.mongodb.spark:mongo-spark-connector_2.12:3.0.2 and this code:df = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("uri", jdbcUrl)It works well, but if I install last MongoDB Spark Connector ve...
Hi @Abel_Martinez, I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution.
Hello, I am having issues saving a spark dataframe generated in a databricks notebook to an s3 bucket. The dataframe contains approximately 1.1M rows and 5 columns. The error is as follows: org.apache.spark.SparkException: Job aborted due to stage fa...
Hi @7cb15, I understand you’re encountering issues while saving a Spark DataFrame to an S3 bucket.
Let’s troubleshoot this together!
Here are some steps and recommendations to address the problem:
Check S3 Permissions:
Ensure that the IAM role or us...
Hi , We have a spark job write data in delta table for last 90 date partition. We have enabled spark.databricks.delta.autoCompact.enabled and delta.autoOptimize.optimizeWrite. Job takes 50 mins to complete. In that logic takes 12 mins and optimizewri...
Hi @svrdragon, It’s great you’re using Delta Lake features to optimize your Spark job.
Let’s explore some strategies to reduce the total job time potentially:
Optimize Write:
You’ve already enabled delta.autoOptimize.optimizeWrite, which is a go...
Dear Community,I am testing pyspark code via pytest using VS code and Databricks Connect.SparkSession is initiated from Databricks Connect: from databricks.connect import DatabricksSessionspark = DatabricksSession.builder.getOrCreate()I am receiving...
Hi @Rafal9, Thank you for reaching out with your issue related to SparkSession.sql() and Databricks Connect.
Let's explore potential solutions.
Environment Configuration:
Ensure that your environment variables and configurations are correctly set ...
Hi Team,I need urgent support since I was about to submit my exam and was just reviewing the responses but proctor suspended it because i did not satisfy the proctoring conditions. Even though i was sitting in a room with clear background and well li...
Hi Everyone!I am new to Databricks, and had chosen to use the Cloudformation template to create my AWS Workspace. I regretfully must admit I felt creative in the process and varied the suggested stackname and that must have created errors which ended...
Hi @Noosphera, I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution.
I've recently started using Unity Catalog and I'm trying to set the default catalog name to something else than the hive_metastore for some of my workspaces.According to the documentation (Update an assignment | Metastores API | REST API reference | ...
I found that setting the default catalog in the workspace "Admin Settings" works for Sql warehouse, spark cluster and compute polices. Consult this documentation : https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#view...