Hello,I am running a job that depends on the information provided in column storage_sub_directory in system.information_schema.tables .... and it worked until 1-2 weeks ago.Now I discovered in the doc that this column is deprecated and always null , ...
In a pyspark application, I am using set of python libraries. In order to handle python dependencies while running pyspark application, I am using the approach provided by spark : Create archive file of Python virtual environment using required set o...
Hi,
I have not tried it but based on the doc you have to go by this approach. ./environment/bin/pythonmust be replaced with the correct path.
import os
from pyspark.sql import SparkSession
os.environ['PYSPARK_PYTHON'] = "./environment/bin/python"
sp...
I am trying to reading json from aws s3 using with open in databricks notebook using shared cluster.Error message:No such file or directory:'/dbfs/mnt/datalake/input_json_schema.json'In single instance cluster the above error is not found.
Hi @Nagarathna ,
I just tried it on a shared cluster and did not face any issue. What is the exact error that you are facing? Complete stacktrace might help. Just to confirm are you accessing the "/dbfs/mnt/datalake/input.json" from the same workspac...
Hello:)as part of deploying an app that previously ran directly on emr to databricks, we are running experiments using LTS 9.1, and getting the following error: PythonException: An exception was thrown from a UDF: 'pyspark.serializers.SerializationEr...
Hi @liormayn ,
Are you still facing the issue? This was faced in mid March and issue was fixed. It can happen for some pip install when the libraries are in Workspace. But if you are still facing the issue, I would suggest you to create a support ti...
Hi All,we are executing databricks notebook activity inside the child pipeline thru ADF. we are getting child pipeline name in job name while executing databricks job. Is it possible to get master pipeline name as job name or customize job name thr...
I think we should raise a Request/Product Feedback.
Not sure if it would be Databricks that would own it or Microsoft but you may submit feedback for Databricks here - https://docs.databricks.com/en/resources/ideas.html
I have encountered a technical issue on Databricks.While executing commands both in Spark and SQL within the Databricks environment, I’ve run into permission-related errors from selecting files from DBFS. "org.apache.spark.SparkSecurityException: [IN...
Hi @MOUNIKASIMHADRI ,
Workspace admins get ANY FILE granted by default. They can explicitly grant it to non-admin users.
Hence as suggested in the kb,
GRANT SELECT ON ANY FILE TO `<user@domain-name>`
How do I impersonate a user? I can't find any documentation that explains how to do this or even hint that it's possible.Use case: I perform administrative tasks like assign grants and roles to catalogs, schemas, and tables for the benefit of busines...
Hidbx_687_3__1b3Q,
Actually, I have seen impersonation, is this something that you are looking for? https://docs.gcp.databricks.com/en/dev-tools/google-id-auth.html#step-5-impersonate-the-google-cloud-service-account
After running a sql script, when downloading the results to a csv file, the file includes a null string for blank cells (see screenshot). Is ther a setting I can change to simply get empty cells instead?
Hi AlexG,
I tested with the table content containing null and with empty data and it works as expected in the download option too.
Here is an eg:
CREATE TABLE my_table_null_test1 (
id INT,
name STRING
);
INSERT INTO my_table_null_test1 (id, name)...
Hi,I am getting FilereadException Error while reading JSON file using REST API Connector.It comes when data is huge in Json File and it's not able to handle more than 1 Lac records.Error details:org.apache.spark.SparkException: Job aborted due to sta...
Hello @DataBricks_Use1 ,
It would great if you could add the entire stack trace, as Jose mentioned. But there should be a "Caused by:" section below which would give you an idea of what's the reason for this failure and then you can work on that.
fo...
We have to generate over 70 intermediate tables. Should we use temporary tables or dataframes, or should we create delta tables and truncate and reload? Having too many temporary tables could lead to memory problems. In this situation, what is the mo...
Hi Phani1,
It would be a use case specific answer, so if it is possible I would suggest to work with the Solution Architect on this or share some more insights for a better guidance.
When I say that, I just would want to understand would we really ne...
Hi,I have cloned a public git repo into my Databricks account. It's a repo associated with an online training course. I'd like to work through the notebooks, maybe make some changes and updates, etc., but I'd also like to keep a clean copy of it. M...
Hi DavidKxx,
You can clone public remote repositories without Git credentials (a personal access token and a username). To modify a public remote repository or to clone or modify a private remote repository, you must have a Git provider username and...
Hi all, I am using Databricks and created a notebook and would like to run in Dashboard. It works correctly. I share the Dashboard with another user UserA with "Can Run" permission When I login as a UserA and login and accesses Dashboard then does a...
Hi @Koa, You’ve encountered a security concern related to Databricks and handling JWT tokens within notebooks.
Dashboard State Persistence:
When you share a dashboard with another user (in this case, UserA), any updates made by that user will re...
I'm seeking advice regarding Databricks bundles. In my scenario, I have multiple production environments where I aim to execute the same DLT. To simplify, let's assume the DLT reads data from 'eventhub-region-name,' with this being the only differing...
We have an integration flow where we want to expose databricks data for querying through odata(webapp). For this piecedatabricks sql API <- Delta tables :2 questions here:1. can you share link/documentation on how we can integrate databricks <-delta ...
Hi @Ruby8376 - can you please review the similar posts where the resolution is provided
https://community.databricks.com/t5/warehousing-analytics/databricks-sql-restful-api-to-query-delta-table/td-p/8617
https://www.databricks.com/blog/2023/03/07/da...
hi,I cannot install geopandas in my notebook, ive tried all different forms of generic fix, pip installs etc but always get this error:CalledProcessError: Command 'pip --disable-pip-version-check install geopandas' returned non-zero exit status 1.---...
Hi @vbvasa,
The error message indicates that a GDAL API version must be specified. You can address this by providing a path to gdal-config using a GDAL_CONFIG environment variable or by using a GDAL_VERSION environment variable1.To set the GDAL_CONF...