- 258 Views
- 1 replies
- 0 kudos
Hi Databricks team,I am trying to understand internals of spark coalesce code(DefaultPartitionCoalescer) and going through spark code for this. While I understood coalesce function but I am not sure about complete flow of code like where its get call...
- 258 Views
- 1 replies
- 0 kudos
Latest Reply
Hello @subham0611 ,
The coalesce operation triggered from user code can be initiated from either an RDD or a Dataset, with each having distinct codepaths:
RDD: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD...
- 341 Views
- 5 replies
- 1 kudos
Hi,I have cloned a public git repo into my Databricks account. It's a repo associated with an online training course. I'd like to work through the notebooks, maybe make some changes and updates, etc., but I'd also like to keep a clean copy of it. M...
- 341 Views
- 5 replies
- 1 kudos
Latest Reply
I get your issue, @DavidKxx. Until we do a git push on command line we do not see the Authentication failed
git push origin test
While in the Databricks UI, we fail early(screenshots below). We require the Databricks GitHub App as mentioned here to p...
4 More Replies
- 23 Views
- 0 replies
- 0 kudos
I need to access the following system tables to generate a DBU consumption report, but I am not seeing this table in the system schema. Could you please help me access it?system.billing.inventory, system.billing.workspaces, system.billing.job_usage, ...
- 23 Views
- 0 replies
- 0 kudos
- 1095 Views
- 6 replies
- 0 kudos
I am looking for a way to log my `pyspark.ml.regression.LinearRegression` model with input and signature ata. The usual example that I found around are using sklearn and they can simply do # Log the model with signature and input example
signature =...
- 1095 Views
- 6 replies
- 0 kudos
Latest Reply
@Abi105 I wasn't able to make it work, sorry
5 More Replies
- 123 Views
- 1 replies
- 0 kudos
We have been using runtime 14.2, share mode for our computing cluster in Databrick for quite some time. We are now trying to upgrade to python 3.11 for some dependencies mangement, thereby requiring us to use runtime 15.1/15.2 as runtime 14.2 only ...
- 123 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @Neeraj_Kumar,
Ensure that the necessary libraries are available in the repository used for installation.Verify that the library versions specified are correct and available.Consider installing the library with a different version or from a diffe...
- 83 Views
- 2 replies
- 0 kudos
import pandas as pd
from pyspark.sql.types import StringType, IntegerType
from pyspark.sql.functions import col
save_path = os.path.join(base_path, stg_dir, "testCsvEncoding")
d = [{"code": "00034321"}, {"code": "55964445226"}]
df = pd.Data...
- 83 Views
- 2 replies
- 0 kudos
Latest Reply
@georgeyjy Try opening the CSV as text editor. I bet that Excel is automatically trying to detect the schema of CSV thus it thinks that it's an integer.
1 More Replies
- 57 Views
- 2 replies
- 0 kudos
Reading file like this "Data = spark.sql("SELECT * FROM edge.inv.rm") Getting this error org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 441.0 failed 4 times, most recent failure: Lost task 10.3 in stage 441.0 (TID...
- 57 Views
- 2 replies
- 0 kudos
Latest Reply
Hi @Madhawa,
Ensure that the AWS credentials (access key and secret key) are correctly configured in your Spark application. You can set them using spark.conf.set("spark.hadoop.fs.s3a.access.key", "your_access_key") and spark.conf.set("spark.hadoop....
1 More Replies
- 131 Views
- 1 replies
- 0 kudos
I am trying to install a wheel file which is in my volume to a serverless cluster, getting the below error@ken@Kaniz Note: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.
WARNING: R...
- 131 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @Shravanshibu,
Verify that the wheel file is actually present at the specified location. Double-check the path to ensure there are no typos or missing directories.Remember that Databricks mounts DBFS (Databricks File System) at /dbfs on cluster no...
- 119 Views
- 1 replies
- 0 kudos
I am relatively new to Databricks, and from my recent experience it appears that every step in a DLT Pipeline, we define each LIVE TABLES (be it streaming or not) to pull data upstream.I have yet to see an implementation where data from upstream woul...
- 119 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @_databreaks, You’re absolutely right!
While the typical approach in Databricks involves pulling data from upstream sources into downstream tables, there are scenarios where a push-based architecture could be beneficial.
Pull-Based Architectu...
- 294 Views
- 1 replies
- 0 kudos
Hi all.I have a huge data migration project using medallion architecture, UC, notebooks and workflows . One of the relevant requirements we have is to capture all data dependencies (upstreams and downstreams) using data lineage. I've followed all re...
- 294 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @RobsonNLPT,
Consider checking the documentation for any updates or upcoming features related to capturing CTEs as upstreams in your chosen solution.
- 62 Views
- 1 replies
- 0 kudos
I am currently working on a similarity search use case where we need to extract text from PDF files and create a vector index. We have stored our PDF files in a Unity Catalog Volume, and I can successfully read these files from the driver node.Here's...
- 62 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @devendra_tomar,
Unity Catalog volumes represent logical storage volumes in a cloud object storage location. They allow governance over non-tabular datasets, providing capabilities for accessing, storing, and organizing files.While tables govern ...
- 38 Views
- 0 replies
- 0 kudos
Assessment(Assessment job need to be deployed using Terraform)1.Install latest version of UCX 2.UCX will add the assessment job and queries to the workspace3.Run the assessment using ClusterHow to write code for this by using Terraform. Can anyone he...
- 38 Views
- 0 replies
- 0 kudos
- 582 Views
- 3 replies
- 0 kudos
- 582 Views
- 3 replies
- 0 kudos
Latest Reply
I was able to generate the workspace level token using the databricks cli.I set the following details in the databricks cli profile(.databrickscfg) file: host = https://myworksapce.azuredatabricks.net/ account_id = (my db account id)client_id = ...
2 More Replies
- 436 Views
- 1 replies
- 0 kudos
Hi Community Members,I have been using Databricks for a while, but I have only used Workflow. I have a question about the differences between Delta Live Table and Workflow. Which one should we use in which scenario?Thanks,
- 436 Views
- 1 replies
- 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
Hi Community Members,I have been using Databricks for a while, but I have only used Workflow. I have a question about the differences between Delta Live Table and Workflow. Which one should we use in which scenario?Thanks,
This widget could not be displayed.
This widget could not be displayed.