cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

eric2
by New Contributor II
  • 1335 Views
  • 3 replies
  • 0 kudos

Databricks Delta table Insert Data Error

When trying to insert data into the Delta table in databricks, an error occurs as shown below. [TASK_WRITE_FAILED] Task failed while writing rows to abfss://cont-01@dlsgolfzon001.dfs.core.windows.net/dir-db999_test/D_RGN_INFO_TMP.In SQL, the results ...

  • 1335 Views
  • 3 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

seems ok to me, have you tried to display the data from table A and also the B/C join?

  • 0 kudos
2 More Replies
ChaseM
by New Contributor II
  • 622 Views
  • 2 replies
  • 0 kudos

how to make distributed predictions with sklearn model?

So I have a sklearn style model which predicts on a pandas df. The data to predict on is a spark df. Simply converting the whole thing at once to pandas and predicting is not an option due to time and memory constraints.Is there a way to chunk a spar...

  • 622 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @ChaseM, You can chunk a Spark DataFrame, convert each chunk to a Pandas DataFrame, and predict each chunk in parallel using worker nodes in Databricks. 

  • 0 kudos
1 More Replies
sg-vtc
by New Contributor III
  • 1149 Views
  • 1 replies
  • 0 kudos

problem with workspace after metastore deleted

I am completely new to Databricks AWS and start working on it a week ago.  Pls excuse me if I ask or did something silly.I created a workspace and a single node cluster for testing. A metastore was created from Databricks quickstart and it was automa...

  • 1149 Views
  • 1 replies
  • 0 kudos
Latest Reply
sg-vtc
New Contributor III
  • 0 kudos

I restarted the compute node and this problem went away.ErrorClass=METASTORE_DOES_NOT_EXIST] Metastore 'b11fb1a0-a462-4dfb-b91b-e0795fde10b0' does not exist.New question: I am testing Databricks with non-AWS S3 object storage.  I can access the non-A...

  • 0 kudos
aerofish
by New Contributor II
  • 1266 Views
  • 3 replies
  • 1 kudos

drop duplicates within watermark

Recently we are using structured streaming to ingest data. We want to use watermark to drop duplicated event. But We encountered some wired behavior and unexpected exception. Anyone can help me to explain what is the expected behavior and how should ...

  • 1266 Views
  • 3 replies
  • 1 kudos
Latest Reply
aerofish
New Contributor II
  • 1 kudos

Any maintainer can help me on this question??

  • 1 kudos
2 More Replies
bigt23
by New Contributor II
  • 1721 Views
  • 2 replies
  • 1 kudos

Resolved! Read zstd file from Databricks

I just started to read `zstd` compressed file in Databricks on Azure, Runtime 14.1 on Spark 3.5.0I've set PySpark commands as followspath = f"wasbs://{container}@{storageaccount}.blob.core.windows.net/test-zstd" schema = "some schema" df = spark.read...

  • 1721 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

The available compression types are format dependent.For json, zstd is not (yet) available, whereas for parquet it is.

  • 1 kudos
1 More Replies
eimis_pacheco
by Contributor
  • 1627 Views
  • 3 replies
  • 1 kudos

Confused with databricks Tips and Tricks - Optimizations regarding partitining

Hello Community,Today I was in Tips and Tricks - Optimizations webinar and I started being confused, they said:"Don't partition tables <1TB in size and plan carefully when partitioning• Partitions should be >=1GB" Now my confusion is if this recommen...

Get Started Discussions
data engineering
performance
  • 1627 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

that is partitions on disk.Defining the correct amount of partitions is not that easy.  One would think that more partitions is better because you can process more data in parallel.And that is true if you only have to do local transformations (no shu...

  • 1 kudos
2 More Replies
Abhiqa
by New Contributor
  • 1211 Views
  • 1 replies
  • 0 kudos

How to schedule/refresh databricks alerts using REST API?

Hi, I am deploying Databricks SQL alerts using REST API. But I can't seem to figure out how to schedule their refresh task.I went through the documentation it says "Alerts can be scheduled using the sql_task type of the Jobs API, e.g. Jobs/Create"How...

Abhiqa_0-1697550139434.png Abhiqa_1-1697550638337.png
Get Started Discussions
Alerts
REST API
sql query
sql_task
  • 1211 Views
  • 1 replies
  • 0 kudos
Latest Reply
btafur
Contributor III
  • 0 kudos

What they mention in the API docs is that you can create a job with sql_task of type Alert. To make it easier you can try creating the job first in the UI first and downloading the JSON config. Here is an example with the main parameters that should ...

  • 0 kudos
naga_databricks
by Contributor
  • 3725 Views
  • 2 replies
  • 1 kudos

Shared access vs Single user access mode

I am running a notebook to get secret value from GCP Secret Manager. This is working well with Single user Access Mode, however it fail when i use a cluster with Shared Access mode. I have specified the same GCP service account on both of these clust...

  • 3725 Views
  • 2 replies
  • 1 kudos
Latest Reply
naga_databricks
Contributor
  • 1 kudos

Thanks for your response.I am using a cloud service account (same account that was used to create the workspace) on the cluster properties in case of both the single user cluster and on the shared user cluster. This service account has all the necess...

  • 1 kudos
1 More Replies
alesventus
by New Contributor III
  • 6257 Views
  • 6 replies
  • 0 kudos

Specify bottleneck for databricks cluster

Hi, Im trying to find out what is bottleneck on cluster when running loading process.Scenario: Loading CDC changes from sql server to Raw zone and merge changes into Bronze zone and then merge Bronze to Silver. All is orchestrated in data factory as ...

  • 6257 Views
  • 6 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

stdout and stderr looks okay, do you have the log4j to share? You can make a doc out of it and share the doc here. 

  • 0 kudos
5 More Replies
Ankita1
by New Contributor
  • 680 Views
  • 1 replies
  • 0 kudos

Deleting external table takes 8 hrs

Hi,I am trying to delete the data from the external partitioned table, it has around 3 years of data, and the partition is created on the date column.I am trying to delete each partition first and then the schema of the table, which takes around 8hrs...

  • 680 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Ankita1 ,  If you need to delete a large amount of data from an external partitioned table, there are a few things you can do to try to reduce the time it takes: Deleting a large amount of data from an external partitioned table can take signific...

  • 0 kudos
smehta_0908
by New Contributor II
  • 1441 Views
  • 2 replies
  • 0 kudos

Resolved! Unable to edit Catalog Owner

I created a Catalog and ownership was assigned to meI created databricks account-group on UC, added my user to this account-group, Assigned ownership of the catalog to this account-group.I deleted the account-groupNow, the catalog ownership is showin...

  • 1441 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

Hi, In addition to the previous message, you can refer to https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/index.html#assign-a-metastore-admin to get more information on metastore and etc.

  • 0 kudos
1 More Replies
Data_Analytics1
by Contributor III
  • 1095 Views
  • 3 replies
  • 0 kudos

Merge version data files of Delta table

Hi,I am having one CDC enabled Delta table. In 256th version, table is having 50 data files. I want all to merge and create a single file. How can I merge all 50 data file and when I query for 256th version, I should get 1 data file? Is there any com...

  • 1095 Views
  • 3 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

Hi, ae you talking about merging CSV files? https://community.databricks.com/t5/machine-learning/merge-12-csv-files-in-databricks/td-p/3551#:~:text=Use%20Union()%20method%20to,from%20the%20specified%20set%2Fs.

  • 0 kudos
2 More Replies
THIAM_HUATTAN
by Valued Contributor
  • 976 Views
  • 1 replies
  • 0 kudos

why the code breaks below?

from pyspark.sql import SparkSessionfrom pyspark.ml.regression import LinearRegressionfrom pyspark.ml.feature import VectorAssemblerfrom pyspark.ml.evaluation import RegressionEvaluatorfrom pyspark.ml import Pipelineimport numpy as np# Create a Spark...

  • 976 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @THIAM_HUATTAN ,  The error message "Output column features already exists" is caused by the fact that the VectorAssembler output column features already exists before the pipeline.fit() method is called. Here's what you can do to fix the issue: R...

  • 0 kudos
Isolated
by New Contributor
  • 854 Views
  • 2 replies
  • 0 kudos

Having trouble with ARC (Automated Record Connector) Python Notebook

I'm trying to use Databricks ARC (Automated Record Connector) and running into an object issue. I assume I'm missing something rather trivial that's not related to ARC. #Databricks Python notebook #CMD1 import AutoLinker from arc.autolinker import A...

  • 854 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

https://www.databricks.com/blog/improving-public-sector-decision-making-simple-automated-record-linking and https://github.com/databricks-industry-solutions/auto-data-linkage#databricks-runtime-requirements

  • 0 kudos
1 More Replies
Data_Analytics1
by Contributor III
  • 703 Views
  • 2 replies
  • 0 kudos

Delta Sharing CDF API error: "RESOURCE_LIMIT_EXCEEDED"

Hi, When attempting to read a particular version from the Databricks Delta Sharing CDF (Change Data Feed) API, even when that version contains only one data file, an error occurs due to a timeout with following message:"errorCode": "RESOURCE_LIMIT_EX...

  • 703 Views
  • 2 replies
  • 0 kudos
Latest Reply
MaxGendu
New Contributor II
  • 0 kudos

Hi Data_Analytics1Use Optimize on your delta tables. Refer https://docs.databricks.com/en/sql/language-manual/delta-optimize.html

  • 0 kudos
1 More Replies
Labels
Top Kudoed Authors