- 50 Views
- 0 replies
- 0 kudos
Hi, I implemented a job that should incrementally read all the available data from a Kinesis Data Stream and terminate afterwards. I schedule the job daily. The data retention period of the data stream is 7 days, i.e., there should be enough time to ...
- 50 Views
- 0 replies
- 0 kudos
- 2604 Views
- 3 replies
- 0 kudos
Hello, recently I've tried to upgrade my runtime env to the 13.3 LTS ML and found that it breaks my workload during applyInPandas.My job started to hang during the applyInPandas execution. Thread dump shows that it hangs on direct memory allocation: ...
- 2604 Views
- 3 replies
- 0 kudos
Latest Reply
Having a near identical issue just materializing a dataframe with `.toPandas()` an operation that now (14.3) takes 5 minutes used to take ~30s before on 10.4.
2 More Replies
by
RobinK
• New Contributor III
- 586 Views
- 10 replies
- 7 kudos
Hello,since last night none of our ETL jobs in Databricks are running anymore, although we have not made any code changes.The identical jobs (deployed with Databricks asset bundles) run on an all-purpose cluster, but fail on a job cluster. We have no...
- 586 Views
- 10 replies
- 7 kudos
Latest Reply
Hello,We are also experiencing the same error message [NOT_COLUMN] Argument `col` should be a Column, got ColumnThis occurs when a workflow is run as a task from another workflow, but not when said workflow is run on its own, that is not triggered by...
9 More Replies
- 73 Views
- 0 replies
- 0 kudos
I'm using the docs here: https://docs.databricks.com/en/data-sharing/read-data-open.html#store-credsHowever I am unable to read the stored file which is sucessfully created with the following code:%scala
dbutils.fs.put("dbfs:/FileStore/extraction/con...
- 73 Views
- 0 replies
- 0 kudos
by
bampo
• New Contributor
- 60 Views
- 0 replies
- 0 kudos
Each merge/update on a table with liquid clustering force the streaming to read whole table.Databricks Runtime: 14.3 LTSBelow I prepare a simple scripts to reproduce the issue:Create schema. %sql
CREATE SCHEMA IF NOT EXISTS test; Create table with si...
- 60 Views
- 0 replies
- 0 kudos
- 80 Views
- 0 replies
- 0 kudos
Hi, using the Databricks cli, I exported the jobs in json format from the workspace in Azure, using the same json to create a new job, but in a workspace in AWS, the error below occurs.To create a job via Databricks cli on AWS, do you need to change ...
- 80 Views
- 0 replies
- 0 kudos
by
Dicer
• Valued Contributor
- 59 Views
- 0 replies
- 0 kudos
I am using the distributed Pandas on Spark, not the single node Pandas.But when I try to run the following code to transform a data frame with 652 x 729803 data points df_ps_pct = df.pandas_api().pct_change().to_spark() , it returns me this error: ...
- 59 Views
- 0 replies
- 0 kudos
- 80 Views
- 0 replies
- 0 kudos
We've been running delta live tables for some time with unity catalog and it's as slow as a sloth on a Hawaiian vacation.Anyway, DLT had three consecutive failures (due to the data source being unreliable) and then the logs printed: "MaxRetryThreshol...
- 80 Views
- 0 replies
- 0 kudos
- 116 Views
- 1 replies
- 0 kudos
We have structured streaming that reads from external delta table defined in following way: try:
df_silver = (
spark.readStream
.format("delta")
.option("skipChangeCommits", True)
.table(src_location)
...
- 116 Views
- 1 replies
- 0 kudos
Latest Reply
Hi,I see you are using `Trigger.AvailableNow`. Is this intended to be a continuous stream or an incremental batch trigger at an interval with Databricks Workflows?From the docs (https://docs.databricks.com/en/structured-streaming/triggers.html#config...
- 269 Views
- 1 replies
- 0 kudos
Recent changes to the worskpace UI (and introduction of Unity Catalog) seem to have discretely sunset the ability to upload data directly to DBFS from the local Filesystem using the UI (NOT the CLI)I want to be able to load a raw file (no matter the ...
- 269 Views
- 1 replies
- 0 kudos
Latest Reply
Have you tried using Volumes? https://docs.databricks.com/en/connect/unity-catalog/volumes.html You can do it through the UI, on the Catalog Explorer > Add Data button. Also, you could double check if your workspace admin has disabled DBFS access, b...
by
Erik
• Valued Contributor II
- 3334 Views
- 8 replies
- 10 kudos
​Databricks connect is a program which allows you to run spark code locally, but the actual execution happens on a spark cluster. Noticeably, it allows you to debug and step through the code locally in your own IDE. Quite useful. But it is now beeing...
- 3334 Views
- 8 replies
- 10 kudos
Latest Reply
Thank you all for the interesting and useful information
7 More Replies
by
TWib
• New Contributor III
- 138 Views
- 0 replies
- 1 kudos
This code fails with exception:[NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.File <command-4420517954891674>, line 7 4 spark = DatabricksSession.builder.getOrCreate() 6 df = spark.read.table("samples.nyctaxi.trips") ---->...
- 138 Views
- 0 replies
- 1 kudos
by
mjar
• New Contributor
- 161 Views
- 0 replies
- 0 kudos
Recently we have run into an issue using foreachBatch after upgrading our Databricks cluster on Azure to a runtime version 14 with Spark 3.5 with Shared access mode and Unity catalogue.The issue was manifested by ModuleNotFoundError error being throw...
- 161 Views
- 0 replies
- 0 kudos
- 119 Views
- 1 replies
- 0 kudos
Hi All,Introduction: I am trying to register my model on Databricks so that I can serve it as an endpoint. The packages that I need are "torch", "mlflow", "torchvision", "numpy" and "git+https://github.com/facebookresearch/detectron2.git". For this, ...
- 119 Views
- 1 replies
- 0 kudos
Latest Reply
Found an answer!Basically pip was somehow installed the dependencies from the git repo first and was not following the given order so in order to solve this, I added the libraries for conda to install.```
conda_env = {
"channels": [
"defa...