Data Engineering

Forum Posts

Sorted by:

by Mathias_Peters • Contributor

27m ago

4 Views
0 replies
0 kudos

How to properly implement incremental batching from Kinesis Data Streams

Hi, I implemented a job that should incrementally read all the available data from a Kinesis Data Stream and terminate afterwards. I schedule the job daily. The data retention period of the data stream is 7 days, i.e., there should be enough time to ...

Data Engineering

4 Views
0 replies
0 kudos

27m ago

by DmitriyLamzin • New Contributor

01-09-2024 8:32:59 AM

2583 Views
3 replies
0 kudos

applyInPandas started to hang on the runtime 13.3 LTS ML and above

Hello, recently I've tried to upgrade my runtime env to the 13.3 LTS ML and found that it breaks my workload during applyInPandas.My job started to hang during the applyInPandas execution. Thread dump shows that it hangs on direct memory allocation: ...

Data Engineering

pandas udf

2583 Views
3 replies
0 kudos

01-09-2024 8:32:59 AM

View Replies

Latest Reply

jackson-nline
New Contributor III

yesterday

0 kudos

Having a near identical issue just materializing a dataframe with `.toPandas()` an operation that now (14.3) takes 5 minutes used to take ~30s before on 10.4.

0 kudos

yesterday

2 More Replies

by RobinK • New Contributor III

Wednesday

547 Views
10 replies
7 kudos

Databricks Jobs do not run on job compute but on shared compute

Hello,since last night none of our ETL jobs in Databricks are running anymore, although we have not made any code changes.The identical jobs (deployed with Databricks asset bundles) run on an all-purpose cluster, but fail on a job cluster. We have no...

Data Engineering

547 Views
10 replies
7 kudos

Wednesday

View Replies

Latest Reply

UniBart
Visitor

yesterday

7 kudos

Hello,We are also experiencing the same error message [NOT_COLUMN] Argument `col` should be a Column, got ColumnThis occurs when a workflow is run as a task from another workflow, but not when said workflow is run on its own, that is not triggered by...

7 kudos

yesterday

9 More Replies

by WarmCat • Visitor

yesterday

48 Views
0 replies
0 kudos

Read data shared using Delta Sharing open sharing (for recipients) FileNotFoundError

I'm using the docs here: https://docs.databricks.com/en/data-sharing/read-data-open.html#store-credsHowever I am unable to read the stored file which is sucessfully created with the following code:%scala dbutils.fs.put("dbfs:/FileStore/extraction/con...

Data Engineering

DELTA SHARING

48 Views
0 replies
0 kudos

yesterday

by bampo • New Contributor

yesterday

38 Views
0 replies
0 kudos

Streaming Reads Full Table with Liquid Clustering

Each merge/update on a table with liquid clustering force the streaming to read whole table.Databricks Runtime: 14.3 LTSBelow I prepare a simple scripts to reproduce the issue:Create schema. %sql CREATE SCHEMA IF NOT EXISTS test; Create table with si...

Data Engineering

38 Views
0 replies
0 kudos

yesterday

by thiagoawstest • New Contributor

Friday

59 Views
0 replies
0 kudos

databricks cli create job

Hi, using the Databricks cli, I exported the jobs in json format from the workspace in Azure, using the same json to create a new job, but in a workspace in AWS, the error below occurs.To create a job via Databricks cli on AWS, do you need to change ...

Data Engineering

AWS

jobs

migration

59 Views
0 replies
0 kudos

Friday

by Dicer • Valued Contributor

Friday

56 Views
0 replies
0 kudos

Why Pandas on Spark can trigger `Driver is up but is not responsive, likely due to GC` ?

I am using the distributed Pandas on Spark, not the single node Pandas.But when I try to run the following code to transform a data frame with 652 x 729803 data points df_ps_pct = df.pandas_api().pct_change().to_spark() , it returns me this error: ...

Data Engineering

56 Views
0 replies
0 kudos

Friday

by Erik_L • Contributor II

Friday

55 Views
0 replies
0 kudos

How to force delta live tables legacy execution mode?

We've been running delta live tables for some time with unity catalog and it's as slow as a sloth on a Hawaiian vacation.Anyway, DLT had three consecutive failures (due to the data source being unreliable) and then the logs printed: "MaxRetryThreshol...

Data Engineering

55 Views
0 replies
0 kudos

Friday

by skarpeck • New Contributor

Wednesday

113 Views
1 replies
0 kudos

Spark structured streaming - not working with checkpoint location set

We have structured streaming that reads from external delta table defined in following way: try: df_silver = ( spark.readStream .format("delta") .option("skipChangeCommits", True) .table(src_location) ...

Data Engineering

113 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

brockb
Contributor II

Friday

0 kudos

Hi,I see you are using `Trigger.AvailableNow`. Is this intended to be a continuous stream or an incremental batch trigger at an interval with Databricks Workflows?From the docs (https://docs.databricks.com/en/structured-streaming/triggers.html#config...

0 kudos

Friday

by Mehdi-LAMRANI • New Contributor

Friday

260 Views
1 replies
0 kudos

Upload file from local file system to DBFS (2024)

Recent changes to the worskpace UI (and introduction of Unity Catalog) seem to have discretely sunset the ability to upload data directly to DBFS from the local Filesystem using the UI (NOT the CLI)I want to be able to load a raw file (no matter the ...

Data Engineering

260 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

mhiltner
New Contributor III

Friday

0 kudos

Have you tried using Volumes? https://docs.databricks.com/en/connect/unity-catalog/volumes.html You can do it through the UI, on the Catalog Explorer > Add Data button. Also, you could double check if your workspace admin has disabled DBFS access, b...

0 kudos

Friday

by Erik • Valued Contributor II

04-03-2022 7:53:30 AM

3263 Views
8 replies
10 kudos

Resolved! How to use dbx for local development.

Databricks connect is a program which allows you to run spark code locally, but the actual execution happens on a spark cluster. Noticeably, it allows you to debug and step through the code locally in your own IDE. Quite useful. But it is now beeing...

Data Engineering

3263 Views
8 replies
10 kudos

04-03-2022 7:53:30 AM

View Replies

Latest Reply

FeliciaWilliam
Contributor

Friday

10 kudos

Thank you all for the interesting and useful information

10 kudos

Friday

7 More Replies

by TWib • New Contributor III

Friday

118 Views
0 replies
1 kudos

DatabricksSession broken for 15.1

This code fails with exception:[NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.File <command-4420517954891674>, line 7 4 spark = DatabricksSession.builder.getOrCreate() 6 df = spark.read.table("samples.nyctaxi.trips") ---->...

Data Engineering

118 Views
0 replies
1 kudos

Friday

by mjar • New Contributor

Friday

111 Views
0 replies
0 kudos

ModuleNotFoundError when using foreachBatch on runtime 14 with Unity

Recently we have run into an issue using foreachBatch after upgrading our Databricks cluster on Azure to a runtime version 14 with Spark 3.5 with Shared access mode and Unity catalogue.The issue was manifested by ModuleNotFoundError error being throw...

Data Engineering

111 Views
0 replies
0 kudos

Friday

by StephenDsouza • New Contributor

Thursday

97 Views
1 replies
0 kudos

Error during build process for serving model caused by detectron2

Hi All,Introduction: I am trying to register my model on Databricks so that I can serve it as an endpoint. The packages that I need are "torch", "mlflow", "torchvision", "numpy" and "git+https://github.com/facebookresearch/detectron2.git". For this, ...

Data Engineering

97 Views
1 replies
0 kudos

Thursday

View Replies

Latest Reply

StephenDsouza
New Contributor

Friday

0 kudos

Found an answer!Basically pip was somehow installed the dependencies from the git repo first and was not following the given order so in order to solve this, I added the libraries for conda to install.``` conda_env = { "channels": [ "defa...

0 kudos

Friday

by Chris_Konsur • New Contributor III

03-20-2023 7:24:31 AM

10459 Views
4 replies
6 kudos

Resolved! Error: The associated location ... is not empty but it's not a Delta table

I try to create a table but I get this error: AnalysisException: Cannot create table ('`spark_catalog`.`default`.`citation_all_tenants`'). The associated location ('dbfs:/user/hive/warehouse/citation_all_tenants') is not empty but it's not a Delta t...

Data Engineering

10459 Views
4 replies
6 kudos

03-20-2023 7:24:31 AM

View Replies

Latest Reply

sachin_tirth
New Contributor

Thursday

6 kudos

Hi Team, I am facing the same issue. When we try to load data to table in production batch getting error as table not in delta format. there is no recent change in table. and we are not trying any create or replace table. this is existing table in pr...

6 kudos

Thursday

3 More Replies

User

Count

1604

737

344

284

247

Databricks

Forum Posts

How to properly implement incremental batching from Kinesis Data Streams

applyInPandas started to hang on the runtime 13.3 LTS ML and above

Databricks Jobs do not run on job compute but on shared compute

Read data shared using Delta Sharing open sharing (for recipients) FileNotFoundError

Streaming Reads Full Table with Liquid Clustering

databricks cli create job

Why Pandas on Spark can trigger `Driver is up but is not responsive, likely due to GC` ?

How to force delta live tables legacy execution mode?

Spark structured streaming - not working with checkpoint location set

Upload file from local file system to DBFS (2024)

Resolved! How to use dbx for local development.

DatabricksSession broken for 15.1

ModuleNotFoundError when using foreachBatch on runtime 14 with Unity

Error during build process for serving model caused by detectron2

Resolved! Error: The associated location ... is not empty but it's not a Delta table

WorkspaceClient authentication fails when running ...

Disk cache for csv file in Databricks

No Course Materials Widget below Lesson

Can I delete specific partition from a Delta Live ...

Databricks Expectations