Data Engineering

Forum Posts

Sorted by:

by Rene • New Contributor

Tuesday

123 Views
2 replies
1 kudos

Can we build IOT data trading platform by using Databricks?

I have an idea of sharing & trading IoT data streamlined from many data sources on the incentive platform.I would be appreciate it if you guys discuss with me about the idea.Thank you

Data Engineering

123 Views
2 replies
1 kudos

Tuesday

View Replies

Latest Reply

betty4920taylor
New Contributor

Tuesday

1 kudos

Hello @Rene,Building an IoT data trading platform using Databricks is indeed a feasible and innovative idea. Databricks provides a unified analytics platform that can handle massive amounts of data processing and advanced analytics, which is essentia...

1 kudos

Tuesday

1 More Replies

by Fresher • New Contributor

yesterday

39 Views
0 replies
0 kudos

Query is taking too long to run

I have two clusters. Cluster A(spark cluster) and cluster B(SQL warehouse). whenever I try to run a particular query using cluster B, it works fine but whenever I try to run same query using cluster A. It's taking time and never show the output

Data Engineering

39 Views
0 replies
0 kudos

yesterday

by stevenayers-bge • New Contributor II

yesterday

41 Views
0 replies
0 kudos

Autoloader: Read old version of file. Read modification time is X, latest modification time is X

I'm recieving this error from autoloader. It seems to be stuck on this one file. I don't care when it was read and last modified, I just want to ingest it. Any ideas?java.io.IOException: Read old version of file s3a://<file-path>.json. Read modificat...

Data Engineering

41 Views
0 replies
0 kudos

yesterday

by stevenayers-bge • New Contributor II

Thursday

61 Views
1 replies
1 kudos

Bug with enabling UniForm Data Format?

In the documentation for enabling iceberg compatibility on delta tables, it states that the minReaderVersion for IcebergCompatV1 and IcebergCompatV2 is 2 (https://docs.databricks.com/en/delta/uniform.html#requirements).However, when you run the REORG...

Data Engineering

61 Views
1 replies
1 kudos

Thursday

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

Thursday

1 kudos

@stevenayers-bge I've just checked source code of delta and you're right - documentation states that tat minReaderVersion should be >=2, but source code is upgrading it to 3https://github.com/delta-io/delta/blob/78970abd96dfc0278e21c04cda442bb05ccde4...

1 kudos

Thursday

by angel_ba • New Contributor II

Thursday

60 Views
1 replies
0 kudos

unity catalog system.access.audit lag

Hello,We have unity catalog enabled workspace. To get the completion time of a pipeline that runs multiple times a day, I am checking system.access.audit table. Comparing the completion time of the pipeline compared to other pipeline time I am creat...

Data Engineering

60 Views
1 replies
0 kudos

Thursday

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

Thursday

0 kudos

@angel_ba System tables are still in public preview thus there are some limitations, one of them is a blocker for your use case.Currently no support for real-time monitoring. Data is updated throughout the day. If you don’t see a log for a recent eve...

0 kudos

Thursday

by Hubert-Dudek • Esteemed Contributor III

Thursday

79 Views
0 replies
1 kudos

How much USD are you spending on Databricks?

Join two system tables and get exactly how much USD you are spending.The short version of the query: SELECT u.usage_date, u.sku_name, SUM(u.usage_quantity * p.pricing.default) AS total_spent, p.currency_code FROM system.billing....

Data Engineering

79 Views
0 replies
1 kudos

Thursday

by John_Rotenstein • New Contributor II

09-14-2023 1:44:25 AM

3476 Views
3 replies
2 kudos

Retrieve job-level parameters in Python

Parameters can be passed to Tasks and the values can be retrieved with:dbutils.widgets.get("parameter_name")More recently, we have been given the ability to add parameters to Jobs.However, the parameters cannot be retrieved like Task parameters.Quest...

Data Engineering

3476 Views
3 replies
2 kudos

09-14-2023 1:44:25 AM

View Replies

Latest Reply

cbern
New Contributor

Thursday

2 kudos

@Kaniz This method works for Task parameters. Is there a way to access Job parameters that apply to the entire workflow, set under a heading like this in the UI:I am able to read Job parameters in a different way from Task parameters using dynamic v...

2 kudos

Thursday

2 More Replies

by sasi2 • New Contributor II

Thursday

200 Views
0 replies
0 kudos

Connecting to MuleSoft from Databricks

Hi, Is there any connectivity pipeline established already to access MuleSoft or AnyPoint exchange data using Databricks. I have seen many options to access databricks data in mulesoft but can we read the data from Mulesoft into databricks. Please gi...

Data Engineering

200 Views
0 replies
0 kudos

Thursday

by jenshumrich • New Contributor III

2 weeks ago

284 Views
2 replies
0 kudos

Filter not using partition

I have the following code:spark.sparkContext.setCheckpointDir("dbfs:/mnt/lifestrategy-blob/checkpoints") result_df.repartitionByRange(200, "IdStation") result_df_checked = result_df.checkpoint(eager=True) unique_stations = result_df.select("IdStation...

Data Engineering

284 Views
2 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

jenshumrich
New Contributor III

Thursday

0 kudos

Thanks a lot for your response. It seems the Filter is not pushed down, no? station_df.explain() == Physical Plan == *(1) Filter (isnotnull(IdStation#2678) AND (IdStation#2678 = 1119844)) +- *(1) Scan ExistingRDD[Date#2718,WindSpeed#2675,Tower_Accele...

0 kudos

Thursday

1 More Replies

by israelst • New Contributor II

01-15-2024 12:53:49 AM

312 Views
2 replies
0 kudos

DLT can't authenticate with kinesis using instance profile

When running my notebook using personal compute with instance profile I am indeed able to readStream from kinesis. But adding it as a DLT with UC, while specifying the same instance-profile in the DLT pipeline setting - causes a "MissingAuthenticatio...

Data Engineering

Delta Live Tables

Unity Catalog

312 Views
2 replies
0 kudos

01-15-2024 12:53:49 AM

View Replies

Latest Reply

Mathias_Peters
New Contributor II

Thursday

0 kudos

Hi, were you able to solve this problem? If so, what was the solution?

0 kudos

Thursday

1 More Replies

by nikhilkumawat • New Contributor III

04-27-2023 6:37:46 AM

4925 Views
6 replies
3 kudos

Resolved! Get file information while using "Trigger jobs when new files arrive" https://docs.databricks.com/workflows/jobs/file-arrival-triggers.html

I am currently trying to use this feature of "Trigger jobs when new file arrive" in one of my project. I have an s3 bucket in which files are arriving on random days. So I created a job to and set the trigger to "file arrival" type. And within the no...

Data Engineering

4925 Views
6 replies
3 kudos

04-27-2023 6:37:46 AM

View Replies

Latest Reply

adriennn
Contributor

Thursday

3 kudos

Looks like a major oversight not to be able to get the information on what file(s) have triggered the job. Anyway, the above explanations given by Anon read like the replies of ChatGPT, especially the scenario where a dataframe is passed to a trigger...

3 kudos

Thursday

5 More Replies

by BerkerKozan • New Contributor III

Thursday

50 Views
0 replies
0 kudos

Using AAD Spn on AWS Databricks

I use AWS Databricks which has an SSO&Scim integration with AAD. I generated an SPN in AAD, synced it to Databricks, and want to use this SPN with using AAD client secrets to use Databricks SDK. But it doesnt work. I dont want to generate another tok...

Data Engineering

50 Views
0 replies
0 kudos

Thursday

by Oliver_Angelil • Valued Contributor II

Thursday

63 Views
0 replies
0 kudos

Append-only table from non-streaming source in Delta Live Tables

I have a DLT pipeline, where all tables are non-streaming (materialized views), except for the last one, which needs to be append-only, and is therefore defined as a streaming table.The pipeline runs successfully on the first run. However on the seco...

Data Engineering

63 Views
0 replies
0 kudos

Thursday

by Anske • New Contributor II

Thursday

55 Views
0 replies
0 kudos

DLT apply_changes applies only deletes and inserts not updates

Hi,I have a DLT pipeline that applies changes from a source table (cdctest_cdc_enriched) to a target table (cdctest), by the following code:dlt.apply_changes( target = "cdctest", source = "cdctest_cdc_enriched", keys = ["ID"], sequence_by...

Data Engineering

Delta Live Tables

55 Views
0 replies
0 kudos

Thursday

by zahra_Khedri • New Contributor

Thursday

95 Views
1 replies
0 kudos

An error occurred when loading Jobs and Workflows App.

Hi,I was trying to open the Workflows but there is an error "An error occurred when loading Jobs and Workflows App." we need help to know why it happened and how we can resolve it please.

Data Engineering

95 Views
1 replies
0 kudos

Thursday

View Replies

Latest Reply

GeoPer
New Contributor

Thursday

0 kudos

Same...and the weirdest is that all of the services looks healthy in https://status.databricks.com/Region: eu-central-1Provider: AWSCould anyone provide some info here?

0 kudos

Thursday

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Can we build IOT data trading platform by using Databricks?

Query is taking too long to run

Autoloader: Read old version of file. Read modification time is X, latest modification time is X

Bug with enabling UniForm Data Format?

unity catalog system.access.audit lag

How much USD are you spending on Databricks?

Retrieve job-level parameters in Python

Connecting to MuleSoft from Databricks

Filter not using partition

DLT can't authenticate with kinesis using instance profile

Resolved! Get file information while using "Trigger jobs when new files arrive" https://docs.databricks.com/workflows/jobs/file-arrival-triggers.html

Using AAD Spn on AWS Databricks

Append-only table from non-streaming source in Delta Live Tables

DLT apply_changes applies only deletes and inserts not updates

An error occurred when loading Jobs and Workflows App.

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...