Data Engineering

Forum Posts

Sorted by:

by ThomazRossito • New Contributor III

3 weeks ago

134 Views
0 replies
0 kudos

Post: Lakehouse Federation - Databricks

Lakehouse Federation - Databricks In the world of data, innovation is constant. And the most recent revolution comes with Lakehouse Federation, a fusion between data lakes and data warehouses, taking data manipulation to a new level. This advancement...

Data Engineering

data engineer

Lakehouse

SQL Analytics

134 Views
0 replies
0 kudos

3 weeks ago

by Jorge3 • New Contributor III

3 weeks ago

240 Views
1 replies
0 kudos

Trigger a job on file update

I'm using AutoLoader to process any new file or update that arrives to my landing area. And then I schedule the job using DB workflows to trigger on file arrival. The issue is that the trigger only executes when new files arrive, not when an exiting ...

Data Engineering

240 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Ivan_Donev
New Contributor III

3 weeks ago

0 kudos

I don't think you can effectively achieve your goal. While it's theoretically somewhat possible, Databricks documentation says there is no guarantee for correctness - Auto Loader FAQ | Databricks on AWS

0 kudos

3 weeks ago

by SwapnilKamle • New Contributor

3 weeks ago

333 Views
2 replies
1 kudos

When reading a csv file with Spark.read, the data is not loading in the appropriate column while pas

I am trying to read a csv file from storage location using spark.read function. Also, i am explicitly passing the schema to the function. However, the data is not loading in proper column of the dataframe. Following are the code details:from pyspark....

Data Engineering

333 Views
2 replies
1 kudos

3 weeks ago

View Replies

Latest Reply

sai_sathya
New Contributor III

3 weeks ago

1 kudos

Hi , i would suggest to approach as suggested by Thomaz Rossito,but maybe you can give it as an try like swapping the struct field order like this followingschema = StructType([StructField('DA_RATE', DateType(), True),StructField('CURNCY_F', StringTy...

1 kudos

3 weeks ago

1 More Replies

by dvmentalmadess • Valued Contributor

02-15-2024 9:09:01 AM

1140 Views
3 replies
0 kudos

Resolved! OPTIMIZE: Exception thrown in awaitResult: / by zero

We run `OPTIMIZE` on our tables every 24 hours as follows:spark.sql(f'OPTIMIZE {catalog_name}.{schema_name}.`{table_name}`;') This morning one of our hourly jobs started failing on the call to `OPTIMIZE` with the error:org.apache.spark.SparkException...

Data Engineering

1140 Views
3 replies
0 kudos

02-15-2024 9:09:01 AM

View Replies

Latest Reply

sh
New Contributor II

3 weeks ago

0 kudos

I am getting same error. Any resolution

0 kudos

3 weeks ago

2 More Replies

by ksenija • New Contributor III

3 weeks ago

3286 Views
1 replies
1 kudos

Resolved! Cluster pools

Could you help me understand pools? How to know the difference in pricing between running clusters and running clusters with a pool? Since we're saving time to start/stop the cluster when we have a pool. And should we keep Min Idle above 0 or equal t...

Data Engineering

3286 Views
1 replies
1 kudos

3 weeks ago

View Replies

Latest Reply

Walter_C
Valued Contributor II

3 weeks ago

1 kudos

Databricks pools are a set of idle, ready-to-use instances. When a cluster is attached to a pool, cluster nodes are created using the pool’s idle instances. If the pool has no idle instances, the pool expands by allocating a new instance from the ins...

1 kudos

3 weeks ago

by data_turtle • New Contributor

3 weeks ago

120 Views
0 replies
0 kudos

Understand why your jobs' performances are changing over time

Hi Folks -We released a new metrics view for databricks jobs in Gradient, which helps track and plot the metrics below over time to help engineers understand what's going on with their jobs over time.Job cost (DBU + Cloud fees)Job RuntimeNumber of co...

Data Engineering

120 Views
0 replies
0 kudos

3 weeks ago

by drag7ter • New Contributor II

3 weeks ago

548 Views
2 replies
0 kudos

Resolved! How to enable CDF when saveAsTable from pyspark code?

I'm running this code in databricks notebook and I want the table from dataframe in catalog were created with CDF enables. When I run the code table hasn't exited yet.This code doesn't create a table with enables CDF. It doesn't add:delta.enableChang...

Data Engineering

548 Views
2 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

raphaelblg
New Contributor III

3 weeks ago

0 kudos

Hello @drag7ter ,I don't see anything wrong with your approach, check my repro:

0 kudos

3 weeks ago

1 More Replies

by nyehia • Contributor

04-20-2023 9:14:40 AM

2549 Views
9 replies
0 kudos

Can not access a sql file from Notebook

Hey,I have a repo of notebooks and SQL files, the typical way is to update/create notebooks in the repo then push it and CICD pipeline deploys the notebooks to the Shared workspace.the issue is that I can access the SQL files in the Repo but can not ...

Data Engineering

2549 Views
9 replies
0 kudos

04-20-2023 9:14:40 AM

View Replies

Latest Reply

ok_1
New Contributor II

3 weeks ago

0 kudos

0 kudos

3 weeks ago

8 More Replies

by surband • New Contributor II

3 weeks ago

914 Views
8 replies
0 kudos

Pulsar Streaming (Read) - Benchmarking Information

We are doing a first time implementation of data streaming reading from a partitioned pulsar topics to a delta table managed by UC. We are unable to scale the job beyond about ~ 40k msgs/sec. Beyond 40k msgs/sec , the job fails. I'd imagine Databric...

Data Engineering

914 Views
8 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

surband
New Contributor II

3 weeks ago

0 kudos

Attached Grafana screenshots

0 kudos

3 weeks ago

7 More Replies

by ChingizK • New Contributor II

3 weeks ago

279 Views
0 replies
0 kudos

Workflow Failure Alert Webhooks for OpsGenie

I'm trying to set up a Workflow Job Webhook notification to send an alert to OpsGenie REST API on job failure. We've set up Teams & Email successfully.We've created the Webhook and when I configure "On Failure" I can see it in the JSON/YAML view. How...

Data Engineering

jobs

opsgenie

webhooks

Workflows

279 Views
0 replies
0 kudos

3 weeks ago

by Deepikamani • New Contributor

3 weeks ago

2071 Views
1 replies
0 kudos

Exam vochure

Hii I am planning to take Databricks certified data engineer assosciate certification. where can i get the exam vochure.

Data Engineering

2071 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

TPSteve
New Contributor II

3 weeks ago

0 kudos

The Help Center provides an additional forum for this topic. You can request a voucher by submitting a help request, however, vouchers are not provided in all cases. Other ways to obtain a voucher are participation in training events held throughout ...

0 kudos

3 weeks ago

by Raja_Databricks • New Contributor III

4 weeks ago

1404 Views
5 replies
7 kudos

Resolved! Liquid Clustering With Merge

Hi there,I'm working with a large Delta table (2TB) and I'm looking for the best way to efficiently update it with new data (10GB). I'm particularly interested in using Liquid Clustering for faster queries, but I'm unsure if it supports updates effic...

Data Engineering

1404 Views
5 replies
7 kudos

4 weeks ago

View Replies

Latest Reply

youssefmrini
Honored Contributor III

3 weeks ago

7 kudos

Liquid will be a good option. Just make sure to run the optimize whenever you upsert data. Don't worry the optimize won't be expensive as it will run on only the latest data in order to cluster them and have fast queries

7 kudos

3 weeks ago

4 More Replies

by cszczotka • New Contributor II

3 weeks ago

341 Views
3 replies
0 kudos

Not able to create table shallow clone on DBR 15.0

Hi,I'm getting below error when I'm trying to create table shallow clone on my DBR 15.0.[CANNOT_SHALLOW_CLONE_NON_UC_MANAGED_TABLE_AS_SOURCE_OR_TARGET] Shallow clone is only supported for the MANAGED table type. The table xxx_clone is not MANAGED tab...

Data Engineering

341 Views
3 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

cszczotka
New Contributor II

3 weeks ago

0 kudos

Hi,Source table is external table in UC and result table should be also external. I'm running such command CREATE TABLE target_catalog.target_schema.table_clone SHALLOW CLONE source_catalog.source_schema.source_table but this for some reason doesn't...

0 kudos

3 weeks ago

2 More Replies

by hanspetter • New Contributor III

08-02-2017 12:26:46 AM

36662 Views
19 replies
4 kudos

Resolved! Is it possible to get Job Run ID of notebook run by dbutils.notbook.run?

When running a notebook using dbutils.notebook.run from a master-notebook, an url to that running notebook is printed, i.e.: Notebook job #223150 Notebook job #223151 Are there any ways to capture that Job Run ID (#223150 or #223151)? We have 50 or ...

Data Engineering

36662 Views
19 replies
4 kudos

08-02-2017 12:26:46 AM

View Replies

Latest Reply

Rodrigo_Mohr
New Contributor II

3 weeks ago

4 kudos

I know this is an old thread, but sharing what is working for me well in Python now, for retrieving the run_id as well and building the entire link to that job run:job_id = dbutils.notebook.entry_point.getDbutils().notebook().getContext().jobId().get...

4 kudos

3 weeks ago

18 More Replies

by Sandesh87 • New Contributor III

05-27-2022 12:39:10 PM

2483 Views
3 replies
2 kudos

Task not serializable: java.io.NotSerializableException: org.apache.spark.sql.streaming.DataStreamWriter

I have a getS3Object function to get (json) objects located in aws s3 object client_connect extends Serializable { val s3_get_path = "/dbfs/mnt/s3response" def getS3Objects(s3ObjectName: String, s3Client: AmazonS3): String = { val...

Data Engineering

2483 Views
3 replies
2 kudos

05-27-2022 12:39:10 PM

View Replies

Latest Reply

Anonymous
Not applicable

07-28-2022 10:13:48 AM

2 kudos

Hey there @Sandesh Puligundla Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear f...

2 kudos

07-28-2022 10:13:48 AM

2 More Replies

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Post: Lakehouse Federation - Databricks

Trigger a job on file update

When reading a csv file with Spark.read, the data is not loading in the appropriate column while pas

Resolved! OPTIMIZE: Exception thrown in awaitResult: / by zero

Resolved! Cluster pools

Understand why your jobs' performances are changing over time

Resolved! How to enable CDF when saveAsTable from pyspark code?

Can not access a sql file from Notebook

Pulsar Streaming (Read) - Benchmarking Information

Workflow Failure Alert Webhooks for OpsGenie

Exam vochure

Resolved! Liquid Clustering With Merge

Not able to create table shallow clone on DBR 15.0

Resolved! Is it possible to get Job Run ID of notebook run by dbutils.notbook.run?

Task not serializable: java.io.NotSerializableException: org.apache.spark.sql.streaming.DataStreamWriter

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...