cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ThomazRossito
by New Contributor III
  • 134 Views
  • 0 replies
  • 0 kudos

Post: Lakehouse Federation - Databricks

Lakehouse Federation - Databricks In the world of data, innovation is constant. And the most recent revolution comes with Lakehouse Federation, a fusion between data lakes and data warehouses, taking data manipulation to a new level. This advancement...

Data Engineering
data engineer
Lakehouse
SQL Analytics
  • 134 Views
  • 0 replies
  • 0 kudos
Jorge3
by New Contributor III
  • 240 Views
  • 1 replies
  • 0 kudos

Trigger a job on file update

I'm using AutoLoader to process any new file or update that arrives to my landing area. And then I schedule the job using DB workflows to trigger on file arrival. The issue is that the trigger only executes when new files arrive, not when an exiting ...

  • 240 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ivan_Donev
New Contributor III
  • 0 kudos

I don't think you can effectively achieve your goal. While it's theoretically somewhat possible, Databricks documentation says there is no guarantee for correctness - Auto Loader FAQ | Databricks on AWS

  • 0 kudos
SwapnilKamle
by New Contributor
  • 333 Views
  • 2 replies
  • 1 kudos

When reading a csv file with Spark.read, the data is not loading in the appropriate column while pas

I am trying to read a csv file from storage location using spark.read function. Also, i am explicitly passing the schema to the function. However, the data is not loading in proper column of the dataframe. Following are the code details:from pyspark....

  • 333 Views
  • 2 replies
  • 1 kudos
Latest Reply
sai_sathya
New Contributor III
  • 1 kudos

Hi , i would suggest to approach as suggested by Thomaz Rossito,but maybe you can give it as an try like swapping the struct field order like this followingschema = StructType([StructField('DA_RATE', DateType(), True),StructField('CURNCY_F', StringTy...

  • 1 kudos
1 More Replies
dvmentalmadess
by Valued Contributor
  • 1140 Views
  • 3 replies
  • 0 kudos

Resolved! OPTIMIZE: Exception thrown in awaitResult: / by zero

We run `OPTIMIZE` on our tables every 24 hours as follows:spark.sql(f'OPTIMIZE {catalog_name}.{schema_name}.`{table_name}`;') This morning one of our hourly jobs started failing on the call to `OPTIMIZE` with the error:org.apache.spark.SparkException...

  • 1140 Views
  • 3 replies
  • 0 kudos
Latest Reply
sh
New Contributor II
  • 0 kudos

I am getting same error. Any resolution

  • 0 kudos
2 More Replies
ksenija
by New Contributor III
  • 3286 Views
  • 1 replies
  • 1 kudos

Resolved! Cluster pools

Could you help me understand pools? How to know the difference in pricing between running clusters and running clusters with a pool? Since we're saving time to start/stop the cluster when we have a pool. And should we keep Min Idle above 0 or equal t...

  • 3286 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Valued Contributor II
  • 1 kudos

Databricks pools are a set of idle, ready-to-use instances. When a cluster is attached to a pool, cluster nodes are created using the pool’s idle instances. If the pool has no idle instances, the pool expands by allocating a new instance from the ins...

  • 1 kudos
drag7ter
by New Contributor II
  • 548 Views
  • 2 replies
  • 0 kudos

Resolved! How to enable CDF when saveAsTable from pyspark code?

I'm running this code in databricks notebook and I want the table from dataframe in catalog were created with CDF enables. When I run the code table hasn't exited yet.This code doesn't create a table with enables CDF. It doesn't add:delta.enableChang...

  • 548 Views
  • 2 replies
  • 0 kudos
Latest Reply
raphaelblg
New Contributor III
  • 0 kudos

Hello @drag7ter ,I don't see anything wrong with your approach, check my repro:    

  • 0 kudos
1 More Replies
nyehia
by Contributor
  • 2549 Views
  • 9 replies
  • 0 kudos

Can not access a sql file from Notebook

Hey,I have a repo of notebooks and SQL files, the typical way is to update/create notebooks in the repo then push it and CICD pipeline deploys the notebooks to the Shared workspace.the issue is that I can access the SQL files in the Repo but can not ...

tempsnip
  • 2549 Views
  • 9 replies
  • 0 kudos
Latest Reply
ok_1
New Contributor II
  • 0 kudos

ok

  • 0 kudos
8 More Replies
surband
by New Contributor II
  • 914 Views
  • 8 replies
  • 0 kudos

Pulsar Streaming (Read) - Benchmarking Information

We are doing a first time implementation of data streaming reading from a partitioned pulsar topics to a delta table managed by UC. We are unable to scale the job beyond about ~ 40k msgs/sec. Beyond 40k msgs/sec , the job fails.  I'd imagine Databric...

  • 914 Views
  • 8 replies
  • 0 kudos
Latest Reply
surband
New Contributor II
  • 0 kudos

Attached Grafana screenshots

  • 0 kudos
7 More Replies
ChingizK
by New Contributor II
  • 279 Views
  • 0 replies
  • 0 kudos

Workflow Failure Alert Webhooks for OpsGenie

I'm trying to set up a Workflow Job Webhook notification to send an alert to OpsGenie REST API on job failure. We've set up Teams & Email successfully.We've created the Webhook and when I configure "On Failure" I can see it in the JSON/YAML view. How...

Screenshot 2024-04-12 at 1.15.33 PM.png Screenshot 2024-04-12 at 1.17.27 PM.png
Data Engineering
jobs
opsgenie
webhooks
Workflows
  • 279 Views
  • 0 replies
  • 0 kudos
Deepikamani
by New Contributor
  • 2071 Views
  • 1 replies
  • 0 kudos

Exam vochure

Hii I am planning to take Databricks certified data engineer assosciate certification. where can i get the exam vochure.

  • 2071 Views
  • 1 replies
  • 0 kudos
Latest Reply
TPSteve
New Contributor II
  • 0 kudos

The Help Center provides an additional forum for this topic. You can request a voucher by submitting a help request, however, vouchers are not provided in all cases. Other ways to obtain a voucher are participation in training events held throughout ...

  • 0 kudos
Raja_Databricks
by New Contributor III
  • 1404 Views
  • 5 replies
  • 7 kudos

Resolved! Liquid Clustering With Merge

Hi there,I'm working with a large Delta table (2TB) and I'm looking for the best way to efficiently update it with new data (10GB). I'm particularly interested in using Liquid Clustering for faster queries, but I'm unsure if it supports updates effic...

  • 1404 Views
  • 5 replies
  • 7 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 7 kudos

Liquid will be a good option. Just make sure to run the optimize whenever you upsert data. Don't worry the optimize won't be expensive as it will run on only the latest data in order to cluster them and have fast queries

  • 7 kudos
4 More Replies
cszczotka
by New Contributor II
  • 341 Views
  • 3 replies
  • 0 kudos

Not able to create table shallow clone on DBR 15.0

Hi,I'm getting below error when I'm trying to create table shallow clone on my DBR 15.0.[CANNOT_SHALLOW_CLONE_NON_UC_MANAGED_TABLE_AS_SOURCE_OR_TARGET] Shallow clone is only supported for the MANAGED table type. The table xxx_clone is not MANAGED tab...

  • 341 Views
  • 3 replies
  • 0 kudos
Latest Reply
cszczotka
New Contributor II
  • 0 kudos

Hi,Source table is external table in UC and  result table should be also external. I'm running such command CREATE TABLE target_catalog.target_schema.table_clone SHALLOW CLONE source_catalog.source_schema.source_table but this for some reason doesn't...

  • 0 kudos
2 More Replies
hanspetter
by New Contributor III
  • 36662 Views
  • 19 replies
  • 4 kudos

Resolved! Is it possible to get Job Run ID of notebook run by dbutils.notbook.run?

When running a notebook using dbutils.notebook.run from a master-notebook, an url to that running notebook is printed, i.e.: Notebook job #223150 Notebook job #223151 Are there any ways to capture that Job Run ID (#223150 or #223151)? We have 50 or ...

  • 36662 Views
  • 19 replies
  • 4 kudos
Latest Reply
Rodrigo_Mohr
New Contributor II
  • 4 kudos

I know this is an old thread, but sharing what is working for me well in Python now, for retrieving the run_id as well and building the entire link to that job run:job_id = dbutils.notebook.entry_point.getDbutils().notebook().getContext().jobId().get...

  • 4 kudos
18 More Replies
Sandesh87
by New Contributor III
  • 2483 Views
  • 3 replies
  • 2 kudos

Task not serializable: java.io.NotSerializableException: org.apache.spark.sql.streaming.DataStreamWriter

I have a getS3Object function to get (json) objects located in aws s3  object client_connect extends Serializable { val s3_get_path = "/dbfs/mnt/s3response" def getS3Objects(s3ObjectName: String, s3Client: AmazonS3): String = { val...

  • 2483 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hey there @Sandesh Puligundla​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear f...

  • 2 kudos
2 More Replies
Labels
Top Kudoed Authors