cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

mh_db
by New Contributor III
  • 3613 Views
  • 1 replies
  • 1 kudos

How to get different dynamic value for each task in workflow

I created a workflow with two tasks. It runs the first notebook and then it wait for that to finish to start the second notebook. I want to use this dynamic value as one of the parameters {{job.start_time.iso_datetime}} for both tasks. This should gi...

  • 3613 Views
  • 1 replies
  • 1 kudos
Latest Reply
lucasrocha
Databricks Employee
  • 1 kudos

Hello @mh_db , The dynamic value {{job.start_time.iso_datetime}} you are using in your workflow is designed to capture the start time of the job run, not the individual tasks within the job. This is why you are seeing the same date and time for both ...

  • 1 kudos
WWoman
by Contributor
  • 1202 Views
  • 1 replies
  • 1 kudos

Identifying invalid views

Is there a way to identify all  invalid views  in a schema or catalog without querying the view to see if it succeeds?

  • 1202 Views
  • 1 replies
  • 1 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 1 kudos

Hello @WWoman, I don't think there's a feature for that. If you think this would be a cool feature you could submit an idea in Databricks' Ideas Portal.

  • 1 kudos
NhanNguyen
by Contributor III
  • 2529 Views
  • 3 replies
  • 0 kudos

Resolved! Disk cache for csv file in Databricks

Dear team,I'm investigate to improve performance when reading large csv file as input and find this https://learn.microsoft.com/en-us/azure/databricks/optimizations/disk-cache.I just wonder Do disk-cache also apply for csv file?Thanks!

  • 2529 Views
  • 3 replies
  • 0 kudos
Latest Reply
NhanNguyen
Contributor III
  • 0 kudos

Thanks @-werners-,That's right, I tried and get some significantly performance.

  • 0 kudos
2 More Replies
saichandu_25
by New Contributor III
  • 5115 Views
  • 9 replies
  • 0 kudos

Not able to read the file content completely using head

Hi,We want to read the file content of the file and encode the content into base64. For that we have used below code file_path = "/path/to/your/file.csv"file_content = dbutils.fs.head(file_path, 512000000)encode_content = base64.b64encode(file_conten...

  • 5115 Views
  • 9 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I am curious what the use case if for wanting to load large files into github, which is a code repo.Depending on the file format different parsing is necessary.  you could foresee logic for that in your program.

  • 0 kudos
8 More Replies
DataEngineer
by New Contributor II
  • 2240 Views
  • 2 replies
  • 0 kudos

AWS Email sending challenge from Databricks with UNITY CATALOG and Multinode cluster

Hi,I have implemented the UNITY CATALOG with multinode cluster in databricks. The workspace instance profile with EC2 access is also created in IAM. but still having a challenge in sending emails from databricks using SES service.The same is working ...

  • 2240 Views
  • 2 replies
  • 0 kudos
Latest Reply
Babu_Krishnan
Contributor
  • 0 kudos

Hi @DataEngineer ,Are you able to resolve the issue. We are having the same issue when we try to use MultiNode cluster for UnityCatalog. Email functionality was working fine with Single node cluster.We are getting "ConnectionRefusedError: [Errno 111]...

  • 0 kudos
1 More Replies
Harispap
by New Contributor
  • 1314 Views
  • 0 replies
  • 0 kudos

Different result between manual and automated task run

I have a notebook where I bring info about a previous task run metadata from the API ".... /jobs/runs/get". The response should be a dictionary that contains information such as task key, run if, run page URL etc.  When I run the notebook as part of ...

  • 1314 Views
  • 0 replies
  • 0 kudos
stevenayers-bge
by Contributor
  • 3162 Views
  • 4 replies
  • 2 kudos

Bug: Shallow Clone `create or replace` causing [TABLE_OR_VIEW_NOT_FOUND]

I am having an issue where when I do a shallow clone using :create or replace table `catalog_a_test`.`schema_a`.`table_a` shallow clone `catalog_a`.`schema_a`.`table_a` I get:[TABLE_OR_VIEW_NOT_FOUND] The table or view catalog_a_test.schema_a.table_a...

  • 3162 Views
  • 4 replies
  • 2 kudos
Latest Reply
Omar_hamdan
Databricks Employee
  • 2 kudos

Hi StevenThis is really a strange issue. First let's exclude some possible causes for this. We need to check the following:- The permission to table A and Catalog B. take a look at the following link to check what permission is needed: https://docs.d...

  • 2 kudos
3 More Replies
gauravchaturved
by New Contributor II
  • 1942 Views
  • 1 replies
  • 1 kudos

Resolved! Can I delete specific partition from a Delta Live Table?

if I have created a Delta Live Table with partition on a column (lets say a date column) from a Stream Source, can I delete the partition for specific date values later to save on cost & to keep the table lean? if I can, then -1- how to do it?2- do I...

  • 1942 Views
  • 1 replies
  • 1 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 1 kudos

Hello @gauravchaturved , You can remove the partition by filtering it in your source code and triggering a full refresh in your pipeline. There is no need to run vacuum, as DLT has maintenance clusters that perform OPTIMIZE and VACUUM operations on y...

  • 1 kudos
NarenderKumar
by New Contributor III
  • 4035 Views
  • 3 replies
  • 2 kudos

Unable to connect with Databricks Serverless SQL using Dbeaver

I am trying to connect to databricks serverless SQL pool using DBeaver as mentioned in the documentation below:https://learn.microsoft.com/en-us/azure/databricks/dev-tools/dbeaverI am trying to use the Browser based authentication i.e (OAuth user-to-...

  • 4035 Views
  • 3 replies
  • 2 kudos
Latest Reply
binsel
New Contributor III
  • 2 kudos

I'm having the same problem. Any update?

  • 2 kudos
2 More Replies
youcanlearn
by New Contributor III
  • 3550 Views
  • 3 replies
  • 2 kudos

Resolved! Databricks Expectations

In the example in https://docs.databricks.com/en/delta-live-tables/expectations.html#fail-on-invalid-records, it wrote that one is able to query the DLT event log for such expectations violation. In Databricks, I can use expectation to fail or drop r...

  • 3550 Views
  • 3 replies
  • 2 kudos
Latest Reply
brockb
Databricks Employee
  • 2 kudos

That's right, the "reason" would be "x1 is negative" in your example and "valid_max_length" in the example JSON payload that I shared.If you are looking for a descriptive reason, you would name the expectation accordingly such as: @Dlt.expect_or_fail...

  • 2 kudos
2 More Replies
Sambit_S
by New Contributor III
  • 5370 Views
  • 8 replies
  • 0 kudos

Databricks Autoloader File Notification Not Working As Expected

Hello Everyone,In my project I am using databricks autoloader to incrementally and efficiently processes new data files as they arrive in cloud storage.I am using file notification mode with event grid and queue service setup in azure storage account...

  • 5370 Views
  • 8 replies
  • 0 kudos
Latest Reply
matthew_m
Databricks Employee
  • 0 kudos

Hi @Sambit_S , I misread inputRows as inputFiles which aren't the same thing. Considering the limitation on Azure queue, if you are already at the limit then you may need to consider to switching to an event source such as Kafka or Event Hub to get b...

  • 0 kudos
7 More Replies
Ramana
by Valued Contributor
  • 3376 Views
  • 3 replies
  • 0 kudos

SHOW GROUPS is not giving groups available at the account level

I am trying to capture all the Databricks groups and their mapping to user/ad group(s).I tried to do this by using show groups, show users, and show grants by following the examples mentioned in the below article but the show groups command only fetc...

  • 3376 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ramana
Valued Contributor
  • 0 kudos

Yes, I can use the Rest API but I am looking for a SQL or Programming way to do this rather than doing the API calls and building the Comex Datatype Dataframe and then saving it as a Table.ThanksRamana

  • 0 kudos
2 More Replies
kseyser
by New Contributor II
  • 2302 Views
  • 2 replies
  • 1 kudos

Predicting compute required to run Spark jobs

Im working on a project to predict compute (cores) required to run spark jobs. Has anyone work on this or something similar before? How did you get started? 

  • 2302 Views
  • 2 replies
  • 1 kudos
Latest Reply
Yeshwanth
Databricks Employee
  • 1 kudos

@kseyser good day, This documentation might help you in your use-case: https://docs.databricks.com/en/compute/cluster-config-best-practices.html#compute-sizing-considerations Kind regards, Yesh

  • 1 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels