cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

WWoman
by Contributor
  • 972 Views
  • 1 replies
  • 1 kudos

Identifying invalid views

Is there a way to identify all  invalid views  in a schema or catalog without querying the view to see if it succeeds?

  • 972 Views
  • 1 replies
  • 1 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 1 kudos

Hello @WWoman, I don't think there's a feature for that. If you think this would be a cool feature you could submit an idea in Databricks' Ideas Portal.

  • 1 kudos
NhanNguyen
by Contributor III
  • 1932 Views
  • 3 replies
  • 0 kudos

Resolved! Disk cache for csv file in Databricks

Dear team,I'm investigate to improve performance when reading large csv file as input and find this https://learn.microsoft.com/en-us/azure/databricks/optimizations/disk-cache.I just wonder Do disk-cache also apply for csv file?Thanks!

  • 1932 Views
  • 3 replies
  • 0 kudos
Latest Reply
NhanNguyen
Contributor III
  • 0 kudos

Thanks @-werners-,That's right, I tried and get some significantly performance.

  • 0 kudos
2 More Replies
saichandu_25
by New Contributor III
  • 4027 Views
  • 9 replies
  • 0 kudos

Not able to read the file content completely using head

Hi,We want to read the file content of the file and encode the content into base64. For that we have used below code file_path = "/path/to/your/file.csv"file_content = dbutils.fs.head(file_path, 512000000)encode_content = base64.b64encode(file_conten...

  • 4027 Views
  • 9 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I am curious what the use case if for wanting to load large files into github, which is a code repo.Depending on the file format different parsing is necessary.  you could foresee logic for that in your program.

  • 0 kudos
8 More Replies
DataEngineer
by New Contributor II
  • 1767 Views
  • 2 replies
  • 0 kudos

AWS Email sending challenge from Databricks with UNITY CATALOG and Multinode cluster

Hi,I have implemented the UNITY CATALOG with multinode cluster in databricks. The workspace instance profile with EC2 access is also created in IAM. but still having a challenge in sending emails from databricks using SES service.The same is working ...

  • 1767 Views
  • 2 replies
  • 0 kudos
Latest Reply
Babu_Krishnan
Contributor
  • 0 kudos

Hi @DataEngineer ,Are you able to resolve the issue. We are having the same issue when we try to use MultiNode cluster for UnityCatalog. Email functionality was working fine with Single node cluster.We are getting "ConnectionRefusedError: [Errno 111]...

  • 0 kudos
1 More Replies
Harispap
by New Contributor
  • 1108 Views
  • 0 replies
  • 0 kudos

Different result between manual and automated task run

I have a notebook where I bring info about a previous task run metadata from the API ".... /jobs/runs/get". The response should be a dictionary that contains information such as task key, run if, run page URL etc.  When I run the notebook as part of ...

  • 1108 Views
  • 0 replies
  • 0 kudos
stevenayers-bge
by Contributor
  • 2529 Views
  • 4 replies
  • 2 kudos

Bug: Shallow Clone `create or replace` causing [TABLE_OR_VIEW_NOT_FOUND]

I am having an issue where when I do a shallow clone using :create or replace table `catalog_a_test`.`schema_a`.`table_a` shallow clone `catalog_a`.`schema_a`.`table_a` I get:[TABLE_OR_VIEW_NOT_FOUND] The table or view catalog_a_test.schema_a.table_a...

  • 2529 Views
  • 4 replies
  • 2 kudos
Latest Reply
Omar_hamdan
Databricks Employee
  • 2 kudos

Hi StevenThis is really a strange issue. First let's exclude some possible causes for this. We need to check the following:- The permission to table A and Catalog B. take a look at the following link to check what permission is needed: https://docs.d...

  • 2 kudos
3 More Replies
gauravchaturved
by New Contributor II
  • 1569 Views
  • 1 replies
  • 1 kudos

Resolved! Can I delete specific partition from a Delta Live Table?

if I have created a Delta Live Table with partition on a column (lets say a date column) from a Stream Source, can I delete the partition for specific date values later to save on cost & to keep the table lean? if I can, then -1- how to do it?2- do I...

  • 1569 Views
  • 1 replies
  • 1 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 1 kudos

Hello @gauravchaturved , You can remove the partition by filtering it in your source code and triggering a full refresh in your pipeline. There is no need to run vacuum, as DLT has maintenance clusters that perform OPTIMIZE and VACUUM operations on y...

  • 1 kudos
NarenderKumar
by New Contributor III
  • 3357 Views
  • 3 replies
  • 2 kudos

Unable to connect with Databricks Serverless SQL using Dbeaver

I am trying to connect to databricks serverless SQL pool using DBeaver as mentioned in the documentation below:https://learn.microsoft.com/en-us/azure/databricks/dev-tools/dbeaverI am trying to use the Browser based authentication i.e (OAuth user-to-...

  • 3357 Views
  • 3 replies
  • 2 kudos
Latest Reply
binsel
New Contributor III
  • 2 kudos

I'm having the same problem. Any update?

  • 2 kudos
2 More Replies
youcanlearn
by New Contributor III
  • 2688 Views
  • 3 replies
  • 2 kudos

Resolved! Databricks Expectations

In the example in https://docs.databricks.com/en/delta-live-tables/expectations.html#fail-on-invalid-records, it wrote that one is able to query the DLT event log for such expectations violation. In Databricks, I can use expectation to fail or drop r...

  • 2688 Views
  • 3 replies
  • 2 kudos
Latest Reply
brockb
Databricks Employee
  • 2 kudos

That's right, the "reason" would be "x1 is negative" in your example and "valid_max_length" in the example JSON payload that I shared.If you are looking for a descriptive reason, you would name the expectation accordingly such as: @Dlt.expect_or_fail...

  • 2 kudos
2 More Replies
guizsantos
by New Contributor II
  • 2299 Views
  • 2 replies
  • 3 kudos

Resolved! How to obtain a query profile programatically?

Hi everyone! Does anyone know if there is a way to obtain the data used to create the graph showed in the "Query profile" section? Particularly, I am interested in the rows produced by the intermediary query operations. I can see there is "Download" ...

  • 2299 Views
  • 2 replies
  • 3 kudos
Latest Reply
guizsantos
New Contributor II
  • 3 kudos

Hey @raphaelblg , thanks for you input!I understand that some info may be obtained by the `EXPLAIN` command, however, the output is not very clear on its meaning and definetely does not provide what is most interesting to us, which is the rows proces...

  • 3 kudos
1 More Replies
Sambit_S
by New Contributor III
  • 4141 Views
  • 8 replies
  • 0 kudos

Databricks Autoloader File Notification Not Working As Expected

Hello Everyone,In my project I am using databricks autoloader to incrementally and efficiently processes new data files as they arrive in cloud storage.I am using file notification mode with event grid and queue service setup in azure storage account...

  • 4141 Views
  • 8 replies
  • 0 kudos
Latest Reply
matthew_m
Databricks Employee
  • 0 kudos

Hi @Sambit_S , I misread inputRows as inputFiles which aren't the same thing. Considering the limitation on Azure queue, if you are already at the limit then you may need to consider to switching to an event source such as Kafka or Event Hub to get b...

  • 0 kudos
7 More Replies
Ramana
by Contributor III
  • 2718 Views
  • 3 replies
  • 0 kudos

SHOW GROUPS is not giving groups available at the account level

I am trying to capture all the Databricks groups and their mapping to user/ad group(s).I tried to do this by using show groups, show users, and show grants by following the examples mentioned in the below article but the show groups command only fetc...

  • 2718 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ramana
Contributor III
  • 0 kudos

Yes, I can use the Rest API but I am looking for a SQL or Programming way to do this rather than doing the API calls and building the Comex Datatype Dataframe and then saving it as a Table.ThanksRamana

  • 0 kudos
2 More Replies
kseyser
by New Contributor II
  • 1802 Views
  • 2 replies
  • 1 kudos

Predicting compute required to run Spark jobs

Im working on a project to predict compute (cores) required to run spark jobs. Has anyone work on this or something similar before? How did you get started? 

  • 1802 Views
  • 2 replies
  • 1 kudos
Latest Reply
Yeshwanth
Databricks Employee
  • 1 kudos

@kseyser good day, This documentation might help you in your use-case: https://docs.databricks.com/en/compute/cluster-config-best-practices.html#compute-sizing-considerations Kind regards, Yesh

  • 1 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels