cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

deng_dev
by New Contributor III
  • 7060 Views
  • 1 replies
  • 0 kudos

py4j.protocol.Py4JJavaError: An error occurred while calling o359.sql. : java.util.NoSuchElementExce

Hi!We are creating table in streaming job every micro-batch using spark.sql('create or replace table ... using delta as ...') command. This query includes combining data from multiple tables.Sometimes our job fails with error:py4j.Py4JException: An e...

  • 7060 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @deng_dev , The error message you’re encountering, java.util.NoSuchElementException: key not found: Filter (isnotnull(uuid#42326735) AND isnotnull(actor_uuid#42326740)), indicates that there’s an issue with the query execution.   Let’s address thi...

  • 0 kudos
oosterhuisf
by New Contributor II
  • 1059 Views
  • 2 replies
  • 0 kudos

break production using a shallow clone

Hi,If you create a shallow clone using the latest LTS, and drop the clone using a SQL warehouse (either current or preview), the source table is broken beyond repair. Data reads and writes still work, but vacuum will remain forever broken. I've attac...

  • 1059 Views
  • 2 replies
  • 0 kudos
Latest Reply
oosterhuisf
New Contributor II
  • 0 kudos

To add to that: the manual does not state that this might happen

  • 0 kudos
1 More Replies
Michael_Galli
by Contributor II
  • 745 Views
  • 1 replies
  • 1 kudos

Resolved! Many dbutils.notebook.run interations in a workflow -> Failed to checkout Github repository Error

Hi all,I have a workflow that runs one single notebook with dbutils.notebook.run() and different parameters in one long loop.At some point, I do have random git erros in the notebook run:com.databricks.WorkflowException: com.databricks.NotebookExecut...

  • 745 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Michael_Galli, It appears that you’re encountering GitHub-related issues during your notebook runs in Databricks.    Let’s address this step by step:   GitHub API Limit: Databricks enforces rate limits for all REST API calls, including those rela...

  • 1 kudos
Anotech
by New Contributor II
  • 4707 Views
  • 2 replies
  • 1 kudos

How can I fix this error. ExecutionError: An error occurred while calling o392.mount: java.lang.NullPointerException

Hello, I'm trying to mount my Databricks to my Azure gen 2 data lake to read in data from the container, but I get an error when executing this line of code: dbutils.fs.mount( source = "abfss://resumes@choisysresume.dfs.core.windows.net/", mount_poin...

  • 4707 Views
  • 2 replies
  • 1 kudos
Latest Reply
WernerS
New Contributor III
  • 1 kudos

checked it with my mount script and that is exactly the same except that I do not put a '/' after dfs.core.windows.netYou might wanna try that.Also, is Unity enabled?  Because Unity does not allow mounts.

  • 1 kudos
1 More Replies
FabriceDeseyn
by Contributor
  • 4907 Views
  • 5 replies
  • 6 kudos

Resolved! What does autoloader's cloudfiles.backfillInterval do?

I'm using autoloader directory listing mode (without incremental file listing) and sometimes, new files are not picked up and found in the cloud_files-listing.I have found that using the 'cloudfiles.backfillInterval'-option can resolve the detection ...

image
  • 4907 Views
  • 5 replies
  • 6 kudos
Latest Reply
Kiranrathod
New Contributor III
  • 6 kudos

Hi @Lakshay Goel​ ,where can I set the backFillInterval property in the code? Do you have any sample codes for this use case?

  • 6 kudos
4 More Replies
grazie
by Contributor
  • 1948 Views
  • 3 replies
  • 1 kudos

Azure Databricks, migrating delta table data with CDF on.

We are on Azure Databricks over ADLS Gen2 and have a set of tables and workflows that process data from and between those tables, using change data feeds. (We are not yet using Unity Catalog, and also not Hive metastore, just accessing delta tables f...

  • 1948 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @grazie , Moving data between Azure storage accounts while preserving timestamps and ensuring efficient processes can indeed be a challenge. Let’s explore some options to achieve this without resorting to manual, error-prone steps:   Azure Databri...

  • 1 kudos
2 More Replies
hafeez
by New Contributor III
  • 1546 Views
  • 2 replies
  • 1 kudos

Resolved! Hive metastore table access control End of Support

Hello,We are using Databricks with Hive metastore and not Unity Catalog.We would like to know if there is any End of Support on Table Access Control with Hive as this link it states that it is legacy.https://docs.databricks.com/en/data-governance/tab...

  • 1546 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @hafeez, Hive metastore table access control is a legacy data governance model within Databricks. While it is still available, Databricks strongly recommends using the Unity Catalog instead. The Unity Catalog offers a more straightforward and acco...

  • 1 kudos
1 More Replies
Remit
by New Contributor III
  • 2057 Views
  • 2 replies
  • 0 kudos

Resolved! Merge error in streaming case

I have a streaming case, where i stream from 2 sources: source1 and source2. I write to seperate streams to pick the data up from the landing area (step1). then i write 2 extra streams to apply some tranformations in order to give them the same schem...

Data Engineering
MERGE
streaming
  • 2057 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Remit , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 0 kudos
1 More Replies
geertvanhove
by New Contributor III
  • 3174 Views
  • 7 replies
  • 0 kudos

transform a dataframe column as concatenated string

Hello,I have a single column dataframe and I want to transform the content into a stringEG df=abcdefxyzToabc, def, xyz Thanks

  • 3174 Views
  • 7 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @geertvanhove , I gave you the code with screenshot.

  • 0 kudos
6 More Replies
Sangram
by New Contributor III
  • 1359 Views
  • 1 replies
  • 0 kudos

Unable to mount ADLS gen2 to databricks file system

I am unable to mount ADLS gen2 storage path into databricks storage path.It is throwing error as unsupported azure scheme:abfssMay I know the reason.Below are the steps that I followed: -1. create a service principal2. store the service principal's s...

Sangram_0-1700274947304.png
  • 1359 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Sangram , Certainly! Let’s troubleshoot the issue with mounting Azure Data Lake Storage Gen2 (ADLS Gen2) into Databricks. Azure Key Vault Permissions: Ensure that the Azure Databricks application has the necessary permissions on the Azure Key Vau...

  • 0 kudos
Erik
by Valued Contributor II
  • 1141 Views
  • 1 replies
  • 0 kudos

Why not enable "decommissioning" in spark?

You can enable "decommissioning" in spark, which causes it to remove work from a worker when it gets a notification from the cloud that the instance goes away (e.g. SPOT instances). This is dissabled by default, but it seems like such a no-brainer to...

  • 1141 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Erik ,  Enabling decommissioning in Spark is valuable, especially when dealing with cloud environments and transient instances like SPOT. Let’s delve into the reasons behind its default state and potential downsides: Why Not Enabled by Defaul...

  • 0 kudos
Erik
by Valued Contributor II
  • 1776 Views
  • 3 replies
  • 0 kudos

Run driver on spot instance

The traditional advice seems to be to run the driver on "on demand", and optionally the workers on spot. And this is indeed what happends if one chooses to run with spot instances in Databricks. But I am interested in what happens if we run with a dr...

  • 1776 Views
  • 3 replies
  • 0 kudos
Latest Reply
Erik
Valued Contributor II
  • 0 kudos

Thanks for your answer @Kaniz_Fatma ! Good overview, and I understand that "driver on-demand and the rest on spot" is a good generall advice. But I am still considering using spot instances for both, and I am left with two concrete questions:1: Can w...

  • 0 kudos
2 More Replies
hold_my_samosa
by New Contributor II
  • 5962 Views
  • 3 replies
  • 0 kudos

Delta Partition File on Azure ADLS Gen2 Migration

Hello,I am working on a migration project and I am facing issue while migrating delta tables from Azure ADLS Gen1 to Gen2.So, as per the Microsoft migration pre-requisites:File or directory names with only spaces or tabs, ending with a ., containing ...

Data Engineering
azure
datalake
delta
dtabricks
  • 5962 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @hold_my_samosa , Could you please explain what exactly is the issue now? What works and what doesn't?  

  • 0 kudos
2 More Replies
The_Demigorgan
by New Contributor
  • 1015 Views
  • 1 replies
  • 0 kudos

Autoloader issue

I'm trying to ingest data from Parquet files using Autoloader. Now, I have my custom schema, I don't want to infer the schema from the parquet files.During readstream everything is fine. But during writestream, it is somehow inferring the schema from...

  • 1015 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @The_Demigorgan, Certainly! When using Autoloader in Databricks for ingesting data from Parquet files, you can enforce your custom schema and avoid schema inference.    Let’s address this issue:   Schema Enforcement: Autoloader allows you to expli...

  • 0 kudos
Balazs
by New Contributor III
  • 5445 Views
  • 1 replies
  • 0 kudos

Unity Catalog Volume as spark checkpoint location

Hi,I tried to set the spark checkpoint location in a notebook to a folder in a Unity Catalog Volume, with the following command: sc.setCheckpointDir("/Volumes/catalog_name/schema_name/volume_name/folder_name")Unfortunately I receive the following err...

  • 5445 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Balazs , Databricks volumes are Unity Catalog objects that represent logical volumes of storage in a cloud object storage location. They provide capabilities for accessing, storing, governing, and organizing files. While tables govern tabular dat...

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels