cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

grazie
by Contributor
  • 2417 Views
  • 3 replies
  • 1 kudos

Azure Databricks, migrating delta table data with CDF on.

We are on Azure Databricks over ADLS Gen2 and have a set of tables and workflows that process data from and between those tables, using change data feeds. (We are not yet using Unity Catalog, and also not Hive metastore, just accessing delta tables f...

  • 2417 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @grazie , Moving data between Azure storage accounts while preserving timestamps and ensuring efficient processes can indeed be a challenge. Let’s explore some options to achieve this without resorting to manual, error-prone steps:   Azure Databri...

  • 1 kudos
2 More Replies
hafeez
by New Contributor III
  • 1942 Views
  • 2 replies
  • 1 kudos

Resolved! Hive metastore table access control End of Support

Hello,We are using Databricks with Hive metastore and not Unity Catalog.We would like to know if there is any End of Support on Table Access Control with Hive as this link it states that it is legacy.https://docs.databricks.com/en/data-governance/tab...

  • 1942 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @hafeez, Hive metastore table access control is a legacy data governance model within Databricks. While it is still available, Databricks strongly recommends using the Unity Catalog instead. The Unity Catalog offers a more straightforward and acco...

  • 1 kudos
1 More Replies
Remit
by New Contributor III
  • 2512 Views
  • 2 replies
  • 0 kudos

Resolved! Merge error in streaming case

I have a streaming case, where i stream from 2 sources: source1 and source2. I write to seperate streams to pick the data up from the landing area (step1). then i write 2 extra streams to apply some tranformations in order to give them the same schem...

Data Engineering
MERGE
streaming
  • 2512 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Remit , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 0 kudos
1 More Replies
geertvanhove
by New Contributor III
  • 4198 Views
  • 7 replies
  • 0 kudos

transform a dataframe column as concatenated string

Hello,I have a single column dataframe and I want to transform the content into a stringEG df=abcdefxyzToabc, def, xyz Thanks

  • 4198 Views
  • 7 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @geertvanhove , I gave you the code with screenshot.

  • 0 kudos
6 More Replies
Sangram
by New Contributor III
  • 1635 Views
  • 1 replies
  • 0 kudos

Unable to mount ADLS gen2 to databricks file system

I am unable to mount ADLS gen2 storage path into databricks storage path.It is throwing error as unsupported azure scheme:abfssMay I know the reason.Below are the steps that I followed: -1. create a service principal2. store the service principal's s...

Sangram_0-1700274947304.png
  • 1635 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Sangram , Certainly! Let’s troubleshoot the issue with mounting Azure Data Lake Storage Gen2 (ADLS Gen2) into Databricks. Azure Key Vault Permissions: Ensure that the Azure Databricks application has the necessary permissions on the Azure Key Vau...

  • 0 kudos
Erik
by Valued Contributor II
  • 1586 Views
  • 1 replies
  • 0 kudos

Why not enable "decommissioning" in spark?

You can enable "decommissioning" in spark, which causes it to remove work from a worker when it gets a notification from the cloud that the instance goes away (e.g. SPOT instances). This is dissabled by default, but it seems like such a no-brainer to...

  • 1586 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Erik ,  Enabling decommissioning in Spark is valuable, especially when dealing with cloud environments and transient instances like SPOT. Let’s delve into the reasons behind its default state and potential downsides: Why Not Enabled by Defaul...

  • 0 kudos
Erik
by Valued Contributor II
  • 2269 Views
  • 3 replies
  • 0 kudos

Run driver on spot instance

The traditional advice seems to be to run the driver on "on demand", and optionally the workers on spot. And this is indeed what happends if one chooses to run with spot instances in Databricks. But I am interested in what happens if we run with a dr...

  • 2269 Views
  • 3 replies
  • 0 kudos
Latest Reply
Erik
Valued Contributor II
  • 0 kudos

Thanks for your answer @Kaniz_Fatma ! Good overview, and I understand that "driver on-demand and the rest on spot" is a good generall advice. But I am still considering using spot instances for both, and I am left with two concrete questions:1: Can w...

  • 0 kudos
2 More Replies
hold_my_samosa
by New Contributor II
  • 7276 Views
  • 3 replies
  • 0 kudos

Delta Partition File on Azure ADLS Gen2 Migration

Hello,I am working on a migration project and I am facing issue while migrating delta tables from Azure ADLS Gen1 to Gen2.So, as per the Microsoft migration pre-requisites:File or directory names with only spaces or tabs, ending with a ., containing ...

Data Engineering
azure
datalake
delta
dtabricks
  • 7276 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @hold_my_samosa , Could you please explain what exactly is the issue now? What works and what doesn't?  

  • 0 kudos
2 More Replies
The_Demigorgan
by New Contributor
  • 1146 Views
  • 1 replies
  • 0 kudos

Autoloader issue

I'm trying to ingest data from Parquet files using Autoloader. Now, I have my custom schema, I don't want to infer the schema from the parquet files.During readstream everything is fine. But during writestream, it is somehow inferring the schema from...

  • 1146 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @The_Demigorgan, Certainly! When using Autoloader in Databricks for ingesting data from Parquet files, you can enforce your custom schema and avoid schema inference.    Let’s address this issue:   Schema Enforcement: Autoloader allows you to expli...

  • 0 kudos
eric-cordeiro
by New Contributor II
  • 1502 Views
  • 1 replies
  • 0 kudos

Insufficient Permission when writing to AWS Redshift

I'm trying to write a table in AWS Redshift using the following code:try:    (df_source.write        .format("redshift")        .option("dbtable", f"{redshift_schema}.{table_name}")        .option("tempdir", tempdir)        .option("url", url)       ...

  • 1502 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @eric-cordeiro ,  1. Ensure that the user has the USAGE privilege on the schema where the table resides. You can grant this privilege using the following SQL command:GRANT USAGE ON SCHEMA <schema_name> TO <schema_user>; 2. Since you mentioned havi...

  • 0 kudos
Hoping
by New Contributor
  • 2198 Views
  • 1 replies
  • 0 kudos

Size of each partitioned file (partitioned by default)

When I try a describe detail I get the number of files the delta table is partitioned into. How can I check the size of each file of these files that make up my entire table ?Will I be able to query each partitioned file to understand how they have b...

  • 2198 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Hoping, Certainly! Let’s explore how you can check the size of each partitioned file in a Delta table and understand how they are split:   Partitioning in Delta Tables: Delta tables can be partitioned by a specific column. The most commonly used ...

  • 0 kudos
Kayla
by Valued Contributor
  • 1310 Views
  • 1 replies
  • 0 kudos

External Table From BigQuery

I'm working on implementing Unity Catalog, and part of that is determining how to handle our BigQuery tables. We need to utilize them to connect to another application, or else we'd stay within regular delta tables on Databricks.The page https://docs...

  • 1310 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Kayla, Certainly! Let’s discuss how Unity Catalog can help you manage your data and analytics assets, including BigQuery tables:   What is Unity Catalog? Unity Catalog is Databricks’ unified data, analytics, and AI governance solution on the lake...

  • 0 kudos
amruth
by New Contributor
  • 2186 Views
  • 4 replies
  • 0 kudos

How do i retrieve timestamp data from history in databricks sql not using DELTA table,its data is coming from SAP

I am not using delta tables my data is from SAP ..how do i retrieve timestamp(history) dynamically from SAP table using databricks SQL

  • 2186 Views
  • 4 replies
  • 0 kudos
Latest Reply
Dribka
New Contributor III
  • 0 kudos

@amruth If you're working with data from SAP in Databricks and want to retrieve timestamps dynamically from a SAP table, you can utilize Databricks SQL to achieve this. You'll need to identify the specific SAP table that contains the timestamp or his...

  • 0 kudos
3 More Replies
IonFreeman_Pace
by New Contributor III
  • 3551 Views
  • 4 replies
  • 1 kudos

Resolved! First notebook in ML course fails with wrong runtime

Help! I'm trying to run this first notebook in the Scalable MachIne LEarning (SMILE) course.https://github.com/databricks-academy/scalable-machine-learning-with-apache-spark-english/blob/published/ML%2000a%20-%20Spark%20Review.pyIt fails on the first...

  • 3551 Views
  • 4 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

it means your cluster type has to be a ML runtime.When you create a cluster in databricks, you can choose between different runtimes.These have different version (spark version), but also different types:For your case you need to select the ML menu o...

  • 1 kudos
3 More Replies
pgruetter
by Contributor
  • 1339 Views
  • 2 replies
  • 0 kudos

Streaming problems after Vaccum

Hi allTo read from a large Delta table, I'm using readStream but with a trigger(availableNow=True) as I only want to run it daily. This worked well for an intial load and then incremental loads after that.At some point though, I received an error fro...

  • 1339 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @pgruetter , Certainly! Let’s delve into the behavior of readStream in the context of Delta tables and address your questions.   Delta Table Streaming with readStream: When you use readStream to read from a Delta table, it operates in an increment...

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels