cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Erik
by Valued Contributor II
  • 2885 Views
  • 1 replies
  • 3 kudos

Resolved! How to combine medallion architecture and delta live-tables nicely?

As many of you, we have implemented a "medallion architecture" (raw/bronze/silver/gold layers), which are each stored on seperate storrage accounts. We only create proper hive tables of the gold layer tables, so our powerbi users connecting to the da...

  • 2885 Views
  • 1 replies
  • 3 kudos
Latest Reply
merca
Valued Contributor II
  • 3 kudos

I can answer the first question:You can define data storage by setting the `path` parameter for tables. The "storage path" in pipeline settings will then only hold checkpoints (and some other pipeline stuff) and data will be stored in the correct acc...

  • 3 kudos
Kasi
by New Contributor II
  • 579 Views
  • 0 replies
  • 0 kudos

Unable to execute 6.1 and 6.2 examples

Hi All,I am unable to execute "Classroom-Setup-06.1" & "Classroom-Setup-06.2" setups in DataEngineering Course. On checking, I found that "DA = DBAcademyHelper()" statement is not executing in the include section of the code.I am using the community ...

  • 579 Views
  • 0 replies
  • 0 kudos
Host
by New Contributor
  • 1550 Views
  • 1 replies
  • 0 kudos

hostinc-logo

Hostinc is the best place to match the price and quality of the product at the most affordable price. If you are looking for a server that can make your marketing campaign a huge success here you go with our one of the most powerful Dedicated Server ...

  • 1550 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sovchenko
New Contributor II
  • 0 kudos

Thanks for sharing! Before you hire mobile app developers, you need to carefully study this topic.

  • 0 kudos
User16790091296
by Contributor II
  • 13257 Views
  • 6 replies
  • 1 kudos

How to delete from a temp view or equivalent in spark sql databricks?

I need to delete from a temp view in databricks, but it looks like i can do only merge, select and insert. Maybe i missed something but I did not find any documentation on this.

  • 13257 Views
  • 6 replies
  • 1 kudos
Latest Reply
crazy_horse
New Contributor II
  • 1 kudos

What about%sqlDROP TABLE IF EXISTS xxxxx

  • 1 kudos
5 More Replies
Bin
by New Contributor
  • 1077 Views
  • 0 replies
  • 0 kudos

How to do an "overwrite" output mode using spark structured streaming without deleting all the data and the checkpoint

I have this delta lake in ADLS to sink data through spark structured streaming. We usually append new data from our data source to our delta lake, but there are some cases when we find errors in the data that we need to reprocess everything. So what ...

  • 1077 Views
  • 0 replies
  • 0 kudos
mp
by New Contributor II
  • 2279 Views
  • 4 replies
  • 6 kudos

Resolved! How can I convert a parquet into delta table?

I am looking to migrate my legacy warehouse data. How can I convert a parquet into delta table?

  • 2279 Views
  • 4 replies
  • 6 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 6 kudos

Hi @Manish P​ , You have three options for converting a Parquet table to a Delta table.Convert files to Delta Lake format and then create a Delta table:CONVERT TO DELTA parquet.`/data-pipeline/` CREATE TABLE events USING DELTA LOCATION '/data-pipelin...

  • 6 kudos
3 More Replies
ilarsen
by Contributor
  • 707 Views
  • 0 replies
  • 1 kudos

Trouble referencing a column that has been added by schema evolution (Auto Loader with Delta Live Tables)

Hi,I have a Delta Live Tables pipeline, using Auto Loader, to ingest from JSON files. I need to do some transformations - in this case, converting timestamps. Except one of the timestamp columns does not exist in every file. This is causing the DLT p...

  • 707 Views
  • 0 replies
  • 1 kudos
serg-v
by New Contributor III
  • 1753 Views
  • 3 replies
  • 0 kudos

Running large window spark structured streaming aggregations with small slide duration

I want to run aggregations on large windows (90 days) with small slide duration (5 minutes).Straightforward solution leads to giant state around hundreds of gigabytes, which doesn't look acceptable.Is there any best practices doing this?Now I conside...

  • 1753 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Sergey Volkov​, Thanks for your question. Here are some fantastic articles on EWMA and Event-time Aggregation in Apache Spark™’s Structured Streaming. Please have a look. Let us know if that helps.https://towardsdatascience.com/time-series-from-s...

  • 0 kudos
2 More Replies
SailajaB
by Valued Contributor III
  • 1590 Views
  • 2 replies
  • 8 kudos

Resolved! How to restrict Azure users to use launch workspace to login to ADB workspace as admin when user has owner or contributor role

HI,Is there any way to disable launch workspace option in Azure portal for ADB.We have user accesses at resource group, so we need to restrict users who are part of owner or contributor role to launch ADB worksapce as admin.Thank you

  • 1590 Views
  • 2 replies
  • 8 kudos
Latest Reply
none_ranjeet
New Contributor III
  • 8 kudos

Deny Assignments don't block subscription contributor to launch workspace and become admin. Actually I haven't find any way to block that after many tries of different methods.

  • 8 kudos
1 More Replies
Malcoln_Dandaro
by New Contributor
  • 1524 Views
  • 0 replies
  • 0 kudos

Is there any way to navigate/access cloud files using the direct abfss URI (no mount) with default python functions/libs like open() or os.listdir()?

Hello, Today on our workspace we access everything via mount points, we plan to change it to "abfss://" because of security, governance and performance reasons. The problem is sometimes we interact with files using "python only" code, and apparently ...

  • 1524 Views
  • 0 replies
  • 0 kudos
danny_edm
by New Contributor
  • 584 Views
  • 0 replies
  • 0 kudos

collect_set wired result when Proton enable

Cluster : DBR 10.4 LTS with protonSample schemaseq_no (decimal)type (string)Sample dataseq_no type1 A1 A2 A2 B2 Bcommand : F.size(F.collect_set(F.col("type")).over(Window.partitionBy("seq_no"))...

  • 584 Views
  • 0 replies
  • 0 kudos
Mamdouh_Dabjan
by New Contributor III
  • 3386 Views
  • 6 replies
  • 2 kudos

Importing a large csv file into databricks free

Basically, I have a large csv file that does not fit in a single worksheet. I can just use it in power query. I am trying to import this file into my databricks notebook. I imported it and created a table using that file. But, When I saw the table, i...

  • 3386 Views
  • 6 replies
  • 2 kudos
Latest Reply
weldermartins
Honored Contributor
  • 2 kudos

hello, manually opening one of the parts of the csv file is the view different?

  • 2 kudos
5 More Replies
yannickmo
by New Contributor III
  • 5497 Views
  • 8 replies
  • 14 kudos

Resolved! Adding JAR from Azure DevOps Artifacts feed to Databricks job

Hello,We have some Scala code which is compiled and published to an Azure DevOps Artifacts feed.The issue is we're trying to now add this JAR to a Databricks job (through Terraform) to automate the creation.To do this I'm trying to authenticate using...

  • 5497 Views
  • 8 replies
  • 14 kudos
Latest Reply
alexott
Valued Contributor II
  • 14 kudos

As of right now, Databricks can't use non-public Maven repositories as resolving of the maven coordinates happens in the control plane. That's different from the R & Python libraries. As workaround you may try to install libraries via init script or ...

  • 14 kudos
7 More Replies
User16752245312
by New Contributor III
  • 4546 Views
  • 2 replies
  • 2 kudos

How can I automatically capture the heap dump on the driver and executors in the event of an OOM error?

If you have a job that repeatedly run into Out-of-memory error (OOM) either on the driver or executors, automatically capture the heap dump on OOM event will help debugging the memory issue and identify the cause of the error.Spark config:spark.execu...

  • 4546 Views
  • 2 replies
  • 2 kudos
Latest Reply
John_360
New Contributor II
  • 2 kudos

Is it necessary to use exactly that HeapDumpPath? I find I'm unable to get driver heap dumps with a different path but otherwise the same configuration. I'm using spark_version 10.4.x-cpu-ml-scala2.12.

  • 2 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels
Top Kudoed Authors