cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MadelynM
by Databricks Employee
  • 3198 Views
  • 2 replies
  • 4 kudos

Resolved! Why isn't my notebook search function working?

My search function is broken. I can't search for notebook contents.

  • 3198 Views
  • 2 replies
  • 4 kudos
Latest Reply
lizou
Contributor II
  • 4 kudos

Here is a tool availableelsevierlabs-os/NotebookDiscovery: Notebook Discovery Tool for Databricks notebooks (github.com)How to Catalog and Discover Your Databricks Notebooks Faster - The Databricks Blog

  • 4 kudos
1 More Replies
Prabakar
by Databricks Employee
  • 2210 Views
  • 0 replies
  • 2 kudos

Accessing the regions that are disabled by default in AWS from Databricks. In AWS we have 4 regions that are disabled by default. You must first enabl...

Accessing the regions that are disabled by default in AWS from Databricks.In AWS we have 4 regions that are disabled by default. You must first enable it before you can create and manage resources. The following Regions are disabled by default:Africa...

  • 2210 Views
  • 0 replies
  • 2 kudos
Jreco
by Contributor
  • 13726 Views
  • 13 replies
  • 3 kudos

Event hub streaming improve processing rate

Hi all,I'm working with event hubs and data bricks to process and enrich data in real-time.Doing a "simple" test, I'm getting some weird values (input rate vs processing rate) and I think I'm losing data:If you can see, there is a peak with 5k record...

image image
  • 13726 Views
  • 13 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

hi @Jhonatan Reyes​ ,How many Event hubs partitions are you readying from? your micro-batch takes a few milliseconds to complete, which I think is good time, but I would like to undertand better what are you trying to improve here.Also, in this case ...

  • 3 kudos
12 More Replies
BigJay
by New Contributor II
  • 5261 Views
  • 5 replies
  • 5 kudos

Capture num_affected_rows in notebooks

If I run some code, say for an ETL process to migrate data from bronze to silver storage, when a cell executes it reports num_affected_rows in a table format. I want to capture that and log it in my logger. Is it stored in a variable or syslogged som...

  • 5261 Views
  • 5 replies
  • 5 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 5 kudos

Good one Dan! I never thought of using the delta api for this but there you go.

  • 5 kudos
4 More Replies
xiaozy
by New Contributor
  • 1541 Views
  • 1 replies
  • 1 kudos
  • 1541 Views
  • 1 replies
  • 1 kudos
Latest Reply
Prabakar
Databricks Employee
  • 1 kudos

Hi @xiaojun wang​  please check the blog and let us know if this helps you.https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

  • 1 kudos
Frankooo
by New Contributor III
  • 7488 Views
  • 8 replies
  • 7 kudos

How to optimize exporting dataframe to delta file?

Scenario : I have a dataframe that have 5 billion records/rows and 100+ columns. Is there a way to write this in a delta format efficiently. I have tried to export it but cancelled it after 2 hours (write didnt finish) as this processing time is not ...

  • 7488 Views
  • 8 replies
  • 7 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 7 kudos

Hi @Franco Sia​ ,I will recommend to avoid to use the repartition(50), instead enable optimizes writes on your Delta table. You can find more details hereEnable optimized writes and auto compaction on your Delta table. Use AQE (docs here) to have eno...

  • 7 kudos
7 More Replies
dbu_spark
by New Contributor III
  • 7688 Views
  • 10 replies
  • 6 kudos

Older Spark Version loaded into the spark notebook

I have databricks runtime for a job set to latest 10.0 Beta (includes Apache Spark 3.2.0, Scala 2.12) .In the notebook when I check for the spark version, I see version 3.1.0 instead of version 3.2.0I need the Spark version 3.2 to process workloads a...

Screen Shot 2021-10-20 at 11.45.10 AM
  • 7688 Views
  • 10 replies
  • 6 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 6 kudos

hi @Dhaivat Upadhyay​ ,Good news, DBR 10 was release yesterday October 20th. You can find more details in the release notes website

  • 6 kudos
9 More Replies
D3nnisd
by New Contributor III
  • 20988 Views
  • 15 replies
  • 6 kudos

Resolved! BufferHolder Exceeded on Json flattening

On Databricks, we use the following code to flatten JSON in Python. The data is from a REST API:```df = spark.read.format("json").option("header", "true").option("multiline", "true").load(SourceFileFolder + sourcetable + "*.json")df2 = df.select(psf....

  • 20988 Views
  • 15 replies
  • 6 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 6 kudos

@Dennis D​ , what's happening here is that more than 2 GB (2147483648 bytes) is being loaded into a single column value. This is a hard-limit for serialization. This KB article addresses it. The solution would be to find some way to have this loaded ...

  • 6 kudos
14 More Replies
Erik
by Valued Contributor III
  • 1905 Views
  • 4 replies
  • 3 kudos

Feature request: It is possible to add comments to both databricks sql databases and tables. It would be really usefull if these comments could show u...

Feature request: It is possible to add comments to both databricks sql databases and tables. It would be really usefull if these comments could show up (if they are provided) in PowerBI when one connects to the Databricks SQL endpoint, e.g. in this w...

bilde
  • 1905 Views
  • 4 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

Nice idea!

  • 3 kudos
3 More Replies
tarente
by New Contributor III
  • 3797 Views
  • 6 replies
  • 5 kudos

Resolved! How to implement the where not exists pattern in scala?

I have a dataframe with the following columns:Key1Key2Y_N_ColCol1Col2For the key tuple (Key1, Key2), I have rows with Y_N_Col = "Y" and Y_N_Col = "N".I need a new dataframe with all rows with Y_N_Col = "Y" (regardless of the key tuple), plus all Y_N_...

  • 3797 Views
  • 6 replies
  • 5 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 5 kudos

I'd use a left-anti join.So create a df with all the Y, then create a df with all the N and do a left_anti join (on key1 and key2) on the df with the Y.then a union of those two.

  • 5 kudos
5 More Replies
Programming_Sch
by New Contributor
  • 717 Views
  • 0 replies
  • 0 kudos

aws logo

What is the future of aws?The future of AWS is very promising. So, if you are thinking of a cloud career or want to switch your position to something related to the cloud, I would highly recommend you going for AWS training. No matter what field you ...

  • 717 Views
  • 0 replies
  • 0 kudos
User16826992666
by Valued Contributor
  • 1194 Views
  • 1 replies
  • 0 kudos

If data from a Delta table is cached in Databricks SQL and the table is altered in the backend, does it invalidate the cache?

Basically I'm worried about the scenario where data that gets cached on Databricks SQL endpoints becomes out of sync with the source Delta table. If that were to happen and data was read from the cache it would be out of date/incorrect. Is this a con...

  • 1194 Views
  • 1 replies
  • 0 kudos
Latest Reply
mathan_pillai
Databricks Employee
  • 0 kudos

There are 3 types of caching. 1-Databricks SQL UI caching, 2-Query results caching , 3-Delta caching . (1) does not get invalidated. It's like your BI dashboard. BI dashboard needs to be manually refreshed.(2) and (3) gets auto invalidation.pls check...

  • 0 kudos
nlee
by New Contributor
  • 3392 Views
  • 1 replies
  • 1 kudos

Resolved! How to create a temporary file with sql

what are the commands to create a temporary file with SQL

  • 3392 Views
  • 1 replies
  • 1 kudos
Latest Reply
mathan_pillai
Databricks Employee
  • 1 kudos

In Spark SQL, you could use commands like "insert overwrite directory" that indirectly creates a temporary file with the datahttps://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-dml-insert-overwrite-directory.html#example...

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels