cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Saf4Databricks
by Contributor
  • 1139 Views
  • 8 replies
  • 0 kudos

Alternative of spark.sql.globalTempDatabase

Question: Since I'm using Databricks Free Edition that uses only serverless cluster, I cannot use spark.sql.globalTempDatabase in my code below. What's an alternative solution for the Caller_Notebook below. Following error occurred in the second line...

  • 1139 Views
  • 8 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

Hi @Saf4Databricks You can use the below.Called_Notebook:spark.range(5).toDF("MyCol").createOrReplaceTempView("MyView")Caller_Notebook:%run "./Notebook" display(table("MyView")) MyCol01234

  • 0 kudos
7 More Replies
Subash
by New Contributor III
  • 7844 Views
  • 9 replies
  • 6 kudos

DBFS File browser option is not visible in Databricks Community Edition

I am not able to see the DBFS File Browser option to enable it. Can someone help how to get that option?  

Subash_0-1729132675814.png
  • 7844 Views
  • 9 replies
  • 6 kudos
Latest Reply
kekanedbricks
New Contributor II
  • 6 kudos

DBFS File browser is not visible. Kindly help to enable it please. kekanedbricks@gmail.com and kekaneshrikant@gmail.com account.

  • 6 kudos
8 More Replies
SethuSrinivasan
by New Contributor II
  • 41220 Views
  • 1 replies
  • 2 kudos

Requesting support for "SELECT TOP n from Table"

In notebook, It looks like if I need to select top N rows, I can rely on "LIMIT" keyword. It would be nice if you can support "TOP" as well The current approach to select 10 rows: select * from table1 LIMIT 10 Requesting TOP support: SELECT TOP 10 *...

  • 41220 Views
  • 1 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

@SethuSrinivasan , Looks like this one slipped through the cracks — apologies if you've long since moved on, but posting anyway in case it helps someone hitting the same wall. In Databricks SQL, SELECT TOP n doesn't exist. You get the same result wit...

  • 2 kudos
RisabhRawat
by New Contributor
  • 1033 Views
  • 2 replies
  • 1 kudos

Resolved! Spark Streaming – Old file not processed with new checkpoint and new output path

Hi everyone,I am a Data Engineer and currently practicing Spark Streaming in Databricks. I am trying to understand how file streaming behaves with checkpoints and how Spark detects new files.My setup:Source folder:/Volumes/workspace/streaming/streamI...

  • 1033 Views
  • 2 replies
  • 1 kudos
Latest Reply
balajij8
Contributor III
  • 1 kudos

The checkpoint tracks the structured streaming information including state information and processed records. When you change to a new checkpoint location, the next run begins fresh. You can create a different Delta file with a new checkpoint & new o...

  • 1 kudos
1 More Replies
MikeGo
by Valued Contributor
  • 7434 Views
  • 2 replies
  • 0 kudos

How databricks assign memory and cores

Hi team,We are using job cluster with node type 128G memory+16cores for a workflow. From document we know one worker is one node and is one executor. From Spark UI env tab we can see the spark.executor.memory is 24G, and from metrics we can see the m...

  • 7434 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Databricks allocates resources to executors on a node based on several factors, and it appears that your cluster configuration is using default settings since no specific Spark configurations were provided. Executor Memory Allocation: The spark.exec...

  • 0 kudos
1 More Replies
sandeephenkel23
by New Contributor III
  • 2909 Views
  • 2 replies
  • 1 kudos

StringIndexer method fails with shared compute

Dear TeamStringIndexer method of mlflow library upon running code on No Isolation Shared access mode data bricks cluster it works but it is failing on Unity catalog enabled data bricks cluster having Shared access mode. here is the library name: from...

  • 2909 Views
  • 2 replies
  • 1 kudos
Latest Reply
vivadiva1981
New Contributor II
  • 1 kudos

so can we not run Spark ML in the Databricks Free edition that uses only serverless compute?

  • 1 kudos
1 More Replies
hdu
by New Contributor III
  • 541 Views
  • 2 replies
  • 0 kudos

Resolved! Apply expectations conditionally in SLDP

Hi folks, The following code runs as expected, and all three rules are validated.     @dp.view(name=f"v_validate_source_{table}")    @dp.expect_all_or_drop({"201-Data row":"row_cnt > 0"})    @dp.expect_all_or_drop({        "101-One footer row" : "foo...

  • 541 Views
  • 2 replies
  • 0 kudos
Latest Reply
mauriciofh
New Contributor III
  • 0 kudos

Great question. With decorators, you cannot place them inside an if block the way you wrote. Decorators are applied when the function is defined.The clean way is to build the expectations dictionary first, then apply one decorator:rules = { "101-...

  • 0 kudos
1 More Replies
ItalSess_5094
by New Contributor II
  • 729 Views
  • 3 replies
  • 2 kudos

Resolved! Photon not used for the filter step (falls back to COLUMNAR_TO_ROW → FILTER_EXEC in JVM)

We have a custom python notebook used to handle data loading. In this case, it's for a full overwrite of specific partitions. The notebook determines columns to use for the update based on incoming data. It creates a replace condition like this: repl...

  • 729 Views
  • 3 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Thanks for the feedback @ItalSess_5094 , do me a favor and click on "Accept as Solution" if you are satisified with my response. It will be helpful to others here in the community.  Thanks, Louis.

  • 2 kudos
2 More Replies
Saf4Databricks
by Contributor
  • 992 Views
  • 6 replies
  • 1 kudos

Resolved! Why my calling notebook is not receiving the value of a variable in called notebook?

 Remarks: I though you can use %run command to make variables defined in one notebook available in another. The %run command executes the specified notebook inline within the current notebook's session, so all functions, variables, and DataFrames def...

  • 992 Views
  • 6 replies
  • 1 kudos
Latest Reply
Saf4Databricks
Contributor
  • 1 kudos

Hi @Ashwin_DSA, thank you for pointing out the cause of the error. This post can now be locked/closed.

  • 1 kudos
5 More Replies
Dhruv-22
by Contributor III
  • 2419 Views
  • 10 replies
  • 0 kudos

Merge with schema evolution fails because of upper case columns

The following is a minimal reproducible example of what I'm facing right now.%sql CREATE OR REPLACE TABLE edw_nprd_aen.bronze.test_table ( id INT ); INSERT INTO edw_nprd_aen.bronze.test_table VALUES (1); SELECT * FROM edw_nprd_aen.bronze.test_tab...

Dhruv22_0-1768233514715.png Dhruv22_1-1768233551139.png Dhruv22_0-1768234077162.png
  • 2419 Views
  • 10 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @Dhruv-22 , I did check with our product teams and they agree with what I wrote above, and that if you have a support contract to open a ticket about it. They are aware of this behavior and the workaround needed. However, they haven't seen this af...

  • 0 kudos
9 More Replies
cdn_yyz_yul
by Contributor II
  • 2545 Views
  • 13 replies
  • 4 kudos

Resolved! unionbyname several streaming dataframes of different sources

Is the following type of union safe with spark structured streaming?union multiple streaming dataframes, and each from a different source.Anything better solution ?for example, df1 = spark.readStream.table(f"{bronze_catalog}.{bronze_schema}.table1") ...

  • 2545 Views
  • 13 replies
  • 4 kudos
Latest Reply
cdn_yyz_yul
Contributor II
  • 4 kudos

Thanks @Kirankumarbs, Thanks @SteveOstrowski You have provided very useful information. 

  • 4 kudos
12 More Replies
ChrisHunt
by New Contributor III
  • 1499 Views
  • 2 replies
  • 1 kudos

Resolved! How to stop Databricks adding quotes to multi-line selections

I'm using a query to generate some YML code from my tables, and running into an annoying behaviour. Here's a simplified example...Run this query in a notebook or the SQL editor:SELECT 'foo\nbar' FROM system.information_schema.tables LIMIT 10You get a...

foobar.png
  • 1499 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @ChrisHunt, As Ale_Armillotta mentioned, any field that contains a newline is wrapped in double quotes so that each row still represents a single CSV record. I think that is a logical and expected behaviour. There is currently no setting to turn t...

  • 1 kudos
1 More Replies
CodeInYellow
by New Contributor II
  • 722 Views
  • 2 replies
  • 2 kudos

Resolved! Pool Max Capacity and Cluster Creation

Hello,I have a theoretical question for which I have not been able to find a clear answer in the documentation.When a cluster is created using an instance pool, what exactly is checked when the pool is asked to provide nodes?More specifically, does t...

  • 722 Views
  • 2 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Greetings @CodeInYellow , I did some research and here is what I found.  Your first scenario is correct: the pool checks actual current usage, not possible future usage across attached clusters. Using your example with a pool max capacity of 23: Clu...

  • 2 kudos
1 More Replies
maikel
by Contributor III
  • 1056 Views
  • 6 replies
  • 4 kudos

Resolved! Job description

Hello!Is there a way to add some job description with some information about parameters meaning e.g.? Or only notebook which is the source of job can be used for that?Thank you!

  • 1056 Views
  • 6 replies
  • 4 kudos
Latest Reply
maikel
Contributor III
  • 4 kudos

OK! I found it:resources: jobs: example_job: name: example_job${bundle.target} description: "y description"Thanks a lot!

  • 4 kudos
5 More Replies
drag7ter
by Contributor
  • 13789 Views
  • 8 replies
  • 4 kudos

Resolved! foreachBatch doesn't work in structured streaming

I' m trying to print out number of rows in the batch, but seems it doesn't work properly. I have 1 node compute optimized cluster and run in notebook this code:# Logging the row count using a streaming-friendly approach def log_row_count(batch_df, ba...

Capture.PNG
  • 13789 Views
  • 8 replies
  • 4 kudos
Latest Reply
Malthe
Valued Contributor II
  • 4 kudos

@szymon_dybczak in my testing, the print output does not appear anywhere. There is no trace of them anywhere,  neither in the notebook or in driver logs.

  • 4 kudos
7 More Replies
Labels