cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jenshumrich
by Contributor
  • 2655 Views
  • 3 replies
  • 1 kudos

Filter not using partition

I have the following code:spark.sparkContext.setCheckpointDir("dbfs:/mnt/lifestrategy-blob/checkpoints") result_df.repartitionByRange(200, "IdStation") result_df_checked = result_df.checkpoint(eager=True) unique_stations = result_df.select("IdStation...

  • 2655 Views
  • 3 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

it seems like there is a filter being apply according to this.  Filter (isnotnull(IdStation#2678) AND (IdStation#2678 = 1119844))  I would like to share the following notebook that covers in detail this topic, in case you would like to check it out h...

  • 1 kudos
2 More Replies
EhsanSaba
by New Contributor
  • 6693 Views
  • 1 replies
  • 0 kudos

RocksDB results in empty stream/stream joins dataframe

Since we enable RocksDB in our spark.conf the stream to stream joins/unions results in empty dataframe, does anyone else have the same experience? it is on AWSspark.conf.set("spark.sql.streaming.stateStore.providerClass","com.databricks.sql.streaming...

  • 6693 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Did you also update the checkpoint? You might need to use a new checkpoint after you enable the RocksDB state store.

  • 0 kudos
Brammer88
by New Contributor III
  • 2674 Views
  • 5 replies
  • 2 kudos

Trying to run databricks academy labs, but execution fails due to method to clearcache not whilelist

Hi there,Im trying to run DE 2.1 - Querying Files Directly on my workspace with a default cluster configuration for found below,but I cannot seem to run this file (or any other labs) as it gives me this error message  Resetting the learning environme...

Brammer88_0-1713340930496.png
  • 2674 Views
  • 5 replies
  • 2 kudos
Latest Reply
Brammer88
New Contributor III
  • 2 kudos

Hi @Retired_mod and databricks team,Did you already found some other solution for this? Thanks,Bram

  • 2 kudos
4 More Replies
chloeh
by New Contributor II
  • 830 Views
  • 0 replies
  • 0 kudos

Chaining window aggregations in SQL

In my SQL data transformation pipeline, I'm doing chained/cascading window aggregations: for example, I want to do average over the last 5 minutes, then compute average over the past day on top of the 5 minute average, so that my aggregations are mor...

  • 830 Views
  • 0 replies
  • 0 kudos
Fresher
by New Contributor II
  • 1018 Views
  • 0 replies
  • 0 kudos

users are deleted/ unsynced from azure AD to databricks

In azure AD, it's shows users are synced to Databricks. But in Databricks, it's showing users is not a part of the group. The user is not part of only one group , he is part of remaining groups. All the syncing works fine till yesterday. I don't now ...

  • 1018 Views
  • 0 replies
  • 0 kudos
Darian
by New Contributor II
  • 1352 Views
  • 2 replies
  • 0 kudos

Delta Live table getting error of garbage collection after running few days

Hi, i am using delta live table in continuous mode for a real time streaming data pipeline. After running the pipeline like 2-3 days i am getting this garbage collection error:Driver/10.15.0.73 paused the JVM process 68 seconds during the past 120 se...

Darian_0-1714426883477.png Darian_1-1714426964675.png
  • 1352 Views
  • 2 replies
  • 0 kudos
Latest Reply
Darian
New Contributor II
  • 0 kudos

Here are the metrics:The size/type:Thanks!   

  • 0 kudos
1 More Replies
al_joe
by Contributor
  • 10942 Views
  • 5 replies
  • 3 kudos

Resolved! Split a code cell at cursor position? Add a cell above/below?

In JupyterLab notebooks, we can --In edit mode, you can press Ctrl+Shift+Minus to split the current cell into two at the cursor position In command mode, you can click A or B to add a cell Above or Below the current cellare there equivalent shortcuts...

  • 10942 Views
  • 5 replies
  • 3 kudos
Latest Reply
DavidKxx
Contributor
  • 3 kudos

What's the status of the ctrl-alt-minus shortcut for splitting a cell?  That keyboard combination does absolutely nothing in my interface (running Databricks via Chrome on GCP).

  • 3 kudos
4 More Replies
Lazloo
by New Contributor III
  • 16596 Views
  • 6 replies
  • 4 kudos

databricks-connect version 13: spark-class2.cmd not found

I install the newest version "databricks-connect==13.0.0". Now get the issue    Command C:\Users\Y\AppData\Local\pypoetry\Cache\virtualenvs\X-py3.9\Lib\site-packages\pyspark\bin\spark-class2.cmd"" not found   konnte nicht gefunden werden.   Traceback...

  • 16596 Views
  • 6 replies
  • 4 kudos
Latest Reply
Susumu_Asaga
New Contributor II
  • 4 kudos

Use this code:from databricks.connect import DatabricksSession spark = DatabricksSession.builder.getOrCreate() 

  • 4 kudos
5 More Replies
Phani1
by Valued Contributor II
  • 1087 Views
  • 0 replies
  • 0 kudos

Databricks cell-level code parallel execution through the Python threading library

Hi Team,We are currently planning to  implement Databricks cell-level code parallel execution through the Python threading library. We are interested in comprehending the resource consumption and allocation process from the cluster. Are there any pot...

  • 1087 Views
  • 0 replies
  • 0 kudos
jitesh
by New Contributor
  • 801 Views
  • 0 replies
  • 0 kudos

Code reusability for silver table transformations

How/how many databricks notebooks should be created to populate multiple silver delta tables, all having different and complex transformations ? What's the best practice -1. create a notebook each for a silver table ?2. push SQL transformation logic ...

  • 801 Views
  • 0 replies
  • 0 kudos
Ruby8376
by Valued Contributor
  • 1176 Views
  • 1 replies
  • 0 kudos

Databricks sql warehouse has Serverless compute as a public preview.

There is a risk form infosec as it is processed in the control plane shared with other azure clients. s there any control to mitigate the risk?

  • 1176 Views
  • 1 replies
  • 0 kudos
Latest Reply
PL_db
Databricks Employee
  • 0 kudos

You can find more information on that topic here. "With Databricks, your serverless workloads are protected by multiple layers of security. These security layers form the foundation of Databricks’ commitment to providing a secure and reliable environ...

  • 0 kudos
astrobil
by New Contributor II
  • 940 Views
  • 1 replies
  • 0 kudos

Tab Stops Indenting in SQL Editor

I am utilizing Databricks via Azure, and I've been consistently experiencing an issue with the SQL Editor. The tab button, instead of indenting, redirects my cursor to seemingly random parts of the page. This problem has persisted since I began using...

  • 940 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

which DBR version are you using? which web browser are you using?

  • 0 kudos
kartikmnc
by New Contributor
  • 1074 Views
  • 1 replies
  • 1 kudos

Regarding Exam got Suspended at middle without any reason.

Hi Team,My Databricks Certified Data Engineer Associate exam got suspended on 17th December and it is in progress state.I was there continuously in front of the camera and suddenly the alert appeared, and support person asked me to show the desk and ...

  • 1074 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Adding @Retired_mod for visibility on this request

  • 1 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 938 Views
  • 1 replies
  • 1 kudos

How much USD are you spending on Databricks?

Join two system tables and get exactly how much USD you are spending.The short version of the query: SELECT u.usage_date, u.sku_name, SUM(u.usage_quantity * p.pricing.default) AS total_spent, p.currency_code FROM system.billing....

system_pig.png
  • 938 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Thank you for sharing this information @Hubert-Dudek 

  • 1 kudos
Fresher
by New Contributor II
  • 791 Views
  • 1 replies
  • 0 kudos

Query is taking too long to run

I have two clusters. Cluster A(spark cluster) and cluster B(SQL warehouse). whenever I try to run a particular query using cluster B, it works fine but whenever I try to run same query using cluster A. It's taking time and never show the output

  • 791 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Check the physical query plan of the query you are running. Also, check the Spark UI to identify where is taking time and why.

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels