cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

pjp94
by Contributor
  • 11023 Views
  • 3 replies
  • 4 kudos

Resolved! Difference between DBFS and Delta Lake?

Would like a deeper dive/explanation into the difference. When I write to a table with the following code:spark_df.write.mode("overwrite").saveAsTable("db.table")The table is created and can be viewed in the Data tab. It can also be found in some DBF...

  • 11023 Views
  • 3 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

Tables in spark, delta lake-backed or not are basically just semantic views on top of the actual data.On Databricks, the data itself is stored in DBFS, which is an abstraction layer on top of the actual storage (like S3, ADLS etct). this can be parq...

  • 4 kudos
2 More Replies
MRH
by New Contributor II
  • 2621 Views
  • 4 replies
  • 4 kudos

Resolved! Simple Question

Does Spark SQL have both materialized and non-materialized views? With materialized views, it reads from cache for unchanged data, and only from the table for new/changed rows since the view was last accessed? Thanks!

  • 2621 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

AWESOME!

  • 4 kudos
3 More Replies
MichaelO
by New Contributor III
  • 13149 Views
  • 2 replies
  • 2 kudos

Resolved! Transfer files saved in filestore to either the workspace or to a repo

I built a machine learning model:lr = LinearRegression() lr.fit(X_train, y_train)which I can save to the filestore by:filename = "/dbfs/FileStore/lr_model.pkl" with open(filename, 'wb') as f: pickle.dump(lr, f)Ideally, I wanted to save the model ...

  • 13149 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Workspace and Repo is not full available via dbfs as they have separate access rights. It is better to use MLFlow for your models as it is like git but for ML. I think using MLOps you can than put your model also to git.

  • 2 kudos
1 More Replies
BorislavBlagoev
by Valued Contributor III
  • 5575 Views
  • 8 replies
  • 4 kudos

Resolved! Spark data limits

How much data is too much for spark and what is the best strategy to partition 2GB data?

  • 5575 Views
  • 8 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

2GB is quite small so usually default settings are the best (so in most cases better result is not to set anything like repartition etc. and leave everything to catalyst optimizer). If you want to set custom partitioning:please remember about avoidi...

  • 4 kudos
7 More Replies
User16844513407
by New Contributor III
  • 631 Views
  • 0 replies
  • 0 kudos

Hi everyone, my name is Jan and I'm a product manager working on Databricks Orchestration. We are excited to work with you to build the best Airfl...

Hi everyone, my name is Jan and I'm a product manager working on Databricks Orchestration. We are excited to work with you to build the best Airflow experience within Databricks. Feel free to ask or discuss anything around this integration!

  • 631 Views
  • 0 replies
  • 0 kudos
Yogita
by New Contributor
  • 1272 Views
  • 1 replies
  • 0 kudos

Haven't received Databricks Certified Associate Developer for Apache Spark 3.0 certification yet?

Have took my Spark 3.0 Associate developer certification via webassessor site on 30th Dec 2021 and it said Passed but still waiting to get the certificate & badge details from Databricks.Could you guys please look in to this and provide me with the c...

  • 1272 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hello, @Yogita Nesargi​ !We have an answer for you. Would you please check out these announcements?https://community.databricks.com/s/question/0D53f00001ebiUOCAY/databricks-courseshttps://community.databricks.com/s/feed/0D53f00001dq6W6CAI

  • 0 kudos
caroline123
by New Contributor III
  • 6139 Views
  • 10 replies
  • 1 kudos

Resolved! Haven't received any updates of certificate after more than one week

Hi team, I took the exam on Jan 14th and passed the exam with 91.66% score. I got an email right after the exam saying I should receive the certificate within one week. But it has been more than 10 days and I haven't heard anything from Databricks te...

  • 6139 Views
  • 10 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hello, @Luwei Lei​ - We have an answer now.Please check out these announcements.https://community.databricks.com/s/question/0D53f00001ebiUOCAY/databricks-courseshttps://community.databricks.com/s/feed/0D53f00001dq6W6CAI

  • 1 kudos
9 More Replies
All_users_grou1
by New Contributor II
  • 1648 Views
  • 2 replies
  • 2 kudos

Resolved! Haven't received Databricks Certified Associate Developer for Apache Spark 3.0 certification yet

I took the exam on 04-01-2022 and passed with 80% though I haven't received my certification yet. I had also raised a query regarding this, is there an update on request #00129864?

  • 1648 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Ayush Kumar Singh​ - We have an answer for you. Please check out these announcements.https://community.databricks.com/s/question/0D53f00001ebiUOCAY/databricks-courseshttps://community.databricks.com/s/feed/0D53f00001dq6W6CAI

  • 2 kudos
1 More Replies
CleverAnjos
by New Contributor III
  • 6429 Views
  • 5 replies
  • 3 kudos

Resolved! Best way of loading several csv files in a table

What would be the best way of loading several files like in a single table to be consumed?https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2019-10.csvhttps://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2019-11.csvhttps://s3.amazonaws...

  • 6429 Views
  • 5 replies
  • 3 kudos
Latest Reply
CleverAnjos
New Contributor III
  • 3 kudos

Thanks Kaniz, I already have the files. I was discussing about the best way to load them

  • 3 kudos
4 More Replies
techno
by New Contributor
  • 720 Views
  • 0 replies
  • 0 kudos

PLogo

Best kurta set for womenPlain, printed, weaved, adorned, these are only a couple of the choices of Kurta sets for ladies accessible on Purley rich, and that also at the best cost. You will likewise observe a gigantic determination of creator base wea...

  • 720 Views
  • 0 replies
  • 0 kudos
BilalAslamDbrx
by Databricks Employee
  • 587 Views
  • 0 replies
  • 0 kudos

�� hi everyone, I am a product manager in Amsterdam. I work on Jobs and Databricks SQL. In my spare time, I cook �� and send emojis &#12...

hi everyone, I am a product manager in Amsterdam. I work on Jobs and Databricks SQL. In my spare time, I cook and send emojis . Looking forward to working with this community!

  • 587 Views
  • 0 replies
  • 0 kudos
Tarviha
by New Contributor
  • 672 Views
  • 0 replies
  • 0 kudos

Hi. I have been trying to sign up for Databricks for past 3 hours but its giving me this error every time. I am not using any ad blocker or private mo...

Hi.I have been trying to sign up for Databricks for past 3 hours but its giving me this error every time. I am not using any ad blocker or private mode. I tried to sign up using different web browsers and experienced the same issue. Is it still possi...

Databricks Failure
  • 672 Views
  • 0 replies
  • 0 kudos
GoldenTuna
by New Contributor II
  • 4413 Views
  • 5 replies
  • 2 kudos

Resolved! Mounting an Azure Storage Account in a cluster init script?

We are trying to configure our environment so when our cluster starts up, it checks to see if we have mounted our Azure storage account container and if is not, mount it. We can do this fine in a notebook however have no luck doing this through an in...

  • 4413 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@David Kruetzkamp​ - Would you be happy to mark whichever answer helped the most as best? That will help other members find the solution more quickly.

  • 2 kudos
4 More Replies
Mirko
by Contributor
  • 4480 Views
  • 6 replies
  • 1 kudos

Resolved! Group vs User rights

I have a small question: How does the combination of group and user rights work? Is it like in azure, that if i have for example databricks sql acces threw a (databricks) group i am member of, but in my personal account databricks sql is not enabled...

  • 4480 Views
  • 6 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Mirko Ludewig​ - Good morning (or evening depending on where you hail), would you be happy to mark whichever answer resolved the problem for you as best? That helps other members find the solutions more quickly.

  • 1 kudos
5 More Replies
KKDataEngineer
by New Contributor III
  • 1385 Views
  • 0 replies
  • 2 kudos

Spark Structred Streaming, An Aggregation DF with Watermark in Append mode to Delta table is not writing the most recent aggregation to the Delta table even after crossing the water mark boundary. This is causing dataloss

Team,  I am struggling with a unique issue. I am not sure if my understanding is wrong or this is a bug with spark. I am reading a stream from events hub ( Extract) Pivoting and Aggregating the above dataframe ( Transformation). This is a WATERMARKED...

  • 1385 Views
  • 0 replies
  • 2 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels