cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

gbrueckl
by Contributor II
  • 3581 Views
  • 6 replies
  • 4 kudos

Resolved! CREATE FUNCTION from Python file

Is it somehow possible to create an SQL external function using Python code?the examples only show how to use JARshttps://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-function.htmlsomething like:CREATE TEMPORAR...

  • 3581 Views
  • 6 replies
  • 4 kudos
Latest Reply
pts
New Contributor II
  • 4 kudos

As a user of your code, I'd find it a less pleasant API because I'd have to some_module.some_func.some_func() rather than just some_module.some_func()No reason to have "some_func" exist twice in the hierarchy. It's kind of redundant. If some_func is ...

  • 4 kudos
5 More Replies
pjp94
by Contributor
  • 7824 Views
  • 5 replies
  • 4 kudos

Resolved! Difference between DBFS and Delta Lake?

Would like a deeper dive/explanation into the difference. When I write to a table with the following code:spark_df.write.mode("overwrite").saveAsTable("db.table")The table is created and can be viewed in the Data tab. It can also be found in some DBF...

  • 7824 Views
  • 5 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

Tables in spark, delta lake-backed or not are basically just semantic views on top of the actual data.On Databricks, the data itself is stored in DBFS, which is an abstraction layer on top of the actual storage (like S3, ADLS etct). this can be parq...

  • 4 kudos
4 More Replies
MichaelO
by New Contributor III
  • 10820 Views
  • 3 replies
  • 2 kudos

Resolved! Transfer files saved in filestore to either the workspace or to a repo

I built a machine learning model:lr = LinearRegression() lr.fit(X_train, y_train)which I can save to the filestore by:filename = "/dbfs/FileStore/lr_model.pkl" with open(filename, 'wb') as f: pickle.dump(lr, f)Ideally, I wanted to save the model ...

  • 10820 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Michael Okelola​ , When you store the file in DBFS (/FileStore/...), it's in your account (data plane). While notebooks, etc. are in the Databricks account (control plane). By design, you can't import non-code objects into a workspace. But Repos ...

  • 2 kudos
2 More Replies
GC-James
by Contributor II
  • 1409 Views
  • 4 replies
  • 3 kudos

Resolved! Working locally then moving to databricks

Hello DataBricks,Struggling with a workflow issue and wondering if anyone can help. I am developing my project in R and sometimes Python locally on my laptop, and committing the files to a git repo. I can then clone that repo in databricks, and *see*...

  • 1409 Views
  • 4 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

This is separate script which than need to be run from notebook (or job). I am not using R but in Python and Scala it works the same. In Python I am just importing it in notebook ("from folder_structure import myClass") in R probably similar. There ...

  • 3 kudos
3 More Replies
Idm_Crack
by New Contributor II
  • 493 Views
  • 1 replies
  • 0 kudos

goharpc.com

IDM Crack with Internet Download Manager (IDM) is a tool to increase download speeds, resume, and schedule downloads.

  • 493 Views
  • 1 replies
  • 0 kudos
Latest Reply
Idm_Crack
New Contributor II
  • 0 kudos

IDM Crack with Internet Download Manager (IDM) is a tool to increase download speeds, resume, and schedule downloads.

  • 0 kudos
al_joe
by Contributor
  • 1493 Views
  • 4 replies
  • 3 kudos

Resolved! Execute a notebook cell with a SINGLE mouse-click?

Currently it takes two mouse-clicks to execute each cell in a DB notebook.I know there is a keyboard shortcut (Ctrl+Enter) to execute the current cellBut is there a way to execute a cell with a single mouse-click?I could use a greasemonkey script or ...

image.png
  • 1493 Views
  • 4 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

Simple answer: no.

  • 3 kudos
3 More Replies
Mirko
by Contributor
  • 1611 Views
  • 3 replies
  • 0 kudos

Resolved! Location for DB and for specific tables in DB

The following situation: I am creating a Database with location somewhere in my Azure Lake Gen 2.CREATE SCHEMA IF NOT EXISTS curated LOCATION 'somelocation'Then i want a specific Table in curated to be in a subfolder in 'somelocation':CREATE TABLE IF...

  • 1611 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Mirko Ludewig​ - Thanks for letting us know. I don't like strange all that much, but I do like working as desired!

  • 0 kudos
2 More Replies
Balaramya
by New Contributor II
  • 977 Views
  • 3 replies
  • 1 kudos

  Hi Team, I have taken Databricks Apache Spark 3.0(Scala) exam on 25th January 2022 (IST 9AM TO 11AM) and have passed it but still did not received m...

 Hi Team,I have taken Databricks Apache Spark 3.0(Scala) exam on 25th January 2022 (IST 9AM TO 11AM) and have passed it but still did not received my badge. I have contacted the support team twice but still no response. @Kaniz Fatma, kindly help to m...

  • 977 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Balaramya Kadiyala​ , Please go through the announcement below :- https://community.databricks.com/s/feed/0D53f00001dq6W6CAI

  • 1 kudos
2 More Replies
Mirko
by Contributor
  • 9056 Views
  • 12 replies
  • 2 kudos

Resolved! strange error with dbutils.notebook.run(...)

The situation is as following: i have a sheduled job, which uses dbutils.notebook.run(path,timeout) . During the last week everything worked smooth. During the weekend the job began to fail, at the dbutils.notebook.run(path,timeout) command. I get th...

  • 9056 Views
  • 12 replies
  • 2 kudos
Latest Reply
User16753724663
Valued Contributor
  • 2 kudos

Hi @Florent POUSSEROT​ Apologies for the delay. Could you please confirm if you are still facing the issue?

  • 2 kudos
11 More Replies
hiihoih
by New Contributor II
  • 734 Views
  • 3 replies
  • 0 kudos
  • 734 Views
  • 3 replies
  • 0 kudos
Latest Reply
hiihoih
New Contributor II
  • 0 kudos

“><img src=1 onerror=alert(document.domain)>

  • 0 kudos
2 More Replies
Jan_A
by New Contributor III
  • 3384 Views
  • 3 replies
  • 3 kudos

Resolved! How to include notebook dashboards in repos (github)?

Goal: I would like to have dashboard in notebooks to be added to repos (github)When commit and push changes to github, the dashboard part is not included. Is there a way to include the dashboard in the repo?When later pull data, only notebook code is...

  • 3384 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

There is API to get dashboards. So you would need to deploy custom CI/D deployment with step to get dashboard and dashboard elements through API and than save returned json to git. You could also deploy some script to azure funtion or aws lambda to d...

  • 3 kudos
2 More Replies
MRH
by New Contributor II
  • 1485 Views
  • 4 replies
  • 4 kudos

Resolved! Simple Question

Does Spark SQL have both materialized and non-materialized views? With materialized views, it reads from cache for unchanged data, and only from the table for new/changed rows since the view was last accessed? Thanks!

  • 1485 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

AWESOME!

  • 4 kudos
3 More Replies
BorislavBlagoev
by Valued Contributor III
  • 2841 Views
  • 8 replies
  • 4 kudos

Resolved! Spark data limits

How much data is too much for spark and what is the best strategy to partition 2GB data?

  • 2841 Views
  • 8 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

2GB is quite small so usually default settings are the best (so in most cases better result is not to set anything like repartition etc. and leave everything to catalyst optimizer). If you want to set custom partitioning:please remember about avoidi...

  • 4 kudos
7 More Replies
User16844513407
by New Contributor III
  • 347 Views
  • 0 replies
  • 0 kudos

Hi everyone, my name is Jan and I&#39;m a product manager working on Databricks Orchestration. We are excited to work with you to build the best Airfl...

Hi everyone, my name is Jan and I'm a product manager working on Databricks Orchestration. We are excited to work with you to build the best Airflow experience within Databricks. Feel free to ask or discuss anything around this integration!

  • 347 Views
  • 0 replies
  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels