cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

boomerangairpla
by New Contributor
  • 196 Views
  • 0 replies
  • 0 kudos

Liftndrift is a paper airplane blog, that helps the world to learn paper airplanes through easy and simple illustrated instructions, we are specialize...

Liftndrift is a paper airplane blog, that helps the world to learn paper airplanes through easy and simple illustrated instructions, we are specialized in teaching how to make a paper airplane. how to make the world record paper airplane

  • 196 Views
  • 0 replies
  • 0 kudos
wyzer
by Contributor II
  • 1088 Views
  • 3 replies
  • 2 kudos

Resolved! Are we using the advantage of "Map & Reduce" ?

Hello,We are new on Databricks and we would like to know if our working method are good.Currently, we are working like this :spark.sql("CREATE TABLE Temp (SELECT avg(***), sum(***) FROM aaa LEFT JOIN bbb WHERE *** >= ***)")With this method, are we us...

  • 1088 Views
  • 3 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

Spark will handle the map/reduce for you.So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.You just care about what you want as a result.And afterwards when you are more...

  • 2 kudos
2 More Replies
Databricks_7045
by New Contributor III
  • 1261 Views
  • 4 replies
  • 0 kudos

Resolved! Encapsulate Databricks Pyspark/SparkSql code

Hi All ,I have Custom code ( Pyspark & SparkSQL) (notebooks) which I want to deploy at customer location and encapsulate so that end customers don't see the actual code. Currently we have all code in Notebooks (Pyspark/spark sql). Could you please l...

  • 1261 Views
  • 4 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

With notebooks that is not possible.You can write your code in scala/java and build a jar, which you then run with spark-submit.(example)Or use python and deploy a wheel.(example)This can become quite complex when you have dependencies.Also: a jar et...

  • 0 kudos
3 More Replies
ghiet
by New Contributor II
  • 1872 Views
  • 7 replies
  • 6 kudos

Resolved! Cannot sign up to Databricks community edition - CAPTCHA error

Hello. I cannot sign up to have access to the community edition. I always get an error message "CAPTCHA error... contact our sales team". I do not have this issue if I try to create a trial account for Databricks hosted on AWS. However, I do not have...

  • 1872 Views
  • 7 replies
  • 6 kudos
Latest Reply
joao_hoffmam
New Contributor III
  • 6 kudos

Hi @Guillaume Hiet​ ,I was facing the same issue. Try signing up using your mobile phone, it worked for me!

  • 6 kudos
6 More Replies
data_scientist
by New Contributor II
  • 1249 Views
  • 2 replies
  • 2 kudos

Resolved! how to load a .w2v format saved model in databricks

Hi,I am trying load a pre-trained word2vec model which has been saved in .w2v format in databricks. I am not able to load this file . Help me with the correct command.

  • 1249 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @sonam de​ , To save models, use the MLflow functions log_model and save_model. You can also save models using their native APIs onto Databricks File System (DBFS). For MLlib models, use ML Pipelines.To export models for serving individual predict...

  • 2 kudos
1 More Replies
tusworten
by New Contributor II
  • 3810 Views
  • 5 replies
  • 4 kudos

Spark SQL Group by duplicates, collect_list in array of structs and evaluate rows in each group.

I'm begginner working with Spark SQL in Java API. I have a dataset with duplicate clients grouped by ENTITY and DOCUMENT_ID like this:.withColumn( "ROWNUMBER", row_number().over(Window.partitionBy("ENTITY", "ENTITY_DOC").orderBy("ID")))I added a ROWN...

1
  • 3810 Views
  • 5 replies
  • 4 kudos
Latest Reply
tusworten
New Contributor II
  • 4 kudos

Hi @Kaniz Fatma​ Her answer didn't solve my problem but it was useful to learn more about UDFS, which I did not know.

  • 4 kudos
4 More Replies
Constantine
by Contributor III
  • 2181 Views
  • 3 replies
  • 2 kudos

Resolved! OPTIMIZE throws an error after doing MERGE on the table

I have a table on which I do upsert i.e. MERGE INTO table_name ...After which I run OPTIMIZE table_nameWhich throws an errorjava.util.concurrent.ExecutionException: io.delta.exceptions.ConcurrentDeleteReadException: This transaction attempted to read...

  • 2181 Views
  • 3 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

You can try to change isolation level:https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/isolation-levelIn merge is good to specify all partitions in merge conditions.It can also happen when script is running concurrently.

  • 2 kudos
2 More Replies
Smart_City_Laho
by New Contributor
  • 230 Views
  • 0 replies
  • 0 kudos

sigmaproperties.com.pk

smart city lahore lahore smart city locationlahore smart city payment plansmart city lahore locationcapital smart city lahore

  • 230 Views
  • 0 replies
  • 0 kudos
al_joe
by Contributor
  • 1081 Views
  • 1 replies
  • 0 kudos

Resolved! Unable to do Developer Foundation Capstone. Where should I get/put "Registration ID"?

I am trying to do the "Developer Foundations Capstone".The first step as per video asks us to get the "Registration ID" from LMS email and plug it into the Notebook once you import the DBC.Two problems --#1 - I cannot locate any Registration ID at al...

  • 1081 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Al Jo​ , I've informed the concerned team to take any action on the mentioned observation. I'm sure they'll hop in here anytime soon to fix your problem.We’d like to thank you again for taking the time to write this review of our new LMS. All fee...

  • 0 kudos
ST
by New Contributor II
  • 1543 Views
  • 2 replies
  • 2 kudos

Resolved! Convert Week of Year to Month in SQL?

Hi all, Was wondering if there was any built in function or code that I could utilize to convert a singular week of year integer (i.e. 1 to 52), into a value representing month (i.e. 1-12)? The assumption is that a week start on a Monday and end on a...

  • 1543 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

we need old parser as new doesn't support weeks. Than we can map what we need using w - year of year and u - first day of the week:spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY") spark.sql(""" SELECT extract( month from to_date...

  • 2 kudos
1 More Replies
AmarK
by New Contributor III
  • 5124 Views
  • 3 replies
  • 1 kudos

Resolved! Is there a way to programatically retrieve a workspace name ?

Is there a spark command in databricks that will tell me what databricks workspace I am using? I’d like to parameterise my code so that I can update delta lake file paths automatically depending on the workspace (i.e. it picks up the dev workspace na...

  • 5124 Views
  • 3 replies
  • 1 kudos
Latest Reply
AmarK
New Contributor III
  • 1 kudos

Thanks Navya! But this doesn't work for me on a High Concurrency cluster. It seems that toJson() isn't whitelisted.

  • 1 kudos
2 More Replies
Personal1
by New Contributor
  • 954 Views
  • 2 replies
  • 2 kudos
  • 954 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Abhishek Pradhan​ ! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Than...

  • 2 kudos
1 More Replies
gbrueckl
by Contributor II
  • 2911 Views
  • 6 replies
  • 4 kudos

Resolved! CREATE FUNCTION from Python file

Is it somehow possible to create an SQL external function using Python code?the examples only show how to use JARshttps://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-function.htmlsomething like:CREATE TEMPORAR...

  • 2911 Views
  • 6 replies
  • 4 kudos
Latest Reply
pts
New Contributor II
  • 4 kudos

As a user of your code, I'd find it a less pleasant API because I'd have to some_module.some_func.some_func() rather than just some_module.some_func()No reason to have "some_func" exist twice in the hierarchy. It's kind of redundant. If some_func is ...

  • 4 kudos
5 More Replies
pjp94
by Contributor
  • 6862 Views
  • 5 replies
  • 4 kudos

Resolved! Difference between DBFS and Delta Lake?

Would like a deeper dive/explanation into the difference. When I write to a table with the following code:spark_df.write.mode("overwrite").saveAsTable("db.table")The table is created and can be viewed in the Data tab. It can also be found in some DBF...

  • 6862 Views
  • 5 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

Tables in spark, delta lake-backed or not are basically just semantic views on top of the actual data.On Databricks, the data itself is stored in DBFS, which is an abstraction layer on top of the actual storage (like S3, ADLS etct). this can be parq...

  • 4 kudos
4 More Replies
Labels
Top Kudoed Authors