cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

SailajaB
by Valued Contributor III
  • 2298 Views
  • 4 replies
  • 6 kudos

Resolved! how to create a nested(unflatten) json from flatten json

Hi ,Is there any function in pyspark which can convert flatten json to nested json.Ex : if we have attribute in flatten is like a_b_c : 23then in unflatten it should be{"a":{"b":{"c":23}}}Thank you

  • 2298 Views
  • 4 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

As @Chuck Connell​ said can you share more of your source json as that example is not json. Additionally flatten is usually to change something like {"status": {"A": 1,"B": 2}} to {"status.A": 1, "status.B": 2} which can be done easily with spark da...

  • 6 kudos
3 More Replies
irfanaziz
by Contributor II
  • 1697 Views
  • 3 replies
  • 1 kudos

Resolved! What is the difference between passing the schema in the options or using the .schema() function in pyspark for a csv file?

I have observed a very strange behavior with some of our integration pipelines. This week one of the csv files was getting broken when read with read function given below.def ReadCSV(files,schema_struct,header,delimiter,timestampformat,encode="utf8...

  • 1697 Views
  • 3 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @nafri A​ ,What is the error you are getting, can you share it please? Like @Hubert Dudek​ mentioned, both will call the same APIs

  • 1 kudos
2 More Replies
boomerangairpla
by New Contributor
  • 191 Views
  • 0 replies
  • 0 kudos

Liftndrift is a paper airplane blog, that helps the world to learn paper airplanes through easy and simple illustrated instructions, we are specialize...

Liftndrift is a paper airplane blog, that helps the world to learn paper airplanes through easy and simple illustrated instructions, we are specialized in teaching how to make a paper airplane. how to make the world record paper airplane

  • 191 Views
  • 0 replies
  • 0 kudos
wyzer
by Contributor II
  • 1063 Views
  • 3 replies
  • 2 kudos

Resolved! Are we using the advantage of "Map & Reduce" ?

Hello,We are new on Databricks and we would like to know if our working method are good.Currently, we are working like this :spark.sql("CREATE TABLE Temp (SELECT avg(***), sum(***) FROM aaa LEFT JOIN bbb WHERE *** >= ***)")With this method, are we us...

  • 1063 Views
  • 3 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

Spark will handle the map/reduce for you.So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.You just care about what you want as a result.And afterwards when you are more...

  • 2 kudos
2 More Replies
Databricks_7045
by New Contributor III
  • 1234 Views
  • 4 replies
  • 0 kudos

Resolved! Encapsulate Databricks Pyspark/SparkSql code

Hi All ,I have Custom code ( Pyspark & SparkSQL) (notebooks) which I want to deploy at customer location and encapsulate so that end customers don't see the actual code. Currently we have all code in Notebooks (Pyspark/spark sql). Could you please l...

  • 1234 Views
  • 4 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

With notebooks that is not possible.You can write your code in scala/java and build a jar, which you then run with spark-submit.(example)Or use python and deploy a wheel.(example)This can become quite complex when you have dependencies.Also: a jar et...

  • 0 kudos
3 More Replies
ghiet
by New Contributor II
  • 1826 Views
  • 7 replies
  • 6 kudos

Resolved! Cannot sign up to Databricks community edition - CAPTCHA error

Hello. I cannot sign up to have access to the community edition. I always get an error message "CAPTCHA error... contact our sales team". I do not have this issue if I try to create a trial account for Databricks hosted on AWS. However, I do not have...

  • 1826 Views
  • 7 replies
  • 6 kudos
Latest Reply
joao_hoffmam
New Contributor III
  • 6 kudos

Hi @Guillaume Hiet​ ,I was facing the same issue. Try signing up using your mobile phone, it worked for me!

  • 6 kudos
6 More Replies
data_scientist
by New Contributor II
  • 1229 Views
  • 2 replies
  • 2 kudos

Resolved! how to load a .w2v format saved model in databricks

Hi,I am trying load a pre-trained word2vec model which has been saved in .w2v format in databricks. I am not able to load this file . Help me with the correct command.

  • 1229 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @sonam de​ , To save models, use the MLflow functions log_model and save_model. You can also save models using their native APIs onto Databricks File System (DBFS). For MLlib models, use ML Pipelines.To export models for serving individual predict...

  • 2 kudos
1 More Replies
tusworten
by New Contributor II
  • 3716 Views
  • 5 replies
  • 4 kudos

Spark SQL Group by duplicates, collect_list in array of structs and evaluate rows in each group.

I'm begginner working with Spark SQL in Java API. I have a dataset with duplicate clients grouped by ENTITY and DOCUMENT_ID like this:.withColumn( "ROWNUMBER", row_number().over(Window.partitionBy("ENTITY", "ENTITY_DOC").orderBy("ID")))I added a ROWN...

1
  • 3716 Views
  • 5 replies
  • 4 kudos
Latest Reply
tusworten
New Contributor II
  • 4 kudos

Hi @Kaniz Fatma​ Her answer didn't solve my problem but it was useful to learn more about UDFS, which I did not know.

  • 4 kudos
4 More Replies
Constantine
by Contributor III
  • 2136 Views
  • 3 replies
  • 2 kudos

Resolved! OPTIMIZE throws an error after doing MERGE on the table

I have a table on which I do upsert i.e. MERGE INTO table_name ...After which I run OPTIMIZE table_nameWhich throws an errorjava.util.concurrent.ExecutionException: io.delta.exceptions.ConcurrentDeleteReadException: This transaction attempted to read...

  • 2136 Views
  • 3 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

You can try to change isolation level:https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/isolation-levelIn merge is good to specify all partitions in merge conditions.It can also happen when script is running concurrently.

  • 2 kudos
2 More Replies
Smart_City_Laho
by New Contributor
  • 221 Views
  • 0 replies
  • 0 kudos

sigmaproperties.com.pk

smart city lahore lahore smart city locationlahore smart city payment plansmart city lahore locationcapital smart city lahore

  • 221 Views
  • 0 replies
  • 0 kudos
al_joe
by Contributor
  • 5259 Views
  • 3 replies
  • 3 kudos

Resolved! Split a code cell at cursor position? Add a cell above/below?

In JupyterLab notebooks, we can --In edit mode, you can press Ctrl+Shift+Minus to split the current cell into two at the cursor position In command mode, you can click A or B to add a cell Above or Below the current cellare there equivalent shortcuts...

  • 5259 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

If you click the keyboard icon in the top right of the notebook it brings up all the hot keys

  • 3 kudos
2 More Replies
al_joe
by Contributor
  • 1071 Views
  • 1 replies
  • 0 kudos

Resolved! Unable to do Developer Foundation Capstone. Where should I get/put "Registration ID"?

I am trying to do the "Developer Foundations Capstone".The first step as per video asks us to get the "Registration ID" from LMS email and plug it into the Notebook once you import the DBC.Two problems --#1 - I cannot locate any Registration ID at al...

  • 1071 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Al Jo​ , I've informed the concerned team to take any action on the mentioned observation. I'm sure they'll hop in here anytime soon to fix your problem.We’d like to thank you again for taking the time to write this review of our new LMS. All fee...

  • 0 kudos
ST
by New Contributor II
  • 1507 Views
  • 2 replies
  • 2 kudos

Resolved! Convert Week of Year to Month in SQL?

Hi all, Was wondering if there was any built in function or code that I could utilize to convert a singular week of year integer (i.e. 1 to 52), into a value representing month (i.e. 1-12)? The assumption is that a week start on a Monday and end on a...

  • 1507 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

we need old parser as new doesn't support weeks. Than we can map what we need using w - year of year and u - first day of the week:spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY") spark.sql(""" SELECT extract( month from to_date...

  • 2 kudos
1 More Replies
AmarK
by New Contributor III
  • 5001 Views
  • 3 replies
  • 1 kudos

Resolved! Is there a way to programatically retrieve a workspace name ?

Is there a spark command in databricks that will tell me what databricks workspace I am using? I’d like to parameterise my code so that I can update delta lake file paths automatically depending on the workspace (i.e. it picks up the dev workspace na...

  • 5001 Views
  • 3 replies
  • 1 kudos
Latest Reply
AmarK
New Contributor III
  • 1 kudos

Thanks Navya! But this doesn't work for me on a High Concurrency cluster. It seems that toJson() isn't whitelisted.

  • 1 kudos
2 More Replies
Labels
Top Kudoed Authors