cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Volkan_Gumuskay
by New Contributor III
  • 7106 Views
  • 6 replies
  • 3 kudos

Resolved! Is there a way to run a single or selected lines in a notebook?

Assume we have a given cellprint('A') print('B') print('C')I want to run only the below line.print('B')Obviously, I can seperate the cell into three and run the one I want, but this is timely. This is a feature I use so often (e.g. in pycharm) and wo...

  • 7106 Views
  • 6 replies
  • 3 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 3 kudos

@Volkan_Gumuskay This is also available as an option in the notebook run options.

  • 3 kudos
5 More Replies
shelly
by New Contributor
  • 2597 Views
  • 3 replies
  • 0 kudos

take() operation throwing index out of range error

x=[1,2,3,4,5,6,7]rdd = sc.parallelize(x)print (rdd.take(2))Traceback (most recent call last): File "/usr/local/spark/python/pyspark/serializers.py", line 458, in dumps return cloudpickle.dumps(obj, pickle_protocol) ^^^^^^^^^^^^^^^^^^...

  • 2597 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Shelly Bhardwaj​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 0 kudos
2 More Replies
Callum
by New Contributor II
  • 11612 Views
  • 3 replies
  • 2 kudos

Pyspark Pandas column or index name appears to persist after being dropped or removed.

So, I have this code for merging dataframes with pyspark pandas. And I want the index of the left dataframe to persist throughout the joins. So following suggestions from others wanting to keep the index after merging, I set the index to a column bef...

  • 11612 Views
  • 3 replies
  • 2 kudos
Latest Reply
Serlal
New Contributor III
  • 2 kudos

Hi!I tried debugging your code and I think that the error you get is simply because the column exists in two instances of your dataframe within your loop.I tried adding some extra debug lines in your merge_dataframes function:and after executing that...

  • 2 kudos
2 More Replies
tanjil
by New Contributor III
  • 2564 Views
  • 2 replies
  • 2 kudos

print(flush = True) not working

Hello, I have the following minimum example working example using multiprocessing:from multiprocessing import Pool   files_list = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)]     def f(t): print('Hello from child process', flush = Tr...

  • 2564 Views
  • 2 replies
  • 2 kudos
Latest Reply
tanjil
New Contributor III
  • 2 kudos

No errors are generated. The code executes successfully, but there the print statement for "Hello from child process" does not work.

  • 2 kudos
1 More Replies
RohitKulkarni
by Contributor II
  • 5044 Views
  • 2 replies
  • 1 kudos

Salesforce to Databricks

Hello Team,I am trying to run the salesforce and try to extract the data.AT that time i am facing the below issue :SOURCE_SYSTEM_NAME = 'Salesforce'TABLE_NAME = 'XY'desc = eval("sf." + TABLE_NAME + ".describe()")print(desc)for field in desc['fields']...

  • 5044 Views
  • 2 replies
  • 1 kudos
Latest Reply
Vidula
Honored Contributor
  • 1 kudos

Hi @Rohit Kulkarni​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 1 kudos
1 More Replies
Data_Bricks1
by New Contributor III
  • 3778 Views
  • 7 replies
  • 0 kudos

data from 10 BLOB containers and multiple hierarchical folders(every day and every hour folders) in each container to Delta lake table in parquet format - Incremental loading for latest data only insert no updates

I am able to load data for single container by hard coding, but not able to load from multiple containers. I used for loop, but data frame is loading only last container's last folder record only.Here one more issue is I have to flatten data, when I ...

  • 3778 Views
  • 7 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

for sure function (def) should be declared outside loop, move it after importing libraries,logic is a bit complicated you need to debug it using display(Flatten_df2) (or .show()) and validating json after each iteration (using break or sleep etc.)

  • 0 kudos
6 More Replies
Labels