cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

shelly
by New Contributor
  • 2767 Views
  • 3 replies
  • 0 kudos

take() operation throwing index out of range error

x=[1,2,3,4,5,6,7]rdd = sc.parallelize(x)print (rdd.take(2))Traceback (most recent call last): File "/usr/local/spark/python/pyspark/serializers.py", line 458, in dumps return cloudpickle.dumps(obj, pickle_protocol) ^^^^^^^^^^^^^^^^^^...

  • 2767 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Shelly Bhardwaj​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 0 kudos
2 More Replies
hitesh22
by New Contributor II
  • 4323 Views
  • 5 replies
  • 0 kudos
  • 4323 Views
  • 5 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, I am not sure if this helps: https://www.databricks.com/blog/2020/12/15/python-autocomplete-improvements-for-databricks-notebooks.htmlAlso, please tag @Debayan​ with your next response which will notify me. Thank you!

  • 0 kudos
4 More Replies
fuselessmatt
by Contributor
  • 8328 Views
  • 3 replies
  • 0 kudos

Omitting columns in an INSERT statement does not seem to work despite meeting the requirements

We want to use the INSERT INTO command with specific columns as specified in the official documentation. The only requirements for this are​️ Databricks SQL warehouse version 2022.35 or higher️ Databricks Runtime 11.2 and above​and the behaviour shou...

  • 8328 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Fusselmanwog​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 0 kudos
2 More Replies
MetaRossiVinli
by Contributor
  • 6822 Views
  • 2 replies
  • 4 kudos

Resolved! Can you use the Secrets API 2.0 in a Delta Live Tables configuration?

Is the Secrets API 2.0 not applied to Delta Live Tables configurations? I understand that the Secrets API 2.0 is in public preview and this use case may not be supported, yet. I tried the following and both do not work for the stated reasons.In a DLT...

  • 6822 Views
  • 2 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Kevin Rossi​ : As a workaround, you can use the code you provided to load the secret in a cell in a DLT notebook and set it in the Spark configuration. This will allow you to use the secret in your DLT code.Another workaround could be to store the c...

  • 4 kudos
1 More Replies
mshettar
by New Contributor II
  • 1749 Views
  • 2 replies
  • 2 kudos

newAPIHadoopRDD Spark API doesn't retrieve unflushed data written to Hbase table

Reading from an HBase table with a few hundred records that haven't been persisted (flushed) to HDFS doesn't show up in Spark. However, the records become visible after forced flush via Hbase shell or system triggered flush (when size of Memstore cro...

  • 1749 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Manjunath Shettar​ :It seems that the issue is related to the fact that the records in the HBase table have not been flushed to HDFS and are still stored in the Memstore. Spark's newAPIHadoopRDD API reads data from the HBase table through HBase's Ta...

  • 2 kudos
1 More Replies
Osky_Rosky
by New Contributor II
  • 11168 Views
  • 2 replies
  • 0 kudos

Combine Python + R in data manipulation in Databricks Notebook

Want to combine Py + Rfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName("CreateDataFrame").getOrCreate()# Create a sample DataFramedata = [("Alice", 25), ("Bob", 30), ("Charlie", 35), ("Oscar",36), ("Hiromi",41), ("Alejandro", ...

  • 11168 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Oscar CENTENO MORA​ :To combine Py and R in a Databricks notebook, you can use the magics command %python and %rto switch between Python and R cells. Here's an example of how to create a Spark DataFrame in Python and then use it in R:from pyspark.sq...

  • 0 kudos
1 More Replies
TheRealJimShady
by New Contributor
  • 10166 Views
  • 7 replies
  • 0 kudos

Resolved! Email destination not appearing in Job's System Notification list.

On job failure I need to send an email with a custom subject line. I have configured the email address as a destination with the subject that I need, but I don't see it as an option that I can choose in the 'System Notification' dialog in the job set...

Screenshot 2023-03-30 161113 Screenshot 2023-03-30 161231
  • 10166 Views
  • 7 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @James Smith​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

  • 0 kudos
6 More Replies
Hunter1604
by New Contributor II
  • 10405 Views
  • 5 replies
  • 0 kudos

How to remove checkpoints from DeltaLake table ?

How to remove checkpoints from DeltaLake table ?I see that on my delta table exist a few checkpoints I want to remove the oldest one. It seems that existing of it is blocking removing the oldest _delta_logs entries

  • 10405 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Pawel Woj​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

  • 0 kudos
4 More Replies
646901
by New Contributor II
  • 6210 Views
  • 6 replies
  • 0 kudos

Using databricks as an application database?

Is databricks suitable to be used as an application database? I have been asked to build a fairly large CRM type app, databricks will use the data from this for analysis. I am thinking if i built the application database inside of databricks then i c...

  • 6210 Views
  • 6 replies
  • 0 kudos
Latest Reply
646901
New Contributor II
  • 0 kudos

@Vigneshraja Palaniraj​ Latency under 50-100ms to the database would be ideal, once we start adding 2-5 queries in a request, and time after this really starts to compound and add up. Concurrency - the number of users initially will be 5-10 users bu...

  • 0 kudos
5 More Replies
JonD
by New Contributor III
  • 3413 Views
  • 3 replies
  • 0 kudos

Resolved! Why does my Single Node cluster automatically resize num_workers?

Hi community,We have setup a Databricks cluster as Single node with num_workers=0 . Sometimes the cluster automatically resizes to e.g. 10 workers. When I edit the cluster subsequently it gives an error that num_workers is not allowed for Single node...

  • 3413 Views
  • 3 replies
  • 0 kudos
Latest Reply
JonD
New Contributor III
  • 0 kudos

I think the issue is solved, at least it didn't occur in the last month. We monitored this via Azure Log Analytics. Maybe it was solved due to some patch/update, thanks anyway!

  • 0 kudos
2 More Replies
pvignesh92
by Honored Contributor
  • 7670 Views
  • 6 replies
  • 2 kudos

Resolved! Optimizing Writes from Databricks to Snowflake

My job after doing all the processing in Databricks layer writes the final output to Snowflake tables using df.write API and using Spark snowflake connector. I often see that even a small dataset (16 partitions and 20k rows in each partition) takes a...

  • 7670 Views
  • 6 replies
  • 2 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 2 kudos

There are few options I tried out which had given me a better performance.Caching the intermediate or final results so that while writing the dataframe computation does not repeat again. Coalesce the results into the partitions 1x or 0.5x your number...

  • 2 kudos
5 More Replies
ahmedE_
by New Contributor II
  • 4852 Views
  • 6 replies
  • 0 kudos

How to install AI library aif360 on databricks notebook

Hello,I'm trying to install a library called aif360 on the databricks notebook. However, I get error that tkinter is not installed.I tried installing tk and tk-tools, but still the issue remains. Any idea on what solution we can use? I also tried ins...

no way to instal tkinter
  • 4852 Views
  • 6 replies
  • 0 kudos
Latest Reply
Vartika
Databricks Employee
  • 0 kudos

Hi @Ahmed Elghareeb​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.This will ...

  • 0 kudos
5 More Replies
Sandesh87
by New Contributor III
  • 1826 Views
  • 3 replies
  • 0 kudos

parse and combine multiple datasets within a single file

An application receives messages from event hub. Below is a message received from event hub and loaded into a dataframe with one columnname,gender,idsam,m,001-----time,x,y,z,long,lat160,22,45,51,83,56230,82,95,48,18,26-----event,a,b,c034,1,5,6073,4,2...

  • 1826 Views
  • 3 replies
  • 0 kudos
Latest Reply
Vartika
Databricks Employee
  • 0 kudos

Hi @Sandesh Puligundla​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you...

  • 0 kudos
2 More Replies
KVNARK
by Honored Contributor II
  • 2329 Views
  • 4 replies
  • 2 kudos

Azure SQL date function conversion to Databricks SQL.

I need to convert the below azure sql date_add function to databricks sql. But not getting the expected output. Can anyone suggest what can be done for this.DATE_ADD(Hour,(SELECT t1.SLA FROM SLA t1 WHERE t1.Stage_Id = 2 AND t1.RNK = 1)

  • 2329 Views
  • 4 replies
  • 2 kudos
Latest Reply
Vartika
Databricks Employee
  • 2 kudos

Hi @KVNARK .​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 2 kudos
3 More Replies
Anonymous
by Not applicable
  • 819 Views
  • 1 replies
  • 2 kudos

Join our Community Social Group and Never Miss a Beat! Are you looking to connect with like-minded individuals and stay on top of the latest news and ...

Join our Community Social Group and Never Miss a Beat!Are you looking to connect with like-minded individuals and stay on top of the latest news and events in your community? Look no further than our special group on Community called the "Community S...

  • 819 Views
  • 1 replies
  • 2 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 2 kudos

@Rishabh Pandey​ 

  • 2 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels