cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

James_209101
by New Contributor II
  • 5915 Views
  • 2 replies
  • 5 kudos

Using large dataframe in-memory (data not allowed to be "at rest") results in driver crash and/or out of memory

I'm having trouble working on Databricks with data that we are not allowed to save off or persist in any way. The data comes from an API (which returns a JSON response). We have a scala package on our cluster that makes the queries (almost 6k queries...

  • 5915 Views
  • 2 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @James Held​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 5 kudos
1 More Replies
Mado
by Valued Contributor II
  • 1460 Views
  • 2 replies
  • 3 kudos

What is default location when using "writeStream"?

Hi,Assume that I want to write a table by" writeStream". Where is the default location on DBFS where the table is saved?Sample code:spark.table("TEMP_SILVER").writeStream   .option("checkpointLocation", "dbfs:/user/AAA@gmail.com")   ....

  • 1460 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Mohammad Saber​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 3 kudos
1 More Replies
AkilK
by New Contributor II
  • 937 Views
  • 2 replies
  • 3 kudos

community edition workspace password reset issue

I am not able to reset my community edition workspace password. It continuously processing and password not getting rese

  • 937 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Akil Kapasi​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 3 kudos
1 More Replies
logan0015
by Contributor
  • 1059 Views
  • 1 replies
  • 3 kudos

How to move the "__apply changes_storage_mytablename" when creating a streaming live table?

As the title suggests, whenever I create a streaming live table it creates a __apply_changes_storage_"mytablename" section in the database on databricks. Is there a way to specify a different cloud location for these files?

  • 1059 Views
  • 1 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Logan Nicol​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks

  • 3 kudos
farefin
by New Contributor II
  • 2749 Views
  • 2 replies
  • 5 kudos

Need help in a pyspark code in Databricks to calculate a new measure column.

Details of the requirement is as below:I have a table with below structure:So i have to write a code in pyspark to calculate a new column.Logic for new column is Sum of Magnitude for different Categories divided by the total Magnitude.And it should b...

Sample Data
  • 2749 Views
  • 2 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Faizan Arefin​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 5 kudos
1 More Replies
tum
by New Contributor II
  • 3885 Views
  • 3 replies
  • 4 kudos

Create new job api error "MALFORMED_REQUEST"

hi,i'm trying to test create a new job api (v 2.1) with python, but i got error:{ 'error_code': 'MALFORMED_REQUEST', 'message': 'Invalid JSON given in the body of the request - expected a map'}How do i validate json body before posting ?this is my js...

  • 3885 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @tum m​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 4 kudos
2 More Replies
numersoz
by New Contributor III
  • 3787 Views
  • 5 replies
  • 10 kudos

Is ZORDER required after table overwrite?

Hi,After appending new values to a delta table, I need to delete duplicate rows.After deleting duplicate rows using PySpark, I overwrite the table (keeping the schema).My question is, do I have to do ZORDER again?Another question, is there another wa...

  • 3787 Views
  • 5 replies
  • 10 kudos
Latest Reply
DeepakMakwana74
New Contributor III
  • 10 kudos

Hii @Nurettin Ersoz​ try to use incremental load of data so it will avoid duplicate and you can use full load once if you have updation in your data

  • 10 kudos
4 More Replies
Milind
by New Contributor III
  • 4689 Views
  • 7 replies
  • 23 kudos

Resolved! Is there syllabus change in self paced Data Engineering with Databrick course video?

Is there syllabus change in self paced Data Engineering with Databrick course video?Last week i started that video lecture, but today i found that everything is change.https://partner-academy.databricks.com/learn/course/62/data-engineering-with-datab...

  • 4689 Views
  • 7 replies
  • 23 kudos
Latest Reply
DeepakMakwana74
New Contributor III
  • 23 kudos

Hi @Milind Singh​ yes there is keep on updation of syllabus so it is required to be updated on self paced course

  • 23 kudos
6 More Replies
Sagar1
by New Contributor III
  • 3875 Views
  • 3 replies
  • 4 kudos

How to identify or determine how many jobs will be performed if I submit code

I’m not able to find a source where it explains how to determine how many job a written piece of pyspark code will trigger. Can you please help me here. About stages I know that the number of shuffles equals to the number of stages.

  • 3875 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @sagar Varma​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 4 kudos
2 More Replies
Kash
by Contributor III
  • 1410 Views
  • 2 replies
  • 6 kudos

Will Vacuum delete previous folders of data if we z-ordered by as_of_date each day?

Hi there,I've had horrible experiences Vacuuming tables in the past and losing tons of data so I wanted to confirm a few things about Vacuuming and Z-Order.Background:Each day we run an ETL job that appends data in a table and stores the data in S3 b...

  • 1410 Views
  • 2 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hi @Avkash Kana​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 6 kudos
1 More Replies
Sweta
by New Contributor II
  • 3871 Views
  • 6 replies
  • 7 kudos

Can Delta Lake completely host a data warehouse and replace Redshift?

Our use case is simple - to store our PB scale data and transform and use for BI, reporting and analytics. As my title says am trying to eliminate expenditure on Redshift as we are starting as a green field. I know I have designed/used just Delta lak...

  • 3871 Views
  • 6 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hi @Swetha Marakani​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 7 kudos
5 More Replies
Carlton
by Contributor
  • 5031 Views
  • 5 replies
  • 14 kudos

I would like to know why CROSS JOIN fails recognize columns

Whenever I apply a CROSS JOIN to my Databricks SQL query I get a message letting me know that a column does not exists, but I'm not sure if the issue is with the CROSS JOIN.For example, the code should identify characters such as http, https, ://, / ...

image
  • 5031 Views
  • 5 replies
  • 14 kudos
Latest Reply
Shalabh007
Honored Contributor
  • 14 kudos

@CARLTON PATTERSON​ Since you have given an alias "tt" to your table "basecrmcbreport.organizations", to access corresponding columns you will have to access them in format tt.<column_name>in your code in line #4, try accessing the column 'homepage_u...

  • 14 kudos
4 More Replies
RyanD-AgCountry
by Contributor
  • 3704 Views
  • 5 replies
  • 7 kudos

Resolved! Azure Create Metastore button not available

With Unity Catalog gone GA on Azure, we are working through initial tests for setup within Databricks and Azure. However, we are not seeing the "Create Metastore" button available as indicated in documentation. We're also not seeing any additional pr...

  • 3704 Views
  • 5 replies
  • 7 kudos
Latest Reply
Addi1
New Contributor II
  • 7 kudos

I'm facing the same issues listed above. "Create Metastore" button is unavailable for me as well.

  • 7 kudos
4 More Replies
andreiten
by New Contributor II
  • 4572 Views
  • 1 replies
  • 3 kudos

Is there any example or guideline how to pass JSON parameters to the pipeline in Databricks workflow?

I used this source https://docs.databricks.com/workflows/jobs/jobs.html#:~:text=You%20can%20use%20Run%20Now,different%20values%20for%20existing%20parameters.&text=next%20to%20Run%20Now%20and,on%20the%20type%20of%20task. But there is no example of how...

  • 4572 Views
  • 1 replies
  • 3 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 3 kudos

Hi @Andre Ten​ That's exactly how you specify the json parameters in databricks workflow. I have been doing in the same format and it works for me..removed the parameters as it is a bit sensitive. But I hope you get the point.Cheers.

  • 3 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels