cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MaximS
by New Contributor
  • 2171 Views
  • 1 replies
  • 1 kudos

OPTIMIZE command failed to complete on partitioned dataset

Trying to optimize delta table with following stats:size: 212,848 blobs, 31,162,417,246,985 bytescommand: OPTIMIZE <table> ZORDER BY (X, Y, Z)In Spark UI I can see all work divided to batches, and each batch start with 400 tasks to collect data. But ...

  • 2171 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

can you share some sample datasets for this by that we can debug and help you accordingly ThanksAviral

  • 1 kudos
auser85
by New Contributor III
  • 5774 Views
  • 1 replies
  • 1 kudos

How to incorporate these GC options into my Databricks Cluster? )(spark.executor.extraJavaOptions)

I want to try incorporating these options into my databricks cluster.spark.driver.extraJavaOptions -XX:+UseG1GC -XX:+G1SummarizeConcMark spark.executor.extraJavaOptions -XX:+UseG1GC -XX:+G1SummarizeConcMarkIf I put them under Compute -> Cluster -> Co...

  • 5774 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

hey @Andrew Fogarty​ , I think this is only for the spark-submit command, not for cluster UI.Please have a look at this doc - http://progexc.blogspot.com/2014/12/spark-configuration-mess-solved.htmlspark.executor.extraJavaOptionsA string of extra JVM...

  • 1 kudos
RajibRajib_Mand
by New Contributor III
  • 3384 Views
  • 3 replies
  • 2 kudos

Multiple Databricks cluster in same workspace

Hi All,I have created three cluster(dev,qa,prod)in the same work​space to isolate data for different environment.How do we differentiate environment while running job using dev it should update data for dev environment?​Regards,Rajib​

  • 3384 Views
  • 3 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

hey @Rajib Rajib Mandal​ , this is very easy, i have done this multiple times, you can segregate data using your IAM role that is attached to the cluster, it is known as an Instance profile, you can only give the dev data access to dev role and the s...

  • 2 kudos
2 More Replies
SIRIGIRI
by Databricks Partner
  • 1320 Views
  • 1 replies
  • 1 kudos

medium.com

Sorting In Spark**How to sort null values First and last of the records in the Spark data frame?Please find the answershttps://medium.com/@sharikrishna26/sorting-in-spark-a57db245ecd4

  • 1320 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

Yeah this is really good post,keep it up Man

  • 1 kudos
Aviral-Bhardwaj
by Esteemed Contributor III
  • 2730 Views
  • 0 replies
  • 31 kudos

Understanding Cluster Pools Sometimes we want to run our databricks code without any delay as reports are very emergency like the upstream team wants ...

Understanding Cluster PoolsSometimes we want to run our databricks code without any delay as reports are very emergency like the upstream team wants to save as much time as they can save in the starting cluster.That time we can use the pool of cluste...

  • 2730 Views
  • 0 replies
  • 31 kudos
Aviral-Bhardwaj
by Esteemed Contributor III
  • 2919 Views
  • 0 replies
  • 31 kudos

Databricks New Runtime Version is Available Now  PySpark memory profiling- Memory profiling is now enabled for PySpark user-defined functions. This pr...

Databricks New Runtime Version is Available Now PySpark memory profiling- Memory profiling is now enabled for PySpark user-defined functions. This provides information on memory increment, memory usage, and number of occurrences for each line of code...

image
  • 2919 Views
  • 0 replies
  • 31 kudos
ahana
by New Contributor III
  • 3616 Views
  • 1 replies
  • 2 kudos

error too large report

hi i am trying to pull the data from quick base but it is giving me error-: too large reportbelow are the code i used@%pythondf = quickbasePull('b5zj8k_pbz5_0_cd5h4wbb77n4nvp95b4u','bq2nq8jm7',4)2) i tried below code but its not displaying in correc...

image image
  • 3616 Views
  • 1 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

Hey @ahana ahana​ ,this code is not working

  • 2 kudos
joakon
by New Contributor III
  • 15860 Views
  • 6 replies
  • 6 kudos
  • 15860 Views
  • 6 replies
  • 6 kudos
Latest Reply
huyd
New Contributor III
  • 6 kudos

check your read cell, "Delimeter"

  • 6 kudos
5 More Replies
Deiry
by Databricks Partner
  • 3294 Views
  • 1 replies
  • 2 kudos

spark.apache.org

Hey fellow co-workers!!I have been doing the Apache Spark programming in Databricks academy and I realized the hyperlinks here it doesn't work.Spark session https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql.html#spark-session-apis

image
  • 3294 Views
  • 1 replies
  • 2 kudos
Latest Reply
labtech
Valued Contributor II
  • 2 kudos

@Deiry Navas​ Could you share the repo link usued to exectue that notebook? I'll on my side

  • 2 kudos
fury88
by New Contributor II
  • 1972 Views
  • 1 replies
  • 0 kudos

Why are the get..Id() functions returning 'some(123456)' instead of just the id?

Hey fellow users, I've successfully retrieved the notebook context during job runs and there are several getId calls. For some reason when the ids are returned, they are wrapped in a some() instead of just the number. Does anyone know why this is the...

  • 1972 Views
  • 1 replies
  • 0 kudos
Latest Reply
fury88
New Contributor II
  • 0 kudos

Well, my post for me is irrelevant now!! I just stumbled across this beauty which avoids me having to do any of this and deal with odd return values.How to get the Job ID and Run ID and save into a database (databricks.com)Are the braces {{job_id}} n...

  • 0 kudos
RohitKulkarni
by Contributor II
  • 9964 Views
  • 6 replies
  • 6 kudos

External Table issue format in databricks

I am new to databricksI am trying to create a external table in databricks with below format :CREATE EXTERNAL TABLE Salesforce.Account( Id string ,  IsDeleted bigint,  Name string ,  Type string ,  RecordTypeId string ,  ParentId string ,  ShippingSt...

  • 9964 Views
  • 6 replies
  • 6 kudos
Latest Reply
AmitA1
Databricks Partner
  • 6 kudos

Databricks is awesome if you have SQL knowledge....I just came across one of my problem in my project and databricks helped me a lot....like a use of low watermark to hold the load success date .....​

  • 6 kudos
5 More Replies
jt
by New Contributor III
  • 4494 Views
  • 2 replies
  • 3 kudos

collapse partial code in large cell?

In databricks notebook, we have SQL cells that are over 700 lines long. Is there a way to collapse a portion of the code vs scrolling? Looking for something similar to what exists in Netezza, "--region" and "--end region" where anything between those...

  • 4494 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @james t​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 3 kudos
1 More Replies
KrishZ
by Contributor
  • 8210 Views
  • 4 replies
  • 1 kudos

How to print the path of a .py file or a notebook?

I have stored a test.py in the dbfs at the below location "/dbfs/FileStore/shared_uploads/krishna@company.com/Project_Folder/test.py"I have a print statement in test.py which says the belowprint( os.getcwd() )and it prints the below'/databricks/drive...

  • 8210 Views
  • 4 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

Hey @Krishna Zanwar​  Please use the below code this will work and as you want the specific location you can create a custom code and format the path using a python formatter , it will give you desired result .

  • 1 kudos
3 More Replies
Labels