cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

raman
by New Contributor II
  • 577 Views
  • 2 replies
  • 0 kudos

Spark pushdown filter not being respected on dbfs

I have a parquet files with a column g1 with schemaStructField(g1,IntegerType,true)Now I have a query with filter on g1.What's weird in the SQL viewer is that spark is loading all the rows from that file. Even though in the physical plan I can see th...

  • 577 Views
  • 2 replies
  • 0 kudos
Latest Reply
raman
New Contributor II
  • 0 kudos

Thanks @Ajay Pandey​ pls find attached the physical plan.Query: Select identityMap, segmentMembership, _repo, workEmail, person, homePhone, workPhone, workAddress, personalEmail, homeAddress from final_segment_index_table_v2 where (g1 >= 128 AND g1 <...

  • 0 kudos
1 More Replies
Kopal
by New Contributor II
  • 2790 Views
  • 3 replies
  • 3 kudos

Resolved! Data Engineering - CTAS - External Tables - Limitations of CTAS for external tables - can or cannot use options and location

Data Engineering - CTAS - External TablesCan someone help me understand why In chapter 3.3, we cannot not directly use CTAS with OPTIONS and LOCATION to specify delimiter and location of CSV?Or I misunderstood?Details:In Data Engineering with Databri...

  • 2790 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

The 2nd statement CTAS will not be able to parse the csv in any manner because it's just the from statement that points to a file. It's more of a traditional SQL statement with select and from. It will create a Delta Table. This just happens to b...

  • 3 kudos
2 More Replies
jt
by New Contributor III
  • 979 Views
  • 2 replies
  • 1 kudos

Table of Content consistency

When I click on header "STEP 3" in the table of contents, it takes me to the correct section.  However, when I click on "STEP 2" - the table of contents stays on "STEP 3". This sometime causes confusion. For consistency, is there any way to highligh...

image001 image002
  • 979 Views
  • 2 replies
  • 1 kudos
Latest Reply
jt
New Contributor III
  • 1 kudos

If you click on cell "Command-4", does the table of contact (on the left) highlight "Command-4"?

  • 1 kudos
1 More Replies
User16869510359
by Esteemed Contributor
  • 902 Views
  • 1 replies
  • 2 kudos
  • 902 Views
  • 1 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

because your driver is not able to talk with your nodes for this you can add configuration where you can increase databricks heartbeat interval and you can also add rpc max size this will also help.you can explore cluster configuration from here- htt...

  • 2 kudos
spott_submittab
by New Contributor II
  • 563 Views
  • 1 replies
  • 0 kudos

A Job "pool"? (or task pool)

I'm trying to run a single job multiple times with different parameters where the number of concurrent jobs is less than the number of parameters.I have a job (or task...) J that takes parameter set p, I have 100 p values I want to run, however I onl...

  • 563 Views
  • 1 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

this is something new,interesting question, try to reach out databricks support team, maybe they have some good idea here

  • 0 kudos
aka1
by New Contributor II
  • 931 Views
  • 1 replies
  • 3 kudos

dbx - run unit test error (java.lang.NoSuchMethodError)

I am setting up dbx for the fist time on Windows 10. Strictly following https://dbx.readthedocs.io/en/latest/guides/python/python_quickstart/openjdk is installed conda install -c conda-forge openjdk=11.0.15winutils.exe for Hadoop 3 is downloaded, pat...

image.png image image
  • 931 Views
  • 1 replies
  • 3 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 3 kudos

this seems code issue only

  • 3 kudos
MaximS
by New Contributor
  • 843 Views
  • 1 replies
  • 1 kudos

OPTIMIZE command failed to complete on partitioned dataset

Trying to optimize delta table with following stats:size: 212,848 blobs, 31,162,417,246,985 bytescommand: OPTIMIZE <table> ZORDER BY (X, Y, Z)In Spark UI I can see all work divided to batches, and each batch start with 400 tasks to collect data. But ...

  • 843 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

can you share some sample datasets for this by that we can debug and help you accordingly ThanksAviral

  • 1 kudos
auser85
by New Contributor III
  • 2308 Views
  • 1 replies
  • 1 kudos

How to incorporate these GC options into my Databricks Cluster? )(spark.executor.extraJavaOptions)

I want to try incorporating these options into my databricks cluster.spark.driver.extraJavaOptions -XX:+UseG1GC -XX:+G1SummarizeConcMark spark.executor.extraJavaOptions -XX:+UseG1GC -XX:+G1SummarizeConcMarkIf I put them under Compute -> Cluster -> Co...

  • 2308 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

hey @Andrew Fogarty​ , I think this is only for the spark-submit command, not for cluster UI.Please have a look at this doc - http://progexc.blogspot.com/2014/12/spark-configuration-mess-solved.htmlspark.executor.extraJavaOptionsA string of extra JVM...

  • 1 kudos
RajibRajib_Mand
by New Contributor III
  • 1076 Views
  • 3 replies
  • 2 kudos

Multiple Databricks cluster in same workspace

Hi All,I have created three cluster(dev,qa,prod)in the same work​space to isolate data for different environment.How do we differentiate environment while running job using dev it should update data for dev environment?​Regards,Rajib​

  • 1076 Views
  • 3 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

hey @Rajib Rajib Mandal​ , this is very easy, i have done this multiple times, you can segregate data using your IAM role that is attached to the cluster, it is known as an Instance profile, you can only give the dev data access to dev role and the s...

  • 2 kudos
2 More Replies
SIRIGIRI
by Contributor
  • 372 Views
  • 1 replies
  • 1 kudos

medium.com

Sorting In Spark**How to sort null values First and last of the records in the Spark data frame?Please find the answershttps://medium.com/@sharikrishna26/sorting-in-spark-a57db245ecd4

  • 372 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

Yeah this is really good post,keep it up Man

  • 1 kudos
Aviral-Bhardwaj
by Esteemed Contributor III
  • 657 Views
  • 0 replies
  • 31 kudos

Understanding Cluster Pools Sometimes we want to run our databricks code without any delay as reports are very emergency like the upstream team wants ...

Understanding Cluster PoolsSometimes we want to run our databricks code without any delay as reports are very emergency like the upstream team wants to save as much time as they can save in the starting cluster.That time we can use the pool of cluste...

  • 657 Views
  • 0 replies
  • 31 kudos
Aviral-Bhardwaj
by Esteemed Contributor III
  • 890 Views
  • 0 replies
  • 31 kudos

Databricks New Runtime Version is Available Now  PySpark memory profiling- Memory profiling is now enabled for PySpark user-defined functions. This pr...

Databricks New Runtime Version is Available Now PySpark memory profiling- Memory profiling is now enabled for PySpark user-defined functions. This provides information on memory increment, memory usage, and number of occurrences for each line of code...

image
  • 890 Views
  • 0 replies
  • 31 kudos
ahana
by New Contributor III
  • 1251 Views
  • 1 replies
  • 2 kudos

error too large report

hi i am trying to pull the data from quick base but it is giving me error-: too large reportbelow are the code i used@%pythondf = quickbasePull('b5zj8k_pbz5_0_cd5h4wbb77n4nvp95b4u','bq2nq8jm7',4)2) i tried below code but its not displaying in correc...

image image
  • 1251 Views
  • 1 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

Hey @ahana ahana​ ,this code is not working

  • 2 kudos
rammy
by Contributor III
  • 4574 Views
  • 6 replies
  • 5 kudos

How I could read the Job id, run id and parameters in python cell?

I have tried following ways to get job parameters but none of the things are working.runId='{{run_id}}' jobId='{{job_id}}' filepath='{{filepath}}' print(runId," ",jobId," ",filepath) r1=dbutils.widgets.get('{{run_id}}') f1=dbutils.widgets.get('{{file...

  • 4574 Views
  • 6 replies
  • 5 kudos
Latest Reply
rammy
Contributor III
  • 5 kudos

Thanks for your response. I found the solution. The below code gives me all the job parametersall_args = dbutils.notebook.entry_point.getCurrentBindings()print(all_args)Thanks for your support

  • 5 kudos
5 More Replies
Labels
Top Kudoed Authors