cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

aka1
by New Contributor II
  • 1246 Views
  • 1 replies
  • 3 kudos

dbx - run unit test error (java.lang.NoSuchMethodError)

I am setting up dbx for the fist time on Windows 10. Strictly following https://dbx.readthedocs.io/en/latest/guides/python/python_quickstart/openjdk is installed conda install -c conda-forge openjdk=11.0.15winutils.exe for Hadoop 3 is downloaded, pat...

image.png image image
  • 1246 Views
  • 1 replies
  • 3 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 3 kudos

this seems code issue only

  • 3 kudos
MaximS
by New Contributor
  • 1077 Views
  • 1 replies
  • 1 kudos

OPTIMIZE command failed to complete on partitioned dataset

Trying to optimize delta table with following stats:size: 212,848 blobs, 31,162,417,246,985 bytescommand: OPTIMIZE <table> ZORDER BY (X, Y, Z)In Spark UI I can see all work divided to batches, and each batch start with 400 tasks to collect data. But ...

  • 1077 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

can you share some sample datasets for this by that we can debug and help you accordingly ThanksAviral

  • 1 kudos
auser85
by New Contributor III
  • 2774 Views
  • 1 replies
  • 1 kudos

How to incorporate these GC options into my Databricks Cluster? )(spark.executor.extraJavaOptions)

I want to try incorporating these options into my databricks cluster.spark.driver.extraJavaOptions -XX:+UseG1GC -XX:+G1SummarizeConcMark spark.executor.extraJavaOptions -XX:+UseG1GC -XX:+G1SummarizeConcMarkIf I put them under Compute -> Cluster -> Co...

  • 2774 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

hey @Andrew Fogarty​ , I think this is only for the spark-submit command, not for cluster UI.Please have a look at this doc - http://progexc.blogspot.com/2014/12/spark-configuration-mess-solved.htmlspark.executor.extraJavaOptionsA string of extra JVM...

  • 1 kudos
RajibRajib_Mand
by New Contributor III
  • 1482 Views
  • 3 replies
  • 2 kudos

Multiple Databricks cluster in same workspace

Hi All,I have created three cluster(dev,qa,prod)in the same work​space to isolate data for different environment.How do we differentiate environment while running job using dev it should update data for dev environment?​Regards,Rajib​

  • 1482 Views
  • 3 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

hey @Rajib Rajib Mandal​ , this is very easy, i have done this multiple times, you can segregate data using your IAM role that is attached to the cluster, it is known as an Instance profile, you can only give the dev data access to dev role and the s...

  • 2 kudos
2 More Replies
SIRIGIRI
by Contributor
  • 559 Views
  • 1 replies
  • 1 kudos

medium.com

Sorting In Spark**How to sort null values First and last of the records in the Spark data frame?Please find the answershttps://medium.com/@sharikrishna26/sorting-in-spark-a57db245ecd4

  • 559 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

Yeah this is really good post,keep it up Man

  • 1 kudos
Aviral-Bhardwaj
by Esteemed Contributor III
  • 919 Views
  • 0 replies
  • 31 kudos

Understanding Cluster Pools Sometimes we want to run our databricks code without any delay as reports are very emergency like the upstream team wants ...

Understanding Cluster PoolsSometimes we want to run our databricks code without any delay as reports are very emergency like the upstream team wants to save as much time as they can save in the starting cluster.That time we can use the pool of cluste...

  • 919 Views
  • 0 replies
  • 31 kudos
Aviral-Bhardwaj
by Esteemed Contributor III
  • 1138 Views
  • 0 replies
  • 31 kudos

Databricks New Runtime Version is Available Now  PySpark memory profiling- Memory profiling is now enabled for PySpark user-defined functions. This pr...

Databricks New Runtime Version is Available Now PySpark memory profiling- Memory profiling is now enabled for PySpark user-defined functions. This provides information on memory increment, memory usage, and number of occurrences for each line of code...

image
  • 1138 Views
  • 0 replies
  • 31 kudos
ahana
by New Contributor III
  • 1636 Views
  • 1 replies
  • 2 kudos

error too large report

hi i am trying to pull the data from quick base but it is giving me error-: too large reportbelow are the code i used@%pythondf = quickbasePull('b5zj8k_pbz5_0_cd5h4wbb77n4nvp95b4u','bq2nq8jm7',4)2) i tried below code but its not displaying in correc...

image image
  • 1636 Views
  • 1 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

Hey @ahana ahana​ ,this code is not working

  • 2 kudos
rammy
by Contributor III
  • 5945 Views
  • 6 replies
  • 5 kudos

How I could read the Job id, run id and parameters in python cell?

I have tried following ways to get job parameters but none of the things are working.runId='{{run_id}}' jobId='{{job_id}}' filepath='{{filepath}}' print(runId," ",jobId," ",filepath) r1=dbutils.widgets.get('{{run_id}}') f1=dbutils.widgets.get('{{file...

  • 5945 Views
  • 6 replies
  • 5 kudos
Latest Reply
rammy
Contributor III
  • 5 kudos

Thanks for your response. I found the solution. The below code gives me all the job parametersall_args = dbutils.notebook.entry_point.getCurrentBindings()print(all_args)Thanks for your support

  • 5 kudos
5 More Replies
joakon
by New Contributor III
  • 6910 Views
  • 7 replies
  • 6 kudos
  • 6910 Views
  • 7 replies
  • 6 kudos
Latest Reply
huyd
New Contributor III
  • 6 kudos

check your read cell, "Delimeter"

  • 6 kudos
6 More Replies
Deiry
by New Contributor III
  • 995 Views
  • 2 replies
  • 2 kudos

spark.apache.org

Hey fellow co-workers!!I have been doing the Apache Spark programming in Databricks academy and I realized the hyperlinks here it doesn't work.Spark session https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql.html#spark-session-apis

image
  • 995 Views
  • 2 replies
  • 2 kudos
Latest Reply
labtech
Valued Contributor II
  • 2 kudos

@Deiry Navas​ Could you share the repo link usued to exectue that notebook? I'll on my side

  • 2 kudos
1 More Replies
fury88
by New Contributor II
  • 718 Views
  • 1 replies
  • 0 kudos

Why are the get..Id() functions returning 'some(123456)' instead of just the id?

Hey fellow users, I've successfully retrieved the notebook context during job runs and there are several getId calls. For some reason when the ids are returned, they are wrapped in a some() instead of just the number. Does anyone know why this is the...

  • 718 Views
  • 1 replies
  • 0 kudos
Latest Reply
fury88
New Contributor II
  • 0 kudos

Well, my post for me is irrelevant now!! I just stumbled across this beauty which avoids me having to do any of this and deal with odd return values.How to get the Job ID and Run ID and save into a database (databricks.com)Are the braces {{job_id}} n...

  • 0 kudos
vs_29
by New Contributor II
  • 1709 Views
  • 2 replies
  • 3 kudos

Custom Log4j logs are not being written to the DBFS storage.

 I used custom Log4j appender to write the custom logs through the init script and I can see the Custom Log file on the Driver logs but Databricks is not writing those custom logs to the DBFS. I have configured Logging Destination in the Advanced sec...

init script driver logs logs destination
  • 1709 Views
  • 2 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @VIjeet Sharma​, We haven’t heard from you since the last response from @Debayan Mukherjee​ and I was checking back to see if his suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to...

  • 3 kudos
1 More Replies
RohitKulkarni
by Contributor
  • 5802 Views
  • 6 replies
  • 6 kudos

External Table issue format in databricks

I am new to databricksI am trying to create a external table in databricks with below format :CREATE EXTERNAL TABLE Salesforce.Account( Id string ,  IsDeleted bigint,  Name string ,  Type string ,  RecordTypeId string ,  ParentId string ,  ShippingSt...

  • 5802 Views
  • 6 replies
  • 6 kudos
Latest Reply
AmitA1
Contributor
  • 6 kudos

Databricks is awesome if you have SQL knowledge....I just came across one of my problem in my project and databricks helped me a lot....like a use of low watermark to hold the load success date .....​

  • 6 kudos
5 More Replies
jt
by New Contributor III
  • 1893 Views
  • 3 replies
  • 2 kudos

collapse partial code in large cell?

In databricks notebook, we have SQL cells that are over 700 lines long. Is there a way to collapse a portion of the code vs scrolling? Looking for something similar to what exists in Netezza, "--region" and "--end region" where anything between those...

  • 1893 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @james t​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 2 kudos
2 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels