cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

SIRIGIRI
by Contributor
  • 674 Views
  • 1 replies
  • 1 kudos

medium.com

Sorting In Spark**How to sort null values First and last of the records in the Spark data frame?Please find the answershttps://medium.com/@sharikrishna26/sorting-in-spark-a57db245ecd4

  • 674 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

Yeah this is really good post,keep it up Man

  • 1 kudos
Aviral-Bhardwaj
by Esteemed Contributor III
  • 1092 Views
  • 0 replies
  • 31 kudos

Understanding Cluster Pools Sometimes we want to run our databricks code without any delay as reports are very emergency like the upstream team wants ...

Understanding Cluster PoolsSometimes we want to run our databricks code without any delay as reports are very emergency like the upstream team wants to save as much time as they can save in the starting cluster.That time we can use the pool of cluste...

  • 1092 Views
  • 0 replies
  • 31 kudos
Aviral-Bhardwaj
by Esteemed Contributor III
  • 1421 Views
  • 0 replies
  • 31 kudos

Databricks New Runtime Version is Available Now  PySpark memory profiling- Memory profiling is now enabled for PySpark user-defined functions. This pr...

Databricks New Runtime Version is Available Now PySpark memory profiling- Memory profiling is now enabled for PySpark user-defined functions. This provides information on memory increment, memory usage, and number of occurrences for each line of code...

image
  • 1421 Views
  • 0 replies
  • 31 kudos
ahana
by New Contributor III
  • 1897 Views
  • 1 replies
  • 2 kudos

error too large report

hi i am trying to pull the data from quick base but it is giving me error-: too large reportbelow are the code i used@%pythondf = quickbasePull('b5zj8k_pbz5_0_cd5h4wbb77n4nvp95b4u','bq2nq8jm7',4)2) i tried below code but its not displaying in correc...

image image
  • 1897 Views
  • 1 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

Hey @ahana ahana​ ,this code is not working

  • 2 kudos
rammy
by Contributor III
  • 6921 Views
  • 6 replies
  • 5 kudos

How I could read the Job id, run id and parameters in python cell?

I have tried following ways to get job parameters but none of the things are working.runId='{{run_id}}' jobId='{{job_id}}' filepath='{{filepath}}' print(runId," ",jobId," ",filepath) r1=dbutils.widgets.get('{{run_id}}') f1=dbutils.widgets.get('{{file...

  • 6921 Views
  • 6 replies
  • 5 kudos
Latest Reply
rammy
Contributor III
  • 5 kudos

Thanks for your response. I found the solution. The below code gives me all the job parametersall_args = dbutils.notebook.entry_point.getCurrentBindings()print(all_args)Thanks for your support

  • 5 kudos
5 More Replies
joakon
by New Contributor III
  • 8725 Views
  • 7 replies
  • 6 kudos
  • 8725 Views
  • 7 replies
  • 6 kudos
Latest Reply
huyd
New Contributor III
  • 6 kudos

check your read cell, "Delimeter"

  • 6 kudos
6 More Replies
Deiry
by New Contributor III
  • 1226 Views
  • 2 replies
  • 2 kudos

spark.apache.org

Hey fellow co-workers!!I have been doing the Apache Spark programming in Databricks academy and I realized the hyperlinks here it doesn't work.Spark session https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql.html#spark-session-apis

image
  • 1226 Views
  • 2 replies
  • 2 kudos
Latest Reply
labtech
Valued Contributor II
  • 2 kudos

@Deiry Navas​ Could you share the repo link usued to exectue that notebook? I'll on my side

  • 2 kudos
1 More Replies
fury88
by New Contributor II
  • 914 Views
  • 1 replies
  • 0 kudos

Why are the get..Id() functions returning 'some(123456)' instead of just the id?

Hey fellow users, I've successfully retrieved the notebook context during job runs and there are several getId calls. For some reason when the ids are returned, they are wrapped in a some() instead of just the number. Does anyone know why this is the...

  • 914 Views
  • 1 replies
  • 0 kudos
Latest Reply
fury88
New Contributor II
  • 0 kudos

Well, my post for me is irrelevant now!! I just stumbled across this beauty which avoids me having to do any of this and deal with odd return values.How to get the Job ID and Run ID and save into a database (databricks.com)Are the braces {{job_id}} n...

  • 0 kudos
vs_29
by New Contributor II
  • 2264 Views
  • 2 replies
  • 3 kudos

Custom Log4j logs are not being written to the DBFS storage.

 I used custom Log4j appender to write the custom logs through the init script and I can see the Custom Log file on the Driver logs but Databricks is not writing those custom logs to the DBFS. I have configured Logging Destination in the Advanced sec...

init script driver logs logs destination
  • 2264 Views
  • 2 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @VIjeet Sharma​, We haven’t heard from you since the last response from @Debayan Mukherjee​ and I was checking back to see if his suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to...

  • 3 kudos
1 More Replies
RohitKulkarni
by Contributor II
  • 6378 Views
  • 6 replies
  • 6 kudos

External Table issue format in databricks

I am new to databricksI am trying to create a external table in databricks with below format :CREATE EXTERNAL TABLE Salesforce.Account( Id string ,  IsDeleted bigint,  Name string ,  Type string ,  RecordTypeId string ,  ParentId string ,  ShippingSt...

  • 6378 Views
  • 6 replies
  • 6 kudos
Latest Reply
AmitA1
Contributor
  • 6 kudos

Databricks is awesome if you have SQL knowledge....I just came across one of my problem in my project and databricks helped me a lot....like a use of low watermark to hold the load success date .....​

  • 6 kudos
5 More Replies
jt
by New Contributor III
  • 2332 Views
  • 3 replies
  • 3 kudos

collapse partial code in large cell?

In databricks notebook, we have SQL cells that are over 700 lines long. Is there a way to collapse a portion of the code vs scrolling? Looking for something similar to what exists in Netezza, "--region" and "--end region" where anything between those...

  • 2332 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @james t​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 3 kudos
2 More Replies
KrishZ
by Contributor
  • 3988 Views
  • 4 replies
  • 1 kudos

How to print the path of a .py file or a notebook?

I have stored a test.py in the dbfs at the below location "/dbfs/FileStore/shared_uploads/krishna@company.com/Project_Folder/test.py"I have a print statement in test.py which says the belowprint( os.getcwd() )and it prints the below'/databricks/drive...

  • 3988 Views
  • 4 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

Hey @Krishna Zanwar​  Please use the below code this will work and as you want the specific location you can create a custom code and format the path using a python formatter , it will give you desired result .

  • 1 kudos
3 More Replies
cmilligan
by Contributor II
  • 2365 Views
  • 1 replies
  • 2 kudos

Resolved! org.apache.http.conn.ConnectTimeoutException: What does this mean and how can we resolve it.

My team has run into getting this error pretty frequently on one of our larger jobs. I've set out retry policy to 5 and that seems to fix it and keep the job going. It seems like it's unable to pick up the task immediately but can after it's complete...

  • 2365 Views
  • 1 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

Hey @Coleman Milligan​ ,I also faced this type of issue many times you can add the below configuration in your cluster and it should work.spark.executor.heartbeatInterval 60sspark.network.timeout 120sFor more details, you can explore this doc - https...

  • 2 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels