cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

data_boy_2022
by New Contributor III
  • 13669 Views
  • 7 replies
  • 3 kudos

Data ingest of csv files from S3 using Autoloader is slow

I have 150k small csv files (~50Mb) stored in S3 which I want to load into a delta table.All CSV files are stored in the following structure in S3:bucket/folder/name_00000000_00000100.csvbucket/folder/name_00000100_00000200.csvThis is the code I use ...

Cluster Metrics SparkUI_DAG SparkUI_Job
  • 13669 Views
  • 7 replies
  • 3 kudos
Latest Reply
Vidula
Honored Contributor
  • 3 kudos

Hi @Jan R​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 3 kudos
6 More Replies
Nid
by New Contributor
  • 930 Views
  • 1 replies
  • 0 kudos

badge not received for Databricks Lakehouse Fundamentals Accreditation

Hi,I have cleared the assessment for Databricks Lakehouse Fundamentals Accreditationbut yet to received a badge. Kindly assist me with this

  • 930 Views
  • 1 replies
  • 0 kudos
Latest Reply
Vidula
Honored Contributor
  • 0 kudos

Hi @Nidhi kawale​ Thank you for reaching out!Let us look into this for you, and we will get back to you with an update.Kindly, share your email id at community@databricks.com.

  • 0 kudos
Bit-Warrior
by New Contributor
  • 650 Views
  • 0 replies
  • 0 kudos

Installing System ML on the cluster

I am trying to install the systemml package from Maven, I ignored the librarieslog4j:log4j, com:sun.jdmk, com:sun.jmx, javax:jmsBut when I run one command of systemml, then spark/databricks can no longer select from tables, effectively breaking somet...

  • 650 Views
  • 0 replies
  • 0 kudos
parthsalvi
by Contributor
  • 1481 Views
  • 0 replies
  • 0 kudos

Few sparks apis not working in DBR 11.2, 10.4 LTS Shared Mode (custom vpc) like df.tail, df.rdd.map

We're trying to use DBR 11.2 & 10.4LTS in Shared mode on a customer managed vpc. But we're running into following issues Is this issue related to our customer managed VPC setup or is it specific to DBR 11.2.Same issue also seen in DBR 11.1 and 10.4 L...

Screenshot 2022-09-16 at 9.09.58 PM
  • 1481 Views
  • 0 replies
  • 0 kudos
nancy_g
by New Contributor III
  • 4483 Views
  • 4 replies
  • 5 kudos
  • 4483 Views
  • 4 replies
  • 5 kudos
Latest Reply
Rostislaw
New Contributor III
  • 5 kudos

Right now the feature seems to be public available. It is possible to schedule jobs with ADLS passthough enabled and do not have to provide service principal credentials.However I ask myself how that works behind the scenses. When working interactive...

  • 5 kudos
3 More Replies
amit
by New Contributor II
  • 1058 Views
  • 2 replies
  • 0 kudos

www.databricks.com

Hi @Lindsay Olson​ ,I have attended the virtual instructor-led training on 23-08-2022 (https://www.databricks.com/p/webinar/databricks-lakehouse-fundamentals-learning-plan). As per the conditions mentioned, I have completed all of the steps for getti...

  • 1058 Views
  • 2 replies
  • 0 kudos
Latest Reply
amit
New Contributor II
  • 0 kudos

Thanks @Lindsay Olson​ . Yes issue has been resolved,

  • 0 kudos
1 More Replies
BradSheridan
by Valued Contributor
  • 2234 Views
  • 1 replies
  • 0 kudos

using a UDF in a Windows function

I have created a UDF using:%sqlCREATE OR REPLACE FUNCTION f_timestamp_max()....And I've confirmed it works with:%sqlselect f_timestamp_max()But when I try to use it in a Window function (lead over partition), I get:AnalysisException: Using SQL functi...

  • 2234 Views
  • 1 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, As of now, Spark SQL supports three kinds of window functions: ranking functions. analytic functions. aggregate functions. Please refer: https://docs.databricks.com/sql/language-manual/sql-ref-window-functions.html#parameters

  • 0 kudos
Haima
by New Contributor
  • 616 Views
  • 0 replies
  • 0 kudos

FileNotFoundError: [Errno 2] /dbfs/fileone.csv

I'm trying to transfer my csv file from databricks to sftp but i'm getting file not found error.here is my code:file_size = sftp.stat("/dbfs/fileone.csv").st_sizewith open("/dbfs/fileone.csv", "rb") as fl:return self.putfo(fl, Destinationpath, file_s...

  • 616 Views
  • 0 replies
  • 0 kudos
brickster_2018
by Databricks Employee
  • 6254 Views
  • 3 replies
  • 0 kudos

Resolved! How many notebooks/jobs can I run in parallel on a Databricks cluster?

Is there a limit on it and is the limit configurable?

  • 6254 Views
  • 3 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

There is a hard limit of 145 active execution contexts on a Cluster. This is to ensure the cluster is not overloaded with too many parallel threads starving for resources. The limit is not configurable. If there are more than 145 parallel jobs to be ...

  • 0 kudos
2 More Replies
data_serf
by New Contributor
  • 5155 Views
  • 3 replies
  • 1 kudos

Resolved! How to integrate java 11 code in Databricks

Hi all,We're trying to attach java libraries which are compiled/packaged using Java 11.After doing some research it looks like even the most recent runtimes use Java 8 which can't run the Java 11 code ("wrong version 55.0, should be 52.0" errors)Is t...

  • 5155 Views
  • 3 replies
  • 1 kudos
Latest Reply
matthewrj
New Contributor II
  • 1 kudos

I have tried setting JNAME=zulu11-ca-amd64 under Cluster > Advanced options > Spark > Environment variables but it doesn't seem to work. I still get errors indicating Java 8 is the JRE and in the Spark UI under "Environment" I still see:Java Home: /u...

  • 1 kudos
2 More Replies
齐木木
by New Contributor III
  • 1760 Views
  • 1 replies
  • 3 kudos

Resolved! The case class reports an error when running in the notebook

As shown in the figure, the case class and the json string are converted through fasterxml.jackson, but an unexpected error occurred during the running of the code. I think this problem may be related to the loading principle of the notebook. Because...

image.png local image
  • 1760 Views
  • 1 replies
  • 3 kudos
Latest Reply
齐木木
New Contributor III
  • 3 kudos

code:var str="{\"app_type\":\"installed-app\"}" import com.fasterxml.jackson.databind.ObjectMapper import com.fasterxml.jackson.module.scala.DefaultScalaModule val mapper = new ObjectMapper() mapper.registerModule(DefaultScalaModule) ...

  • 3 kudos
WBM1
by New Contributor
  • 542 Views
  • 0 replies
  • 0 kudos

wbm.com.pk

WBM is the best online Supermarket in Pakistan provides you with Fast home delivery of your complete grocery, Home Cleaning, Skincare, Baby Products, and Mosquito Repellent Collection.https://wbm.com.pk/

  • 542 Views
  • 0 replies
  • 0 kudos
Deepak_Kandpal
by New Contributor III
  • 5498 Views
  • 3 replies
  • 2 kudos

Resolved! Enable credential passthrough Option is not available in new UI for Job Cluster

Hi All,I am trying to add new workflow which require to use credential passthrough, but when I am trying to create new Job Cluster from Workflow -> Jobs -> My Job, the option of Enable credential passthrough is not available. Is there any other way t...

image
  • 5498 Views
  • 3 replies
  • 2 kudos
Latest Reply
Rostislaw
New Contributor III
  • 2 kudos

assuming your Excel file is located on ADLS you can add a service principal to the cluster configuration. see: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage#--access-azure-data-lake-storage-gen2-or-blob-stora...

  • 2 kudos
2 More Replies
vamsi0132
by New Contributor II
  • 1105 Views
  • 0 replies
  • 2 kudos

BUG in TIME ZONE EST function

Hi,I found the bug while using in "from_utc_timestamp" function while using from UTC time stamp to EST time stampBelow is the Query Query:select trim(current_timestamp()) as Current,trim(from_utc_timestamp(current_timestamp(),'EST')) as EST,trim(from...

image
  • 1105 Views
  • 0 replies
  • 2 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels