cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

MatthewHo
by New Contributor
  • 7461 Views
  • 4 replies
  • 0 kudos

"Importing" functions from other notebooks

For the sake of organization, I would like to define a few functions in notebook A, and have notebook B have access to those functions in notebook A. Having everything in one notebook makes it look very cluttered. Is this possible?

  • 7461 Views
  • 4 replies
  • 0 kudos
Latest Reply
simone01
New Contributor II
  • 0 kudos

<a href="https://managementassignmentshelp.com/risk-management-assignment-help.php ">Risk Management Assignment Help </a> <a href="https://myassignmentmart.com/assignment/material-science-assignment-help.html "> Material Science assignment help </a>...

  • 0 kudos
3 More Replies
RaymondXie
by New Contributor
  • 6758 Views
  • 1 replies
  • 0 kudos

How to union multiple dataframe in pyspark within Databricks notebook

I have 4 DFs: Avg_OpenBy_Year, AvgHighBy_Year, AvgLowBy_Year and AvgClose_By_Year, all of them have a common column of 'Year'.I want to join the three together to get a final df like:`Year, Open, High, Low, Close`At the moment I have to use the ugly...

0693f000007OoI6AAK
  • 6758 Views
  • 1 replies
  • 0 kudos
Latest Reply
thiago_matos
New Contributor II
  • 0 kudos

Import reduce function in this way: from functools import reduce

  • 0 kudos
McKayHarris
by New Contributor II
  • 22669 Views
  • 17 replies
  • 3 kudos

ExecutorLostFailure: Remote RPC Client Disassociated

This is an expensive and long-running job that gets about halfway done before failing. The stack trace is included below, but here is the salient part: Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 4881 in stage...

  • 22669 Views
  • 17 replies
  • 3 kudos
Latest Reply
RodrigoDe_Freit
New Contributor II
  • 3 kudos

According to https://docs.databricks.com/jobs.html#jar-job-tips:"Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed."That was my prob...

  • 3 kudos
16 More Replies
dtr
by New Contributor
  • 5756 Views
  • 1 replies
  • 0 kudos

PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers.

I am trying to write a function in Azure databricks. I would like to spark.sql inside the function. But it looks like I cannot use it with worker nodes. def SEL_ID(value, index): # some processing on value here ans = spark.sql("SELECT id FRO...

  • 5756 Views
  • 1 replies
  • 0 kudos
Latest Reply
MartinhoAzevedo
New Contributor II
  • 0 kudos

Hi there. i guess im a bit late but do you remember how and if you fixed this issue? im getting the same exact problem. @dtr

  • 0 kudos
HarisKhan
by New Contributor
  • 10311 Views
  • 2 replies
  • 0 kudos

Escape Backslash(/) while writing spark dataframe into csv

I am using spark version 2.4.0. I know that Backslash is default escape character in spark but still I am facing below issue. I am reading a csv file into a spark dataframe (using pyspark language) and writing back the dataframe into csv. I have so...

  • 10311 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

I'm confused - you say the escape is backslash, but you show forward slashes in your data. Don't you want the escape to be forward slash?

  • 0 kudos
1 More Replies
rlgarris
by Contributor
  • 15144 Views
  • 12 replies
  • 0 kudos

Resolved! How do I create a single CSV file from multiple partitions in Databricks / Spark?

Using sparkcsv to write data to dbfs, which I plan to move to my laptop via standard s3 copy commands. The default for spark csv is to write output into partitions. I can force it to a single partition, but would really like to know if there is a ge...

  • 15144 Views
  • 12 replies
  • 0 kudos
Latest Reply
ChristianHomber
New Contributor II
  • 0 kudos

Without access to bash it would be highly appreciated if an option within databricks (e.g. via dbfsutils) existed.

  • 0 kudos
11 More Replies
tunguyen90
by New Contributor
  • 8783 Views
  • 3 replies
  • 1 kudos

How to change line separator for csv file exported from dataframe in databricks

Hello, Currently, I'm facing problem with line separator inside csv file, which is exported from data frame in Azure Databricks (version Spark 2.4.3) to Azure Blob storage. All those csv files contains LF as line-separator. I need to have CRLF (\r\n...

  • 8783 Views
  • 3 replies
  • 1 kudos
Latest Reply
Nikhila
New Contributor II
  • 1 kudos

Hi, Have you got the solution for above problem.Kindly let me know.

  • 1 kudos
2 More Replies
tismith1_558848
by New Contributor
  • 8126 Views
  • 2 replies
  • 0 kudos

Resolved! Change size or aspect ratio of ggplot visualizations

I understand that plots in R notebooks are captured by a png graphics device. Is there a way to set the size or the aspect ratio of the canvas? I understand that I can resize the rendered .png by dragging the handle in the notebook, but that means I...

  • 8126 Views
  • 2 replies
  • 0 kudos
Latest Reply
kassandra
New Contributor II
  • 0 kudos

Hi @sdaza​ , the answer above didn't change the size somehow, or perhaps I was putting it in the wrong place? I entered it in a new cell before the %sql cell with the plot chart. 

  • 0 kudos
1 More Replies
bhosskie
by New Contributor
  • 12802 Views
  • 9 replies
  • 0 kudos

How to merge two data frames column-wise in Apache Spark

I have the following two data frames which have just one column each and have exact same number of rows. How do I merge them so that I get a new data frame which has the two columns and all rows from both the data frames. For example, df1: +-----+...

  • 12802 Views
  • 9 replies
  • 0 kudos
Latest Reply
AmolZinjade
New Contributor II
  • 0 kudos

@bhosskie from pyspark.sql import SparkSession spark = SparkSession.builder.appName("Spark SQL basic example").enableHiveSupport().getOrCreate() sc = spark.sparkContext sqlDF1 = spark.sql("select count(*) as Total FROM user_summary") sqlDF2 = sp...

  • 0 kudos
8 More Replies
cfregly
by Contributor
  • 39275 Views
  • 9 replies
  • 0 kudos
  • 39275 Views
  • 9 replies
  • 0 kudos
Latest Reply
MichaelHuntsber
New Contributor II
  • 0 kudos

I have 8 GB of internal memory, but several MB of them are free but I also have an additional memory with an 8 GB memory card. Anyway, there is no enough space and the memory card is completely empty.essay service

  • 0 kudos
8 More Replies
nmud19
by New Contributor II
  • 63703 Views
  • 8 replies
  • 6 kudos

how to delete a folder in databricks mnt?

I have a folder at location dbfs:/mnt/temp I need to delete this folder. I tried using %fs rm mnt/temp & dbutils.fs.rm("mnt/temp") Could you please help me out with what I am doing wrong?

  • 63703 Views
  • 8 replies
  • 6 kudos
Latest Reply
amitca71
Contributor II
  • 6 kudos

use this (last raw should not be indented twice...): def delete_mounted_dir(dirname): files=dbutils.fs.ls(dirname) for f in files: if f.isDir(): delete_mounted_dir(f.path) dbutils.fs.rm(f.path, recurse=True)

  • 6 kudos
7 More Replies
PraveenKumarB
by New Contributor
  • 7323 Views
  • 5 replies
  • 0 kudos

java.io.IOException: No FileSystem for scheme: null

Getting the error when try to load the uploaded file in py notebook.# File location and type file_location = "//FileStore/tables/data/d1.csv" file_type = "csv" # CSV options infer_schema = "true" first_row_is_header = "false" delimiter = ","# The app...

  • 7323 Views
  • 5 replies
  • 0 kudos
Latest Reply
DivyanshuBhatia
New Contributor II
  • 0 kudos

@naughtonelad​  if your issue is solved,please let me know as I am facing the same problem

  • 0 kudos
4 More Replies
PrithwisMukerje
by New Contributor II
  • 86514 Views
  • 4 replies
  • 4 kudos

Resolved! How to download a file from dbfs to my local computer filesystem?

I have run the WordCount program and have saved the output into a directory as follows counts.saveAsTextFile("/users/data/hobbit-out1") subsequently I check that the output directory contains the expected number of files %fs ls /users/data/hobbit-ou...

  • 86514 Views
  • 4 replies
  • 4 kudos
Latest Reply
Eve
New Contributor III
  • 4 kudos

Also simply CLI?DBFS CLI

  • 4 kudos
3 More Replies
nthomas
by New Contributor
  • 7142 Views
  • 5 replies
  • 0 kudos

Tips for properly using large broadcast variables?

I'm using a broadcast variable about 100 MB pickled in size, which I'm approximating with: >>> data = list(range(int(10*1e6))) >>> import cPickle as pickle >>> len(pickle.dumps(data)) 98888896Running on a cluster with 3 c3.2xlarge executors, ...

  • 7142 Views
  • 5 replies
  • 0 kudos
Latest Reply
dragoncity
New Contributor II
  • 0 kudos

The Facebook credit can be utilized by the gamers to purchase the pearls. The other route is to finished various sorts of Dragons in the Dragon Book. Dragon City Gems There are various kinds of Dragons, one is amazing, at that point you have the fund...

  • 0 kudos
4 More Replies
JulioManuelNava
by New Contributor
  • 6606 Views
  • 2 replies
  • 0 kudos

[pyspark] foreach + print produces no output

The following code produces no output. It seems as if the print(x) is not being executed for each "words" element: words = sc.parallelize ( ["scala", "java", "hadoop", "spark", "akka", "spark vs hadoop", "pyspark", "pysp...

  • 6606 Views
  • 2 replies
  • 0 kudos
Latest Reply
john_nicholas
New Contributor II
  • 0 kudos

Epson wf-3640 error code 0x97 is the common printer error code that may occur mostly in all printers but in order to resolve the error code, upon provides the best printer guide to all printer users.

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels