cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

tismith1_558848
by New Contributor
  • 7065 Views
  • 2 replies
  • 0 kudos

Resolved! Change size or aspect ratio of ggplot visualizations

I understand that plots in R notebooks are captured by a png graphics device. Is there a way to set the size or the aspect ratio of the canvas? I understand that I can resize the rendered .png by dragging the handle in the notebook, but that means I...

  • 7065 Views
  • 2 replies
  • 0 kudos
Latest Reply
kassandra
New Contributor II
  • 0 kudos

Hi @sdaza​ , the answer above didn't change the size somehow, or perhaps I was putting it in the wrong place? I entered it in a new cell before the %sql cell with the plot chart. 

  • 0 kudos
1 More Replies
bhosskie
by New Contributor
  • 10210 Views
  • 9 replies
  • 0 kudos

How to merge two data frames column-wise in Apache Spark

I have the following two data frames which have just one column each and have exact same number of rows. How do I merge them so that I get a new data frame which has the two columns and all rows from both the data frames. For example, df1: +-----+...

  • 10210 Views
  • 9 replies
  • 0 kudos
Latest Reply
AmolZinjade
New Contributor II
  • 0 kudos

@bhosskie from pyspark.sql import SparkSession spark = SparkSession.builder.appName("Spark SQL basic example").enableHiveSupport().getOrCreate() sc = spark.sparkContext sqlDF1 = spark.sql("select count(*) as Total FROM user_summary") sqlDF2 = sp...

  • 0 kudos
8 More Replies
cfregly
by Contributor
  • 29057 Views
  • 9 replies
  • 0 kudos
  • 29057 Views
  • 9 replies
  • 0 kudos
Latest Reply
MichaelHuntsber
New Contributor II
  • 0 kudos

I have 8 GB of internal memory, but several MB of them are free but I also have an additional memory with an 8 GB memory card. Anyway, there is no enough space and the memory card is completely empty.essay service

  • 0 kudos
8 More Replies
nmud19
by New Contributor II
  • 60150 Views
  • 8 replies
  • 6 kudos

how to delete a folder in databricks mnt?

I have a folder at location dbfs:/mnt/temp I need to delete this folder. I tried using %fs rm mnt/temp & dbutils.fs.rm("mnt/temp") Could you please help me out with what I am doing wrong?

  • 60150 Views
  • 8 replies
  • 6 kudos
Latest Reply
amitca71
Contributor II
  • 6 kudos

use this (last raw should not be indented twice...): def delete_mounted_dir(dirname): files=dbutils.fs.ls(dirname) for f in files: if f.isDir(): delete_mounted_dir(f.path) dbutils.fs.rm(f.path, recurse=True)

  • 6 kudos
7 More Replies
PraveenKumarB
by New Contributor
  • 6476 Views
  • 5 replies
  • 0 kudos

java.io.IOException: No FileSystem for scheme: null

Getting the error when try to load the uploaded file in py notebook.# File location and type file_location = "//FileStore/tables/data/d1.csv" file_type = "csv" # CSV options infer_schema = "true" first_row_is_header = "false" delimiter = ","# The app...

  • 6476 Views
  • 5 replies
  • 0 kudos
Latest Reply
DivyanshuBhatia
New Contributor II
  • 0 kudos

@naughtonelad​  if your issue is solved,please let me know as I am facing the same problem

  • 0 kudos
4 More Replies
nthomas
by New Contributor
  • 6023 Views
  • 5 replies
  • 0 kudos

Tips for properly using large broadcast variables?

I'm using a broadcast variable about 100 MB pickled in size, which I'm approximating with: >>> data = list(range(int(10*1e6))) >>> import cPickle as pickle >>> len(pickle.dumps(data)) 98888896Running on a cluster with 3 c3.2xlarge executors, ...

  • 6023 Views
  • 5 replies
  • 0 kudos
Latest Reply
dragoncity
New Contributor II
  • 0 kudos

The Facebook credit can be utilized by the gamers to purchase the pearls. The other route is to finished various sorts of Dragons in the Dragon Book. Dragon City Gems There are various kinds of Dragons, one is amazing, at that point you have the fund...

  • 0 kudos
4 More Replies
JulioManuelNava
by New Contributor
  • 5982 Views
  • 2 replies
  • 0 kudos

[pyspark] foreach + print produces no output

The following code produces no output. It seems as if the print(x) is not being executed for each "words" element: words = sc.parallelize ( ["scala", "java", "hadoop", "spark", "akka", "spark vs hadoop", "pyspark", "pysp...

  • 5982 Views
  • 2 replies
  • 0 kudos
Latest Reply
john_nicholas
New Contributor II
  • 0 kudos

Epson wf-3640 error code 0x97 is the common printer error code that may occur mostly in all printers but in order to resolve the error code, upon provides the best printer guide to all printer users.

  • 0 kudos
1 More Replies
dchokkadi1_5588
by New Contributor II
  • 13374 Views
  • 8 replies
  • 0 kudos

Resolved! graceful dbutils mount/unmount

Is there a way to indicate to dbutils.fs.mount to not throw an error if the mount is already mounted? And viceversa, for unmount to not throw an error if it is already unmounted? I am trying to run my notebook as a job and it has a init section that...

  • 13374 Views
  • 8 replies
  • 0 kudos
Latest Reply
Mariano_IrvinLo
New Contributor II
  • 0 kudos

If you use scala to mount a gen 2 data lake you could try something like this /Gather relevant Keys/ var ServicePrincipalID = "" var ServicePrincipalKey = "" var DirectoryID = "" /Create configurations for our connection/ var configs = Map (...

  • 0 kudos
7 More Replies
Barb
by New Contributor III
  • 5184 Views
  • 6 replies
  • 0 kudos

SQL charindex function?

Hi all,I need to use the SQL charindex function, but I'm getting a databricks error that this doesn't exist. That can't be true, right? Thanks for any ideas about how to make this work!Barb

  • 5184 Views
  • 6 replies
  • 0 kudos
Latest Reply
Traveller
New Contributor II
  • 0 kudos

The best option I found to replace CHARINDEX was LOCATE, examples from the Spark documentation below > SELECT locate('bar', 'foobarbar', 5); 7 > SELECT POSITION('bar' IN 'foobarbar'); 4

  • 0 kudos
5 More Replies
SatheeshSathees
by New Contributor
  • 6024 Views
  • 1 replies
  • 0 kudos

how to dynamically explode array type column in pyspark or scala

HI, i have a parquet file with complex column types with nested structs and arrays. I am using the scrpit from below link to flatten my parquet file. https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schema I am able ...

  • 6024 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Valued Contributor
  • 0 kudos

Hello, Please check out the below docs and notebook which has similar examples, https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schemahttps://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/transform-comple...

  • 0 kudos
zachary_jones
by New Contributor
  • 3365 Views
  • 3 replies
  • 0 kudos

Resolved! Python logging: 'Operation not supported' after upgrading to DBRT 6.1

My organization has an S3 bucket mounted to the databricks filesystem under /dbfs/mnt. When using Databricks runtime 5.5 and below, the following logging code works correctly:log_file = '/dbfs/mnt/path/to/my/bucket/test.log' logger = logging.getLogg...

  • 3365 Views
  • 3 replies
  • 0 kudos
Latest Reply
lycenok
New Contributor II
  • 0 kudos

Probably it's worth to try to rewrite the emit ... https://docs.python.org/3/library/logging.html#handlers This works for me: class OurFileHandler(logging.FileHandler): def emit(self, record): # copied from https://github.com/python/cpython/bl...

  • 0 kudos
2 More Replies
DimitrisMpizos
by New Contributor
  • 25127 Views
  • 16 replies
  • 0 kudos

Exporting data from databricks

I couldn't find in documentation a way to export an RDD as a text file to a local folder by using python. Is it possible?

  • 25127 Views
  • 16 replies
  • 0 kudos
Latest Reply
Manu1
New Contributor II
  • 0 kudos

To: Export a file to local desktop Workaround : Basically you have to do a "Create a table in notebook" with DBFS The steps are: Click on "Data" icon > Click "Add Data" button > Click "DBFS" button > Click "FileStore" folder icon in 1st pane "Sele...

  • 0 kudos
15 More Replies
MarcoMistroni
by New Contributor II
  • 13063 Views
  • 4 replies
  • 0 kudos

pandas.read_csv

HI all i have uploaded a file on my cluster , at location /FileStore/tables/qmwxhxvi1505337108590/PastHires.csv However, whenever i try to read it using panda df = pd.read_csv('dbfs:/FileStore/tables/qmwxhxvi1505337108590/PastHires.csv') , i alwas...

  • 13063 Views
  • 4 replies
  • 0 kudos
Latest Reply
rohitshah
New Contributor II
  • 0 kudos

I am also having same issue, I have uploaded file in DBFS and it gives some default code which itself is not working. Is anyone has solved this issue ?

  • 0 kudos
3 More Replies
olisch
by New Contributor
  • 17393 Views
  • 3 replies
  • 0 kudos

Spark: How to simultaneously read from and write to the same parquet file

How can I read a DataFrame from a parquet file, do transformations and write this modified DataFrame back to the same same parquet file? If I attempt to do so, I get an error, understandably because spark reads from the source and one cannot writ...

  • 17393 Views
  • 3 replies
  • 0 kudos
Latest Reply
saravananraju
New Contributor II
  • 0 kudos

Hi, You can use insertinto instead of save. It will overwrite the target file no need to cache or persist your dataframe Df.write.format("parquet").mode("overwrite").insertInto("/file_path") ~Saravanan

  • 0 kudos
2 More Replies
DineshKumar
by New Contributor III
  • 8267 Views
  • 3 replies
  • 0 kudos

How to convert the first row as column from an existing dataframe.

I have a dataframe like below. I want to convert the first row as columns for this dataframe. How could I do this. Is there any way to convert it directly.(without using df.first) usdata.show() -----+---+------------+------------+-------------------...

  • 8267 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16857282152
Contributor
  • 0 kudos

My point was that you are asking for column names from what you consider to be the "first row" and I am telling you that at scale, or if the data volume grows what you consider to be the "first row" may no longer actually be the "first row" unless ...

  • 0 kudos
2 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels