cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Personal1
by New Contributor II
  • 3464 Views
  • 3 replies
  • 2 kudos

Resolved! Understanding Partitions in Spark Local Mode

I have few fundamental questions in Spark3 while running a simple Spark app in my local mac machine (with 6 cores in total). Please help.local[*] runs my Spark application in local mode with all the cores present on my mac, correct? It also means tha...

  • 3464 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Abhishek Pradhan​ , Just a friendly follow-up. Do you still need help, or @Werner Stinckens​ 's response help you to find the solution? Please let us know.

  • 2 kudos
2 More Replies
Frankooo
by New Contributor III
  • 6355 Views
  • 9 replies
  • 7 kudos

How to optimize exporting dataframe to delta file?

Scenario : I have a dataframe that have 5 billion records/rows and 100+ columns. Is there a way to write this in a delta format efficiently. I have tried to export it but cancelled it after 2 hours (write didnt finish) as this processing time is not ...

  • 6355 Views
  • 9 replies
  • 7 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 7 kudos

Hi @Franco Sia​ , Just a friendly follow-up. Do you still need help or the above responses help you to find the solution? Please let us know.

  • 7 kudos
8 More Replies
Sam
by New Contributor III
  • 1364 Views
  • 2 replies
  • 0 kudos

Can Admins enable Table Download on Sample but not on Full Dataset?

Is it possible to allow for Table download on a sampled dataset but not the full dataset? In the configuration settings it seems like you have to allow both?Not withstanding the fact people could loop through the sample download, it seems like a prud...

  • 1364 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Sam H​, Just a friendly follow-up. Do you still need help, or @Arjun Kaimaparambil Rajan​ 's response helps you to find the solution? Please let us know.

  • 0 kudos
1 More Replies
yitao
by New Contributor III
  • 2846 Views
  • 6 replies
  • 11 kudos

Resolved! How to make sparklyr extension work with Databricks runtime?

Hello. I'm the current maintainer of sparklyr (a R interface for Apache Spark) and a few sparklyr extensions such as sparklyr.flint.Sparklyr was fortunate to receive some contribution from Databricks folks, which enabled R users to run `spark_connect...

  • 2846 Views
  • 6 replies
  • 11 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 11 kudos

Hi @yitao​ , Just a friendly follow-up. Do you still need help, or does the above response help you to find the solution? Please let us know.

  • 11 kudos
5 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 2505 Views
  • 5 replies
  • 18 kudos

Resolved! Azure: Permanently purge cluster logs

Is there any way to purge logs via API instead of clicking daily that option:

image.png
  • 2505 Views
  • 5 replies
  • 18 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 18 kudos

Hi @Hubert Dudek​ â€‹ , Just a friendly follow-up. Do you still need help, or @Prabakar Ammeappin​'s response help you to find the solution? Please let us know.

  • 18 kudos
4 More Replies
BorislavBlagoev
by Valued Contributor III
  • 3368 Views
  • 3 replies
  • 5 kudos

Resolved! Get package from Nexus repo.

I want to receive a package from Nexus repo both in notebook and job. If anyone has experience with this, please answer me here!

  • 3368 Views
  • 3 replies
  • 5 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 5 kudos

Hi @Borislav Blagoev​ , Just a friendly follow-up. Do you still need help, or does the above response help you to find the solution? Please let us know.

  • 5 kudos
2 More Replies
soundari
by New Contributor
  • 1949 Views
  • 3 replies
  • 1 kudos

Resolved! Identify the partitionValues written yesterday from delta

We have a streaming data written into delta. We will not write all the partitions every day. Hence i am thinking of running compact spark job, to run only on partitions that has been modified yesterday. Is it possible to query the partitionsValues wr...

  • 1949 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Gnanasoundari Soundarajan​  , Just a friendly follow-up. Do you still need help, or @Deepak Bhutada​ 's response help you to find the solution? Please let us know.

  • 1 kudos
2 More Replies
narek_margaryan
by New Contributor II
  • 2510 Views
  • 3 replies
  • 3 kudos

Resolved! Do Spark nodes read data from storage in a sequence?

I'm new to Spark and trying to understand how some of its components work.I understand that once the data is loaded into the memory of separate nodes, they process partitions in parallel, within their own memory (RAM).But I'm wondering whether the in...

  • 2510 Views
  • 3 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Narek Margaryan​, Just a friendly follow-up. Do you still need help, or does the above response help you to find the solution? Please let us know.

  • 3 kudos
2 More Replies
brendan-b
by New Contributor II
  • 9280 Views
  • 4 replies
  • 3 kudos

spark-xml not working with Databricks Connect and Pyspark

Hi all,I currently have a cluster configured in databricks with spark-xml (version com.databricks:spark-xml_2.12:0.13.0) which was installed using Maven. The spark-xml library itself works fine with Pyspark when I am using it in a notebook within th...

  • 9280 Views
  • 4 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Brendan Banfield​ , This article describes how to read and write an XML file as an Apache Sparkâ„¢ data source.

  • 3 kudos
3 More Replies
User16783855534
by New Contributor III
  • 7854 Views
  • 6 replies
  • 5 kudos
  • 7854 Views
  • 6 replies
  • 5 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 5 kudos

Hi @Neil Patel​ â€‹ , Just a friendly follow-up. Do you still need help, or do the above responses help you find the solution? Please let us know.

  • 5 kudos
5 More Replies
dataslicer
by Contributor
  • 2668 Views
  • 3 replies
  • 2 kudos

Resolved! upgraded R package rlang to 0.4.11 on DBR 8.3 SC, but sessionInfo() still shows rlang as 0.4.9

I am using Azure Databricks Runtime (DBR) 8.3 ML with Python notebook and R cells together.I want to use "tidyverse" and one of the dependency is rlang >= 0.4.10 and the base DBR 8.3 ML provides rlang @ 0.4.9. I successfully upgraded the R package t...

  • 2668 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Jim Huang​ â€‹ , Just a friendly follow-up. Do you still need help or the above responses help you to find the solution? Please let us know.

  • 2 kudos
2 More Replies
delta_lake
by New Contributor
  • 5012 Views
  • 3 replies
  • 1 kudos

Delta Lake Python

I have setup a virtual environment inside my existing hadoop cluster. Since the current cluster does not have spark >3 , so i installed delta spark using virtual environment. While trying to access the hdfs which is kerberose one, Getting below error...

  • 5012 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Vasanth P​ â€‹ , Just a friendly follow-up. Do you still need help or the above responses help you to find the solution? Please let us know.

  • 1 kudos
2 More Replies
IkramMecheri
by New Contributor II
  • 10915 Views
  • 5 replies
  • 2 kudos

ImportError: No module named 'bs4'

Hi, I would like to do some web scrapping, however I am unable to import the libraries I traditionally use for that task import requests from bs4 import BeautifulSoup

  • 10915 Views
  • 5 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Ikram Mecheri​ â€‹ , Just a friendly follow-up. Do you still need help, or do the above responses help you find the solution? Please let us know.

  • 2 kudos
4 More Replies
User16868770416
by Contributor
  • 1748 Views
  • 4 replies
  • 2 kudos
  • 1748 Views
  • 4 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Will Block​ , Just a friendly follow-up. Do you still need help or the above responses help you to find the solution? Please let us know.

  • 2 kudos
3 More Replies
Zen
by New Contributor III
  • 4372 Views
  • 9 replies
  • 2 kudos

Resolved! How do I run a scala script from the Terminal

Hello, how do I run a scala script from a Terminal on Databricks - Web Terminal, or from a cell with %sh just doing `scala -nc script.scala` is not working.Thanks,

  • 4372 Views
  • 9 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Zen)​, Just a friendly follow-up. Do you still need help, or @DARSHAN BARGAL​ 's response help you to find the solution? Please let us know.

  • 2 kudos
8 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels