cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Artem_Y
by Databricks Employee
  • 2786 Views
  • 1 replies
  • 4 kudos

Embed Google Slides (PowerPoint) into Databricks Interactive Notebooks Use the following code to embed your slides:slide_id = '1CYEVsDqsdfg343fwg4...

Embed Google Slides (PowerPoint) into Databricks Interactive NotebooksUse the following code to embed your slides:slide_id = '1CYEVsDqsdfg343fwg42MtXqGd68gffP-Y16CR59c' slide_number = 'id.p9'   displayHTML(f''' <iframe src="https://docs.google.com/...

8F9761FE-B986-48EB-8461-0AAEA891DEDB_4_5005_c
  • 2786 Views
  • 1 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Artem Yevtushenko​ - Thank you for sharing this solution.

  • 4 kudos
narek_margaryan
by New Contributor II
  • 3618 Views
  • 1 replies
  • 3 kudos

Resolved! Do Spark nodes read data from storage in a sequence?

I'm new to Spark and trying to understand how some of its components work.I understand that once the data is loaded into the memory of separate nodes, they process partitions in parallel, within their own memory (RAM).But I'm wondering whether the in...

  • 3618 Views
  • 1 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

@Narek Margaryan​ , Normally the reading is done in parallel because the underlying file system is already distributed (if you use HDFS-based storage or something like, a data lake f.e.).The number of partitions in the file itself also matters.This l...

  • 3 kudos
Kotofosonline
by New Contributor III
  • 7012 Views
  • 2 replies
  • 3 kudos

Resolved! Query with distinct sort and alias produces error column not found

I’m trying to use sql query on azure-databricks with distinct sort and aliasesSELECT DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistIdThe problem is that if I add an alias then I can not use not aliased name in the order by ...

  • 7012 Views
  • 2 replies
  • 3 kudos
Latest Reply
Kotofosonline
New Contributor III
  • 3 kudos

The code from above is worked in both cases.

  • 3 kudos
1 More Replies
dataslicer
by Contributor
  • 3694 Views
  • 1 replies
  • 1 kudos

Resolved! upgraded R package rlang to 0.4.11 on DBR 8.3 SC, but sessionInfo() still shows rlang as 0.4.9

I am using Azure Databricks Runtime (DBR) 8.3 ML with Python notebook and R cells together.I want to use "tidyverse" and one of the dependency is rlang >= 0.4.10 and the base DBR 8.3 ML provides rlang @ 0.4.9. I successfully upgraded the R package t...

  • 3694 Views
  • 1 replies
  • 1 kudos
Latest Reply
Sivaprasad1
Databricks Employee
  • 1 kudos

@Jim Huang​ : Could you please try to restart the session and try to run tidyverse. Looks like the older version of rlang loaded in session.Error : package or namespace load failed for ‘tidyverse’ in loadNamespace(i, c(lib.loc, .libPaths()), versionC...

  • 1 kudos
amichel
by New Contributor III
  • 4375 Views
  • 2 replies
  • 5 kudos

Resolved! Is there a stable, ideally official JMS/ActiveMQ connector for Spark?

We're delivering pipelines that are mostly based on Databricks Spark Streaming, Delta Lake and Azure Event Hubs, and there's a requirement to integrate with AMQ/JMS endpoints (Request and Response queues in ActiveMQ).Is there a proven way to integrat...

  • 4375 Views
  • 2 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

@amichel We have a feature request to add structured streaming support for Tibco EMS and JMS. Unfortunately, it's yet to be prioritized for the roadmap. I would request to file a feature request in our ideas portal https://ideas.databricks.com/ideas/...

  • 5 kudos
1 More Replies
vasanthvk
by New Contributor III
  • 12000 Views
  • 7 replies
  • 3 kudos

Resolved! Is there a way to automate Table creation in Databricks SQL based on a ADLS storage location which contains multiple Parquet files?

We have ADLS container location which contains several (100+) different data subjects folders which contain Parquet files with partition column and we want to expose each of the data subject folder as a table in Databricks SQL. Is there any way to au...

  • 12000 Views
  • 7 replies
  • 3 kudos
Latest Reply
User16857282152
Databricks Employee
  • 3 kudos

Updating dazfuller suggestion, but including code for one level of partitioning, of course if you have deeper partitions then you will have to make a function and do a recursive call to get to the final directory containing parquet files. Parquet wil...

  • 3 kudos
6 More Replies
User16783852686
by Databricks Employee
  • 5895 Views
  • 4 replies
  • 2 kudos

Resolved! Slow first time run, jar based jobs

When running a jar-based job, I've noticed that the 1st run always takes the extra time to complete the job and consecutive runs take less time to finish the job. This behavior is reproducible on an interactive cluster. What's causing this? Is this e...

  • 5895 Views
  • 4 replies
  • 2 kudos
Latest Reply
User16783852686
Databricks Employee
  • 2 kudos

@Sandeep Katta​ , this is a fat jar that does read-transform-write. @DD Sharma​  response matches @Werner Stinckens​  & I intuition that there was efficiency on the second job due to jar already being loaded. I would not have noticed this had job run...

  • 2 kudos
3 More Replies
BorislavBlagoev
by Valued Contributor III
  • 3598 Views
  • 4 replies
  • 7 kudos

Resolved! Visualization of Structured Streaming in job.

Does Databricks have feature or good pattern to visualize the data from Structured Streaming? Something like display in the notebook.

  • 3598 Views
  • 4 replies
  • 7 kudos
Latest Reply
BorislavBlagoev
Valued Contributor III
  • 7 kudos

I didn't know about that. Thanks!

  • 7 kudos
3 More Replies
User16752246002
by Databricks Employee
  • 3247 Views
  • 2 replies
  • 6 kudos

Resolved! New Bill Inmon Book, What are your thoughts?

Have you checked out the new Bill Inmon Book, Building the Data Lakehouse? https://dbricks.co/3uxCXjO What were your thoughts if you read it?

  • 3247 Views
  • 2 replies
  • 6 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 6 kudos

The quality of the book depends on the audience IMO. For people who have no background in data warehousing it will be interesting to read. For the others the book is too general and descriptive. The 'HOW do you do x' is missing.

  • 6 kudos
1 More Replies
IkramMecheri
by New Contributor II
  • 15429 Views
  • 3 replies
  • 1 kudos

ImportError: No module named 'bs4'

Hi, I would like to do some web scrapping, however I am unable to import the libraries I traditionally use for that task import requests from bs4 import BeautifulSoup

  • 15429 Views
  • 3 replies
  • 1 kudos
Latest Reply
Hayley
Databricks Employee
  • 1 kudos

Did you try `%pip install bs4 `requests is standard in the databricks runtime, so you don't have to install it.

  • 1 kudos
2 More Replies
WillBlock
by Databricks Employee
  • 2687 Views
  • 2 replies
  • 2 kudos
  • 2687 Views
  • 2 replies
  • 2 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 2 kudos

@Werner Stinckens​ you can think about the Databricks Runtime as a contract. It does and will change over time. However, we offer Long Term Support versions of the runtime which offer multi-year support. If you have production jobs, I would definitel...

  • 2 kudos
1 More Replies
Zen
by New Contributor III
  • 6118 Views
  • 4 replies
  • 2 kudos

Resolved! How do I run a scala script from the Terminal

Hello, how do I run a scala script from a Terminal on Databricks - Web Terminal, or from a cell with %sh just doing `scala -nc script.scala` is not working.Thanks,

  • 6118 Views
  • 4 replies
  • 2 kudos
Latest Reply
User16753724663
Databricks Employee
  • 2 kudos

Hi @Zen​ the web terminal is basically used for shell commands only and specific to driver node only.You can install the scala on top of the driver node from web terminal with below command and use it:% sudo apt install scalaPlease let me know if thi...

  • 2 kudos
3 More Replies
FMendez
by New Contributor III
  • 17856 Views
  • 3 replies
  • 6 kudos

Resolved! How can you mount an Azure Data Lake (gen2) using abfss and Shared Key?

I wanted to mount a ADLG2 on databricks and take advantage on the abfss driver which should be better for large analytical workloads (is that even true in the context of DB?).Setting an OAuth is a bit of a pain so I wanted to take the simpler approac...

  • 17856 Views
  • 3 replies
  • 6 kudos
Latest Reply
User16753724663
Databricks Employee
  • 6 kudos

Hi @Fernando Mendez​ ,The below document will help you to mount the ADLS gen2 using abfss:https://docs.databricks.com/data/data-sources/azure/adls-gen2/azure-datalake-gen2-get-started.htmlCould you please check if this helps?

  • 6 kudos
2 More Replies
shan_chandra
by Databricks Employee
  • 7646 Views
  • 1 replies
  • 4 kudos
  • 7646 Views
  • 1 replies
  • 4 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 4 kudos

Please refer to the below widget example using SQL %sql DROP VIEW IF EXISTS tempTable; CREATE temporary view tempTable AS SELECT 'APPLE' as a UNION ALL SELECT 'ORANGE' as a UNION ALL SELECT 'BANANA' as a; CREATE WIDGET DROPDOWN fruits DEFAULT 'ORAN...

  • 4 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels