cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

narek_margaryan
by New Contributor II
  • 3327 Views
  • 1 replies
  • 3 kudos

Resolved! Do Spark nodes read data from storage in a sequence?

I'm new to Spark and trying to understand how some of its components work.I understand that once the data is loaded into the memory of separate nodes, they process partitions in parallel, within their own memory (RAM).But I'm wondering whether the in...

  • 3327 Views
  • 1 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

@Narek Margaryan​ , Normally the reading is done in parallel because the underlying file system is already distributed (if you use HDFS-based storage or something like, a data lake f.e.).The number of partitions in the file itself also matters.This l...

  • 3 kudos
Kotofosonline
by New Contributor III
  • 6416 Views
  • 2 replies
  • 3 kudos

Resolved! Query with distinct sort and alias produces error column not found

I’m trying to use sql query on azure-databricks with distinct sort and aliasesSELECT DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistIdThe problem is that if I add an alias then I can not use not aliased name in the order by ...

  • 6416 Views
  • 2 replies
  • 3 kudos
Latest Reply
Kotofosonline
New Contributor III
  • 3 kudos

The code from above is worked in both cases.

  • 3 kudos
1 More Replies
dataslicer
by Contributor
  • 3453 Views
  • 1 replies
  • 1 kudos

Resolved! upgraded R package rlang to 0.4.11 on DBR 8.3 SC, but sessionInfo() still shows rlang as 0.4.9

I am using Azure Databricks Runtime (DBR) 8.3 ML with Python notebook and R cells together.I want to use "tidyverse" and one of the dependency is rlang >= 0.4.10 and the base DBR 8.3 ML provides rlang @ 0.4.9. I successfully upgraded the R package t...

  • 3453 Views
  • 1 replies
  • 1 kudos
Latest Reply
Sivaprasad1
Valued Contributor II
  • 1 kudos

@Jim Huang​ : Could you please try to restart the session and try to run tidyverse. Looks like the older version of rlang loaded in session.Error : package or namespace load failed for ‘tidyverse’ in loadNamespace(i, c(lib.loc, .libPaths()), versionC...

  • 1 kudos
amichel
by New Contributor III
  • 3965 Views
  • 2 replies
  • 5 kudos

Resolved! Is there a stable, ideally official JMS/ActiveMQ connector for Spark?

We're delivering pipelines that are mostly based on Databricks Spark Streaming, Delta Lake and Azure Event Hubs, and there's a requirement to integrate with AMQ/JMS endpoints (Request and Response queues in ActiveMQ).Is there a proven way to integrat...

  • 3965 Views
  • 2 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

@amichel We have a feature request to add structured streaming support for Tibco EMS and JMS. Unfortunately, it's yet to be prioritized for the roadmap. I would request to file a feature request in our ideas portal https://ideas.databricks.com/ideas/...

  • 5 kudos
1 More Replies
vasanthvk
by New Contributor III
  • 11107 Views
  • 7 replies
  • 3 kudos

Resolved! Is there a way to automate Table creation in Databricks SQL based on a ADLS storage location which contains multiple Parquet files?

We have ADLS container location which contains several (100+) different data subjects folders which contain Parquet files with partition column and we want to expose each of the data subject folder as a table in Databricks SQL. Is there any way to au...

  • 11107 Views
  • 7 replies
  • 3 kudos
Latest Reply
User16857282152
Contributor
  • 3 kudos

Updating dazfuller suggestion, but including code for one level of partitioning, of course if you have deeper partitions then you will have to make a function and do a recursive call to get to the final directory containing parquet files. Parquet wil...

  • 3 kudos
6 More Replies
User16783852686
by Databricks Employee
  • 4270 Views
  • 4 replies
  • 2 kudos

Resolved! Slow first time run, jar based jobs

When running a jar-based job, I've noticed that the 1st run always takes the extra time to complete the job and consecutive runs take less time to finish the job. This behavior is reproducible on an interactive cluster. What's causing this? Is this e...

  • 4270 Views
  • 4 replies
  • 2 kudos
Latest Reply
User16783852686
Databricks Employee
  • 2 kudos

@Sandeep Katta​ , this is a fat jar that does read-transform-write. @DD Sharma​  response matches @Werner Stinckens​  & I intuition that there was efficiency on the second job due to jar already being loaded. I would not have noticed this had job run...

  • 2 kudos
3 More Replies
BorislavBlagoev
by Valued Contributor III
  • 3258 Views
  • 4 replies
  • 7 kudos

Resolved! Visualization of Structured Streaming in job.

Does Databricks have feature or good pattern to visualize the data from Structured Streaming? Something like display in the notebook.

  • 3258 Views
  • 4 replies
  • 7 kudos
Latest Reply
BorislavBlagoev
Valued Contributor III
  • 7 kudos

I didn't know about that. Thanks!

  • 7 kudos
3 More Replies
User16752246002
by Databricks Employee
  • 2768 Views
  • 2 replies
  • 6 kudos

Resolved! New Bill Inmon Book, What are your thoughts?

Have you checked out the new Bill Inmon Book, Building the Data Lakehouse? https://dbricks.co/3uxCXjO What were your thoughts if you read it?

  • 2768 Views
  • 2 replies
  • 6 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 6 kudos

The quality of the book depends on the audience IMO. For people who have no background in data warehousing it will be interesting to read. For the others the book is too general and descriptive. The 'HOW do you do x' is missing.

  • 6 kudos
1 More Replies
IkramMecheri
by New Contributor II
  • 14401 Views
  • 3 replies
  • 1 kudos

ImportError: No module named 'bs4'

Hi, I would like to do some web scrapping, however I am unable to import the libraries I traditionally use for that task import requests from bs4 import BeautifulSoup

  • 14401 Views
  • 3 replies
  • 1 kudos
Latest Reply
Hayley
Databricks Employee
  • 1 kudos

Did you try `%pip install bs4 `requests is standard in the databricks runtime, so you don't have to install it.

  • 1 kudos
2 More Replies
WillBlock
by Contributor
  • 2401 Views
  • 2 replies
  • 2 kudos
  • 2401 Views
  • 2 replies
  • 2 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 2 kudos

@Werner Stinckens​ you can think about the Databricks Runtime as a contract. It does and will change over time. However, we offer Long Term Support versions of the runtime which offer multi-year support. If you have production jobs, I would definitel...

  • 2 kudos
1 More Replies
Zen
by New Contributor III
  • 5702 Views
  • 4 replies
  • 2 kudos

Resolved! How do I run a scala script from the Terminal

Hello, how do I run a scala script from a Terminal on Databricks - Web Terminal, or from a cell with %sh just doing `scala -nc script.scala` is not working.Thanks,

  • 5702 Views
  • 4 replies
  • 2 kudos
Latest Reply
User16753724663
Valued Contributor
  • 2 kudos

Hi @Zen​ the web terminal is basically used for shell commands only and specific to driver node only.You can install the scala on top of the driver node from web terminal with below command and use it:% sudo apt install scalaPlease let me know if thi...

  • 2 kudos
3 More Replies
FMendez
by New Contributor III
  • 16607 Views
  • 3 replies
  • 6 kudos

Resolved! How can you mount an Azure Data Lake (gen2) using abfss and Shared Key?

I wanted to mount a ADLG2 on databricks and take advantage on the abfss driver which should be better for large analytical workloads (is that even true in the context of DB?).Setting an OAuth is a bit of a pain so I wanted to take the simpler approac...

  • 16607 Views
  • 3 replies
  • 6 kudos
Latest Reply
User16753724663
Valued Contributor
  • 6 kudos

Hi @Fernando Mendez​ ,The below document will help you to mount the ADLS gen2 using abfss:https://docs.databricks.com/data/data-sources/azure/adls-gen2/azure-datalake-gen2-get-started.htmlCould you please check if this helps?

  • 6 kudos
2 More Replies
shan_chandra
by Databricks Employee
  • 7190 Views
  • 1 replies
  • 4 kudos
  • 7190 Views
  • 1 replies
  • 4 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 4 kudos

Please refer to the below widget example using SQL %sql DROP VIEW IF EXISTS tempTable; CREATE temporary view tempTable AS SELECT 'APPLE' as a UNION ALL SELECT 'ORANGE' as a UNION ALL SELECT 'BANANA' as a; CREATE WIDGET DROPDOWN fruits DEFAULT 'ORAN...

  • 4 kudos
User16789201666
by Databricks Employee
  • 1833 Views
  • 2 replies
  • 0 kudos

What are some guidelines for migrating to DBR 7/Spark 3?

What are some guidelines for migrating to DBR 7/Spark 3?

  • 1833 Views
  • 2 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

Please refer to the below reference for switching to DBR 7.xWe have extended our DBR 6.4 support until December 2021, DBR 6.4 extended support - Release notes: https://docs.databricks.com/release-notes/runtime/6.4x.htmlMigration guide to DBR 7.x: htt...

  • 0 kudos
1 More Replies
MGH1
by New Contributor III
  • 6996 Views
  • 5 replies
  • 3 kudos

Resolved! how to log the KerasClassifier model in a sklearn pipeline in mlflow?

I have a set of pre-processing stages in a sklearn `Pipeline` and an estimator which is a `KerasClassifier` (`from tensorflow.keras.wrappers.scikit_learn import KerasClassifier`).My overall goal is to tune and log the whole sklearn pipeline in `mlflo...

  • 6996 Views
  • 5 replies
  • 3 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 3 kudos

could you please share the full error stack trace?

  • 3 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels