cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

WillBlock
by Databricks Employee
  • 2702 Views
  • 2 replies
  • 2 kudos
  • 2702 Views
  • 2 replies
  • 2 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 2 kudos

@Werner Stinckens​ you can think about the Databricks Runtime as a contract. It does and will change over time. However, we offer Long Term Support versions of the runtime which offer multi-year support. If you have production jobs, I would definitel...

  • 2 kudos
1 More Replies
Zen
by New Contributor III
  • 6152 Views
  • 4 replies
  • 2 kudos

Resolved! How do I run a scala script from the Terminal

Hello, how do I run a scala script from a Terminal on Databricks - Web Terminal, or from a cell with %sh just doing `scala -nc script.scala` is not working.Thanks,

  • 6152 Views
  • 4 replies
  • 2 kudos
Latest Reply
User16753724663
Databricks Employee
  • 2 kudos

Hi @Zen​ the web terminal is basically used for shell commands only and specific to driver node only.You can install the scala on top of the driver node from web terminal with below command and use it:% sudo apt install scalaPlease let me know if thi...

  • 2 kudos
3 More Replies
FMendez
by New Contributor III
  • 17919 Views
  • 3 replies
  • 6 kudos

Resolved! How can you mount an Azure Data Lake (gen2) using abfss and Shared Key?

I wanted to mount a ADLG2 on databricks and take advantage on the abfss driver which should be better for large analytical workloads (is that even true in the context of DB?).Setting an OAuth is a bit of a pain so I wanted to take the simpler approac...

  • 17919 Views
  • 3 replies
  • 6 kudos
Latest Reply
User16753724663
Databricks Employee
  • 6 kudos

Hi @Fernando Mendez​ ,The below document will help you to mount the ADLS gen2 using abfss:https://docs.databricks.com/data/data-sources/azure/adls-gen2/azure-datalake-gen2-get-started.htmlCould you please check if this helps?

  • 6 kudos
2 More Replies
shan_chandra
by Databricks Employee
  • 7681 Views
  • 1 replies
  • 4 kudos
  • 7681 Views
  • 1 replies
  • 4 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 4 kudos

Please refer to the below widget example using SQL %sql DROP VIEW IF EXISTS tempTable; CREATE temporary view tempTable AS SELECT 'APPLE' as a UNION ALL SELECT 'ORANGE' as a UNION ALL SELECT 'BANANA' as a; CREATE WIDGET DROPDOWN fruits DEFAULT 'ORAN...

  • 4 kudos
User16789201666
by Databricks Employee
  • 2032 Views
  • 2 replies
  • 0 kudos

What are some guidelines for migrating to DBR 7/Spark 3?

What are some guidelines for migrating to DBR 7/Spark 3?

  • 2032 Views
  • 2 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

Please refer to the below reference for switching to DBR 7.xWe have extended our DBR 6.4 support until December 2021, DBR 6.4 extended support - Release notes: https://docs.databricks.com/release-notes/runtime/6.4x.htmlMigration guide to DBR 7.x: htt...

  • 0 kudos
1 More Replies
MGH1
by New Contributor III
  • 7565 Views
  • 5 replies
  • 3 kudos

Resolved! how to log the KerasClassifier model in a sklearn pipeline in mlflow?

I have a set of pre-processing stages in a sklearn `Pipeline` and an estimator which is a `KerasClassifier` (`from tensorflow.keras.wrappers.scikit_learn import KerasClassifier`).My overall goal is to tune and log the whole sklearn pipeline in `mlflo...

  • 7565 Views
  • 5 replies
  • 3 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 3 kudos

could you please share the full error stack trace?

  • 3 kudos
4 More Replies
brij
by New Contributor III
  • 10031 Views
  • 7 replies
  • 3 kudos

Resolved! Databricks snowflake dataframe.toPandas() taking more space and time

I have 2 exactly same table(rows and schema). One table recides in AZSQL server data base and other one is in snowflake database. Now we have some existing code which we want to migrate from azsql to snowflake but when we are trying to create a panda...

  • 10031 Views
  • 7 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Brijan Elwadhi​ - That's wonderful. Thank you for sharing your solution.

  • 3 kudos
6 More Replies
krishnachaitany
by New Contributor II
  • 1700 Views
  • 1 replies
  • 2 kudos

Spot Instances in Azure Databricks

The above screen shot is from AWS Databricks cluster .Similarly, in Azure Databricks - Is there a specific way to determine how many of worker nodes are using spot instances and on-demand instances when it is running/completed a job.Likewise, ...

Compute level spot instances and on demand instances
  • 1700 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hello!My name is Piper and I'm one of the community moderators. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will follow up with the team. Thanks for your p...

  • 2 kudos
Databricks2005
by New Contributor II
  • 5897 Views
  • 3 replies
  • 3 kudos

Resolved! Cosine similarity between all rows pairwise on a dataset of 100million rows

Hello everyone,I am facing performance issue while calculating cosine similarity in pyspark on a dataframe with around 100 million records.I am trying to do a cross self join on the dataframe to calculate it.​The executors are all having same number ...

  • 5897 Views
  • 3 replies
  • 3 kudos
Latest Reply
Sonal
New Contributor II
  • 3 kudos

Is there a way to hash the record attributes so that the cartesian join can be avoided? I work on record similarity and fuzzy matching and we do a learning based blocking alorithm which hashes the records into smaller buckets and then the hashes are ...

  • 3 kudos
2 More Replies
Quan
by New Contributor III
  • 27443 Views
  • 8 replies
  • 5 kudos

Resolved! How to properly load Unicode (UTF-8) characters from table over JDBC connection using Simba Spark Driver

Hello all, I'm trying to pull table data from databricks tables that contain foreign language characters in UTF-8 into an ETL tool using a JDBC connection. I'm using the latest Simba Spark JDBC driver available from the Databricks website.The issue i...

  • 27443 Views
  • 8 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Can you try setting UseUnicodeSqlCharacterTypes=1 in the driver, and also make sure 'file.encoding' is set to UTF-8 in jvm and see if the issue still persists?

  • 5 kudos
7 More Replies
Abhendu
by New Contributor II
  • 2386 Views
  • 2 replies
  • 0 kudos

Resolved! CICD Databricks

Hi TeamI was wondering if there is a document or step by step process to promote code in CICD across various environments of code repository (GIT/GITHUB/BitBucket/Gitlab) with DBx support? [Without involving code repository merging capability of the ...

  • 2386 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Please refer this related thread on CICD in Databricks https://community.databricks.com/s/question/0D53f00001GHVhMCAX/what-are-some-best-practices-for-cicd

  • 0 kudos
1 More Replies
Karankaran_alan
by New Contributor
  • 1826 Views
  • 1 replies
  • 0 kudos

cluster not getting created, timing out

Hello - i've been using the Databricks notebook(for pyspark or scala/spark development), and recently have had issues wherein the cluster creation takes a long time to get created, often timing out. Any ideas on how to resolve this ?

  • 1826 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi Karankaran.alang,What is the error message you are getting? did you get this error while creating/starting a cluster CE?some times these errors are intermittent and go away after a few re-tries.Thank you

  • 0 kudos
RajaLakshmanan
by New Contributor
  • 4487 Views
  • 2 replies
  • 1 kudos

Resolved! Spark StreamingQuery not processing all data from source directory

Hi,I have setup a streaming process that consumers files from HDFS staging directory and writes into target location. Input directory continuesouly gets files from another process.Lets say file producer produces 5 million records sends it to hdfs sta...

  • 4487 Views
  • 2 replies
  • 1 kudos
Latest Reply
User16763506586
Databricks Employee
  • 1 kudos

If it helps , you run try running the Left-Anti join on source and sink to identify missing records and see whether the record is in match with the schema provided or not

  • 1 kudos
1 More Replies
User15787040559
by Databricks Employee
  • 2838 Views
  • 1 replies
  • 1 kudos

How can I get Databricks notebooks to stop cutting off the explain plans?

(since Spark 3.0)Dataset.queryExecution.debug.toFilewill dump the full plan to a file, without concatenating the output as a fully materialized Java string in memory.

  • 2838 Views
  • 1 replies
  • 1 kudos
Latest Reply
dazfuller
Contributor III
  • 1 kudos

Notebooks really aren't the best method of viewing large files. Two methods you could employ areSave the file to dbfs and then use databricks CLI to download the fileUse the web terminalIn the web terminal option you can do something like "cat my_lar...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels