cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Anonymous
by Not applicable
  • 1897 Views
  • 3 replies
  • 19 kudos

Resolved! Welcome back! Please introduce yourself to the community. :)

Hello everyone! My name is Piper and I'm one of the community moderators for Databricks. I'd like to take this opportunity to welcome you to the new Databricks community! I'd also like to ask you to introduce yourself in this thread. We are here to h...

Colorful sign showing the world welcome in different languages.
  • 1897 Views
  • 3 replies
  • 19 kudos
Latest Reply
cconnell
Contributor II
  • 19 kudos

I work mostly with health and medical data, on a contract or project basis. I am located in Bedford MA and Ogunquit Maine. I formerly worked at Blue Metal / Insight, which is where I got my start on Databricks.Languages -- Python, PySpark, Koalashttp...

  • 19 kudos
2 More Replies
manugarri
by New Contributor II
  • 11267 Views
  • 10 replies
  • 1 kudos

Fuzzy text matching in Spark

I have a list of client provided data, a list of company names. I have to match those names with an internal database of company names. The client list can fit in memory (its about 10k elements) but the internal dataset is on hdfs and we use Spark ...

  • 11267 Views
  • 10 replies
  • 1 kudos
Latest Reply
Sonal
New Contributor II
  • 1 kudos

You can use Zingg: Spark based open source tool for this https://github.com/zinggAI/zingg

  • 1 kudos
9 More Replies
Sam
by New Contributor III
  • 1389 Views
  • 0 replies
  • 0 kudos

Can Admins enable Table Download on Sample but not on Full Dataset?

Is it possible to allow for Table download on a sampled dataset but not the full dataset? In the configuration settings it seems like you have to allow both?Not withstanding the fact people could loop through the sample download, it seems like a prud...

  • 1389 Views
  • 0 replies
  • 0 kudos
saniafatimi
by New Contributor II
  • 1311 Views
  • 1 replies
  • 1 kudos

Need guidance on migrating power bi reports to databricks

Hi All, I want to import an existing database/tables (say AdventureWorks) to databricks. And after importing tables, I want to develop reports on top I need guidance on this. Can someone give me resources that could help me in doing things end to en...

  • 1311 Views
  • 1 replies
  • 1 kudos
Latest Reply
Chris_Shehu
Valued Contributor III
  • 1 kudos

@sania fatimi​  There are several different ways to do this and it's really going to depend on what your current need is. You could for example load the data into the databricks delta lake and use the databricks powerbi connecter to query the data fr...

  • 1 kudos
User16830818524
by New Contributor II
  • 1732 Views
  • 3 replies
  • 0 kudos

Resolved! Libraries in Databricks Runtimes

Is it possible to easily determine what libraries and which version are included in a specific DBR Version?

  • 1732 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hello. My name is Piper and I'm one of the community moderators. One of the team members sent this information to me.This should be the correct path to check libraries installed with DBRs.https://docs.databricks.com/release-notes/runtime/8.3ml.html?_...

  • 0 kudos
2 More Replies
Rodrigo_Brandet
by New Contributor
  • 3548 Views
  • 3 replies
  • 4 kudos

Resolved! Upload CSV files on Databricks by code (note UI)

Hello everyone.I have a process on databricks when I need to upload a CSV file everyday manually.I would like to know if there is a way to import this data (as panda in python, for example) with no necessary to upload this file everyday manually util...

  • 3548 Views
  • 3 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

Autoloader is indeed a valid option,or use of some kind of ETL tool which fetches the file and put it somewhere on your cloud provider, like Azure Data Factory or AWS Glue etc.

  • 4 kudos
2 More Replies
Zen
by New Contributor III
  • 3736 Views
  • 2 replies
  • 3 kudos

Resolved! ssh onto Cluster as root

Hello, I'm following the instructions here:https://docs.databricks.com/clusters/configure.html?_ga=2.17611385.1712747127.1631209439-1615211488.1629573963#ssh-access-to-clustersto ssh onto the Driver node, and it's working perfectly when I ssh on as `...

  • 3736 Views
  • 2 replies
  • 3 kudos
Latest Reply
cconnell
Contributor II
  • 3 kudos

I am 99% sure that logging into a Databricks node as root will not be allowed.

  • 3 kudos
1 More Replies
Nyarish
by Contributor
  • 11180 Views
  • 17 replies
  • 18 kudos

Resolved! How to connect Neo4j aura to a cluster

Please help resolve this error :org.neo4j.driver.exceptions.SecurityException: Failed to establish secured connection with the serverThis occurs when I want to establish a connection to neo4j aura to my cluster .Thank you.

  • 11180 Views
  • 17 replies
  • 18 kudos
Latest Reply
Anonymous
Not applicable
  • 18 kudos

@Werner Stinckens​ and @Nyaribo Maseru​ - You two are awesome! Thank you for working so hard together.

  • 18 kudos
16 More Replies
Anonymous
by Not applicable
  • 1486 Views
  • 2 replies
  • 0 kudos

Resolved! What are the advantages of using Delta if I am using MLflow? How is Delta useful for DS/ML use cases?

I am already using MLflow. What benefit would Delta provide me since I am not really working on Data engineering workloads

  • 1486 Views
  • 2 replies
  • 0 kudos
Latest Reply
Sebastian
Contributor
  • 0 kudos

The most important aspect is your experiment can track the version of the data table. So during audits you will be able to trace back why a specific prediction was made.

  • 0 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 2718 Views
  • 2 replies
  • 3 kudos

Resolved! What is the best file format for a temporary table?

As part of my ETL process, I create intermediate/staging temporary tables. These tables created are read at a later point in the ETL and finally cleaned up. Should I use Delta? Using Delta creates the overhead of running optimize jobs, which would de...

  • 2718 Views
  • 2 replies
  • 3 kudos
Latest Reply
Sebastian
Contributor
  • 3 kudos

Agree.. the intermediate delta tables helps since it brings reliability to the pipeline.

  • 3 kudos
1 More Replies
Nyarish
by Contributor
  • 797 Views
  • 0 replies
  • 0 kudos

How to connect Neo4j aura to Databricks connection Error

I get this error: org.neo4j.driver.exceptions.SecurityException: Failed to establish secured connection with the serverI have tried to read through the documentation and tried the solution suggested but I can't seem to hack this problem.Kindly help. ...

  • 797 Views
  • 0 replies
  • 0 kudos
Zircoz
by New Contributor II
  • 12743 Views
  • 2 replies
  • 6 kudos

Resolved! Can we access the variables created in Python in Scala's code or notebook ?

If I have a dict created in python on a Scala notebook (using magic word ofcourse):%python d1 = {1: "a", 2:"b", 3:"c"}Can I access this d1 in Scala ?I tried the following and it returns d1 not found:%scala println(d1)

  • 12743 Views
  • 2 replies
  • 6 kudos
Latest Reply
cpm1
New Contributor II
  • 6 kudos

Martin is correct. We could only access the external files and objects. In most of our cases, we just use temporary views to pass data between R & Python.https://docs.databricks.com/notebooks/notebooks-use.html#mix-languages

  • 6 kudos
1 More Replies
Anonymous
by Not applicable
  • 2576 Views
  • 1 replies
  • 2 kudos

Are there any costs or quotas associated with the Databricks managed Hive metastore?

When using the default hive metastore that is managed within the Databricks control plane are there any associated costs? I.e. if I switched to an external metastore would I expect to see any reduction in my Databricks cost (ignoring total costs).Do ...

  • 2576 Views
  • 1 replies
  • 2 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 2 kudos

There are no costs associated by using the Databricks managed Hive metastore directly. Databricks pricing is on a compute consumption and not on data storage or access. The only real cost would be the compute used to access the data. I would not expe...

  • 2 kudos
Techmate
by New Contributor
  • 1185 Views
  • 1 replies
  • 0 kudos

Populating a array of date tuples Scala

Hi Friends i am trying to pass a list of date ranges needs to be in the below format. val predicates =Array(“2021-05-16” → “2021-05-17”,“2021-05-18” → “2021-05-19”,“2021-05-20” → “2021-05-21”) I am then using map to create a range of conditions that...

  • 1185 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

So basically this can be done by generating 2 lists which are then zipped.One list contains the first dates of the tuples, so these are in your case 2 days apart.The other list is the 2nd dates of the tuples, also 2 days apart.Now we need a function ...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels