cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

saniafatimi
by New Contributor II
  • 3564 Views
  • 1 replies
  • 1 kudos

Need guidance on migrating power bi reports to databricks

Hi All, I want to import an existing database/tables (say AdventureWorks) to databricks. And after importing tables, I want to develop reports on top I need guidance on this. Can someone give me resources that could help me in doing things end to en...

  • 3564 Views
  • 1 replies
  • 1 kudos
Latest Reply
Chris_Shehu
Valued Contributor III
  • 1 kudos

@sania fatimi​  There are several different ways to do this and it's really going to depend on what your current need is. You could for example load the data into the databricks delta lake and use the databricks powerbi connecter to query the data fr...

  • 1 kudos
User16830818524
by Databricks Employee
  • 2730 Views
  • 3 replies
  • 0 kudos

Resolved! Libraries in Databricks Runtimes

Is it possible to easily determine what libraries and which version are included in a specific DBR Version?

  • 2730 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hello. My name is Piper and I'm one of the community moderators. One of the team members sent this information to me.This should be the correct path to check libraries installed with DBRs.https://docs.databricks.com/release-notes/runtime/8.3ml.html?_...

  • 0 kudos
2 More Replies
Rodrigo_Brandet
by New Contributor
  • 5139 Views
  • 3 replies
  • 4 kudos

Resolved! Upload CSV files on Databricks by code (note UI)

Hello everyone.I have a process on databricks when I need to upload a CSV file everyday manually.I would like to know if there is a way to import this data (as panda in python, for example) with no necessary to upload this file everyday manually util...

  • 5139 Views
  • 3 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

Autoloader is indeed a valid option,or use of some kind of ETL tool which fetches the file and put it somewhere on your cloud provider, like Azure Data Factory or AWS Glue etc.

  • 4 kudos
2 More Replies
Zen
by New Contributor III
  • 5979 Views
  • 2 replies
  • 3 kudos

Resolved! ssh onto Cluster as root

Hello, I'm following the instructions here:https://docs.databricks.com/clusters/configure.html?_ga=2.17611385.1712747127.1631209439-1615211488.1629573963#ssh-access-to-clustersto ssh onto the Driver node, and it's working perfectly when I ssh on as `...

  • 5979 Views
  • 2 replies
  • 3 kudos
Latest Reply
cconnell
Contributor II
  • 3 kudos

I am 99% sure that logging into a Databricks node as root will not be allowed.

  • 3 kudos
1 More Replies
Anonymous
by Not applicable
  • 2286 Views
  • 2 replies
  • 0 kudos

Resolved! What are the advantages of using Delta if I am using MLflow? How is Delta useful for DS/ML use cases?

I am already using MLflow. What benefit would Delta provide me since I am not really working on Data engineering workloads

  • 2286 Views
  • 2 replies
  • 0 kudos
Latest Reply
Sebastian
Contributor
  • 0 kudos

The most important aspect is your experiment can track the version of the data table. So during audits you will be able to trace back why a specific prediction was made.

  • 0 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 3430 Views
  • 2 replies
  • 3 kudos

Resolved! What is the best file format for a temporary table?

As part of my ETL process, I create intermediate/staging temporary tables. These tables created are read at a later point in the ETL and finally cleaned up. Should I use Delta? Using Delta creates the overhead of running optimize jobs, which would de...

  • 3430 Views
  • 2 replies
  • 3 kudos
Latest Reply
Sebastian
Contributor
  • 3 kudos

Agree.. the intermediate delta tables helps since it brings reliability to the pipeline.

  • 3 kudos
1 More Replies
Nyarish
by Contributor
  • 1252 Views
  • 0 replies
  • 0 kudos

How to connect Neo4j aura to Databricks connection Error

I get this error: org.neo4j.driver.exceptions.SecurityException: Failed to establish secured connection with the serverI have tried to read through the documentation and tried the solution suggested but I can't seem to hack this problem.Kindly help. ...

  • 1252 Views
  • 0 replies
  • 0 kudos
Zircoz
by New Contributor II
  • 15007 Views
  • 2 replies
  • 6 kudos

Resolved! Can we access the variables created in Python in Scala's code or notebook ?

If I have a dict created in python on a Scala notebook (using magic word ofcourse):%python d1 = {1: "a", 2:"b", 3:"c"}Can I access this d1 in Scala ?I tried the following and it returns d1 not found:%scala println(d1)

  • 15007 Views
  • 2 replies
  • 6 kudos
Latest Reply
cpm1
New Contributor II
  • 6 kudos

Martin is correct. We could only access the external files and objects. In most of our cases, we just use temporary views to pass data between R & Python.https://docs.databricks.com/notebooks/notebooks-use.html#mix-languages

  • 6 kudos
1 More Replies
Anonymous
by Not applicable
  • 3896 Views
  • 1 replies
  • 2 kudos

Are there any costs or quotas associated with the Databricks managed Hive metastore?

When using the default hive metastore that is managed within the Databricks control plane are there any associated costs? I.e. if I switched to an external metastore would I expect to see any reduction in my Databricks cost (ignoring total costs).Do ...

  • 3896 Views
  • 1 replies
  • 2 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 2 kudos

There are no costs associated by using the Databricks managed Hive metastore directly. Databricks pricing is on a compute consumption and not on data storage or access. The only real cost would be the compute used to access the data. I would not expe...

  • 2 kudos
Techmate
by New Contributor
  • 1831 Views
  • 1 replies
  • 0 kudos

Populating a array of date tuples Scala

Hi Friends i am trying to pass a list of date ranges needs to be in the below format. val predicates =Array(“2021-05-16” → “2021-05-17”,“2021-05-18” → “2021-05-19”,“2021-05-20” → “2021-05-21”) I am then using map to create a range of conditions that...

  • 1831 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

So basically this can be done by generating 2 lists which are then zipped.One list contains the first dates of the tuples, so these are in your case 2 days apart.The other list is the 2nd dates of the tuples, also 2 days apart.Now we need a function ...

  • 0 kudos
dlevy
by Databricks Employee
  • 1843 Views
  • 1 replies
  • 1 kudos
  • 1843 Views
  • 1 replies
  • 1 kudos
Latest Reply
gbrueckl
Contributor II
  • 1 kudos

I think this was added Databricks Runtime 8.2https://docs.databricks.com/release-notes/runtime/8.2.html

  • 1 kudos
alphaRomeo
by New Contributor
  • 5771 Views
  • 2 replies
  • 0 kudos

Resolved! DataBricks with MySQL data source?

I have an existing data pipeline which looks like this: A small MySQL data source (around 250 GB) and data passes through Debezium/ Kafka / a custom data redactor -> to Glue ETL jobs and finally lands on Redshift, but the scale of the data is too sm...

  • 5771 Views
  • 2 replies
  • 0 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 0 kudos

There is a lot in this question, so generally speaking I suggest you reach out to the sales team at Databricks. You can talk to a solutions architect who get into more detail. Here are my general thoughts having seen a lot of customer arch:Generally,...

  • 0 kudos
1 More Replies
EvandroLippert_
by New Contributor
  • 2336 Views
  • 1 replies
  • 0 kudos

Conflict with bitbucket and github credentials

I'm migrating my files from Bitbucket to Github, but every time that I need to clone something from bitbucket and send it to GitHub, I need to create a new token to integrate the tools. It seems that when you save a Github credential, it overrides t...

  • 2336 Views
  • 1 replies
  • 0 kudos
Latest Reply
alexott
Databricks Employee
  • 0 kudos

Cross-posting my answer from StackOverflow:Unfortunately right now it works only with a single Git provider. It looks like that you're linking individual notebooks into Git repository. You can simplify things by cloning the Bitbucket repository(-ies)...

  • 0 kudos
Alex_G
by New Contributor II
  • 2819 Views
  • 1 replies
  • 4 kudos

Resolved! Databricks Feature Store in MLFlow run CLI command

Hello!I am attempting to move some machine learning code from a databricks notebook into a mlflow git repository. I am utilizing the databricks feature store to load features that have been processed. Currently I cannot get the databricks library to ...

  • 2819 Views
  • 1 replies
  • 4 kudos
Latest Reply
sean_owen
Databricks Employee
  • 4 kudos

Hm, what error do you get? I believe you won't be able to specify the feature store library as a dependency, as it's not externally published yet, but code that uses it should run on DB ML runtimes as it already exists there

  • 4 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels