cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Zen
by New Contributor III
  • 5602 Views
  • 2 replies
  • 3 kudos

Resolved! ssh onto Cluster as root

Hello, I'm following the instructions here:https://docs.databricks.com/clusters/configure.html?_ga=2.17611385.1712747127.1631209439-1615211488.1629573963#ssh-access-to-clustersto ssh onto the Driver node, and it's working perfectly when I ssh on as `...

  • 5602 Views
  • 2 replies
  • 3 kudos
Latest Reply
cconnell
Contributor II
  • 3 kudos

I am 99% sure that logging into a Databricks node as root will not be allowed.

  • 3 kudos
1 More Replies
Anonymous
by Not applicable
  • 2118 Views
  • 2 replies
  • 0 kudos

Resolved! What are the advantages of using Delta if I am using MLflow? How is Delta useful for DS/ML use cases?

I am already using MLflow. What benefit would Delta provide me since I am not really working on Data engineering workloads

  • 2118 Views
  • 2 replies
  • 0 kudos
Latest Reply
Sebastian
Contributor
  • 0 kudos

The most important aspect is your experiment can track the version of the data table. So during audits you will be able to trace back why a specific prediction was made.

  • 0 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 3242 Views
  • 2 replies
  • 3 kudos

Resolved! What is the best file format for a temporary table?

As part of my ETL process, I create intermediate/staging temporary tables. These tables created are read at a later point in the ETL and finally cleaned up. Should I use Delta? Using Delta creates the overhead of running optimize jobs, which would de...

  • 3242 Views
  • 2 replies
  • 3 kudos
Latest Reply
Sebastian
Contributor
  • 3 kudos

Agree.. the intermediate delta tables helps since it brings reliability to the pipeline.

  • 3 kudos
1 More Replies
Nyarish
by Contributor
  • 1167 Views
  • 0 replies
  • 0 kudos

How to connect Neo4j aura to Databricks connection Error

I get this error: org.neo4j.driver.exceptions.SecurityException: Failed to establish secured connection with the serverI have tried to read through the documentation and tried the solution suggested but I can't seem to hack this problem.Kindly help. ...

  • 1167 Views
  • 0 replies
  • 0 kudos
Zircoz
by New Contributor II
  • 14693 Views
  • 2 replies
  • 6 kudos

Resolved! Can we access the variables created in Python in Scala's code or notebook ?

If I have a dict created in python on a Scala notebook (using magic word ofcourse):%python d1 = {1: "a", 2:"b", 3:"c"}Can I access this d1 in Scala ?I tried the following and it returns d1 not found:%scala println(d1)

  • 14693 Views
  • 2 replies
  • 6 kudos
Latest Reply
cpm1
New Contributor II
  • 6 kudos

Martin is correct. We could only access the external files and objects. In most of our cases, we just use temporary views to pass data between R & Python.https://docs.databricks.com/notebooks/notebooks-use.html#mix-languages

  • 6 kudos
1 More Replies
Anonymous
by Not applicable
  • 3685 Views
  • 1 replies
  • 2 kudos

Are there any costs or quotas associated with the Databricks managed Hive metastore?

When using the default hive metastore that is managed within the Databricks control plane are there any associated costs? I.e. if I switched to an external metastore would I expect to see any reduction in my Databricks cost (ignoring total costs).Do ...

  • 3685 Views
  • 1 replies
  • 2 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 2 kudos

There are no costs associated by using the Databricks managed Hive metastore directly. Databricks pricing is on a compute consumption and not on data storage or access. The only real cost would be the compute used to access the data. I would not expe...

  • 2 kudos
Techmate
by New Contributor
  • 1695 Views
  • 1 replies
  • 0 kudos

Populating a array of date tuples Scala

Hi Friends i am trying to pass a list of date ranges needs to be in the below format. val predicates =Array(“2021-05-16” → “2021-05-17”,“2021-05-18” → “2021-05-19”,“2021-05-20” → “2021-05-21”) I am then using map to create a range of conditions that...

  • 1695 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

So basically this can be done by generating 2 lists which are then zipped.One list contains the first dates of the tuples, so these are in your case 2 days apart.The other list is the 2nd dates of the tuples, also 2 days apart.Now we need a function ...

  • 0 kudos
dlevy
by New Contributor II
  • 1677 Views
  • 1 replies
  • 1 kudos
  • 1677 Views
  • 1 replies
  • 1 kudos
Latest Reply
gbrueckl
Contributor II
  • 1 kudos

I think this was added Databricks Runtime 8.2https://docs.databricks.com/release-notes/runtime/8.2.html

  • 1 kudos
alphaRomeo
by New Contributor
  • 5331 Views
  • 2 replies
  • 0 kudos

Resolved! DataBricks with MySQL data source?

I have an existing data pipeline which looks like this: A small MySQL data source (around 250 GB) and data passes through Debezium/ Kafka / a custom data redactor -> to Glue ETL jobs and finally lands on Redshift, but the scale of the data is too sm...

  • 5331 Views
  • 2 replies
  • 0 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 0 kudos

There is a lot in this question, so generally speaking I suggest you reach out to the sales team at Databricks. You can talk to a solutions architect who get into more detail. Here are my general thoughts having seen a lot of customer arch:Generally,...

  • 0 kudos
1 More Replies
EvandroLippert_
by New Contributor
  • 2202 Views
  • 1 replies
  • 0 kudos

Conflict with bitbucket and github credentials

I'm migrating my files from Bitbucket to Github, but every time that I need to clone something from bitbucket and send it to GitHub, I need to create a new token to integrate the tools. It seems that when you save a Github credential, it overrides t...

  • 2202 Views
  • 1 replies
  • 0 kudos
Latest Reply
alexott
Databricks Employee
  • 0 kudos

Cross-posting my answer from StackOverflow:Unfortunately right now it works only with a single Git provider. It looks like that you're linking individual notebooks into Git repository. You can simplify things by cloning the Bitbucket repository(-ies)...

  • 0 kudos
Alex_G
by New Contributor II
  • 2668 Views
  • 1 replies
  • 4 kudos

Resolved! Databricks Feature Store in MLFlow run CLI command

Hello!I am attempting to move some machine learning code from a databricks notebook into a mlflow git repository. I am utilizing the databricks feature store to load features that have been processed. Currently I cannot get the databricks library to ...

  • 2668 Views
  • 1 replies
  • 4 kudos
Latest Reply
sean_owen
Databricks Employee
  • 4 kudos

Hm, what error do you get? I believe you won't be able to specify the feature store library as a dependency, as it's not externally published yet, but code that uses it should run on DB ML runtimes as it already exists there

  • 4 kudos
irfanaziz
by Contributor II
  • 2424 Views
  • 2 replies
  • 3 kudos

Does anyone know why the optimize does not complete?

I feel there is some issue with a few partitions of the delta file. The optimize runs fine and completes within few minutes for other partitions but for this particular partition the optimize keeps running forever. OPTIMIZE delta.`/mnt/prod-abc/Ini...

  • 2424 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@nafri A​ - Thank you for letting us know.

  • 3 kudos
1 More Replies
User16137833804
by Databricks Employee
  • 4474 Views
  • 3 replies
  • 1 kudos
  • 4474 Views
  • 3 replies
  • 1 kudos
Latest Reply
Sebastian
Contributor
  • 1 kudos

the best solution is to store the .whl locally and do a pip install of the local whl while server boots up. this will freeze the library version. if you install from the pip it might impact your production work.

  • 1 kudos
2 More Replies
User16856693631
by New Contributor II
  • 2288 Views
  • 1 replies
  • 0 kudos

Can you create Clusters via a REST API?

Yes, you can. See here: https://docs.databricks.com/dev-tools/api/latest/clusters.htmlThe JSON payload would look as follows:{ "cluster_name": "my-cluster", "spark_version": "7.3.x-scala2.12", "node_type_id": "i3.xlarge", "spark_conf": { ...

  • 2288 Views
  • 1 replies
  • 0 kudos
Latest Reply
ManishPatil
New Contributor II
  • 0 kudos

One can create a Cluster(s) using CLuster API @ https://docs.databricks.com/dev-tools/api/latest/clusters.html#create However, REST API 2.0 doesn't provide certain features like "Enable Table Access Control", which has been introduced after REST API ...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels