cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

inpefess
by New Contributor II
  • 3430 Views
  • 4 replies
  • 3 kudos

Does Databricks need GCP VMs for a workspace with no clusters in it?

Hi! I'm using GCP. Does Databricks workspace always need two e2-highmem-2 instances running as soon as I create a workspace? I seem them in my VM list in GCP console no matter what (I can stop or remove a cluster, but these two machines are always th...

  • 3430 Views
  • 4 replies
  • 3 kudos
Latest Reply
abagshaw
Databricks Employee
  • 3 kudos

To clarify, on Databricks on GCP will automatically delete the underlying GKE after 5 days of inactivity (no cluster launches or non-empty instance pools) in the workspace. You can contact Databricks support if you want to shorten the idle TTL for th...

  • 3 kudos
3 More Replies
MichaelO
by New Contributor III
  • 1881 Views
  • 0 replies
  • 0 kudos

gateway.create route for open source models

Am I able to use gateway.create_route in mlflow for open source LLM models?I'm aware of the syntax for propietary models like for openAI: from mlflow import gateway gateway.create_route( name=OpenAI_embeddings_route_name...

Data Engineering
llm
mlflow
  • 1881 Views
  • 0 replies
  • 0 kudos
Faisal
by Contributor
  • 1775 Views
  • 1 replies
  • 0 kudos

DLT bronze tables

I am trying to ingest incremental parquet files data to bronze streaming table, how much history data should be retained ideally in bronze layer as a general best practise considering I will be only using bronze to ingest source data and move it to s...

  • 1775 Views
  • 1 replies
  • 0 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 0 kudos

The amount of history data that should be retained in the bronze layer depends on your specific use case and requirements. As a general best practice, you should retain enough history data to support your downstream analytics and machine learning wor...

  • 0 kudos
CaptainJack
by New Contributor III
  • 6569 Views
  • 1 replies
  • 0 kudos

Workspace API

Hello friends. I am having problem with Workspace API. I have many folders inside my /Workspace (200+) which I would like to copy my Program, whole Program folder, which includes 20 spark scripts are Databricks notebooks. I tried Workspace API and I ...

  • 6569 Views
  • 1 replies
  • 0 kudos
Latest Reply
CaptainJack
New Contributor III
  • 0 kudos

I am using this as api = /api/2.0/workspace/import

  • 0 kudos
Immassive
by New Contributor II
  • 2782 Views
  • 1 replies
  • 0 kudos

Reading information_schema tables through JDBC connection

Hi, I am using Unity Catalog as storage for data. I have an external system that establishes connection to Unity Catalog via a JDBC connection using the Databricks driver:Configure the Databricks ODBC and JDBC drivers - Azure Databricks | Microsoft L...

  • 2782 Views
  • 1 replies
  • 0 kudos
Latest Reply
Immassive
New Contributor II
  • 0 kudos

Note: I can see the tables of the system.information schema in the UI of Databricks and read them there.

  • 0 kudos
JonLaRose
by New Contributor III
  • 7233 Views
  • 2 replies
  • 0 kudos

Resolved! Max amount of tables

Hi!What is the maximum amount of tables that is possible to create in a Unity catalog?Is there any difference between managed and external tables? If so, what is the limit for external tables? Thanks,Jonathan.

  • 7233 Views
  • 2 replies
  • 0 kudos
Latest Reply
JonLaRose
New Contributor III
  • 0 kudos

answer is here:https://docs.databricks.com/en/data-governance/unity-catalog/index.html#resource-quotas

  • 0 kudos
1 More Replies
coltonflowers
by New Contributor III
  • 3604 Views
  • 0 replies
  • 0 kudos

MLFlow Spark UDF Error

After trying to run spark_udf = mlflow.pyfunc.spark_udf(spark, model_uri=logged_model,env_manager="virtualenv")We get the following error:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 145.0 failed 4 times, most re...

  • 3604 Views
  • 0 replies
  • 0 kudos
alj_a
by New Contributor III
  • 1625 Views
  • 1 replies
  • 0 kudos

source db and target db in DLT

Hi,Thanks in advance.I am new in DLT, the scenario is i need to read the data from cloud storage(ADLS) and load it into my bronze table. and read it from bronz table -> do some DQ checks and load the cleaned data into my silver table. finally populat...

  • 1625 Views
  • 1 replies
  • 0 kudos
marianopenn
by New Contributor III
  • 3707 Views
  • 2 replies
  • 1 kudos

Databricks VSCode Extension Sync Timeout

I am using the databricks VSCode extension to sync my local repository to Databricks Workspaces. I have everything configured such that smaller syncs work fine, but a full sync of my repository leads to the following error:Sync Error: Post "https://<...

Data Engineering
dbx sync
Repos
VSCode
Workspaces
  • 3707 Views
  • 2 replies
  • 1 kudos
Latest Reply
kimongrigorakis
New Contributor II
  • 1 kudos

Same issue here..... Can someone please help??

  • 1 kudos
1 More Replies
Phani1
by Databricks MVP
  • 1149 Views
  • 0 replies
  • 0 kudos

Unity catalog accounts

Hi Team,We have the requirement to have metadata(Unity catalog) in one AWS account and data storage(Delta tables under data) in another account, is it possible to do that , Do we face any technical/Security issue??

  • 1149 Views
  • 0 replies
  • 0 kudos
278875
by New Contributor
  • 21928 Views
  • 4 replies
  • 1 kudos

How do I figure out the cost breakdown for Databricks

I'm trying to figure out the cost breakdown for the Databricks usage for my team.When I go into the Databricks administration console and click Usage when I select to show the usage By SKU it just displays the type of cluster but not the name of it. ...

  • 21928 Views
  • 4 replies
  • 1 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 1 kudos

Please check the below docs for usage related informations. The Billable Usage Logs: https://docs.databricks.com/en/administration-guide/account-settings/usage.html You can filter them using tags for more precise information which you are looking for...

  • 1 kudos
3 More Replies
dave_d
by New Contributor II
  • 9191 Views
  • 2 replies
  • 0 kudos

What is the "Columnar To Row" node in this simple Databricks SQL query profile?

I am running a relatively simple SQL query that writes back to a table on a Databricks serverless SQL warehouse, and I'm trying to understand why there is a "Columnar To Row" node in the query profile that is consuming the vast majority of the time s...

dave_d_0-1696974904324.png
  • 9191 Views
  • 2 replies
  • 0 kudos
Latest Reply
Annapurna_Hiriy
Databricks Employee
  • 0 kudos

 @dave_d We do not have a document with list of operations that would bring up ColumnarToRow node. This node provides a common executor to translate an RDD of ColumnarBatch into an RDD of InternalRow. This is inserted whenever such a transition is de...

  • 0 kudos
1 More Replies
Rafal9
by New Contributor III
  • 9189 Views
  • 0 replies
  • 0 kudos

Issue during testing SparkSession.sql() with pytest.

Dear Community,I am testing pyspark code via pytest using VS code and Databricks Connect.SparkSession is initiated from Databricks Connect: from databricks.connect import DatabricksSessionspark = DatabricksSession.builder.getOrCreate()I am  receiving...

  • 9189 Views
  • 0 replies
  • 0 kudos
Labels