cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

greyamber
by New Contributor II
  • 28846 Views
  • 4 replies
  • 0 kudos

Select job cluster vs all purpose cluster

I have workflow and need to run at every 1 minute interval, it is rest api call, should I go for all purpose cluster or job cluster to meet the SLA. We need to get the as soon as it is available. 

  • 28846 Views
  • 4 replies
  • 0 kudos
Latest Reply
kulkpd
Contributor
  • 0 kudos

@greyamber Interactive cluster costs two time more than job cluster. can you explain use-case of why job API needs to invoked and what API is doing. 

  • 0 kudos
3 More Replies
BricksGuy
by New Contributor III
  • 536 Views
  • 2 replies
  • 0 kudos

Extract DLT Pipeline Logs to a delta table

I want to export the dlt pipeline run details into a delta table . I want a table that should have data like this. 

BricksGuy_0-1727861395256.png
  • 536 Views
  • 2 replies
  • 0 kudos
Latest Reply
BricksGuy
New Contributor III
  • 0 kudos

Is there any way i can use event hook to log into my own delta table. If anyone have working example then that would be great.

  • 0 kudos
1 More Replies
Tito
by New Contributor II
  • 781 Views
  • 2 replies
  • 0 kudos

VS Code Databricks Connect Cluster Configuration

I am currently setting up the VSCode extension for Databricks Connect, and it’s working fine so far. However, I have a question about cluster configurations. I want to access Unity Catalog from VSCode through the extension, and I’ve noticed that I ca...

  • 781 Views
  • 2 replies
  • 0 kudos
Latest Reply
ElvaCummings
New Contributor II
  • 0 kudos

Thank you

  • 0 kudos
1 More Replies
aleknandrius
by New Contributor
  • 594 Views
  • 1 replies
  • 0 kudos

# Databricks notebook source throws FileNotFoundError: [Errno 2] No such file with PyCharm plugin

I started using Databricks plugin in Pycharm.If I have first line in my code:# Databricks notebook source ...Running such a notebook with a plugin on cluster fails with the message:FileNotFoundError: [Errno 2] No such file or directory: '/Workspace/U...

  • 594 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi @aleknandrius,How are you doing today?As per my understanding, It seems that the # Databricks notebook source line is causing confusion when running your PyCharm code on Databricks. This line is usually added to identify notebook cells, but in a P...

  • 0 kudos
AsfandQ
by New Contributor III
  • 18147 Views
  • 7 replies
  • 6 kudos

Resolved! Delta tables: Cannot set default column mapping mode to "name" in Python for delta tables

Hello,I am trying to write Delta files for some CSV data. When I docsv_dataframe.write.format("delta").save("/path/to/table.delta")I get: AnalysisException: Found invalid character(s) among " ,;{}()\n\t=" in the column names of yourschema.Having look...

  • 18147 Views
  • 7 replies
  • 6 kudos
Latest Reply
Personal1
New Contributor II
  • 6 kudos

I still get the error when I try any method. The column names with spaces are throwing error [DELTA_INVALID_CHARACTERS_IN_COLUMN_NAMES] Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.df1.write.format("delta") \ .mo...

  • 6 kudos
6 More Replies
PB-Data
by New Contributor III
  • 2209 Views
  • 5 replies
  • 0 kudos

Web Terminal

How can I use web terminal within my azure databricks workspace if the workspace is provisioned with a private end points i.e. Allow Public Network Access is disabled.I have tried accessing web terminal from Apps tab and Bottom panel of a notebook.Th...

  • 2209 Views
  • 5 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

Gotcha. See whether this link is helpful. https://learn.microsoft.com/en-us/azure/databricks/connect/storage/tutorial-azure-storage#grant-your-azure-databricks-workspace-access-to-azure-data-lake-storage-gen2

  • 0 kudos
4 More Replies
NanthakumarYoga
by New Contributor II
  • 11971 Views
  • 2 replies
  • 2 kudos

Partition in Spark

Hi Community, Need your help on understanding below topics.. I have a huge transaction file ( 20GB ) partition by transaction_date column , parquet file. I have evenly distributed data ( no skew ). There are 10 days of data and we have 10 partition f...

  • 11971 Views
  • 2 replies
  • 2 kudos
Latest Reply
Personal1
New Contributor II
  • 2 kudos

I read a .zip file in Spark and get unreadable data when I run show() on the data frame.When I check the number of partitions using df.rdd.getNumPartitions(), I get 8 (the number of cores I am using). Shouldn't the partition count be just 1 as I read...

  • 2 kudos
1 More Replies
FedericoRaimond
by New Contributor III
  • 4120 Views
  • 10 replies
  • 3 kudos

Azure Databricks Workflows with Git Integration

Hello,I receive a very weird error when attempting to connect my workflows tasks to a remote git on azure devops.As per documentation: "For a Git repository, the path relative to the repository root."Then, I use directly the name of the notebook file...

  • 4120 Views
  • 10 replies
  • 3 kudos
Latest Reply
nicole_lu_PM
Databricks Employee
  • 3 kudos

Hi Federico, The error in Error 1.png didn't look right. Since you already selected the git source for the job, you should be able to use a relative path. If you continue to run into this issue, can you please submit a support ticket if you have a Su...

  • 3 kudos
9 More Replies
databrickser
by New Contributor
  • 1175 Views
  • 2 replies
  • 0 kudos

Updating records with auto loader

I want to ingest JSON files from an S3 bucket into a Databricks table using an autoloader.A job runs every few hours to write the combined JSON data to the table.Some records might be updates to existing records, identifiable by a specific key.I want...

  • 1175 Views
  • 2 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @databrickser ,In theory, it is possible to use Auto Loader with foreachBatch to update existing records.The below code snippet show a working solution:from datetime import datetime from pyspark.sql import DataFrame from pyspark.sql.functions impo...

  • 0 kudos
1 More Replies
varonis_evgeniy
by New Contributor
  • 550 Views
  • 2 replies
  • 0 kudos

Single task job that runs SQL notebook, can't retrieve results

Hello,We are integrating databricks and I need to run a job with single task that will run notebok with SQL query in it. I Can only use SQL warehouse and no cluster, I need to retrieve a result of the the notebook task but I can't see the results. Is...

Data Engineering
dbutils
Notebook
sql
  • 550 Views
  • 2 replies
  • 0 kudos
Latest Reply
adriennn
Valued Contributor
  • 0 kudos

>  I need to retrieve a result of the the notebook taskIf you want to know if the task run has succeeded or not, you can enable the "lakeflow" system schema and you'll find the logs of jobs and task runs.You could then use the above info to execute a...

  • 0 kudos
1 More Replies
sungsoo
by New Contributor
  • 370 Views
  • 1 replies
  • 0 kudos

AWS Role of NACL outbound 3306 port

When using databricks for AWSI need to open 3306 to the NACL outbound port of the subnet where the endpoint is locatedI understand this is to communicate with the meta store of Databricks on the instanceAm I right to understand?If not, please let me ...

  • 370 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

You are correct, the intention of this port is to connect to the Hive metastore 

  • 0 kudos
Brad
by Contributor II
  • 2347 Views
  • 1 replies
  • 0 kudos

How databricks assign memory and cores

Hi team,We are using job cluster with node type 128G memory+16cores for a workflow. From document we know one worker is one node and is one executor. From Spark UI env tab we can see the spark.executor.memory is 24G, and from metrics we can see the m...

  • 2347 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Databricks allocates resources to executors on a node based on several factors, and it appears that your cluster configuration is using default settings since no specific Spark configurations were provided. Executor Memory Allocation: The spark.exec...

  • 0 kudos
AxelM
by New Contributor
  • 478 Views
  • 1 replies
  • 0 kudos

Asset Bundles from Workspace for CI/CD

Hello there,I am exploring the possibilities for CI/CD from a DEV-Workspace to PROD. Besides the Notebooks (which can easily be handled by the GIT-provider) I am mainly interested in the Deployment o Jobs/Clusters/DDl...I can nowhere find a tutorial ...

  • 478 Views
  • 1 replies
  • 0 kudos
Latest Reply
datastones
Contributor
  • 0 kudos

i think the dab mlop stack template is pretty helpful re: how to bundle, schedule and trigger custom jobshttps://docs.databricks.com/en/dev-tools/bundles/mlops-stacks.htmlyou can bundle init it locally and it should give you the skeleton of how to bu...

  • 0 kudos
balwantsingh24
by New Contributor II
  • 1733 Views
  • 3 replies
  • 0 kudos

Resolved! java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMeta

Guys please help me too solve this issue, I need it very urgent basis.

Screenshot 2024-09-27 133729.png
  • 1733 Views
  • 3 replies
  • 0 kudos
Latest Reply
saikumar246
Databricks Employee
  • 0 kudos

Hi @balwantsingh24  Internal Metastore:- Internal metastores are managed by Databricks and are typically used to store metadata about databases, tables, views, and user-defined functions (UDFs). This metadata is essential for operations like the SHOW...

  • 0 kudos
2 More Replies
Frustrated_DE
by New Contributor III
  • 839 Views
  • 4 replies
  • 0 kudos

Delta live tables multiple .csv diff schemas

Hi all,      I have a fairly straight-forward task whereby I am looking to ingest six .csv file all with different names, schema's and blob locations into individual tables on one bronze schema. I have the files in my landing zone under different fol...

  • 839 Views
  • 4 replies
  • 0 kudos
Latest Reply
Frustrated_DE
New Contributor III
  • 0 kudos

The code follows similar pattern below to load the different tables. import dltimport reimport pyspark.sql.functions as Flanding_zone = '/Volumes/bronze_dev/landing_zone/'source = 'addresses'@Dlt.table(comment="addresses snapshot",name="addresses")de...

  • 0 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels