cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RyanHager
by Contributor
  • 4 Views
  • 0 replies
  • 0 kudos

Liquid Clustering and S3 Performance

Are there any performance concerns when using liquid clustering and AWS S3.  I believe all the parquet files go in the same folder (Prefix in AWS S3 Terms) verses folders per partition when using "partition by".  And there is this note on S3 performa...

  • 4 Views
  • 0 replies
  • 0 kudos
libpekin
by New Contributor II
  • 58 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks Free Edition - Accessing files in S3

Hello,Attempting read/write files from s3 but got the error below. I am on the free edition (serverless by default). I'm  using access_key and secret_key. Has anyone done this successfully? Thanks!Directly accessing the underlying Spark driver JVM us...

  • 58 Views
  • 1 replies
  • 1 kudos
Latest Reply
Sanjeeb2024
Contributor
  • 1 kudos

I do not think you can read files from S3 in free edition. The best way of playing with free edition, upload the data in volumes and develop your pipelines.

  • 1 kudos
Gaurav_784295
by New Contributor III
  • 3431 Views
  • 3 replies
  • 0 kudos

pyspark.sql.utils.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets

pyspark.sql.utils.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/DatasetsGetting this error while writing can any one please tell how we can resolve it

  • 3431 Views
  • 3 replies
  • 0 kudos
Latest Reply
preetmdata
New Contributor II
  • 0 kudos

Hi @Gaurav_784295  ,In Spark, In case of streaming, please use a time based column in window function. Because, In streaming we cant say "last 10 rows", "limit 10" etc. Because streaming never ends. So when you use window, please dont use columns lik...

  • 0 kudos
2 More Replies
espenol
by New Contributor III
  • 27279 Views
  • 11 replies
  • 13 kudos

input_file_name() not supported in Unity Catalog

Hey, so our notebooks reading a bunch of json files from storage typically use a input_file_name() when moving from raw to bronze, but after upgrading to Unity Catalog we get an error message:AnalysisException: [UC_COMMAND_NOT_SUPPORTED] input_file_n...

  • 27279 Views
  • 11 replies
  • 13 kudos
Latest Reply
ramanpreet
Visitor
  • 13 kudos

The reason why the 'input_file_name' is not supported because this function was available in older versions of Databricks runtime. It got deprecated from Databricks Runtime 13.3 LTS onwards

  • 13 kudos
10 More Replies
mydefaultlogin
by New Contributor II
  • 847 Views
  • 2 replies
  • 0 kudos

Inconsistent PYTHONPATH, Git folders vs DAB

Hello Databricks Community,I'm encountering an issue related to Python paths when working with notebooks in Databricks. I have a following structure in my project:my_notebooks - my_notebook.py /my_package - __init__.py - hello.py databricks.yml...

  • 847 Views
  • 2 replies
  • 0 kudos
Latest Reply
kenny_hero
Visitor
  • 0 kudos

I have a related question.I'm new to Databricks platform. I struggle with PYTHONPATH issue as the original poster raised. I understand using sys.path.append(...) is one approach for notebook. This is acceptable for ad-hoc interactive session, but thi...

  • 0 kudos
1 More Replies
bsr
by New Contributor II
  • 92 Views
  • 2 replies
  • 1 kudos

Resolved! DBR 17.3.3 introduced unexpected DEBUG logs from ThreadMonitor – how to disable?

After upgrading from DBR 17.3.2 to DBR 17.3.3, we started seeing a flood of DEBUG logs like this in job outputs:```DEBUG:ThreadMonitor:Logging python thread stack frames for MainThread and py4j threads: DEBUG:ThreadMonitor:Logging Thread-8 (run) stac...

  • 92 Views
  • 2 replies
  • 1 kudos
Latest Reply
bsr
New Contributor II
  • 1 kudos

Thanks for the quick response!

  • 1 kudos
1 More Replies
kALYAN5
by New Contributor
  • 112 Views
  • 4 replies
  • 2 kudos

Service Principal

Can two service principal have same name,but unique id's ?

  • 112 Views
  • 4 replies
  • 2 kudos
Latest Reply
emma_s
Databricks Employee
  • 2 kudos

Hi @kALYAN5,  Here is an explanation of why service principals share a name but IDs are unique: Names Are for Human Readability: Organizations use human-friendly names like "automation-batch-job" or "databricks-ci-cd" to make it easy for admins to re...

  • 2 kudos
3 More Replies
Ligaya
by New Contributor II
  • 57316 Views
  • 7 replies
  • 2 kudos

ValueError: not enough values to unpack (expected 2, got 1)

Code:Writer.jdbc_writer("Economy",economy,conf=CONF.MSSQL.to_dict(), modified_by=JOB_ID['Economy'])The problem arises when i try to run the code, in the specified databricks notebook, An error of "ValueError: not enough values to unpack (expected 2, ...

  • 57316 Views
  • 7 replies
  • 2 kudos
Latest Reply
mukul1409
New Contributor
  • 2 kudos

The error happens because the function expects the table name to include both schema and table separated by a dot. Inside the function it splits the table name using a dot and tries to assign two values. When you pass only Economy, the split returns ...

  • 2 kudos
6 More Replies
ripa1
by New Contributor
  • 153 Views
  • 4 replies
  • 4 kudos

Is anyone getting up and working ? Federating Snowflake-managed Iceberg tables into Azure Databricks

I'm federating Snowflake-managed Iceberg tables into Azure Databricks Unity Catalog to query the same data from both platforms without copying it. I am getting weird error message when query table from Databricks and i have tried to put all nicely in...

Data Engineering
azure
Iceberg
snowflake
unity-catalog
  • 153 Views
  • 4 replies
  • 4 kudos
Latest Reply
ripa1
New Contributor
  • 4 kudos

Thanks Hubert. I did check the Iceberg metadata location and Databricks can list the files, but the issue is that Snowflake’s Iceberg metadata.json contains paths like abfss://…@<acct>.blob.core.windows.net/..., and on UC Serverless Databricks then t...

  • 4 kudos
3 More Replies
Askenm
by New Contributor
  • 1125 Views
  • 6 replies
  • 4 kudos

Docker tab missing in create compute

I am running databricks premium and looking to create a compute running conda. It seems that the best way to do this is to boot the compute from a docker image. However, in the ```create_compute > advanced``` I cannot see the the docker option nor ca...

Data Engineering
conda
Docker
  • 1125 Views
  • 6 replies
  • 4 kudos
Latest Reply
mukul1409
New Contributor
  • 4 kudos

Hi @Askenm In Databricks Premium, the Docker option for custom images is not available on all compute types and is not controlled by user level permissions. Custom Docker images are only supported on Databricks clusters that use the legacy VM based c...

  • 4 kudos
5 More Replies
CHorton
by New Contributor
  • 146 Views
  • 3 replies
  • 2 kudos

Resolved! Calling a function with parameters via Spark ODBC driver

Hi All,I am having an issue with calling a Databricks SQL user defined function with parameters from my client application using the Spark ODBC driver.I have been able to execute a straight SQL statement using parameters e.g. SELECT * FROM Customer W...

  • 146 Views
  • 3 replies
  • 2 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 2 kudos

Hi @CHorton The Databricks SQL engine does not support positional (?) parameters inside SQL UDF calls.  When Spark SQL parses GetCustomerData(?), the parameter is unresolved at analysis time, so you get [UNBOUND_SQL_PARAMETER]. This is not an ODBC bu...

  • 2 kudos
2 More Replies
Harun
by Honored Contributor
  • 12658 Views
  • 2 replies
  • 3 kudos

How to change the number of executors instances in databricks

I know that Databricks runs one executor per worker node. Can i change the no.of.exectors by adding params (spark.executor.instances) in the cluster advance option? and also can i pass this parameter when i schedule a task, so that particular task wi...

  • 12658 Views
  • 2 replies
  • 3 kudos
Latest Reply
RandiMacGyver
New Contributor II
  • 3 kudos

In Databricks, the executor model is largely managed by the platform itself. On Databricks clusters, each worker node typically runs a single Spark executor, and this behavior is intentional.

  • 3 kudos
1 More Replies
liquibricks
by Contributor
  • 125 Views
  • 3 replies
  • 3 kudos

Resolved! Spark verison errors in "Build an ETL pipeline with Lakeflow Spark Declarative Pipelines"

I'm trying to define a job for a pipeline using the Asset Bundle Python SDK. I created the pipeline first (using the SDK) and i'm now trying to add the Job. The DAB validates and deploys successfully, but when I run the Job i get an error: UNAUTHORIZ...

  • 125 Views
  • 3 replies
  • 3 kudos
Latest Reply
mukul1409
New Contributor
  • 3 kudos

This happens because the job is not actually linked to the deployed pipeline and the pipeline id is None at runtime. When using Asset Bundles, the pipeline id is only resolved after deployment, so referencing my_pipeline.id in code does not work. Ins...

  • 3 kudos
2 More Replies
mukul1409
by New Contributor
  • 170 Views
  • 3 replies
  • 1 kudos

Resolved! Iceberg interoperability between Databricks and external catalogs

I would like to understand the current approach for Iceberg interoperability in Databricks. Databricks supports Iceberg using Unity Catalog, but many teams also use Iceberg tables managed outside Databricks. Are there recommended patterns today for s...

  • 170 Views
  • 3 replies
  • 1 kudos
Latest Reply
Yogesh_Verma_
Contributor II
  • 1 kudos

Great

  • 1 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels