cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Magesh_Kumar
by New Contributor II
  • 781 Views
  • 3 replies
  • 0 kudos

[CONFIG_NOT_AVAILABLE] Configuration spark.sql.legacy.timeParserPolicy is not available. SQLSTATE:

Running a DBT into the development environment, QA and PROD. Same config is working in QA and PROD but in dev facing this issue [CONFIG_NOT_AVAILABLE] Configuration spark.sql.legacy.timeParserPolicy is not available. SQLSTATE: and the compute type is...

  • 781 Views
  • 3 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 0 kudos

  SET legacy_time_parser_policy = legacy; https://docs.databricks.com/aws/en/sql/language-manual/parameters/legacy_time_parser_policy  https://docs.getdbt.com/docs/core/connect-data-platform/databricks-setup your_profile_name: target: dev outputs...

  • 0 kudos
2 More Replies
GJ2
by New Contributor II
  • 17612 Views
  • 14 replies
  • 2 kudos

Install the ODBC Driver 17 for SQL Server

Hi,I am not a Data Engineer, I want to connect to ssas. It looks like it can be connected through pyodbc. however looks like  I need to install "ODBC Driver 17 for SQL Server" using the following command. How do i install the driver on the cluster an...

GJ2_1-1739798450883.png
  • 17612 Views
  • 14 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 2 kudos

As SQL Server is included in the Lakehouse federation driver, it is built in databricks. Install only in case you need a different version - the built-in one is not working

  • 2 kudos
13 More Replies
iFoxz17
by Databricks Partner
  • 436 Views
  • 3 replies
  • 1 kudos

Databricks academy error setup - Free Edition with Serverless Compute

Databricks is passing from the Community Edition to the Free Edition, which I am currently using.When executing the Includes/Classroom-setup notebooks the following exception is raised: [CONFIG_NOT_AVAILABLE] Configuration dbacademy.deprecation.loggi...

  • 436 Views
  • 3 replies
  • 1 kudos
Latest Reply
iFoxz17
Databricks Partner
  • 1 kudos

@ManojkMohan as mentioned in the first post I already used dict(spark.conf.getAll()).get(key, default) where possible.However the problem stands when importing modules, like:- from dbacademy import dbgems- from dbacademy.dbhelper import DBAcademyHelp...

  • 1 kudos
2 More Replies
JUMAN4422
by Databricks Partner
  • 3301 Views
  • 8 replies
  • 0 kudos

DELTA LIVE TABLE -Parallel processing

how can we process multiple tables within a delta live table pipeline parallelly as table names as parameters.

  • 3301 Views
  • 8 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 0 kudos

Maybe better approach will be to use dlt meta https://databrickslabs.github.io/dlt-meta/index.html

  • 0 kudos
7 More Replies
new_user_ptl
by New Contributor II
  • 1487 Views
  • 5 replies
  • 1 kudos

Resolved! Create external table using iceberg not working

I'm trying to create a external iceberg table on top of files I have in s3 created using a spark job that runs outside of databricks using Hadoop catalog.Create External table table_name using iceberg location 's3://myicebergbucket/iceberg_table_path...

  • 1487 Views
  • 5 replies
  • 1 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 1 kudos

Databricks doesn’t support the LOCATION clause for Iceberg in Unity Catalog; Iceberg tables must be created as managed tables without a path, or read via a foreign catalog. Trying USING iceberg LOCATION 's3://...' triggers “Managed Iceberg tables do ...

  • 1 kudos
4 More Replies
elgeo
by Valued Contributor II
  • 9545 Views
  • 9 replies
  • 10 kudos

Clean up _delta_log files

Hello experts. We are trying to clarify how to clean up the large amount of files that are being accumulated in the _delta_log folder (json, crc and checkpoint files). We went through the related posts in the forum and followed the below:SET spark.da...

  • 9545 Views
  • 9 replies
  • 10 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 10 kudos

Delta Lake does automatically clean up _delta_log files (JSON, CHECKPOINT, CRC), but only when two conditions are met: The retention durations are respectedBy default: delta.logRetentionDuration = 30 days delta.deletedFileRetentionDuration = 7 day...

  • 10 kudos
8 More Replies
osas
by New Contributor II
  • 3654 Views
  • 7 replies
  • 3 kudos

databricks academy setup error -data engineering

am trying to run the set up notebook  "_COMMON" for my academy data engineering,am getting the below error: "Configuration dbacademy.deprecation.logging is not available."

  • 3654 Views
  • 7 replies
  • 3 kudos
Latest Reply
iFoxz17
Databricks Partner
  • 3 kudos

Databricks is passing from the Community Edition to the Free Edition, which I am currently using.Inspecting the code, the problem seems to be related to the spark.conf.get() method, which is declared as follows in the documentation:------------------...

  • 3 kudos
6 More Replies
Hatter1337
by New Contributor III
  • 5429 Views
  • 6 replies
  • 4 kudos

Resolved! Write Spark DataFrame into OpenSearch

Hi Databricks Community,I'm trying to read an index from OpenSearch or write a DataFrame into an OpenSearch index using the native Spark OpenSearch connector:host = dbutils.secrets.get(scope="opensearch", key="host") port = dbutils.secrets.get(scope=...

  • 5429 Views
  • 6 replies
  • 4 kudos
Latest Reply
SayedAbdallah
New Contributor II
  • 4 kudos

Hi,I am getting the same error and i also was able to connect using opensearch-py.I also founded in this doc https://github.com/opensearch-project/opensearch-hadoop/blob/main/README.md#requirements that i need to have some jars i already add it witho...

  • 4 kudos
5 More Replies
Oumeima
by New Contributor III
  • 2916 Views
  • 5 replies
  • 3 kudos

Resolved! I can't use my own .whl package in Databricks app with databricks asset bundles

I am building a databricks app using databricks asset bundles. I need to use a helpers packages that i built as an artifact and using in other resources outside the app. The only way to use it is to have the built package inside the app source code f...

  • 2916 Views
  • 5 replies
  • 3 kudos
Latest Reply
nk-five1
New Contributor III
  • 3 kudos

Thank you very much. I hope translate your tips to my case which does not use asset bundles.

  • 3 kudos
4 More Replies
Rohit_hk
by New Contributor
  • 422 Views
  • 2 replies
  • 1 kudos

DLT Autoloader schemaHints from JSON file instead of inline list?

Hi @Witold, @Hubert-Dudek,I’m using a DLT pipeline to ingest realtime data from Parquet files in S3 into Delta tables using Auto Loader. The pipeline is written in SQL notebooks.Problem:Sometimes decimal columns in the Parquet files get inferred as I...

  • 422 Views
  • 2 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 1 kudos

- dlt use automatically cloudFiles.schemaLocation So the schema is stored automatically, and in many cases, it will be stable, but it does not - keep using cloudFiles.schemaHints, but just load JSON to a variable and pass that variable (I guess you w...

  • 1 kudos
1 More Replies
cdn_yyz_yul
by Contributor II
  • 489 Views
  • 3 replies
  • 2 kudos

Resolved! how to avoid extra column after retry upon UnknownFieldException

 With autoloader.option("cloudFiles.schemaEvolutionMode", "addNewColumns") I have done retry after getting org.apache.spark.sql.catalyst.util.UnknownFieldException: [UNKNOWN_FIELD_EXCEPTION.NEW_FIELDS_IN_FILE] Encountered unknown fields during par...

  • 489 Views
  • 3 replies
  • 2 kudos
Latest Reply
cdn_yyz_yul
Contributor II
  • 2 kudos

Hi @Hubert-Dudek the input is csv. readStream reads csv with .option("cloudFiles.inferColumnTypes", "true"). then, df.toDF( ) is called to rename the column name. The original csv header has space, that's why error message has "test 1_2 Prime". The r...

  • 2 kudos
2 More Replies
Garrus990
by New Contributor II
  • 2655 Views
  • 5 replies
  • 2 kudos

How to run a python task that uses click for CLI operations

Hey,in my application I am using click to facilitate CLI operations. It works locally, in notebooks, when scripts are run locally, but it fails in Databricks. I defined a task that, as an entrypoint, accepts the file where the click-decorated functio...

  • 2655 Views
  • 5 replies
  • 2 kudos
Latest Reply
Garrus990
New Contributor II
  • 2 kudos

Hey guys,I think I managed to find a workaround. I will leave it here for everyone that is seeking the same answers, including future me.What I did is basically this piece of code:def main(): try: assign_variants(standalone_mode=False) ...

  • 2 kudos
4 More Replies
tnyein_99
by New Contributor II
  • 648 Views
  • 4 replies
  • 6 kudos

Resolved! ONLY PNG format is available for databricks dashboard table download

I couldn't download the data straight from databricks dashboards in csv format starting from last night (night of Dec 1st, 2025). The only format that is available right now is PNG. I've tried downloading the data on multiple browsers but only the PN...

Screenshot 2025-12-02 at 8.47.55 AM.png
  • 648 Views
  • 4 replies
  • 6 kudos
Latest Reply
random_user77
New Contributor II
  • 6 kudos

Hey @Advika  you saya quick workaround is to right-click and download the CSV from there.What do you mean? Where? I am right clicking all over my dashboard widget and don't see CSV download option. Can you be more specific?

  • 6 kudos
3 More Replies
a_user12
by Contributor
  • 851 Views
  • 7 replies
  • 3 kudos

Resolved! Declarative Pipelines: set Merge Schema to False

Dear Team!I want to prevent at a certain table that the schema is automatically updated. With plain strucutred streaming I can do the following:silver_df.writeStream \ .format("delta") \ .option("mergeSchema", "false") \ .option("checkpoi...

  • 851 Views
  • 7 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 3 kudos

It is automatic in DLT. If there are significant schema changes, you need to full refresh. Maybe consider storing everything (the whole JSON) in a single VARIANT column and unpacking only what is necessary later - this way you will have it under cont...

  • 3 kudos
6 More Replies
dpc
by Contributor III
  • 576 Views
  • 3 replies
  • 2 kudos

Resolved! API Call to return more than 100 jobs

Hello I have around 150 jobs and this is likely to increase.I use this call to get all the jobs and write them into a list called json.My logic here is to match a name to a job id and run the job using the job id. response = requests.get(hostHTTPS, j...

  • 576 Views
  • 3 replies
  • 2 kudos
Latest Reply
dpc
Contributor III
  • 2 kudos

Looping using next_page_token works well, thanks @bianca_unifeye 

  • 2 kudos
2 More Replies
Labels