cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

dataminion01
by New Contributor II
  • 772 Views
  • 1 replies
  • 0 kudos

create streaming table using variable file path

is it possible to use a variable for the file path based on dates?files are stores in folders in this format yyyy/mm CREATE OR REFRESH STREAMING TABLE test  AS SELECT * FROM STREAM read_files(    "/Volumes/.....",    format => "parquet"  );

  • 772 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh-Pandey
Databricks MVP
  • 0 kudos

@dataminion01  Yes, it is possible to use a variable or dynamic file path based on dates in some data processing frameworks , but not directly in static SQL DDL statements like CREATE OR REFRESH STREAMING TABLE unless the environment you're working e...

  • 0 kudos
Akshay_Petkar
by Valued Contributor
  • 1796 Views
  • 1 replies
  • 1 kudos

How to Convert MySQL SELECT INTO OUTFILE and LOAD DATA INFILE to Databricks SQL?

Hi Community,I have some existing MySQL code :SELECT * FROM [table_name]INTO OUTFILE 'file_path'FIELDS TERMINATED BY '\t'OPTIONALLY ENCLOSED BY '"'LINES TERMINATED BY '\n';LOAD DATA INFILE 'file_path' REPLACE INTO TABLE [database].[table_name]FIELDS ...

  • 1796 Views
  • 1 replies
  • 1 kudos
Latest Reply
krishnakhadka28
New Contributor II
  • 1 kudos

Databricks SQL does not directly support MySQL’s SELECT INTO OUTFILE or LOAD DATA INFILE syntax. However, equivalent functionality can be achieved using Databricks features like saving to and reading from external locations like dbfs, s3 etc. I have ...

  • 1 kudos
erigaud
by Honored Contributor
  • 10754 Views
  • 3 replies
  • 2 kudos

Dynamically specify pivot column in SQL

Hello everyone !I am looking for a way to dynamically specify pivot columns in a SQL query, so it can be used in a view. However we don't want to hard code the values that need to become columns, and would rather extract it from another table.I've se...

  • 10754 Views
  • 3 replies
  • 2 kudos
Latest Reply
lprevost
Contributor III
  • 2 kudos

I agree - not clear how to do this?  I'm thinking of using the pandas api.

  • 2 kudos
2 More Replies
Gecofer
by Contributor II
  • 1385 Views
  • 2 replies
  • 1 kudos

Resolved! Inconsistent query results between dbt ETL run and SQL editor in Databricks

Hi everyone,I’m running into a strange issue in one of my ETL pipelines using dbt on Databricks, and I’d appreciate any insights or ideas. I have a query that is part of my dbt model. When I run the ETL process, the results from this query are incorr...

  • 1385 Views
  • 2 replies
  • 1 kudos
Latest Reply
Gecofer
Contributor II
  • 1 kudos

Hi @Isi Thanks so much for your insight!It turned out to be a combination of the two things you mentioned:There was a data masking policy applied to one of the columns, and while I had permissions to view the unmasked data, the service principal runn...

  • 1 kudos
1 More Replies
Debashisrajib
by New Contributor II
  • 911 Views
  • 1 replies
  • 0 kudos

Resolved! 65 technical questions.

I recently gave Databricks Data Engineer professional exam and I got really lengthy 65 technical questions. These questions are different from statistical questions.65 lengthy technical questions for 120 minutes is too much and this number is not men...

  • 911 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Community Manager
  • 0 kudos

Hello @Debashisrajib! I’m sorry to hear that.As outlined in the exam details on Webassessor, Databricks certification exams include 60 scored questions, along with additional unscored questions that appear like regular ones but do not affect your fin...

  • 0 kudos
Abhimanyu
by Databricks Partner
  • 1179 Views
  • 2 replies
  • 0 kudos

Why does df.dropna(how="all") fail when there is a . in a column name?

I'm working in a Databricks notebook and using Spark to query a Delta table. Here's the code I ran: df = spark.sql("select * from catalog.schema.table") df = df.dropna(how="all") display(df)This works fine unless the DataFrame has a column name that ...

  • 1179 Views
  • 2 replies
  • 0 kudos
Latest Reply
MujtabaNoori
New Contributor III
  • 0 kudos

Hi @Abhimanyu ,Yeah Actually in spark, '.'(dot) in columns is used for StructType by which the nested object can be accessed.But definitely You can rename thE columns dynamically whichever has '.' in it.Attached few screenshots for your reference tha...

  • 0 kudos
1 More Replies
chsoni12
by New Contributor II
  • 1275 Views
  • 1 replies
  • 0 kudos

Resolved! Limitation in Managed Volumes Recovery — UNDROP Should Be Supported

Hello Databricks Community,While reviewing the Databricks official documentation and performing a POC on managed volumes, I observed that volumes cannot be recovered using the UNDROP command if accidentally deleted — unlike managed tables.Technically...

Data Engineering
@databricks
  • 1275 Views
  • 1 replies
  • 0 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 0 kudos

Thank you for highlighting this issue!Databricks is already working on implementing this in the future.

  • 0 kudos
farazahmad372
by New Contributor II
  • 2719 Views
  • 3 replies
  • 0 kudos

TypeError: 'JavaPackage' object is not callable

from pyspark.sql import *if __name__ == "__main__": spark = SparkSession.builder \ .appName("hello Spark") \ .master("local[2]") \ .getOrCreate() data_list = [("Ravi",28), ("David",45), ("Abd...

  • 2719 Views
  • 3 replies
  • 0 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 0 kudos

@farazahmad372 May I know the DBR version and type of cluster?Are you using serverless?  

  • 0 kudos
2 More Replies
adurand-accure
by Databricks Partner
  • 3739 Views
  • 5 replies
  • 2 kudos

Serverless job error - spark.rpc.message.maxSize

Hello, I am facing this error when moving a Workflow to serverless modeERROR : SparkException: Job aborted due to stage failure: Serialized task 482:0 was 269355219 bytes, which exceeds max allowed: spark.rpc.message.maxSize (268435456 bytes). Consid...

  • 3739 Views
  • 5 replies
  • 2 kudos
Latest Reply
adurand-accure
Databricks Partner
  • 2 kudos

Hello PiotrMi,We found out that the problem was caused by a collect() and managed to fix it by changing some codeThanks for your quick repliesBest regards,Antoine 

  • 2 kudos
4 More Replies
SakthiGanesh
by New Contributor II
  • 2902 Views
  • 1 replies
  • 0 kudos

Unable to run python script from Azure DevOps git repo in Databricks Workflow job

Hi, I'm getting an issue while running a python script from Azure DevOps git repo in Databricks Workflow job task. The error stating internal commit path issue. But I referred the Source as Azure DevOps Services and I gave the branch name when settin...

  • 2902 Views
  • 1 replies
  • 0 kudos
Latest Reply
niteshm
New Contributor III
  • 0 kudos

@SakthiGanesh This is a known type of issue when running Databricks Workflows with Azure DevOps Git-backed repos.Did you try, Workspace Path Instead of Internal Git Path?If possible, use a .ipynb notebook-based task rather than a raw .py script, note...

  • 0 kudos
AgusBudianto
by Contributor
  • 2404 Views
  • 8 replies
  • 1 kudos

Resolved! Is it possible for Store Procedure to be in Unity Catalog Dataricks

I got information that the latest release of Unity Catalog already supports Store Procedure, but I have searched from several sources that Unity catalog does not support Store Procedure, according to the following post: https://community.databricks.c...

  • 2404 Views
  • 8 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Yes, you can attend virtually and it is free. What I don't know is what is free. I believe the keynotes are free and some sessions.  You should definitely register and check it out.

  • 1 kudos
7 More Replies
CJOkpala
by New Contributor II
  • 1635 Views
  • 4 replies
  • 1 kudos

Databricks DLT execution issue

I am having an issue when trying to do a full refresh of a DLT pipeline. I am getting the following error below: com.databricks.sql.managedcatalog.UnityCatalogServiceException: [RequestId=97d4fe52-b185-4757-b0b1-113cb96ae0bb ErrorClass=TABLE_ALREADY_...

CJOkpala_0-1748441402775.png CJOkpala_0-1748426421678.png
  • 1635 Views
  • 4 replies
  • 1 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 1 kudos

Are you facing the same issue, If you give a different name in the dlt decorator for the table? 

  • 1 kudos
3 More Replies
oneill
by New Contributor II
  • 4667 Views
  • 3 replies
  • 0 kudos

SQL - Dynamic overwrite + overwrite schema

Hello,Let say we have an empty table S that represents the schema we want to keepABCDEWe have another table T partionned by column A with a schema that depends on the file we have load into. Say :ABCF1b1c1f12b2c2f2Now to make T having the same schema...

  • 4667 Views
  • 3 replies
  • 0 kudos
Latest Reply
oneill
New Contributor II
  • 0 kudos

Hi, thanks for the reply. I've already looked at the documentation on this point, which actually states that dynamic overwrite doesn't work with schema overwrite, while the instructions described above seem to indicate the opposite.

  • 0 kudos
2 More Replies
andreapeterson
by Contributor
  • 619 Views
  • 1 replies
  • 0 kudos

Question about which tags appear in drop down

Hi there, I have a question regarding the appearance of tags in the drop down when adding a tag to a resource (catalog, schema, table, column - level). When does a tag get populated in a drop down? I noticed when I created a column level tag, and wan...

  • 619 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hello @andreapeterson Yes, your understanding of Databricks tag behavior is correct. In Databricks Unity Catalog, tags follow a hierarchical inheritance pattern:Downward inheritance: Tags applied at higher levels (catalog → schema → table) become ava...

  • 0 kudos
sparklez
by New Contributor III
  • 2061 Views
  • 3 replies
  • 2 kudos

Resolved! Creating Cluster configuration with library dependency using DABS

I am trying to create a cluster configuration using DABS and defining library dependencies.My yaml file looks like this: resources: clusters: project_Job_Cluster: cluster_name: "Project Cluster" spark_version: "16.3.x-cpu-ml-scala2.12" node_type_id: ...

  • 2061 Views
  • 3 replies
  • 2 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 2 kudos

Hi @sparklez You're encountering this issue because the libraries field is not valid in the cluster configuration.Libraries need to be specified at the job level, not the cluster level.Option 1: Job-Level Libraries (Recommended)Move the libraries sec...

  • 2 kudos
2 More Replies
Labels