cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

SaraCorralLou
by New Contributor III
  • 19660 Views
  • 3 replies
  • 2 kudos

Resolved! Differences between lit(None) or lit(None).cast('string')

I want to define a column with null values in my dataframe using pyspark. This column will later be used for other calculations.What is the difference between creating it in these two different ways?df.withColumn("New_Column", lit(None))df.withColumn...

  • 19660 Views
  • 3 replies
  • 2 kudos
Latest Reply
shadowinc
New Contributor III
  • 2 kudos

For me df.withColumn("New_Column", lit(None).cast(StringType())) this didn't work.I used this instead df.withColumn("New_Column", lit(null).cast(StringType))  

  • 2 kudos
2 More Replies
jeremy98
by Honored Contributor
  • 755 Views
  • 5 replies
  • 1 kudos

Set serveless compute environment to a task of a job

Hi Community,I want to set the environment of a task inside in a job using DABs, but I got this error.I could achieve my goal, if I set manually the task inside to be environment 2, because I need to use Python 3.11.How can I do it through DABs?

jeremy98_0-1738149373540.png
  • 755 Views
  • 5 replies
  • 1 kudos
Latest Reply
jeremy98
Honored Contributor
  • 1 kudos

Hi,Seems that this could be set for spark_python_task:resources: jobs: New_Job_Jan_29_2025_at_11_48_AM: name: New Job Jan 29, 2025 at 11:48 AM tasks: - task_key: test-py-version2 spark_python_task: pyth...

  • 1 kudos
4 More Replies
panganibana
by New Contributor II
  • 404 Views
  • 1 replies
  • 0 kudos

Resolved! Inconsistency on Dataframe queried from External Data Source

We have a Catalog pointing to an External Data Source (Google BigQuery).1) In a notebook, create a cell where it runs a query to populate a Dataframe. Display results.2) Create another cell below and display the same Dataframe.3) I get different resu...

Data Engineering
externaldata
  • 404 Views
  • 1 replies
  • 0 kudos
Latest Reply
crystal548
New Contributor III
  • 0 kudos

@panganibana wrote:We have a Catalog pointing to an External Data Source (Google BigQuery).1) In a notebook, create a cell where it runs a query to populate a Dataframe. Display results.2) Create another cell below and display the same Dataframe.3) I...

  • 0 kudos
markbaas
by New Contributor III
  • 9569 Views
  • 9 replies
  • 0 kudos

DBFS_DOWN

I have an Azure Databricks workspace with Unity Catalog setup, using VNet and private endpoints. Serverless works great; however, the regular clusters have problems showing large results:Failed to store the result. Try rerunning the command. Failed ...

  • 9569 Views
  • 9 replies
  • 0 kudos
Latest Reply
markbaas
New Contributor III
  • 0 kudos

The dbfs (dbstorage) resource in the managed azure resource group needs to have private endpoints to your virtual network. You can create those manually or through iac (bicep/terraform).

  • 0 kudos
8 More Replies
sdes10
by New Contributor II
  • 975 Views
  • 3 replies
  • 0 kudos

DLT apply_as_deletes not working on existing data with full refresh

I have an existing DLT pipeline that works on a modified medallion architecture. Data is sent from debezium to kafka and lands into a bronze table. From bronze table, it goes to a silver table where it is schematized. Finally to a good table where I ...

  • 975 Views
  • 3 replies
  • 0 kudos
Latest Reply
sdes10
New Contributor II
  • 0 kudos

@Sidhant07 how do i use skipChangeCommits? The idea is that i have a bronze, silver and gold table already built. Now i am enabling deletes on gold table in the apply_changes API. The silver table is added with operation column (values c,u,r,d). I di...

  • 0 kudos
2 More Replies
osas
by New Contributor II
  • 1926 Views
  • 6 replies
  • 3 kudos

databricks academy setup error -data engineering

am trying to run the set up notebook  "_COMMON" for my academy data engineering,am getting the below error: "Configuration dbacademy.deprecation.logging is not available."

  • 1926 Views
  • 6 replies
  • 3 kudos
Latest Reply
Luipiu
New Contributor III
  • 3 kudos

I reported here the solutionSetup learning environment failed: Configuration d... - Databricks Community - 82441

  • 3 kudos
5 More Replies
Abdurrahman
by New Contributor II
  • 674 Views
  • 3 replies
  • 0 kudos

How can I save a large spark table (~88.3Mn rows) to a delta lake table

I am trying to add a column to an existing delta lake table by adding a column and saving the table as a new table. The spark driver is getting overloaded. I have databricks notebook to work with (I have a decent compute as well g5.12xlarge) and have...

  • 674 Views
  • 3 replies
  • 0 kudos
Latest Reply
Amit_Dass
New Contributor II
  • 0 kudos

Hi @Abdurrahman, Addition to the Sidhant07, I assumed you are adding this new column and you may be using this column in query, Use the ZORDER & OPTIMIZE both. ZORDER (Highly Recommended): Even more important than just OPTIMIZE for adding columns eff...

  • 0 kudos
2 More Replies
Puspak
by New Contributor II
  • 360 Views
  • 1 replies
  • 0 kudos

DLT behaving differently when used with python syntax vs when used with sql syntax to read CDF

I was trying t read CDF data of a table as a DLT materialized view.It works fine with sql syntax reading all the columns of the source table along with the 3 CDF columns : _change_type,_commit_timestamp,_commit_version:@dlt.table()def change_table():...

  • 360 Views
  • 1 replies
  • 0 kudos
jeremy98
by Honored Contributor
  • 3166 Views
  • 4 replies
  • 0 kudos

Resolved! Concurrent Writes to the same DELTA TABLE

Hi Community,My team and I have written some workflows that write to the same table. One of my workflows performs a MERGE operation on the table, while another workflow performs an append. However, these operations can occur simultaneously, leading t...

  • 3166 Views
  • 4 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

To resolve the issue of concurrent write conflicts, specifically the `ConcurrentAppendException: [DELTA_CONCURRENT_APPEND]`, you can consider the following strategies: 1. **Isolation Levels**:- **WriteSerializable**: This is the default isolation lev...

  • 0 kudos
3 More Replies
Nes_Hdr
by New Contributor III
  • 3868 Views
  • 12 replies
  • 1 kudos

Limitations for Unity Catalog on single user access mode clusters

Hello! According to Databricks documentation on azure :"On Databricks Runtime 15.3 and below, fine-grained access control on single user compute is not supported. Specifically:You cannot access a table that has a row filter or column mask.You cannot ...

Nes_Hdr_0-1732872787713.png
  • 3868 Views
  • 12 replies
  • 1 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 1 kudos

@Nes_Hdr Single user compute uses fine-grained access control to access the tables with RLS/CLM enabled.There is no specific details about OPTIMIZE being supported in Single user mode. Under this doc limitations of FGAC mentions that  "No support for...

  • 1 kudos
11 More Replies
clentin
by Contributor
  • 3011 Views
  • 6 replies
  • 0 kudos

Import Py File

How do i import a .py file in Databricks environment?Any help will be appreciated. 

  • 3011 Views
  • 6 replies
  • 0 kudos
Latest Reply
fifata
New Contributor II
  • 0 kudos

@filipniziol @tejaswi24 Sorry to bring this up again, but I'm facing kind of similar problem.We have Databricks Repos that is a copy of a GitHub repository. The GitHub contains only .py files, but when copied to Databricks, they all get converted to ...

  • 0 kudos
5 More Replies
Splush_
by New Contributor III
  • 4314 Views
  • 5 replies
  • 6 kudos

Cannot cast Decimal to Double

Hey,Im trying to save the contents of a database table to a databrick delta table. The schema right from the database returns the number fields as decimal(38, 10). At least one of the values is too large for this data type. So I try to convert it usi...

  • 4314 Views
  • 5 replies
  • 6 kudos
Latest Reply
Splush_
New Contributor III
  • 6 kudos

Hey guys,Thank you a lot for your help. Since this is taking days alreary, I have asked the application owners of the database to delete these values for me. Apparently they are weights in gram for whatever products - so the problematic rows are heav...

  • 6 kudos
4 More Replies
susanne
by New Contributor III
  • 946 Views
  • 2 replies
  • 2 kudos

Resolved! Views in DLT with Private Preview feature Direct Publish

Hi everyone,I am building a dlt Pipeline and there I am using the Direct Publish feature which is as of now still under Private Preview.While it works well to create streaming tables and write them to another schema than the dlt  default schema, I ge...

  • 946 Views
  • 2 replies
  • 2 kudos
Latest Reply
susanne
New Contributor III
  • 2 kudos

Hi Sidhan,thanks a lot for your reply, it works very well to write materialized views to a different schema than the default schema.Thanks for your guidance!Best regardsSusanne

  • 2 kudos
1 More Replies
AlexVB
by New Contributor III
  • 2246 Views
  • 2 replies
  • 0 kudos

Catalogue global UDF's

The current UDF implementation stores UDFs in a catalogue.schema location. This requires reference/call to said udf location; example `select my_catalogue.my_schema.my_udf()`. Or have the sql execute from that schema.In Snowflake, UDFs are globally a...

  • 2246 Views
  • 2 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi @AlexVB , The current UDF implementation in Databricks requires referencing the UDF location with select my_catalogue.my_schema.my_udf() or executing SQL from that schema because Databricks organizes database objects using a three-tier hierarchy: ...

  • 0 kudos
1 More Replies
messiah
by New Contributor II
  • 1373 Views
  • 3 replies
  • 0 kudos

Unable to Read Data from S3 in Databricks (AWS Free Trial)

Hey Community, I recently signed up for a Databricks free trial on AWS and created a workspace using the quickstart method. After setting up my cluster and opening a notebook, I tried to read a Parquet file from S3 using: spark.read.parquet("s3://<bu...

  • 1373 Views
  • 3 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi @messiah , This occurs due to the lack of AWS credentials or IAM roles necessary to access the S3 bucket. Can you please check the AWS Credentials, IAM Roles and IAM Permissions: Make sure the IAM role associated with the instance profile has......

  • 0 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels