cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jeremy98
by Honored Contributor
  • 1371 Views
  • 5 replies
  • 1 kudos

For each task field

Hi community,I was wondering after passing a list of dict through tasks using .taskValue.set() method, how to maintain the same data type through each task?Because seems, that when I use the for loop and getting by the parameters each element of the ...

  • 1371 Views
  • 5 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Yeah, to ensure that the data types are maintained, you can convert the values to the desired types after deserialization. This is necessary because JSON does not distinguish between integers and floats, and all numbers are deserialized as floatsThe ...

  • 1 kudos
4 More Replies
VJ3
by Contributor
  • 2701 Views
  • 3 replies
  • 0 kudos

Databricks Upload local files (Create/Modify table)

Hello Team,I believe Databricks come out recently feature of Create or modify a table using file upload which is less than 2 GB (file format CSV, TSV, or JSON, Avro, Parquet, or text files to create or overwrite a managed Delta Lake table) on Self Se...

  • 2701 Views
  • 3 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

For Sharing a CSV file containing PII data with another user who should not have access to PII data elements: You can use Databricks' Unity Catalog to manage and govern access to data. Unity Catalog allows you to define fine-grained access controls a...

  • 0 kudos
2 More Replies
alpar
by New Contributor II
  • 6517 Views
  • 4 replies
  • 4 kudos

Merge operation to delta table with new column starting with upper case seems to be not working

Hello,I have a simple spark dataframe saved to a delta table:data = [ (1, "John", "Doe"), (2, "Jane", "Smith"), (3, "Mike", "Johnson"), (4, "Emily", "Davis")]columns = ["Id", "First_name", "Last_name"]df = spark.createDataFrame(data, sche...

  • 6517 Views
  • 4 replies
  • 4 kudos
Latest Reply
hari-prasad
Valued Contributor II
  • 4 kudos

I assume you must be facing an error referred here on GitHub issues page. you can follow it, they make release fix for same.[BUG][Spark] issue when merge using autoMerge property · Issue #3336 · delta-io/delta · GitHub

  • 4 kudos
3 More Replies
REM1992
by New Contributor
  • 1205 Views
  • 1 replies
  • 0 kudos

Alert monitoring, not running in schedule

Hello, I think the alert that I set is not running on the schedule that I set , every day 9 am JST time. It shows up like it is running, with the symbol of running moving , but it says since 2025/1/7 while it should have been run at 2025/1/8 9:00 am ...

  • 1205 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

The day showing there is the day since the first time it was executed, if you check on the job runs do you see that there are jobs running every day on that period of time?

  • 0 kudos
mh7
by New Contributor II
  • 3156 Views
  • 3 replies
  • 0 kudos

spark throws error while using [NOT_IMPLEMENTED] rdd is not implemented.

i am running code in 15.4lts and it works fine in all purpose cluster.processed_counts = df.rdd.mapPartitions(process_partition).reduce(lambda x, y: x + y)when i run the same code using job cluster, it throw's below error. I verfied the cluster setti...

  • 3156 Views
  • 3 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Ok, but your all purpose cluster is set up with Single User mode which is indeed supported for the RDD, can you confirm your job cluster is also created by using Single user mode?

  • 0 kudos
2 More Replies
Databricks_-Dat
by New Contributor II
  • 9947 Views
  • 4 replies
  • 1 kudos

Databricks workflows, sample script/method to deploy jobs.json to other workspace

Could someone point me at right direction to deploy Jobs from one workspace to other workspace using josn file in Devops CI/CD pipeline? Thanks in advance.

  • 9947 Views
  • 4 replies
  • 1 kudos
Latest Reply
yuvapraveen_k
New Contributor III
  • 1 kudos

Your are welcome. There was a feature that databricks released to linked the workflow definition to the GIT automatically. Please refer the link below,https://www.databricks.com/blog/2022/06/21/build-reliable-production-data-and-ml-pipelines-with-git...

  • 1 kudos
3 More Replies
Deepak_Goldwyn
by New Contributor III
  • 9661 Views
  • 5 replies
  • 2 kudos

Resolved! Create Jobs and Pipelines in Workflows using API

I am trying to create Databricks Jobs and Delta live table(DLT) pipelines by using Databricks API.I would like to have the JSON code of Jobs and DLT in the repository(to configure the code as per environment) and execute the Databricks API by passing...

  • 9661 Views
  • 5 replies
  • 2 kudos
Latest Reply
Deepak_Goldwyn
New Contributor III
  • 2 kudos

Hi Jose,Yes it answered my question. I am indeed using JSON file to create Jobs and pipelinesThanks.

  • 2 kudos
4 More Replies
Ru
by Databricks Partner
  • 2422 Views
  • 6 replies
  • 2 kudos

Resolved! DLT Databricks Runtime version for the CURRENT channel doesn't match what's in release 2024.49

I'm expecting the Databricks Runtime for the DLT pipeline (CURRENT) to match the 2024.49 release notes. However, this is not the case. We are seeing CURRENT DLT pipelines still using Databricks Runtime 14. Our code depends on Databricks Runtime 15.4,...

  • 2422 Views
  • 6 replies
  • 2 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 2 kudos

Hi @Ru, Is it still not 15.4 DBR version? If not do you have a support plan to open a case?

  • 2 kudos
5 More Replies
Dilorom
by New Contributor
  • 7543 Views
  • 2 replies
  • 0 kudos

How to connect to Dynamics CRM server in Databricks.

Currently I have access to Dynamics CRM backend server via AAD, and I can query tables via XRM tool. I am trying to connect to Dynamics CRM backend server in Databricks, and I am not sure how the connection needs to be set up or if any other access n...

  • 7543 Views
  • 2 replies
  • 0 kudos
Latest Reply
arijitm
Databricks Employee
  • 0 kudos

Hi @Dilorom @sheridan06 I was wondering if you were able to successfully connect and have some guidance or best practices around this.

  • 0 kudos
1 More Replies
amarnathpal
by New Contributor III
  • 2244 Views
  • 4 replies
  • 0 kudos

Inquiry About Adding Filters on Notebook Dashboard

Subject: Inquiry About Adding Filters on Notebook DashboardHiI have recently created some visuals on Notebook and added them to the Notebook dashboard. However, I am unable to find a way to add filters to the dashboard. I have looked through the avai...

  • 2244 Views
  • 4 replies
  • 0 kudos
Latest Reply
hari-prasad
Valued Contributor II
  • 0 kudos

Then you can use alternate way, which is to use widgets in databricks which you can leverage to parametrize your notebook and queries. And any changes to parameter will auto trigger the code block where you read parameter in cell.Also, you can set ac...

  • 0 kudos
3 More Replies
Brianben
by New Contributor III
  • 5408 Views
  • 7 replies
  • 4 kudos

Resolved! Data archive with delta tables in UC enable environment

Hi all,I am new to Databricks and delta table. Recently I am researching on data archiving as we have data retention policy within the company.I have studied the documentation (Archival support in Databricks | Databricks on AWS) and I am exploring th...

  • 5408 Views
  • 7 replies
  • 4 kudos
Latest Reply
VZLA
Databricks Employee
  • 4 kudos

@Brianben Correct, apologies if my previous response is introducing further confusion. I had to go back to Azure's documentation to get more in context, initially I was considering timestamps inside the files, instead of the files metadata timestamp....

  • 4 kudos
6 More Replies
geckopher
by New Contributor
  • 6377 Views
  • 1 replies
  • 0 kudos

Best Practices for Managing Schema Changes and Metadata Lineage in Delta Tables

Hello Databricks Community,We are working with Airflow DAGs to trigger Databricks jobs that use Delta tables for performing upsert operations via a MERGE statement. The job was initially designed to perform a merge upsert with predefined Delta tables...

  • 6377 Views
  • 1 replies
  • 0 kudos
Latest Reply
hari-prasad
Valued Contributor II
  • 0 kudos

Hi @geckopher ,To address your concerns about managing schema evolution, tracking metadata lineage, and efficiently updating schemas in Delta tables, here are some best practices and strategies:Tracking Schema ChangesDelta Table History: Utilize Delt...

  • 0 kudos
ChristianRRL
by Honored Contributor
  • 4573 Views
  • 5 replies
  • 0 kudos

CREATE view USING json and *include* _metadata, _rescued_data

Title may be self-explanatory. Basically, I'm curious to ask if it's possible (and if so how) to add `_metadata` and `_rescued_data` fields to a view "using json".e.g. %sql CREATE OR REPLACE VIEW entity_view USING json OPTIONS (path="/.../.*json",mu...

ChristianRRL_0-1731949214474.png ChristianRRL_1-1731949348303.png
  • 4573 Views
  • 5 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

I am able to perform the below operation for a delta table. SELECT *,_metadata.file_name FROM anytable where condition. https://docs.databricks.com/en/ingestion/file-metadata-column.html You can use something like  df = spark.read \ .format("json")...

  • 0 kudos
4 More Replies
dsnde49
by New Contributor
  • 918 Views
  • 1 replies
  • 0 kudos

Unable to locate saved data

Hi,I have been trying to save some processed data using pandas from my databricks notebook.I have tried two versions, using csv and xlsx. The code for both of them runs without any error, but I'm unable to find the location of the saved data.for tabl...

  • 918 Views
  • 1 replies
  • 0 kudos
Latest Reply
hari-prasad
Valued Contributor II
  • 0 kudos

hi @dsnde49 ,When using Python instead of Spark in Databricks, the data you write will be stored in the drivers local storage.To avoid this, you can utilize the spark-excel jar from Crealytics (Maven Repository: com.crealytics » spark-excel). This to...

  • 0 kudos
Labels