cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jwilliam
by Contributor
  • 12935 Views
  • 3 replies
  • 3 kudos

Resolved! Differences between Spark Cluster Manager and Databricks Cluster Manager?

I didn't found any documentation on Databricks Cluster Manager. Could anyone give me some resources on this topic?

  • 12935 Views
  • 3 replies
  • 3 kudos
Latest Reply
User16752242622
Databricks Employee
  • 3 kudos

Hi @John William​ Databricks clusters use Spark's Standalone cluster manager. Each Databricks cluster has its own standalone Master and Worker processes run inside of the LXC containers and share a lifecycle with the cluster. Each cluster has a singl...

  • 3 kudos
2 More Replies
weldermartins
by Honored Contributor
  • 4593 Views
  • 4 replies
  • 11 kudos

Resolved! Databricks pyspark - Find columns in xls file.

Hello everyone, every day I extract data into xls files but the column position changes every day. Is there any way to find these columns within the file?Here's a snippet of my code.df = spark.read.format("com.crealytics.spark.excel")\ .option("hea...

  • 4593 Views
  • 4 replies
  • 11 kudos
Latest Reply
Vidula
Honored Contributor
  • 11 kudos

Hi @welder martins​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 11 kudos
3 More Replies
kfoster
by Contributor
  • 3071 Views
  • 2 replies
  • 3 kudos

DLT Event Log

I am trying to utilize the Event Log DLT is keeping updated, I noticed some of the fields are consistently empty/null.In the Event Log, located ".../storage/system/events", I see the field "origin" and there are nested fields within which are empty/n...

  • 3071 Views
  • 2 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Hi @Kristian Foster​,The following docs will provide more details on the event log schema. Please refer to this link https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-event-log.html#monitor-pipelines-with-the-delta-live-tables...

  • 3 kudos
1 More Replies
LadislavSulak
by New Contributor II
  • 2584 Views
  • 2 replies
  • 2 kudos

What is the long-term plan for the official Databricks Docker Containers?

Hi, I'd like to ask you, how much resources do you plan to dedicate to the maintenance/development of the official Databricks Docker images, please? Do you have a view on the longer-term plan for these docker images? It seems to be maintained, but i...

  • 2584 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

Curious too,but I have a feeling it is not a priority.

  • 2 kudos
1 More Replies
parthsalvi
by Contributor
  • 8621 Views
  • 6 replies
  • 3 kudos

GRANT permission not working on Storage Credential using DBR 11.2 Shared mode

We were trying to update permissions of Storage credential using DBR 11.2 Shared mode but running into following issue Operation not allowed: GRANT(line 1, pos 0) PFA complete error in file.Is the above issue with cluster permission or changing permi...

image
  • 8621 Views
  • 6 replies
  • 3 kudos
Latest Reply
Debayan
Databricks Employee
  • 3 kudos

Hi @Parth Salvi​ , could you please paste the full error here, also, it is better to create a support case with Databricks with the whole error.

  • 3 kudos
5 More Replies
anibose
by New Contributor III
  • 8192 Views
  • 3 replies
  • 7 kudos

Resolved! Hands-On exercise material

Hi FriendsI am following Databricks Customer Academy training material, and created a Databricks service in Azure Trial account and was able to launched a single node cluster there. Could you please guide me on how to do all the hands-on?

  • 8192 Views
  • 3 replies
  • 7 kudos
Latest Reply
anibose
New Contributor III
  • 7 kudos

Thanks Doug, I was able to locate .dbc file, appreciate your response. Best RegardsAnindya

  • 7 kudos
2 More Replies
benydc
by New Contributor II
  • 1639 Views
  • 0 replies
  • 2 kudos

Is it possible to connect to IPython Kernel from local or client outside databricks cluster?

When looking in the standard output of a notebook run in a cluster, we get this message: "To connect another client to this kernel, use: /databricks/kernel-connections-dj8dj93d3d3.json"Is it possible to connect to the databricks ipython kernel and ma...

  • 1639 Views
  • 0 replies
  • 2 kudos
dataexplorer
by New Contributor III
  • 11450 Views
  • 6 replies
  • 5 kudos

Resolved! COPY INTO generating duplicate rows in Delta table

Hello Everyone,I'm trying to bulk load tables from a SQL server database into ADLS as parquet files and then loading these files into Delta tables (raw/bronze). I had done a one off history/base load but my subsequent incremental loads (which had a d...

  • 11450 Views
  • 6 replies
  • 5 kudos
Latest Reply
dataexplorer
New Contributor III
  • 5 kudos

thanks for the guidance!

  • 5 kudos
5 More Replies
User16826994223
by Databricks Employee
  • 12802 Views
  • 2 replies
  • 3 kudos

How to Prevent Duplicate Entries to enter to delta lake of Azure Storage

I Have a Dataframe stored in the format of delta into Adls, now when im trying to append new updated rows to that delta lake it should, Is there any way where i can delete the old existing record in delta and add the new updated Record.There is a uni...

  • 12802 Views
  • 2 replies
  • 3 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 3 kudos

You should use a MERGE command on this table to match records on the unique column. Delta Lake does not enforce primary keys so if you append only the duplicate ids will appear. Merge will provide you the functionality you desire. https://docs.databr...

  • 3 kudos
1 More Replies
vanessafvg
by New Contributor III
  • 7097 Views
  • 4 replies
  • 5 kudos
  • 7097 Views
  • 4 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

We're always here, even for newbie errors @Vanessa Van Gelder​ !Thanks for posting, and thanks @Hubert Dudek​ for always being so helpful.

  • 5 kudos
3 More Replies
db-avengers2rul
by Contributor II
  • 6247 Views
  • 1 replies
  • 2 kudos

Resolved! unable to replace null with 0 in dataframe using Pyspark databricks notebook (community edition)

Hello Experts,I am unable to replace nulls with 0 in a dataframe ,please refer to the screen shotfrom pyspark.sql.functions import col emp_csv_df = emp_csv_df.na.fill(0).withColumn("Total_Sal",col('sal')+col('comm')) display(emp_csv_df)erorr desired ...

unable to fill nulls with 0 in dataframe using PySpark in databricks Screenshot 2022-10-03 at 20.26.23
  • 6247 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

I bet that it is not real null but the string "null". Please check what is in the source and try luck with replacing it.

  • 2 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels