cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

pramalin
by New Contributor
  • 3573 Views
  • 3 replies
  • 2 kudos
  • 3573 Views
  • 3 replies
  • 2 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 2 kudos

@prudhvi ramalingam​ - Please refer to the below example code.import org.apache.spark.sql.functions.expr val person = Seq( (0, "Bill Chambers", 0, Seq(100)), (1, "Matei Zaharia", 1, Seq(500, 250, 100)), (2, "Michael Armbrust", 1, Seq(250,...

  • 2 kudos
2 More Replies
KVNARK
by Honored Contributor II
  • 1824 Views
  • 2 replies
  • 2 kudos

Encrypt in azure SQL DB and decrypt in Power BI

If some columns are encrypted in Azure SQL DB.I need to decrypt them in Power BI.Are there any pre-requisites to implement this.

  • 1824 Views
  • 2 replies
  • 2 kudos
Latest Reply
Nhan_Nguyen
Valued Contributor
  • 2 kudos

Could you describe more detail your case?

  • 2 kudos
1 More Replies
LidorAbo
by New Contributor II
  • 2489 Views
  • 1 replies
  • 0 kudos

Databricks can write to s3 bucket through panda but not from spark

Hey,I have problem with access to s3 bucket using cross account bucket permission, i got the following error:Steps to repreduce:Checking the role that assoicated to ec2 instance:{ "Version": "2012-10-17", "Statement": [ { ...

Access_Denied_S3_Bucket
  • 2489 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nhan_Nguyen
Valued Contributor
  • 0 kudos

Could you try to map s3 bucket location with Databricks File System then write output to this new location instead of directly write to S3 location.

  • 0 kudos
sedat
by New Contributor II
  • 2331 Views
  • 2 replies
  • 2 kudos

Hi, is there any document for databricks about performance tuning and reporting?

Hi, I need to analyse performance issues for databricks. Is there any document or monitoring tool to run to see what is happening in databricks? I am very new in databricks. Best

  • 2331 Views
  • 2 replies
  • 2 kudos
Latest Reply
Nhan_Nguyen
Valued Contributor
  • 2 kudos

You could try some courses in "https://customer-academy.databricks.com/"What's New In Apache Spark 3.0Optimizing Apache Spark on Databricks

  • 2 kudos
1 More Replies
Callum
by New Contributor II
  • 13408 Views
  • 3 replies
  • 2 kudos

Pyspark Pandas column or index name appears to persist after being dropped or removed.

So, I have this code for merging dataframes with pyspark pandas. And I want the index of the left dataframe to persist throughout the joins. So following suggestions from others wanting to keep the index after merging, I set the index to a column bef...

  • 13408 Views
  • 3 replies
  • 2 kudos
Latest Reply
Serlal
New Contributor III
  • 2 kudos

Hi!I tried debugging your code and I think that the error you get is simply because the column exists in two instances of your dataframe within your loop.I tried adding some extra debug lines in your merge_dataframes function:and after executing that...

  • 2 kudos
2 More Replies
sonalitotade
by New Contributor II
  • 2143 Views
  • 2 replies
  • 0 kudos

Capture events such as Start, Stop and Terminate of cluster.

Hi,I am using databricks with AWS.I need to capture events such as Start, Stop and Terminate of cluster and perform some other action based on the events that happened on the cluster.Is there a way I can achieve this in databricks?

  • 2143 Views
  • 2 replies
  • 0 kudos
Latest Reply
sonalitotade
New Contributor II
  • 0 kudos

Hi Daniel, thanks for the responseI would like to know if we can capture the event logs as shown in the image below when an event occurs on the cluster.

  • 0 kudos
1 More Replies
KVNARK
by Honored Contributor II
  • 16430 Views
  • 2 replies
  • 5 kudos

Resolved! pyspark optimizations and best practices

What and all we can implement maximum to attain the best optimization and which are all the best practices using PySpark end to end.

  • 16430 Views
  • 2 replies
  • 5 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 5 kudos

@KVNARK .​  This video is cool.https://www.youtube.com/watch?v=daXEp4HmS-E

  • 5 kudos
1 More Replies
Gandham
by New Contributor II
  • 4429 Views
  • 3 replies
  • 2 kudos

Maven Libraries are failing on restarting the cluster.

I have installed "com.databricks:spark-xml_2.12:0.16.0" maven libraries to a cluster. The installation was successful. But when I restart the cluster, even this successful installation becomes failed. This happens with all Maven Libraries. Here is th...

  • 4429 Views
  • 3 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

it is intermittent issue, we also faced this issue earlier ,try to upgrade DBR version

  • 2 kudos
2 More Replies
Therdpong
by New Contributor III
  • 2171 Views
  • 2 replies
  • 0 kudos

how to check what jobs cluster to have expanddisk.

We would like to know how to check what jobs cluster to have to expand disk.

  • 2171 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

You can check in the cluster's event logs. You can type in the search box, "disk" and you will see all the events there.

  • 0 kudos
1 More Replies
SS2
by Valued Contributor
  • 2173 Views
  • 2 replies
  • 1 kudos

Spark out of memory error.

Sometimes in Databricks you can see the out of memory error then in that case you can change the cluster size. As per requirement to resolve the issue.

  • 2173 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @S S​,Could you provide more details on your issue? for example, error stack traces, code snippet, etc. We will be able to help you if you share more details

  • 1 kudos
1 More Replies
rocky5
by New Contributor III
  • 2647 Views
  • 1 replies
  • 2 kudos

Cannot create delta live table

I created a simple definition of delta live table smth like:CREATE OR REFRESH STREAMING LIVE TABLE customers_silverAS SELECT * FROM STREAM(LIVE.customers_bronze)But I am getting an error when running a pipeline:com.databricks.sql.transaction.tahoe.De...

  • 2647 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

You might need to execute the following on your tables to avoid this error message ALTER TABLE <table_name> SET TBLPROPERTIES ( 'delta.minReaderVersion' = '2', 'delta.minWriterVersion' = '5', 'delta.columnMapping.mode' = 'name' )Docs https...

  • 2 kudos
BL
by New Contributor III
  • 5344 Views
  • 4 replies
  • 3 kudos

Error reading in Parquet file

I am trying to read a .parqest file from a ADLS gen2 location in azure databricks . But facing the below error:spark.read.parquet("abfss://............/..._2023-01-14T08:01:29.8549884Z.parquet")org.apache.spark.SparkException: Job aborted due to stag...

  • 5344 Views
  • 4 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Can you access the executor logs? When you cluster is up and running, you can access the executor's logs. For example, the error shows:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent ...

  • 3 kudos
3 More Replies
jagac
by New Contributor
  • 1451 Views
  • 2 replies
  • 0 kudos

Cannot log into Community Edition.

Hi there, I recently made an account on the Community Edition and cannot seem to log in. Error says the following:Invalid email address or passwordNote: Emails/usernames are case-sensitiveSo I tried to reset my password and still could not log in. I ...

  • 1451 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @jagac petrovic​ Thank you for reaching out, and we’re sorry to hear about this log-in issue! We have this Community Edition login troubleshooting post on Community. Please take a look, and follow the troubleshooting steps. If the steps do not res...

  • 0 kudos
1 More Replies
User16835756816
by Valued Contributor
  • 3925 Views
  • 3 replies
  • 1 kudos

How can I optimize my data pipeline?

Delta Lake provides optimizations that can help you accelerate your data lake operations. Here’s how you can improve query speed by optimizing the layout of data in storage.There are two ways you can optimize your data pipeline: 1) Notebook Optimizat...

  • 3925 Views
  • 3 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

some tips from me:Look for data skews; some partitions can be huge, some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit (get getNumPartitions()), especially SQL can divide it unequally to parti...

  • 1 kudos
2 More Replies
Arun_Kumar
by New Contributor II
  • 4346 Views
  • 4 replies
  • 1 kudos

List of Databricks tables created by a user

Hi team,​Could you please confirm below clarifications1. How can we get the list of tables created by a user in particular workspace?2. How can we get the list of tables created by user from multiple workspaces? ( Same user has access to 10 workspace...

  • 4346 Views
  • 4 replies
  • 1 kudos
Latest Reply
ashish1
New Contributor III
  • 1 kudos

Hi Arun, hope your query is answered. Please select the best answer or let us know if any further questions.

  • 1 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels