cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by Not applicable
  • 5205 Views
  • 5 replies
  • 0 kudos

Resolved! How to use from standalone Spark Jar running from Intellij Idea the library installed in Databricks DBR?

Hello, I tried without success to use several libraries installed by use in the Databricks 9.1 cluster (not provived by default in DBR) from a standalone Spark application runs from Intellij Idea. For instance, for connecting to Redshift it works onl...

  • 5205 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Unfortunately, I did not find any solution. We have to package JAR and run it from Databricks job for test/debug. Not efficient but as no solution for remote debug has been found/provided.

  • 0 kudos
4 More Replies
Vibhor
by Contributor
  • 8439 Views
  • 5 replies
  • 13 kudos

Resolved! ADF Pipeline - Notebook Run time

In adf/pipeline can we specify to exit notebook and proceed to another notebook after some threshold value like 15 minutes. For example I have a pipeline with notebooks scheduled in sequence, want the pipeline to keep running that notebook for a cert...

  • 8439 Views
  • 5 replies
  • 13 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 13 kudos

Hi @Vibhor Sethi​ ,There is a global timeout in Azure Data Factory (ADF) that you can use to stop the pipeline. In addition, you can use the notebook timeout in case you want to control it from your Databricks job.

  • 13 kudos
4 More Replies
pantelis_mare
by Contributor III
  • 9429 Views
  • 2 replies
  • 1 kudos

Resolved! Dynamic Partition Pruning override

Hello everybody,Another strange issue I have and I would like to confirm me if this is a bug or expected behaviour:I'm joining a large dataset with a dimension table and as expected DPP is activated.I was trying to deactivate the feature as it change...

  • 9429 Views
  • 2 replies
  • 1 kudos
Latest Reply
pantelis_mare
Contributor III
  • 1 kudos

Hello @Kaniz Fatma​ Thank you for taking the time to answer.The issue in this case was that spark.databricks.optimizer.deltaTableFilesThreshold was activating DPP even if it was formally deactivated by setting all available "enabled" properties to f...

  • 1 kudos
1 More Replies
chrisreve89
by New Contributor II
  • 1864 Views
  • 1 replies
  • 2 kudos

Resolved! Databricks Spark Certification

Hello, I have been preparing for the for a while. I have seen here that the exam is mostly about remembering syntax details and some general understanding of the spark's internal architecture. I am VidMate just wondering if there are some exa Mobdro...

  • 1864 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 2 kudos

I recommend practice tests on Udemy. There is also available practice exam from data-bricks training.I haven't found others.

  • 2 kudos
Mahalakshmi
by New Contributor II
  • 1902 Views
  • 1 replies
  • 1 kudos

Resolved! Spark UI is not working for completed jobs

Spark UI is not working for completed jobs

  • 1902 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 1 kudos

Jobs executed from API jobs or Azure data factory are for example not available in spark management console.It can be also issue with community edition or spark settings.

  • 1 kudos
lprevost
by Contributor III
  • 3491 Views
  • 1 replies
  • 1 kudos

Resolved! Schema inferrence CSV picks up \r carriage returns

I'm using: frame = spark.read.csv(path=bucket+folder, inferSchema = True, header = True, multiLine=True ) to read in a series of CSV ...

  • 3491 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 1 kudos

Files saved in Windows operation system contain carriage return and line feed in every line.Please add following option it can help: .option("ignoreTrailingWhiteSpace", true)

  • 1 kudos
missyT
by New Contributor III
  • 3770 Views
  • 1 replies
  • 4 kudos

Resolved! How to distinguish arrow-key from escape character with getch in C?

I want to know weather an arrow key or the escape character has ben pressed. But in order to check which arrow key has been pressed I need to do multiple blocking getch-calls bc the arrow-key sequence is bigger than 1 char. This is a problem when I c...

  • 3770 Views
  • 1 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 4 kudos

getch () function returns two keycodes for arrow keys. Arrow put to getch '\033' and '[' and letter from A to D (up, down, right, left) so code will be something like:if (getch() == '\033') { getch(); // [ value switch(getch()) { ...

  • 4 kudos
sarvesh
by Contributor III
  • 5874 Views
  • 3 replies
  • 4 kudos

Resolved! Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot modify the value of a Spark config: spark.executor.memory;

I am trying to read a 16mb excel file and I was getting a gc overhead limit exceeded error to resolve that i tried to increase my executor memory with,spark.conf.set("spark.executor.memory", "8g")but i got the following stack :Using Spark's default l...

  • 5874 Views
  • 3 replies
  • 4 kudos
Latest Reply
Prabakar
Databricks Employee
  • 4 kudos

On the cluster configuration page, go to the advanced options. Click it to expand the field. There you will find the Spark tab and you can set the values there in the "Spark config".

  • 4 kudos
2 More Replies
sarvesh
by Contributor III
  • 9587 Views
  • 9 replies
  • 8 kudos

Resolved! Getting Null values at the place of data which was removed manually from excel file( solved )

I was reading an excel file with one column,country india India india India indiadataframe i got from this data : df.show()+-------+ |country| +-------+ | india | | India | | india | | India | | india | +-------+In the next step i removed last value ...

  • 9587 Views
  • 9 replies
  • 8 kudos
Latest Reply
Anonymous
Not applicable
  • 8 kudos

@sarvesh singh​ - Thank you for letting us know. Would you be happy to mark the best answer so others can find the solution easily?

  • 8 kudos
8 More Replies
mdavidallen
by New Contributor II
  • 5275 Views
  • 4 replies
  • 2 kudos

Resolved! How to transfer ownership of a Databricks cloud standard account?

My email address is the owner of an account in a particular standard plan tenancy. I would like to transfer ownership to another user so they can change billing details, and take admin access going forward. How can this be accomplished?

  • 5275 Views
  • 4 replies
  • 2 kudos
Latest Reply
Prabakar
Databricks Employee
  • 2 kudos

Hi @David Allen​ To transfer account owner rights, contact your Databricks account representative. This is applicable for both legacy and E2 accounts.https://docs.databricks.com/administration-guide/account-settings/account-console.html#access-the-ac...

  • 2 kudos
3 More Replies
Chris_Shehu
by Valued Contributor III
  • 3874 Views
  • 4 replies
  • 3 kudos
  • 3874 Views
  • 4 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

You may have noticed that the local SQL endpoint is not listed in the options for getting started with APEX. The local SQL endpoint is an extremely useful feature for getting ADO.NET web services started. I say check this uk-dissertation.com review f...

  • 3 kudos
3 More Replies
Confused
by New Contributor III
  • 6001 Views
  • 6 replies
  • 1 kudos

Hi Guys Is there any documentation on where the /databricks-datasets/ mount is actually served from?We are looking at locking down where our workspace...

Hi GuysIs there any documentation on where the /databricks-datasets/ mount is actually served from?We are looking at locking down where our workspace can reach out to via the internet and as it currently stands we are unable to reach this.I did look ...

  • 6001 Views
  • 6 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hello Mat, Thanks for letting us know. Would you be happy to mark your answer as best if that will solve the problem for others? That way, members will be able to find the solution more easily.

  • 1 kudos
5 More Replies
MadelynM
by Databricks Employee
  • 4025 Views
  • 2 replies
  • 1 kudos

2021-08-Best-Practices-for-Your-Data-Architecture-v3-OG-1200x628

Thanks to everyone who joined the Best Practices for Your Data Architecture session on Getting Workloads to Production using CI/CD. You can access the on-demand session recording here, and the code in the Databricks Labs CI/CD Templates Repo. Posted ...

  • 4025 Views
  • 2 replies
  • 1 kudos
Latest Reply
MadelynM
Databricks Employee
  • 1 kudos

Here's the embedded links list!Jobs scheduling and orchestrationBuilt-in job scheduling: https://docs.databricks.com/jobs.html#schedule-a-job Periodic scheduling of the jobsExecute notebook / jar / Python script / Spark-submitMultitask JobsExecute no...

  • 1 kudos
1 More Replies
raymund
by New Contributor III
  • 5606 Views
  • 7 replies
  • 5 kudos

Resolved! Why adding the package 'org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1' failed in runtime 9.1.x-scala2.12 but was successful using runtime 8.2.x-scala2.12 ?

Using Databricks spark submit job, setting new cluster1] "spark_version": "8.2.x-scala2.12" => OK, works fine2] "spark_version": "9.1.x-scala2.12" => FAIL, with errorsException in thread "main" java.lang.ExceptionInInitializerError at com.databricks...

  • 5606 Views
  • 7 replies
  • 5 kudos
Latest Reply
raymund
New Contributor III
  • 5 kudos

this has been resolved by adding the following spark_conf (not thru --conf) "spark.hadoop.fs.file.impl": "org.apache.hadoop.fs.LocalFileSystem"example:------"new_cluster": { "spark_version": "9.1.x-scala2.12", ... "spark_conf": { "spar...

  • 5 kudos
6 More Replies
antoooks
by New Contributor III
  • 4033 Views
  • 2 replies
  • 4 kudos

Resolved! display() function always return connection refused on tunneling despite successfully retrieving the schema

Hi everyone,I am using SSH tunnelling with SSHTunnelForwarder to reach a target AWS RDS PostgreSQL database. The connection got through, however when I tried to display the retrieved data frame it always throws "connection refused" error. Please see ...

image.png
  • 4033 Views
  • 2 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

hi @Kurnianto Trilaksono Sutjipto​ ,This seems like a connectivity issue with the url you are trying to connect to. It fails during the display() command because read is a lazy transformation and it will not be executed right away. On the other hand,...

  • 4 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels