cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

giacomosachs
by New Contributor
  • 918 Views
  • 0 replies
  • 0 kudos

apt-get install texlive error 404

Hi everybody, I'm trying to install on a cluster (Azure Databricks, DBR 7.3LTS) texlive-full using apt-get install texlive-full in an init script.The issue is that, most of the times (not always), I get a 404 when downloading packages from security.u...

  • 918 Views
  • 0 replies
  • 0 kudos
aimas
by New Contributor III
  • 4687 Views
  • 8 replies
  • 5 kudos

Resolved! error creating tables using UI

Hi, i try to create a table using UI, but i keep getting the error "error creating table <table name> create a cluster first" even when i have a cluster alread running. what is the problem?

  • 4687 Views
  • 8 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

Be sure that cluster is selected (arrow in database) and at least there is Default database.

  • 5 kudos
7 More Replies
BorislavBlagoev
by Valued Contributor III
  • 3577 Views
  • 5 replies
  • 4 kudos

Resolved! Databricks writeStream checkpoint

I'm trying to execute this writeStream data_frame.writeStream.format("delta") \ .option("checkpointLocation", checkpoint_path) \ .trigger(processingTime="1 second") \ .option("mergeSchema", "true") \ .o...

  • 3577 Views
  • 5 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

You can remove that folder so it will be recreated automatically.Additionally every new job run should have new (or just empty) checkpoint location.You can add in your code before running streaming:dbutils.fs.rm(checkpoint_path, True)Additionally you...

  • 4 kudos
4 More Replies
marchello
by New Contributor III
  • 4216 Views
  • 9 replies
  • 3 kudos

Resolved! error on connecting to Snowflake

Hi team, I'm getting weird error in one of my jobs when connecting to Snowflake. All my other jobs (I've got plenty) work fine. The current one also works fine when I have only one coding step (except installing needed libraries in my very first step...

  • 4216 Views
  • 9 replies
  • 3 kudos
Latest Reply
Dan_Z
Honored Contributor
  • 3 kudos

@marchello​ I suggest you contact Snowflake to move forward on this one.

  • 3 kudos
8 More Replies
Kotofosonline
by New Contributor III
  • 3623 Views
  • 3 replies
  • 3 kudos

Resolved! Query with distinct sort and alias produces error column not found

I’m trying to use sql query on azure-databricks with distinct sort and aliasesSELECT DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistIdThe problem is that if I add an alias then I can not use not aliased name in the order by ...

  • 3623 Views
  • 3 replies
  • 3 kudos
Latest Reply
Kotofosonline
New Contributor III
  • 3 kudos

The code from above is worked in both cases.

  • 3 kudos
2 More Replies
Nyarish
by Contributor
  • 5958 Views
  • 17 replies
  • 18 kudos

Resolved! How to connect Neo4j aura to a cluster

Please help resolve this error :org.neo4j.driver.exceptions.SecurityException: Failed to establish secured connection with the serverThis occurs when I want to establish a connection to neo4j aura to my cluster .Thank you.

  • 5958 Views
  • 17 replies
  • 18 kudos
Latest Reply
Anonymous
Not applicable
  • 18 kudos

@Werner Stinckens​ and @Nyaribo Maseru​ - You two are awesome! Thank you for working so hard together.

  • 18 kudos
16 More Replies
Kotofosonline
by New Contributor III
  • 817 Views
  • 1 replies
  • 0 kudos

Bug Report: Date type with year less than 1000 (years 1-999) in spark sql where [solved]

Hi, I noticed unexpected behavior for Date type. If year value is less then 1000 then filtering do not work. Steps:create table test (date Date); insert into test values ('0001-01-01'); select * from test where date = '0001-01-01' Returns 0 rows....

  • 817 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kotofosonline
New Contributor III
  • 0 kudos

Hm, seems to work now.

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 949 Views
  • 1 replies
  • 0 kudos
  • 949 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

The issue can happen if the Hive syntax for table creation is used instead of the Spark syntax. Read more here: https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-table-hiveformat.htmlThe issue mentioned in t...

  • 0 kudos
User16789201666
by Contributor II
  • 2416 Views
  • 1 replies
  • 0 kudos

what does this error in hyperopt mean, One error that users commonly encounter with Hyperopt is: There are no evaluation tasks, cannot return argmin of task losses.?

This means that no trial completed successfully. This almost always means that there is a bug in the objective function, and every invocation is resulting in an error. See the error output in the logs for details. In Databricks, the underlying error ...

  • 2416 Views
  • 1 replies
  • 0 kudos
Latest Reply
tj-cycyota
New Contributor III
  • 0 kudos

The fmin function should be of the form:def evaluate_hyperparams(params): """ This method will be passed to `hyperopt.fmin()`. It fits and evaluates the model using the given hyperparameters to get the validation loss. :param params: This d...

  • 0 kudos
McKayHarris
by New Contributor II
  • 17553 Views
  • 17 replies
  • 3 kudos

ExecutorLostFailure: Remote RPC Client Disassociated

This is an expensive and long-running job that gets about halfway done before failing. The stack trace is included below, but here is the salient part: Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 4881 in stage...

  • 17553 Views
  • 17 replies
  • 3 kudos
Latest Reply
RodrigoDe_Freit
New Contributor II
  • 3 kudos

According to https://docs.databricks.com/jobs.html#jar-job-tips:"Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed."That was my prob...

  • 3 kudos
16 More Replies
RahulMukherjee
by New Contributor
  • 20268 Views
  • 1 replies
  • 1 kudos

I am trying to load a delta table from a dataframe. But its giving me an error.

Code : from pyspark.sql.functions import *acDF = spark.read.format('csv').options(header='true', inferschema='true').load("/mnt/rahulmnt/Insurance_Info1.csv"); acDF.write.option("overwriteSchema", "true").format("delta").mode("overwrite").save("/delt...

  • 20268 Views
  • 1 replies
  • 1 kudos
Latest Reply
AbhaKhanna
New Contributor II
  • 1 kudos

1. using Spark SQL Context in python, scala notebooks : sql("SET spark.databricks.delta.formatCheck.enabled=false") 2. In SQL dbc notebooks: SET spark.databricks.delta.formatCheck.enabled=false

  • 1 kudos
JigaoLuo
by New Contributor
  • 4333 Views
  • 3 replies
  • 0 kudos

OPTIMIZE error: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'OPTIMIZE'

Hi everyone. I am trying to learn the keyword OPTIMIZE from this blog using scala: https://docs.databricks.com/delta/optimizations/optimization-examples.html#delta-lake-on-databricks-optimizations-scala-notebook. But my local spark seems not able t...

  • 4333 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi Jigao, OPTIMIZE isn't in the open source delta API, so won't run on your local Spark instance - https://docs.delta.io/latest/api/scala/io/delta/tables/index.html?search=optimize

  • 0 kudos
2 More Replies
Labels