cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Constantine
by Contributor III
  • 1993 Views
  • 2 replies
  • 3 kudos

Resolved! Can't view files of different types in databricks

I am reading a Kafka input using Spark Streaming on databricks and trying to deserialize it. The input is in the form of thrift. I want to create a file of .thrift format to provide schema but am unable to do it. Even if I create the file locally and...

  • 1993 Views
  • 2 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Hi @John Constantine​ ,Just checking if you still need help or not anymore. If you do, please share as much details and logs as possible, so we would be able to help better.

  • 3 kudos
1 More Replies
KKo
by Contributor III
  • 2269 Views
  • 3 replies
  • 7 kudos

Resolved! ETL in Databricks

I use Azure Databricks for ETL. I read/write data from and to raw/stage/curate folders. I write dataframe to a path (eg: /mnt/datalake/curated/....). In final step I read data from the path, convert that to dataframe and write it to the Azure SQL DB/...

  • 2269 Views
  • 3 replies
  • 7 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 7 kudos

Hi @Kris Koirala​ ,Just checking if you still have any follow-up questions? please let us know.

  • 7 kudos
2 More Replies
Jreco
by Contributor
  • 5054 Views
  • 4 replies
  • 1 kudos

Resolved! Method iterableAsScalaIterable does not exist Pydeequ

Hello,I'm using Databricks and pydeequ to build a QA step in structured streaming.One of the Analyzers that I need to use is the Uniqueness.If I try to add another one like Completeness, work properly, but if y add the Uniqueness I get an error:py4j....

155309688-d4d03acc-1012-42ec-8d40-9cbf4b8d12c3.png 155311239-2259d89e-e2b2-45c1-b57c-1a841ebe189e 155309988-fd6ec25f-53ec-4f7a-a37a-e3596cefe10e
  • 5054 Views
  • 4 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

I think it is because you did not attach the libraries to the cluster.When you work with a notebook, the sparksession is already created.To add libraries, you should install them on the cluster (in the compute tab) using f.e. pypi/maven etc.

  • 1 kudos
3 More Replies
wgsing
by New Contributor
  • 3836 Views
  • 3 replies
  • 0 kudos

Resolved! Databricks Cluster create fail

i facing the problem here in creating cluster in databricks. Error as below :MessageCluster terminated.Reason:Unexpected launch failureAn unexpected error was encountered while setting up the cluster. Please retry and contact Databricks if the proble...

  • 3836 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Giin Sing Wong​ ,Just a friendly follow-up. Is this issue still happening or you were able to resolve it by increasing your account's quota? Please let us know.

  • 0 kudos
2 More Replies
knight007
by New Contributor II
  • 4700 Views
  • 7 replies
  • 5 kudos

Containerized Databricks/Spark database

Hello. I'm fairly new to Databricks and Spark.I have a requirement to connect to Databricks using JDBC and that works perfectly using the driver I downloaded from the Databricks website ("com.simba.spark.jdbc.Driver")What I would like to do now is ha...

  • 4700 Views
  • 7 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

@Gurps Bassi​ , "running instance of a database in docker" - that is hive metastore, so it just mapping to data which is usually physically on the data lake. Databricks are so much on the cloud that setting metastore locally doesn't make sense. Inste...

  • 5 kudos
6 More Replies
Constantine
by Contributor III
  • 3811 Views
  • 1 replies
  • 5 kudos

Resolved! Unable to create a partitioned table on s3 data

I write data to s3 like data.write.format("delta").mode("append").option("mergeSchema", "true").save(s3_location)and create a partitioned table likeCREATE TABLE IF NOT EXISTS demo_table USING DELTA PARTITIONED BY (column_a) LOCATION {s3_location};whi...

  • 3811 Views
  • 1 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

@John Constantine​ , In CREATE TABLE, you need to specify fields:CREATE TABLE IF NOT EXISTS demo_table (column_a STRING, number INT) USING DELTA PARTITIONED BY (column_a) LOCATION {s3_location};and when you save data before creating ...

  • 5 kudos
Constantine
by Contributor III
  • 2485 Views
  • 1 replies
  • 5 kudos

Resolved! Delta Table created on s3 has all null values

I have data in a Spark Dataframe and I write it to an s3 location. It has some complex datatypes like structs etc. When I create the table on top on the s3 location by using CREATE TABLE IF NOT EXISTS table_name USING DELTA LOCATION 's3://.../...';Th...

  • 2485 Views
  • 1 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

@John Constantine​ ,Try to load it as DataFrame (spark.read.delta(path)) and validate what is loading,It could be easier to mount the S3 location as a folder to ensure that all data is there (dbutils or %fs to check) and that the connection is workin...

  • 5 kudos
keunsoop
by New Contributor
  • 62026 Views
  • 7 replies
  • 2 kudos

Resolved! Run stored bash in Databricks with %sh

Hi, I made bash file in databricks and I can see that the file is stored as the following picture. I was supposed to run this bash file through %sh cell, but as you see the following picture, I could not find bash file, which I could find through d...

0693f000007OoILAA0 0693f000007OoIMAA0
  • 62026 Views
  • 7 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Hi @keunsoop​ ,Are you able to run your code using an init script? I would like to share some docs in case you might have some questions https://docs.databricks.com/clusters/init-scripts.html

  • 2 kudos
6 More Replies
Transcarent
by New Contributor II
  • 6548 Views
  • 6 replies
  • 0 kudos

Error: ConnectionError: HTTPSConnectionPool(host='https', port=443): Max retries exceeded with url: /api/2.0/workspace/list?path=%2F (Caused b...

Error: ConnectionError: HTTPSConnectionPool(host='https', port=443): Max retries exceeded with url: /api/2.0/workspace/list?path=%2F (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

  • 6548 Views
  • 6 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

hi @prakash reddy​ ,Is this an intermittent error or are you able to repro it? please let us know.

  • 0 kudos
5 More Replies
s_plank
by New Contributor III
  • 4425 Views
  • 6 replies
  • 5 kudos

Resolved! Databricks-Connect shows different partitions than Databricks for the same delta table

Hello,here is a small code-snippet:from pyspark.sql import SparkSession spark = SparkSession.builder.appName('example_app').getOrCreate()   spark.sql('SHOW PARTITIONS database.table').show() The output inside the Databricks-Notebook:+-------------+--...

  • 4425 Views
  • 6 replies
  • 5 kudos
Latest Reply
s_plank
New Contributor III
  • 5 kudos

Hi @Jose Gonzalez​ ,yes the SQL-Connector works fine. Thank you!

  • 5 kudos
5 More Replies
JEAG
by New Contributor III
  • 40128 Views
  • 12 replies
  • 4 kudos

Error writing parquet files

Hi, we are having this chain of errors every day in different files and processes:An error occurred while calling o11255.parquet.: org.apache.spark.SparkException: Job aborted.Caused by: org.apache.spark.SparkException: Job aborted due to stage failu...

  • 40128 Views
  • 12 replies
  • 4 kudos
Latest Reply
databircks
New Contributor II
  • 4 kudos

Hi all,I am also looking for a resolution of the same error. We are using DBR "9.1 LTS ML (includes Apache Spark 3.1.2, Scala 2.12)" and getting this error. We are reading and writing data from the same path but there are partitions inside the folder...

  • 4 kudos
11 More Replies
Krishscientist
by New Contributor III
  • 1372 Views
  • 1 replies
  • 0 kudos

Resolved! AutoML : data set for problem type "Classification"

HI,I am working on AutoML Experiment. Could you plz help me with data set for problem type "Classification"Regards.

  • 1372 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

There are a lot of datasets available in /databricks-datasets/ that you can look through. You'll have to turn them into a table so that you can access them in automl. There are datasets associated with the spark definitive guide and learning spark ...

  • 0 kudos
Rex
by New Contributor III
  • 6157 Views
  • 4 replies
  • 3 kudos

Resolved! Cannot use prepared statements with date functions

We are using PHP and the Databricks SQL ODBC driver and cannot run a query that users DATE functions with prepared statements. Sample script/docker setup here:https://github.com/rlorenzo/databricks_php/blob/odbc_prepare_error/test_connection.phpFor e...

  • 6157 Views
  • 4 replies
  • 3 kudos
Latest Reply
Rex
New Contributor III
  • 3 kudos

@Bilal Aslam​ We tried CAST and CONVERT and still getting the same error.

  • 3 kudos
3 More Replies
TS
by New Contributor III
  • 813 Views
  • 0 replies
  • 1 kudos

Is there a better way for this matching?

I have an array:var arg = condColumnsKeyswith the elementsarg: Array[String] = Array(LOT_PREFIX, PS_NAME_BOOK_TEMPLATE_NAME, PS_NAME_PAGE_NAME, PS_NAME_FIELD_NAME)Desired outcome is to get the string "LOT_PREFIX" and store it in var ccLotPrefixMy fir...

  • 813 Views
  • 0 replies
  • 1 kudos
Taha_Hussain
by Databricks Employee
  • 1548 Views
  • 1 replies
  • 1 kudos

Databricks Office Hours Our next Office Hours session is scheduled for April 27 2022 - 8:00 am PT. Do you have questions about how to set up or use Da...

Databricks Office HoursOur next Office Hours session is scheduled for April 27 2022 - 8:00 am PT.Do you have questions about how to set up or use Databricks? Do you want to learn more about the best practices for deploying your use case or tips on da...

  • 1548 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Just registered. Thank you and happy weekend.

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels