cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Sandesh87
by New Contributor III
  • 4532 Views
  • 2 replies
  • 2 kudos

Resolved! create a dataframe with all the responses from the api requests within foreachPartition

I am trying to execute an api call to get an object(json) from amazon s3 and I am using foreachPartition to execute multiple calls in paralleldf.rdd.foreachPartition(partition => { //Initialize list buffer var buffer_accounts1 = new ListBuffer[St...

  • 4532 Views
  • 2 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Hi @Sandesh Puligundla​ ,Thank you for sharing the solution. We will mark it as "best" response so, in the future is another user has the same question, they will be able to find the solution right away.

  • 2 kudos
1 More Replies
Constantine
by Contributor III
  • 2656 Views
  • 2 replies
  • 3 kudos

Resolved! Can't view files of different types in databricks

I am reading a Kafka input using Spark Streaming on databricks and trying to deserialize it. The input is in the form of thrift. I want to create a file of .thrift format to provide schema but am unable to do it. Even if I create the file locally and...

  • 2656 Views
  • 2 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Hi @John Constantine​ ,Just checking if you still need help or not anymore. If you do, please share as much details and logs as possible, so we would be able to help better.

  • 3 kudos
1 More Replies
KKo
by Contributor III
  • 3035 Views
  • 3 replies
  • 7 kudos

Resolved! ETL in Databricks

I use Azure Databricks for ETL. I read/write data from and to raw/stage/curate folders. I write dataframe to a path (eg: /mnt/datalake/curated/....). In final step I read data from the path, convert that to dataframe and write it to the Azure SQL DB/...

  • 3035 Views
  • 3 replies
  • 7 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 7 kudos

Hi @Kris Koirala​ ,Just checking if you still have any follow-up questions? please let us know.

  • 7 kudos
2 More Replies
Jreco
by Contributor
  • 6808 Views
  • 4 replies
  • 1 kudos

Resolved! Method iterableAsScalaIterable does not exist Pydeequ

Hello,I'm using Databricks and pydeequ to build a QA step in structured streaming.One of the Analyzers that I need to use is the Uniqueness.If I try to add another one like Completeness, work properly, but if y add the Uniqueness I get an error:py4j....

155309688-d4d03acc-1012-42ec-8d40-9cbf4b8d12c3.png 155311239-2259d89e-e2b2-45c1-b57c-1a841ebe189e 155309988-fd6ec25f-53ec-4f7a-a37a-e3596cefe10e
  • 6808 Views
  • 4 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

I think it is because you did not attach the libraries to the cluster.When you work with a notebook, the sparksession is already created.To add libraries, you should install them on the cluster (in the compute tab) using f.e. pypi/maven etc.

  • 1 kudos
3 More Replies
wgsing
by New Contributor
  • 4704 Views
  • 3 replies
  • 0 kudos

Resolved! Databricks Cluster create fail

i facing the problem here in creating cluster in databricks. Error as below :MessageCluster terminated.Reason:Unexpected launch failureAn unexpected error was encountered while setting up the cluster. Please retry and contact Databricks if the proble...

  • 4704 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Giin Sing Wong​ ,Just a friendly follow-up. Is this issue still happening or you were able to resolve it by increasing your account's quota? Please let us know.

  • 0 kudos
2 More Replies
knight007
by New Contributor II
  • 6047 Views
  • 7 replies
  • 5 kudos

Containerized Databricks/Spark database

Hello. I'm fairly new to Databricks and Spark.I have a requirement to connect to Databricks using JDBC and that works perfectly using the driver I downloaded from the Databricks website ("com.simba.spark.jdbc.Driver")What I would like to do now is ha...

  • 6047 Views
  • 7 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 5 kudos

@Gurps Bassi​ , "running instance of a database in docker" - that is hive metastore, so it just mapping to data which is usually physically on the data lake. Databricks are so much on the cloud that setting metastore locally doesn't make sense. Inste...

  • 5 kudos
6 More Replies
Constantine
by Contributor III
  • 4549 Views
  • 1 replies
  • 5 kudos

Resolved! Unable to create a partitioned table on s3 data

I write data to s3 like data.write.format("delta").mode("append").option("mergeSchema", "true").save(s3_location)and create a partitioned table likeCREATE TABLE IF NOT EXISTS demo_table USING DELTA PARTITIONED BY (column_a) LOCATION {s3_location};whi...

  • 4549 Views
  • 1 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 5 kudos

@John Constantine​ , In CREATE TABLE, you need to specify fields:CREATE TABLE IF NOT EXISTS demo_table (column_a STRING, number INT) USING DELTA PARTITIONED BY (column_a) LOCATION {s3_location};and when you save data before creating ...

  • 5 kudos
Constantine
by Contributor III
  • 2978 Views
  • 1 replies
  • 5 kudos

Resolved! Delta Table created on s3 has all null values

I have data in a Spark Dataframe and I write it to an s3 location. It has some complex datatypes like structs etc. When I create the table on top on the s3 location by using CREATE TABLE IF NOT EXISTS table_name USING DELTA LOCATION 's3://.../...';Th...

  • 2978 Views
  • 1 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 5 kudos

@John Constantine​ ,Try to load it as DataFrame (spark.read.delta(path)) and validate what is loading,It could be easier to mount the S3 location as a folder to ensure that all data is there (dbutils or %fs to check) and that the connection is workin...

  • 5 kudos
keunsoop
by New Contributor
  • 68286 Views
  • 7 replies
  • 2 kudos

Resolved! Run stored bash in Databricks with %sh

Hi, I made bash file in databricks and I can see that the file is stored as the following picture. I was supposed to run this bash file through %sh cell, but as you see the following picture, I could not find bash file, which I could find through d...

0693f000007OoILAA0 0693f000007OoIMAA0
  • 68286 Views
  • 7 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Hi @keunsoop​ ,Are you able to run your code using an init script? I would like to share some docs in case you might have some questions https://docs.databricks.com/clusters/init-scripts.html

  • 2 kudos
6 More Replies
Transcarent
by New Contributor II
  • 8162 Views
  • 6 replies
  • 0 kudos

Error: ConnectionError: HTTPSConnectionPool(host='https', port=443): Max retries exceeded with url: /api/2.0/workspace/list?path=%2F (Caused b...

Error: ConnectionError: HTTPSConnectionPool(host='https', port=443): Max retries exceeded with url: /api/2.0/workspace/list?path=%2F (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

  • 8162 Views
  • 6 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

hi @prakash reddy​ ,Is this an intermittent error or are you able to repro it? please let us know.

  • 0 kudos
5 More Replies
s_plank
by New Contributor III
  • 5636 Views
  • 6 replies
  • 5 kudos

Resolved! Databricks-Connect shows different partitions than Databricks for the same delta table

Hello,here is a small code-snippet:from pyspark.sql import SparkSession spark = SparkSession.builder.appName('example_app').getOrCreate()   spark.sql('SHOW PARTITIONS database.table').show() The output inside the Databricks-Notebook:+-------------+--...

  • 5636 Views
  • 6 replies
  • 5 kudos
Latest Reply
s_plank
New Contributor III
  • 5 kudos

Hi @Jose Gonzalez​ ,yes the SQL-Connector works fine. Thank you!

  • 5 kudos
5 More Replies
Krishscientist
by New Contributor III
  • 1800 Views
  • 1 replies
  • 0 kudos

Resolved! AutoML : data set for problem type "Classification"

HI,I am working on AutoML Experiment. Could you plz help me with data set for problem type "Classification"Regards.

  • 1800 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

There are a lot of datasets available in /databricks-datasets/ that you can look through. You'll have to turn them into a table so that you can access them in automl. There are datasets associated with the spark definitive guide and learning spark ...

  • 0 kudos
Rex
by New Contributor III
  • 7295 Views
  • 4 replies
  • 3 kudos

Resolved! Cannot use prepared statements with date functions

We are using PHP and the Databricks SQL ODBC driver and cannot run a query that users DATE functions with prepared statements. Sample script/docker setup here:https://github.com/rlorenzo/databricks_php/blob/odbc_prepare_error/test_connection.phpFor e...

  • 7295 Views
  • 4 replies
  • 3 kudos
Latest Reply
Rex
New Contributor III
  • 3 kudos

@Bilal Aslam​ We tried CAST and CONVERT and still getting the same error.

  • 3 kudos
3 More Replies
TS
by New Contributor III
  • 1048 Views
  • 0 replies
  • 1 kudos

Is there a better way for this matching?

I have an array:var arg = condColumnsKeyswith the elementsarg: Array[String] = Array(LOT_PREFIX, PS_NAME_BOOK_TEMPLATE_NAME, PS_NAME_PAGE_NAME, PS_NAME_FIELD_NAME)Desired outcome is to get the string "LOT_PREFIX" and store it in var ccLotPrefixMy fir...

  • 1048 Views
  • 0 replies
  • 1 kudos
Taha_Hussain
by Databricks Employee
  • 1848 Views
  • 1 replies
  • 1 kudos

Databricks Office Hours Our next Office Hours session is scheduled for April 27 2022 - 8:00 am PT. Do you have questions about how to set up or use Da...

Databricks Office HoursOur next Office Hours session is scheduled for April 27 2022 - 8:00 am PT.Do you have questions about how to set up or use Databricks? Do you want to learn more about the best practices for deploying your use case or tips on da...

  • 1848 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 1 kudos

Just registered. Thank you and happy weekend.

  • 1 kudos
Labels