cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Constantine
by Contributor III
  • 1862 Views
  • 2 replies
  • 5 kudos

Resolved! Unable to create a partitioned table on s3 data

I write data to s3 like data.write.format("delta").mode("append").option("mergeSchema", "true").save(s3_location)and create a partitioned table likeCREATE TABLE IF NOT EXISTS demo_table USING DELTA PARTITIONED BY (column_a) LOCATION {s3_location};whi...

  • 1862 Views
  • 2 replies
  • 5 kudos
Latest Reply
Kaniz
Community Manager
  • 5 kudos

Hi @John Constantine​ , Did the above suggestions provided by @Hubert Dudek​ help your case?

  • 5 kudos
1 More Replies
Constantine
by Contributor III
  • 1278 Views
  • 2 replies
  • 5 kudos

Resolved! Delta Table created on s3 has all null values

I have data in a Spark Dataframe and I write it to an s3 location. It has some complex datatypes like structs etc. When I create the table on top on the s3 location by using CREATE TABLE IF NOT EXISTS table_name USING DELTA LOCATION 's3://.../...';Th...

  • 1278 Views
  • 2 replies
  • 5 kudos
Latest Reply
Kaniz
Community Manager
  • 5 kudos

Hi @John Constantine​ , Did you try the above suggestions?

  • 5 kudos
1 More Replies
Krishscientist
by New Contributor III
  • 1095 Views
  • 4 replies
  • 2 kudos

Resolved! Py Spark Pandas Code diff

Hi Can you help me why Pandas code not working..but Pyspark is working..import pandas as pdpdf = pd.read_csv('/FileStore/tables/new.csv',sep=',')Error : No such file exists...below is worked..df = spark.read.csv("/FileStore/tables/new.csv", sep=",", ...

  • 1095 Views
  • 4 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Rafael Rockenbach​  and @Hubert Dudek​ , It was so nice to have your response. Thank you for the time you put into our community. I really want you to know how much we appreciate that.

  • 2 kudos
3 More Replies
kdkoa
by New Contributor III
  • 1418 Views
  • 4 replies
  • 2 kudos

Resolved! Random SMTP authentication failures to Office 365 (Exchange)

Hey all-I have a python script running in databricks notebook which uses smtplib to connect and send email via our Exchange online server. At random times, it will start getting authentication failures and I can't figure out why. I've confirmed that ...

  • 1418 Views
  • 4 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

If message is "'bad username or password.'" my guess is that it is on Exchange side.

  • 2 kudos
3 More Replies
athjain
by New Contributor III
  • 3015 Views
  • 5 replies
  • 7 kudos

Resolved! How to query deltatables stored in s3 through databricks SQL Endpoint?

the delta tables after ETL are stored in s3 in csv or parquet format, so now question is how to allow databricks sql endpoint to run query over s3 saved files

  • 3015 Views
  • 5 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hey @Athlestan Jain​ How are you doing?Thanks for posting your question. Do you think you were able to resolve the issue?We'd love to hear from you.

  • 7 kudos
4 More Replies
_Orc
by New Contributor
  • 1565 Views
  • 2 replies
  • 1 kudos

Resolved! Checkpoint is getting created even the though the microbatch append has failed

Use caseRead data from source table using structured spark streaming(Round the clock).Apply transformation logic etc etc and finally merge the dataframe in the target table.If there is any failure during transformation or merge ,databricks job should...

  • 1565 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Om Singh​ Hope you are doing well. Just wanted to check in and see if you were able to find a solution to your question?Cheers

  • 1 kudos
1 More Replies
Databricks_7045
by New Contributor III
  • 2214 Views
  • 2 replies
  • 4 kudos

Resolved! Connecting Delta Tables from any Tools

Hi Team,To access SQL Tables we use tools like TOAD , SQL SERVER MANAGEMENT STUDIO (SSMS).Is there any tool to connect and access Databricks Delta tables.Please let us know.Thank you

  • 2214 Views
  • 2 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Rajesh Vinukonda​ Hope you are doing well. Thanks for sending in your question. Were you able to find a solution to your query?

  • 4 kudos
1 More Replies
RRO
by Contributor
  • 21024 Views
  • 7 replies
  • 7 kudos

Resolved! Performance for pyspark dataframe is very slow after using a @pandas_udf

Hello,I am currently working on a time series forecasting with FBProphet. Since I have data with many time series groups (~3000) I use a @pandas_udf to parallelize the training. @pandas_udf(schema, PandasUDFType.GROUPED_MAP) def forecast_netprofit(pr...

  • 21024 Views
  • 7 replies
  • 7 kudos
Latest Reply
RRO
Contributor
  • 7 kudos

Thank you for the answers. Unfortunately this did not solve the performance issue.What I did now is I saved the results into a table:results.write.mode("overwrite").saveAsTable("db.results") This is probably not the best solution but after I do that ...

  • 7 kudos
6 More Replies
sarvesh242
by Contributor
  • 915 Views
  • 2 replies
  • 2 kudos

Resolved! java.lang.NoSuchMethodError in databricks

I have created a package. Now I am calling a method from this package in my notebook but it is throwing me java.lang.NoSuchMethodError in databricks. The method exists in the package. Can you please guide me regarding the same.Thanks!

  • 915 Views
  • 2 replies
  • 2 kudos
Latest Reply
sarvesh242
Contributor
  • 2 kudos

Hi! I am sharing the error stack with you. I can't share the code with you due to confidentiality of the code. Can you please guide me ?java.lang.NoSuchMethodError: com.iig.utils.common.IIGCommonConstants$.flowProperties()Ljava/lang/String; at com.ii...

  • 2 kudos
1 More Replies
DavideCagnoni
by Contributor
  • 2485 Views
  • 8 replies
  • 3 kudos

Resolved! How to force pandas_on_spark plots to use all dataframe data?

When I load a table as a `pandas_on_spark` dataframe, and try to e.g. scatterplot two columns, what I obtain is a subset of the desired points. For example, if I try to plot two columns from a table with 1000000 rows, I only see some of the data - i...

  • 2485 Views
  • 8 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Davide Cagnoni​ , The Ideas Portal lets you influence the Databricks product roadmap by providing feedback directly to the product team. Use the Ideas Portal to:Enter feature requests.View, comment, and vote up other users’ requests.Monitor the p...

  • 3 kudos
7 More Replies
BeginnerBob
by New Contributor III
  • 2328 Views
  • 4 replies
  • 4 kudos

Resolved! Bronze silver gold layers

Is there a best practise guide on setting up the delta lake for these 3 layers. ​I'm looking for document or scripts to run that will assist me.

  • 2328 Views
  • 4 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @Lloyd Vickery​ , Did you find any of the above answers helpful? Would you like to tell us if you solved it?

  • 4 kudos
3 More Replies
michaelh
by New Contributor III
  • 2902 Views
  • 3 replies
  • 2 kudos

Resolved! Databricks runtime from docker hub image

Hello,are databricks runtimes from docker hub ( https://hub.docker.com/r/databricksruntime/standard ) same as actual runtimes inside Databricks? I mean when we made our own docker image from databricksruntime/standard will be there same dependencies...

image.png image
  • 2902 Views
  • 3 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

Hi @michael henzl​ ,Just checking if you still need help with this or not any more? please let us know

  • 2 kudos
2 More Replies
Sandesh87
by New Contributor III
  • 1570 Views
  • 2 replies
  • 2 kudos

Resolved! create a dataframe with all the responses from the api requests within foreachPartition

I am trying to execute an api call to get an object(json) from amazon s3 and I am using foreachPartition to execute multiple calls in paralleldf.rdd.foreachPartition(partition => { //Initialize list buffer var buffer_accounts1 = new ListBuffer[St...

  • 1570 Views
  • 2 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

Hi @Sandesh Puligundla​ ,Thank you for sharing the solution. We will mark it as "best" response so, in the future is another user has the same question, they will be able to find the solution right away.

  • 2 kudos
1 More Replies
Constantine
by Contributor III
  • 829 Views
  • 2 replies
  • 3 kudos

Resolved! Can't view files of different types in databricks

I am reading a Kafka input using Spark Streaming on databricks and trying to deserialize it. The input is in the form of thrift. I want to create a file of .thrift format to provide schema but am unable to do it. Even if I create the file locally and...

  • 829 Views
  • 2 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Moderator
  • 3 kudos

Hi @John Constantine​ ,Just checking if you still need help or not anymore. If you do, please share as much details and logs as possible, so we would be able to help better.

  • 3 kudos
1 More Replies
Labels
Top Kudoed Authors