Data Engineering

Forum Posts

Sorted by:

by Constantine • Contributor III

04-10-2022 10:56:12 PM

1862 Views
2 replies
5 kudos

Resolved! Unable to create a partitioned table on s3 data

I write data to s3 like data.write.format("delta").mode("append").option("mergeSchema", "true").save(s3_location)and create a partitioned table likeCREATE TABLE IF NOT EXISTS demo_table USING DELTA PARTITIONED BY (column_a) LOCATION {s3_location};whi...

Data Engineering

1862 Views
2 replies
5 kudos

04-10-2022 10:56:12 PM

View Replies

Latest Reply

Kaniz
Community Manager

04-13-2022 3:01:47 AM

5 kudos

Hi @John Constantine , Did the above suggestions provided by @Hubert Dudek help your case?

5 kudos

04-13-2022 3:01:47 AM

1 More Replies

by Constantine • Contributor III

04-11-2022 12:54:25 PM

1278 Views
2 replies
5 kudos

Resolved! Delta Table created on s3 has all null values

I have data in a Spark Dataframe and I write it to an s3 location. It has some complex datatypes like structs etc. When I create the table on top on the s3 location by using CREATE TABLE IF NOT EXISTS table_name USING DELTA LOCATION 's3://.../...';Th...

Data Engineering

1278 Views
2 replies
5 kudos

04-11-2022 12:54:25 PM

View Replies

Latest Reply

Kaniz
Community Manager

04-13-2022 2:37:19 AM

5 kudos

Hi @John Constantine , Did you try the above suggestions?

5 kudos

04-13-2022 2:37:19 AM

1 More Replies

by Krishscientist • New Contributor III

04-12-2022 5:52:43 AM

1095 Views
4 replies
2 kudos

Resolved! Py Spark Pandas Code diff

Hi Can you help me why Pandas code not working..but Pyspark is working..import pandas as pdpdf = pd.read_csv('/FileStore/tables/new.csv',sep=',')Error : No such file exists...below is worked..df = spark.read.csv("/FileStore/tables/new.csv", sep=",", ...

Data Engineering

1095 Views
4 replies
2 kudos

04-12-2022 5:52:43 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-13-2022 1:48:54 AM

2 kudos

Hi @Rafael Rockenbach and @Hubert Dudek , It was so nice to have your response. Thank you for the time you put into our community. I really want you to know how much we appreciate that.

2 kudos

04-13-2022 1:48:54 AM

3 More Replies

by kdkoa • New Contributor III

03-17-2022 11:20:58 AM

1418 Views
4 replies
2 kudos

Resolved! Random SMTP authentication failures to Office 365 (Exchange)

Hey all-I have a python script running in databricks notebook which uses smtplib to connect and send email via our Exchange online server. At random times, it will start getting authentication failures and I can't figure out why. I've confirmed that ...

Data Engineering

1418 Views
4 replies
2 kudos

03-17-2022 11:20:58 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-17-2022 12:02:11 PM

2 kudos

If message is "'bad username or password.'" my guess is that it is on Exchange side.

2 kudos

03-17-2022 12:02:11 PM

3 More Replies

by athjain • New Contributor III

03-07-2022 12:31:19 AM

3015 Views
5 replies
7 kudos

Resolved! How to query deltatables stored in s3 through databricks SQL Endpoint?

the delta tables after ETL are stored in s3 in csv or parquet format, so now question is how to allow databricks sql endpoint to run query over s3 saved files

Data Engineering

3015 Views
5 replies
7 kudos

03-07-2022 12:31:19 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-12-2022 9:37:26 AM

7 kudos

Hey @Athlestan Jain How are you doing?Thanks for posting your question. Do you think you were able to resolve the issue?We'd love to hear from you.

7 kudos

04-12-2022 9:37:26 AM

4 More Replies

by _Orc • New Contributor

03-02-2022 12:19:52 PM

1565 Views
2 replies
1 kudos

Resolved! Checkpoint is getting created even the though the microbatch append has failed

Use caseRead data from source table using structured spark streaming(Round the clock).Apply transformation logic etc etc and finally merge the dataframe in the target table.If there is any failure during transformation or merge ,databricks job should...

Data Engineering

1565 Views
2 replies
1 kudos

03-02-2022 12:19:52 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-12-2022 9:34:32 AM

1 kudos

Hi @Om Singh Hope you are doing well. Just wanted to check in and see if you were able to find a solution to your question?Cheers

1 kudos

04-12-2022 9:34:32 AM

1 More Replies

by Databricks_7045 • New Contributor III

03-03-2022 9:45:31 AM

2214 Views
2 replies
4 kudos

Resolved! Connecting Delta Tables from any Tools

Hi Team,To access SQL Tables we use tools like TOAD , SQL SERVER MANAGEMENT STUDIO (SSMS).Is there any tool to connect and access Databricks Delta tables.Please let us know.Thank you

Data Engineering

2214 Views
2 replies
4 kudos

03-03-2022 9:45:31 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-12-2022 9:29:43 AM

4 kudos

Hi @Rajesh Vinukonda Hope you are doing well. Thanks for sending in your question. Were you able to find a solution to your query?

4 kudos

04-12-2022 9:29:43 AM

1 More Replies

by RRO • Contributor

03-31-2022 3:12:14 AM

21024 Views
7 replies
7 kudos

Resolved! Performance for pyspark dataframe is very slow after using a @pandas_udf

Hello,I am currently working on a time series forecasting with FBProphet. Since I have data with many time series groups (~3000) I use a @pandas_udf to parallelize the training. @pandas_udf(schema, PandasUDFType.GROUPED_MAP) def forecast_netprofit(pr...

Data Engineering

21024 Views
7 replies
7 kudos

03-31-2022 3:12:14 AM

View Replies

Latest Reply

RRO
Contributor

04-12-2022 8:01:24 AM

7 kudos

Thank you for the answers. Unfortunately this did not solve the performance issue.What I did now is I saved the results into a table:results.write.mode("overwrite").saveAsTable("db.results") This is probably not the best solution but after I do that ...

7 kudos

04-12-2022 8:01:24 AM

6 More Replies

by sarvesh242 • Contributor

04-12-2022 1:13:50 AM

915 Views
2 replies
2 kudos

Resolved! java.lang.NoSuchMethodError in databricks

I have created a package. Now I am calling a method from this package in my notebook but it is throwing me java.lang.NoSuchMethodError in databricks. The method exists in the package. Can you please guide me regarding the same.Thanks!

Data Engineering

915 Views
2 replies
2 kudos

04-12-2022 1:13:50 AM

View Replies

Latest Reply

sarvesh242
Contributor

04-12-2022 3:14:10 AM

2 kudos

Hi! I am sharing the error stack with you. I can't share the code with you due to confidentiality of the code. Can you please guide me ?java.lang.NoSuchMethodError: com.iig.utils.common.IIGCommonConstants$.flowProperties()Ljava/lang/String; at com.ii...

2 kudos

04-12-2022 3:14:10 AM

1 More Replies

by DavideCagnoni • Contributor

02-11-2022 1:09:31 AM

2485 Views
8 replies
3 kudos

Resolved! How to force pandas_on_spark plots to use all dataframe data?

When I load a table as a `pandas_on_spark` dataframe, and try to e.g. scatterplot two columns, what I obtain is a subset of the desired points. For example, if I try to plot two columns from a table with 1000000 rows, I only see some of the data - i...

Data Engineering

2485 Views
8 replies
3 kudos

02-11-2022 1:09:31 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-11-2022 11:43:04 PM

3 kudos

Hi @Davide Cagnoni , The Ideas Portal lets you influence the Databricks product roadmap by providing feedback directly to the product team. Use the Ideas Portal to:Enter feature requests.View, comment, and vote up other users’ requests.Monitor the p...

3 kudos

04-11-2022 11:43:04 PM

7 More Replies

by BeginnerBob • New Contributor III

03-15-2022 10:52:29 AM

2328 Views
4 replies
4 kudos

Resolved! Bronze silver gold layers

Is there a best practise guide on setting up the delta lake for these 3 layers. I'm looking for document or scripts to run that will assist me.

Data Engineering

2328 Views
4 replies
4 kudos

03-15-2022 10:52:29 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-11-2022 10:24:06 PM

4 kudos

Hi @Lloyd Vickery , Did you find any of the above answers helpful? Would you like to tell us if you solved it?

4 kudos

04-11-2022 10:24:06 PM

3 More Replies

by spaz • New Contributor II

02-13-2022 1:03:22 PM

1679 Views
5 replies
1 kudos

Resolved! Convert table in nested JSON

What is the easiest way to convert a table to a nested JSON?

Data Engineering

1679 Views
5 replies
1 kudos

02-13-2022 1:03:22 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-06-2022 4:08:12 PM

1 kudos

@Sergio Paz - How's it going? Are you able to give us more information?

1 kudos

03-06-2022 4:08:12 PM

4 More Replies

by michaelh • New Contributor III

03-11-2022 3:09:23 AM

2902 Views
3 replies
2 kudos

Resolved! Databricks runtime from docker hub image

Hello,are databricks runtimes from docker hub ( https://hub.docker.com/r/databricksruntime/standard ) same as actual runtimes inside Databricks? I mean when we made our own docker image from databricksruntime/standard will be there same dependencies...

Data Engineering

2902 Views
3 replies
2 kudos

03-11-2022 3:09:23 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

04-11-2022 2:20:35 PM

2 kudos

Hi @michael henzl ,Just checking if you still need help with this or not any more? please let us know

2 kudos

04-11-2022 2:20:35 PM

2 More Replies

by Sandesh87 • New Contributor III

03-08-2022 9:53:56 AM

1570 Views
2 replies
2 kudos

Resolved! create a dataframe with all the responses from the api requests within foreachPartition

I am trying to execute an api call to get an object(json) from amazon s3 and I am using foreachPartition to execute multiple calls in paralleldf.rdd.foreachPartition(partition => { //Initialize list buffer var buffer_accounts1 = new ListBuffer[St...

Data Engineering

1570 Views
2 replies
2 kudos

03-08-2022 9:53:56 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

04-11-2022 2:08:20 PM

2 kudos

Hi @Sandesh Puligundla ,Thank you for sharing the solution. We will mark it as "best" response so, in the future is another user has the same question, they will be able to find the solution right away.

2 kudos

04-11-2022 2:08:20 PM

1 More Replies

by Constantine • Contributor III

03-06-2022 10:49:56 AM

829 Views
2 replies
3 kudos

Resolved! Can't view files of different types in databricks

I am reading a Kafka input using Spark Streaming on databricks and trying to deserialize it. The input is in the form of thrift. I want to create a file of .thrift format to provide schema but am unable to do it. Even if I create the file locally and...

Data Engineering

829 Views
2 replies
3 kudos

03-06-2022 10:49:56 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

04-11-2022 1:59:33 PM

3 kudos

Hi @John Constantine ,Just checking if you still need help or not anymore. If you do, please share as much details and logs as possible, so we would be able to help better.

3 kudos

04-11-2022 1:59:33 PM

1 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Resolved! Unable to create a partitioned table on s3 data

Resolved! Delta Table created on s3 has all null values

Resolved! Py Spark Pandas Code diff

Resolved! Random SMTP authentication failures to Office 365 (Exchange)

Resolved! How to query deltatables stored in s3 through databricks SQL Endpoint?

Resolved! Checkpoint is getting created even the though the microbatch append has failed

Resolved! Connecting Delta Tables from any Tools

Resolved! Performance for pyspark dataframe is very slow after using a @pandas_udf

Resolved! java.lang.NoSuchMethodError in databricks

Resolved! How to force pandas_on_spark plots to use all dataframe data?

Resolved! Bronze silver gold layers

Resolved! Convert table in nested JSON

Resolved! Databricks runtime from docker hub image

Resolved! create a dataframe with all the responses from the api requests within foreachPartition

Resolved! Can't view files of different types in databricks

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...