Data Engineering

Forum Posts

Sorted by:

by Sri_H • New Contributor III

07-27-2022 7:53:09 AM

3796 Views
2 replies
1 kudos

Databricks Academy - Access to training recording attended during Data & AI Summit 2022

Hi All,I attended a 2 day ML training during the Data & AI 2022 summit and I received an email from the events team (ataaisummit@typeaevents.com) telling that the recordings for training and related material will be available in my Databricks Academy...

Data Engineering

3796 Views
2 replies
1 kudos

07-27-2022 7:53:09 AM

View Replies

Latest Reply

Anonymous
Not applicable

08-03-2022 11:10:43 AM

1 kudos

Hi @Sri H ! I am checking on this for you - hang tight! I'll try and get an update asap from the Academy Team.

1 kudos

08-03-2022 11:10:43 AM

1 More Replies

by Data_Engineer3 • Contributor III

06-11-2022 7:01:55 AM

8953 Views
3 replies
3 kudos

Resolved! Unable to read file from dbfs location in databricks.

When i tried to read file from dbfs, it throws error - Caused by: FileReadException: Error while reading file dbfs:/.......................parquet is not a Parquet file. Expected magic number at tail [80, 65, 82, 49] but found [105, 108, 101, 115].Bu...

Data Engineering

8953 Views
3 replies
3 kudos

06-11-2022 7:01:55 AM

View Replies

by Daniel3 • New Contributor II

08-11-2022 5:52:16 AM

2063 Views
0 replies
0 kudos

Unable to load table in Community Edition using COPY INTO through DBFS

I have uploaded a sample file using DBFS tab under Data module.But after create table, unable to load that CSV file uploaded data into the table.Getting below error, please let me know the solution.Error in SQL statement: UnsupportedOperationExceptio...

Data Engineering

2063 Views
0 replies
0 kudos

08-11-2022 5:52:16 AM

by AJ270990 • Contributor III

08-09-2022 11:54:30 PM

22367 Views
3 replies
0 kudos

Resolved! I am getting ParseException: error while running the spark SQL query

I am using below code to create the Spark session and also loading the csv file. Spark session and loading csv is running well. However SQL query is generating the Parse Exception.%pythonfrom pyspark.sql import SparkSession # Create a SparkSessio...

Data Engineering

22367 Views
3 replies
0 kudos

08-09-2022 11:54:30 PM

View Replies

Latest Reply

AJ270990
Contributor III

08-10-2022 10:49:33 PM

0 kudos

This is resolved. Below query works fine nowsqldf = spark.sql("select sum(cast(enrollment as float)), sum(cast(growth as float)),`plan type`,`Parent Organization`,state,`Special Needs Plan`,`Plan Name Sec A`, CASE when `Plan ID` between '800' and '89...

0 kudos

08-10-2022 10:49:33 PM

2 More Replies

by Jhaji • New Contributor

08-10-2022 2:13:56 PM

1327 Views
0 replies
0 kudos

The REFRESH TABLE command doesn't seem to invalidate the local cache. Am I missing something?

Hi Team,As part of "Data Enginering with Databricks" course section "DE 4.2 - Providing Options for External Sources", I can read total number of records of sales_csv table as 10510. The append command in Cmd17 is supposed to increase this number 2x,...

Data Engineering

1327 Views
0 replies
0 kudos

08-10-2022 2:13:56 PM

by 159312 • New Contributor III

06-30-2022 10:19:50 AM

4239 Views
3 replies
0 kudos

When trying to ingest parquet files with autoloader I get an error stating that schema inference is not supported, but the parquet files have schema data. No inference should be necessary. Is this right?

When trying to ingest parquet files with autoloader with the following codedf = (spark .readStream .format("cloudFiles") .option("cloudfiles.format","parquet") .load(filePath))I get the following error:java.lang.UnsupportedOperationException:...

Data Engineering

4239 Views
3 replies
0 kudos

06-30-2022 10:19:50 AM

View Replies

Latest Reply

Noopur_Nigam
Databricks Employee

07-25-2022 3:40:28 AM

0 kudos

Hi @Ben Bogart This is supported in DBR 11.1 and above.The below document suggests the same:https://docs.databricks.com/ingestion/auto-loader/schema.html#schema-inference-and-evolution-in-auto-loaderPlease try in DBR 11.1 and please let us know if y...

0 kudos

07-25-2022 3:40:28 AM

2 More Replies

by Zair • New Contributor III

08-06-2022 2:15:58 PM

2965 Views
2 replies
2 kudos

How to handle 100+ tables ETL through spark structured streaming?

I am writing a streaming job which will be performing ETL for more than 130 tables. I would like to know is there any other better way to do this. Another solution I am thinking is to write separate streaming job for all tables. source data is coming...

Data Engineering

2965 Views
2 replies
2 kudos

08-06-2022 2:15:58 PM

View Replies

Latest Reply

artsheiko
Databricks Employee

08-07-2022 6:10:46 AM

2 kudos

Hi, I guess to answer your question it might be helpful to get more details on what you're trying to achieve and the bottleneck that you encounter now.Indeed handle the processing of 130 tables in one monolith could be challenging as the business rul...

2 kudos

08-07-2022 6:10:46 AM

1 More Replies

by AJ270990 • Contributor III

04-19-2022 5:02:45 AM

11792 Views
3 replies
4 kudos

Resolved! How to bold a text ?

I have searched several ways on applying a bold to a text however unable to achieve it.Have added '\033[1m' then my text and followed by '\033[0m', however cant see the text as bold.I need to apply Bold to the Header "Ocean" in below image which is i...

Data Engineering

11792 Views
3 replies
4 kudos

04-19-2022 5:02:45 AM

View Replies

Latest Reply

AJ270990
Contributor III

08-09-2022 11:55:22 PM

4 kudos

I have used plt.text() to make text bold

4 kudos

08-09-2022 11:55:22 PM

2 More Replies

by RaymondLC92 • New Contributor II

08-05-2022 1:03:18 PM

4130 Views
2 replies
1 kudos

Resolved! How to obtain run_id without using dbutils in python?

We would like to be able to get the run_id in a job run and we have the unfortunate restriction that we cannot use dbutils, is there a way to get it in python?I know for Job ID it's possible to retrieve it from the environment variables.

Data Engineering

4130 Views
2 replies
1 kudos

08-05-2022 1:03:18 PM

View Replies

Latest Reply

artsheiko
Databricks Employee

08-07-2022 6:15:28 AM

1 kudos

Hi, please refer to the following thread : https://community.databricks.com/s/question/0D58Y00008pbkj9SAA/how-to-get-the-job-id-and-run-id-and-save-into-a-databaseHope this helps

1 kudos

08-07-2022 6:15:28 AM

1 More Replies

by Rahul_Samant • Contributor

01-19-2022 2:20:31 AM

10217 Views
4 replies
5 kudos

Resolved! High Concurrency Pass Through Cluster : pyarrow optimization not working while converting to pandasdf

i need to convert a spark dataframe to pandas dataframe with arrow optimization spark.conf.set("spark.sql.execution.arrow.enabled", "true")data_df=df.toPandas()but getting one of the below error randomly while doing so Exception: arrow is not support...

Data Engineering

10217 Views
4 replies
5 kudos

01-19-2022 2:20:31 AM

View Replies

Latest Reply

AlexanderBij
New Contributor II

08-09-2022 5:42:26 AM

5 kudos

Can you confirm this is a known issue?Running into same issue, example to test in 1 cell.# using Arrow fails on HighConcurrency-cluster with PassThrough in runtime 10.4 (and 10.5 and 11.0) spark.conf.set("spark.sql.execution.arrow.pyspark.enabled",...

5 kudos

08-09-2022 5:42:26 AM

3 More Replies

by NickMendes • Databricks Partner

08-02-2022 1:12:28 PM

3788 Views
3 replies
1 kudos

Resolved! Databricks SQL duplicates alert e-mail

Hi everyone, I've been working in DBC SQL and creating few e-mails alerts and have noticed that when it triggers, e-mail notification is getting duplicated. I've been trying lots of testing in different situations, however, it keeps duplicating in my...

Data Engineering

3788 Views
3 replies
1 kudos

08-02-2022 1:12:28 PM

View Replies

Latest Reply

NickMendes
Databricks Partner

08-09-2022 5:11:31 AM

1 kudos

After lots of testing, I've finally figured out one solution. I've changed "Notifications" settings to "When triggered, send notification At most every 1 day" and "Refresh" to "Refresh every 1 day". Now it is working perfectly.

1 kudos

08-09-2022 5:11:31 AM

2 More Replies

by harsha4u • New Contributor II

06-09-2022 9:32:19 AM

1612 Views
1 replies
2 kudos

Any suggestions around automating sizing of clusters and best practices around it? Other than enabling auto scaling, are there any other practices aro...

Any suggestions around automating sizing of clusters and best practices around it? Other than enabling auto scaling, are there any other practices around creating a right size driver/worker nodes?

Data Engineering

1612 Views
1 replies
2 kudos

06-09-2022 9:32:19 AM

View Replies

Latest Reply

User16766737456
Databricks Employee

08-09-2022 3:17:58 AM

2 kudos

Autoscaling should help in sizing the clusters according to the workload. You may want to consider the recommendations here: https://docs.databricks.com/clusters/cluster-config-best-practices.html#cluster-sizing-considerations

2 kudos

08-09-2022 3:17:58 AM

by valiro21 • Contributor

08-08-2022 7:43:41 AM

3306 Views
3 replies
0 kudos

Resolved! docs.databricks.com

Is there an any example that contains at least one widget that works with the Databricks SQL Create Dashboard API? I tried the following simple dashboard:{ "name": "Delta View", "dashboard_filters_enabled": false, "widgets": [ { ...

Data Engineering

3306 Views
3 replies
0 kudos

08-08-2022 7:43:41 AM

View Replies

Latest Reply

Debayan
Databricks Employee

08-09-2022 12:31:44 AM

0 kudos

@Valentin Rosca , Right now, Databricks also does not recommend creating new widgets via queries and dashboards API (https://docs.databricks.com/sql/api/queries-dashboards.html#operation/sql-analytics-create-dashboard). Also, copying a dashboard fr...

0 kudos

08-09-2022 12:31:44 AM

2 More Replies

by Isaac_Low • New Contributor II

08-04-2022 10:26:10 PM

3025 Views
2 replies
3 kudos

Resolved! On the navigational pane, I want to look for DataBricks Repos, but this is not available for the community edition. Anyone can point to the right direction?

Data Engineering

3025 Views
2 replies
3 kudos

08-04-2022 10:26:10 PM

View Replies

Latest Reply

Isaac_Low
New Contributor II

08-08-2022 5:09:24 PM

3 kudos

All good. I just imported the training material manually using the dbc link. Didn't need repos for that.

3 kudos

08-08-2022 5:09:24 PM

1 More Replies

by davidvb • New Contributor II

08-08-2022 2:26:19 AM

3555 Views
2 replies
1 kudos

I have a big problem creating a community account

It is impossible for me create a community account. I put my data on web and in the next step, when the website show me the 3 type of data ( google, amazn etc) and I click on the “ "Get started with community account" the web show me this I have try...

Data Engineering

3555 Views
2 replies
1 kudos

08-08-2022 2:26:19 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

08-08-2022 2:40:24 PM

1 kudos

Hi @david vazquez,It seems like the website was down due to maintenance. You can check the status page next time to check why the website is down https://status.databricks.com/

1 kudos

08-08-2022 2:40:24 PM

1 More Replies

Databricks Community

Forum Posts

Databricks Academy - Access to training recording attended during Data & AI Summit 2022

Resolved! Unable to read file from dbfs location in databricks.

Unable to load table in Community Edition using COPY INTO through DBFS

Resolved! I am getting ParseException: error while running the spark SQL query

The REFRESH TABLE command doesn't seem to invalidate the local cache. Am I missing something?

When trying to ingest parquet files with autoloader I get an error stating that schema inference is not supported, but the parquet files have schema data. No inference should be necessary. Is this right?

How to handle 100+ tables ETL through spark structured streaming?

Resolved! How to bold a text ?

Resolved! How to obtain run_id without using dbutils in python?

Resolved! High Concurrency Pass Through Cluster : pyarrow optimization not working while converting to pandasdf

Resolved! Databricks SQL duplicates alert e-mail

Any suggestions around automating sizing of clusters and best practices around it? Other than enabling auto scaling, are there any other practices aro...

Resolved! docs.databricks.com

Resolved! On the navigational pane, I want to look for DataBricks Repos, but this is not available for the community edition. Anyone can point to the right direction?

I have a big problem creating a community account

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template