Data Engineering

Forum Posts

Sorted by:

Start a conversation

by elgeo • Valued Contributor II

11-16-2022 2:27:20 AM

824 Views
3 replies
2 kudos

Resolved! Disable auto-complete (tab button)

Hello. How could we disable autocomplete that appears with tab button? Thank you

Data Engineering

824 Views
3 replies
2 kudos

11-16-2022 2:27:20 AM

View Replies

Latest Reply

elgeo
Valued Contributor II

11-18-2022 12:30:22 AM

2 kudos

Thank you @Kaniz Fatma

2 kudos

11-18-2022 12:30:22 AM

2 More Replies

by sharonbjehome • New Contributor

11-16-2022 4:17:29 AM

742 Views
1 replies
1 kudos

Structered Streamin from MongoDB Atlas not parsing JSON correctly

HI all,I have a table in MongoDB Atlas that I am trying to read continuously to memory and then will write that file out eventually. However, when I look at the in-memory table it doesn't have the correct schema.Code here:from pyspark.sql.types impo...

Data Engineering

742 Views
1 replies
1 kudos

11-16-2022 4:17:29 AM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

11-17-2022 11:36:04 PM

1 kudos

Hi @sharonbjehome , This has to be checked thoroughly via a support ticket, did you follow: https://docs.databricks.com/external-data/mongodb.html Also, could you please check with mongodb support, Was this working before?

1 kudos

11-17-2022 11:36:04 PM

by dara • New Contributor

11-16-2022 4:23:16 PM

507 Views
1 replies
1 kudos

How to count DelayCategories?

I would like to know how many count of each categories in each year, When I run count, it doesn't work.

Data Engineering

507 Views
1 replies
1 kudos

11-16-2022 4:23:16 PM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

11-17-2022 11:29:28 PM

1 kudos

Hi, @Dara Tourt , When you say it does not work, what is the error? You can run count aggregate function. https://docs.databricks.com/sql/language-manual/functions/count.htmlPlease let us know if this helps.

1 kudos

11-17-2022 11:29:28 PM

by 547284 • New Contributor II

11-17-2022 12:49:35 PM

344 Views
1 replies
1 kudos

How to read in csvs from s3 directory with different columns

I can read all csvs under an S3 uri byu doing:files = dbutils.fs.ls('s3://example-path')df = spark.read.options(header='true', encoding='iso-8859-1', dateFormat='yyyyMMdd', ignoreLeadingWhiteSpace='true', i...

Data Engineering

344 Views
1 replies
1 kudos

11-17-2022 12:49:35 PM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

11-17-2022 10:59:49 PM

1 kudos

Hi @Anthony Wang As of now, I think that's the only way. Please refer: https://docs.databricks.com/external-data/csv.html#pitfalls-of-reading-a-subset-of-columns. Please let us know if this helps.

1 kudos

11-17-2022 10:59:49 PM

by sage5616 • Valued Contributor

07-05-2022 11:01:48 AM

4363 Views
4 replies
6 kudos

Saving PySpark standard out and standard error logs to cloud object storage

I am running my PySpark data pipeline code on a standard databricks cluster. I need to save all Python/PySpark standard output and standard error messages into a file in an Azure BLOB account.When I run my Python code locally I can see all messages i...

Data Engineering

4363 Views
4 replies
6 kudos

07-05-2022 11:01:48 AM

View Replies

Latest Reply

sage5616
Valued Contributor

07-08-2022 8:28:18 AM

6 kudos

This is the approach I am currently taking. It is documented here: https://stackoverflow.com/questions/62774448/how-to-capture-cells-output-in-databricks-notebook from IPython.utils.capture import CapturedIO capture = CapturedIO(sys.stdout, sys.st...

6 kudos

07-08-2022 8:28:18 AM

3 More Replies

by flora2408 • New Contributor II

11-17-2022 8:53:53 AM

566 Views
2 replies
2 kudos

I have passed the Fundamentals Accreditation but I haven´t received my badge and certificate.

I have just passed Fundamentals Accreditation i dont have the badge

Data Engineering

566 Views
2 replies
2 kudos

11-17-2022 8:53:53 AM

View Replies

Latest Reply

LandanG
Honored Contributor

11-17-2022 10:26:36 AM

2 kudos

Hi @FRANCISCO LORA @Kaniz Fatma knows more than me but you could probably submit a ticket to Databricks' Training Team here: https://help.databricks.com/s/contact-us?ReqType=training who will get back to you shortly.

2 kudos

11-17-2022 10:26:36 AM

1 More Replies

by ajithkaythottil • New Contributor

11-17-2022 8:53:41 PM

318 Views
0 replies
0 kudos

usedlaptopcalicut.in

We Are Among The Most Reliable Used Laptop Sellers In Calicut. A Wide Variety Of Laptops From Different Brands To Suit Different Budgets Are Available At Us. The used laptops are in good condition and cost a fraction of what a brand-new laptop would....

Data Engineering

318 Views
0 replies
0 kudos

11-17-2022 8:53:41 PM

by Anonymous • Not applicable

11-13-2022 11:29:12 PM

2410 Views
11 replies
34 kudos

Resolved! Couldn't create new catalog?

I used DBR version 11.0

Data Engineering

2410 Views
11 replies
34 kudos

11-13-2022 11:29:12 PM

View Replies

Latest Reply

Anonymous
Not applicable

11-14-2022 12:25:07 AM

34 kudos

Thank you, I did all steps from Ales's suggestion but still not successfully

34 kudos

11-14-2022 12:25:07 AM

10 More Replies

by Rahul_Tiwary • New Contributor II

11-04-2022 4:20:41 AM

3587 Views
2 replies
4 kudos

Getting Error "java.lang.NoSuchMethodError: org.apache.spark.sql.AnalysisException" while writing data to event hub for streaming. It is working fine if I am writing it to another data brick table

import org.apache.spark.sql._import scala.collection.JavaConverters._import com.microsoft.azure.eventhubs._import java.util.concurrent._import scala.collection.immutable._import org.apache.spark.eventhubs._import scala.concurrent.Futureimport scala.c...

Data Engineering

3587 Views
2 replies
4 kudos

11-04-2022 4:20:41 AM

View Replies

Latest Reply

Gepap
New Contributor II

11-17-2022 8:47:24 AM

4 kudos

The dataframe to write needs to have the following schema:Column | Type ---------------------------------------------- body (required) | string or binary partitionId (*optional) | string partitionKey...

4 kudos

11-17-2022 8:47:24 AM

1 More Replies

by 196083 • New Contributor II

11-04-2022 9:05:04 AM

805 Views
2 replies
2 kudos

iPython shell `set_next_input` not working

I'm running on 11.3 LTS. Expected Behavior:Databricks Notebook Behavior (it does nothing): You can also do `shell.set_next_input("test", replace=True)` to replace the current cell content which also doesn't work on Databricks. `set_next_input` stores...

Data Engineering

805 Views
2 replies
2 kudos

11-04-2022 9:05:04 AM

View Replies

Latest Reply

Kaniz
Community Manager

11-17-2022 4:06:45 AM

2 kudos

Hi @Ryan Eakman, Can you try the DBR version 11.2?

2 kudos

11-17-2022 4:06:45 AM

1 More Replies

by horatiug • New Contributor III

10-05-2022 9:31:44 AM

2169 Views
8 replies
3 kudos

Create workspace in Databricks deployed in Google Cloud using terraform

In the documentation https://registry.terraform.io/providers/databricks/databricks/latest/docs https://docs.gcp.databricks.com/dev-tools/terraform/index.html I could not find documentation on how to provision Databricks workspaces in GCP. Only cre...

Data Engineering

2169 Views
8 replies
3 kudos

10-05-2022 9:31:44 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-13-2022 8:01:04 PM

3 kudos

Hi @horatiu guja Does @Debayan Mukherjee response answer your question?If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else, we can help you with more details.

3 kudos

11-13-2022 8:01:04 PM

7 More Replies

by Arumugam • New Contributor II

11-17-2022 3:12:07 AM

1892 Views
5 replies
1 kudos

DLT Pipeline failed to Start due to "The Execution Contained atleast one disallowed language

Hi , im trying to setup DLT pipeline ,its a basic pipeline for testing purpose im facing the issue while starting the pipeline , any help is appreciated Code :@dlt.table(name="dlt_bronze_cisco_hardware")def dlt_cisco_networking_bronze_hardware(): ret...

Data Engineering

1892 Views
5 replies
1 kudos

11-17-2022 3:12:07 AM

View Replies

Latest Reply

Vivian_Wilfred
Honored Contributor

11-17-2022 5:30:45 AM

1 kudos

Hi @Arumugam Ramachandran seems like you have a spark config set on your DLT job cluster that allows only python and SQL code. Check the spark config (cluster policy).In any case, the python code should work. Verify the notebook's default language, ...

1 kudos

11-17-2022 5:30:45 AM

4 More Replies

by sreedata • New Contributor III

03-31-2022 6:47:42 AM

2504 Views
5 replies
12 kudos

Resolved! Date field getting changed when reading from excel file to dataframe

The date field is getting changed while reading data from source .xls file to the dataframe. In the source xl file all columns are strings but i am not sure why date column alone behaves differentlyIn Source file date is 1/24/2022.In dataframe it is ...

Data Engineering

2504 Views
5 replies
12 kudos

03-31-2022 6:47:42 AM

View Replies

Latest Reply

Pradeep_Namani
New Contributor III

11-17-2022 6:56:19 AM

12 kudos

Hi Team, @Merca Ovnerud I am also facing same issue , below is the code snippet which I am using df=spark.read.format("com.crealytics.spark.excel").option("header","true").load("/mnt/dataplatform/Tenant_PK/Results.xlsx")I have a couple of date colum...

12 kudos

11-17-2022 6:56:19 AM

4 More Replies

by Anonymous • Not applicable

06-11-2021 8:02:20 AM

1020 Views
2 replies
0 kudos

Cluster Modes

Given that there are three different kinda of cluster modes, when is it appropriate to use each one?

Data Engineering

1020 Views
2 replies
0 kudos

06-11-2021 8:02:20 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-14-2021 5:45:52 AM

0 kudos

Standard clustersA Standard cluster is recommended for a single user. Standard clusters can run workloads developed in any language: Python, SQL, R, and Scala.High Concurrency clustersA High Concurrency cluster is a managed cloud resource. The key be...

0 kudos

06-14-2021 5:45:52 AM

1 More Replies

by am777 • New Contributor

11-16-2022 7:04:11 PM

2565 Views
1 replies
1 kudos

I am new to Databricks and SQL. My CASE statement is not working and I cannot figure out why. Below is my code and the error message I'm receiving. Grateful for any and all suggestions. I'm trying to put yrs_to_mat into buckets.

SELECT *, yrs_to_mat, CASE WHEN < 3 THEN "under3" WHEN => 3 AND < 5 THEN "3to5" WHEN => 5 AND < 10 THEN "5to10" WHEN => 10 AND < 15 THEN "10to15" WHEN => 15 THEN "over15" ELSE null END AS maturity_bucket FROM mat...

Data Engineering

2565 Views
1 replies
1 kudos

11-16-2022 7:04:11 PM

View Replies

Latest Reply

Pat
Honored Contributor III

11-17-2022 4:40:44 AM

1 kudos

Hi @Anne-Marie Wood ,I think it's more SQL general issue:you are not comparing any value to `< 3`it should be something like :WHEN X < 3 THEN "under3" SELECT *, yrs_to_mat, CASE WHEN X < 3 THEN "under3" WHEN X => 3 AND <...

1 kudos

11-17-2022 4:40:44 AM

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Resolved! Disable auto-complete (tab button)

Structered Streamin from MongoDB Atlas not parsing JSON correctly

How to count DelayCategories?

How to read in csvs from s3 directory with different columns

Saving PySpark standard out and standard error logs to cloud object storage

I have passed the Fundamentals Accreditation but I haven´t received my badge and certificate.

usedlaptopcalicut.in

Resolved! Couldn't create new catalog?

Getting Error "java.lang.NoSuchMethodError: org.apache.spark.sql.AnalysisException" while writing data to event hub for streaming. It is working fine if I am writing it to another data brick table

iPython shell `set_next_input` not working

Create workspace in Databricks deployed in Google Cloud using terraform

DLT Pipeline failed to Start due to "The Execution Contained atleast one disallowed language

Resolved! Date field getting changed when reading from excel file to dataframe

Cluster Modes

I am new to Databricks and SQL. My CASE statement is not working and I cannot figure out why. Below is my code and the error message I'm receiving. Grateful for any and all suggestions. I'm trying to put yrs_to_mat into buckets.

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...