cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

sage5616
by Valued Contributor
  • 6186 Views
  • 4 replies
  • 6 kudos

Saving PySpark standard out and standard error logs to cloud object storage

I am running my PySpark data pipeline code on a standard databricks cluster. I need to save all Python/PySpark standard output and standard error messages into a file in an Azure BLOB account.When I run my Python code locally I can see all messages i...

  • 6186 Views
  • 4 replies
  • 6 kudos
Latest Reply
sage5616
Valued Contributor
  • 6 kudos

This is the approach I am currently taking. It is documented here: https://stackoverflow.com/questions/62774448/how-to-capture-cells-output-in-databricks-notebook from IPython.utils.capture import CapturedIO capture = CapturedIO(sys.stdout, sys.st...

  • 6 kudos
3 More Replies
flora2408
by New Contributor II
  • 1008 Views
  • 2 replies
  • 2 kudos

I have passed the Fundamentals Accreditation but I haven´t received my badge and certificate.

I have just passed  Fundamentals Accreditation i dont have the badge

  • 1008 Views
  • 2 replies
  • 2 kudos
Latest Reply
LandanG
Honored Contributor
  • 2 kudos

Hi @FRANCISCO LORA​ @Kaniz Fatma​ knows more than me but you could probably submit a ticket to Databricks' Training Team here: https://help.databricks.com/s/contact-us?ReqType=training who will get back to you shortly. 

  • 2 kudos
1 More Replies
ajithkaythottil
by New Contributor
  • 529 Views
  • 0 replies
  • 0 kudos

usedlaptopcalicut.in

We Are Among The Most Reliable Used Laptop Sellers In Calicut. A Wide Variety Of Laptops From Different Brands To Suit Different Budgets Are Available At Us. The used laptops are in good condition and cost a fraction of what a brand-new laptop would....

used laptop in calicut
  • 529 Views
  • 0 replies
  • 0 kudos
Rahul_Tiwary
by New Contributor II
  • 5361 Views
  • 2 replies
  • 4 kudos

Getting Error "java.lang.NoSuchMethodError: org.apache.spark.sql.AnalysisException" while writing data to event hub for streaming. It is working fine if I am writing it to another data brick table

import org.apache.spark.sql._import scala.collection.JavaConverters._import com.microsoft.azure.eventhubs._import java.util.concurrent._import scala.collection.immutable._import org.apache.spark.eventhubs._import scala.concurrent.Futureimport scala.c...

  • 5361 Views
  • 2 replies
  • 4 kudos
Latest Reply
Gepap
New Contributor II
  • 4 kudos

The dataframe to write needs to have the following schema:Column | Type ---------------------------------------------- body (required) | string or binary partitionId (*optional) | string partitionKey...

  • 4 kudos
1 More Replies
196083
by New Contributor II
  • 1413 Views
  • 2 replies
  • 2 kudos

iPython shell `set_next_input` not working

I'm running on 11.3 LTS. Expected Behavior:Databricks Notebook Behavior (it does nothing): You can also do `shell.set_next_input("test", replace=True)` to replace the current cell content which also doesn't work on Databricks. `set_next_input` stores...

Jupyter Shell Example Databricks Behavior
  • 1413 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

 Hi @Ryan Eakman​, Can you try the DBR version 11.2?

  • 2 kudos
1 More Replies
horatiug
by New Contributor III
  • 3765 Views
  • 8 replies
  • 3 kudos

Create workspace in Databricks deployed in Google Cloud using terraform

In the documentation https://registry.terraform.io/providers/databricks/databricks/latest/docs https://docs.gcp.databricks.com/dev-tools/terraform/index.html I could not find documentation on how to provision Databricks workspaces in GCP. Only cre...

  • 3765 Views
  • 8 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @horatiu guja​ Does @Debayan Mukherjee​ response answer your question?If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else, we can help you with more details.

  • 3 kudos
7 More Replies
Arumugam
by New Contributor II
  • 3293 Views
  • 5 replies
  • 1 kudos

DLT Pipeline failed to Start due to "The Execution Contained atleast one disallowed language

Hi , im trying to setup DLT pipeline ,its a basic pipeline for testing purpose im facing the issue while starting the pipeline , any help is appreciated Code :@dlt.table(name="dlt_bronze_cisco_hardware")def dlt_cisco_networking_bronze_hardware(): ret...

Capture.PNG Capture
  • 3293 Views
  • 5 replies
  • 1 kudos
Latest Reply
Vivian_Wilfred
Honored Contributor
  • 1 kudos

Hi @Arumugam Ramachandran​ seems like you have a spark config set on your DLT job cluster that allows only python and SQL code. Check the spark config (cluster policy).In any case, the python code should work. Verify the notebook's default language, ...

  • 1 kudos
4 More Replies
sreedata
by New Contributor III
  • 4095 Views
  • 5 replies
  • 12 kudos

Resolved! Date field getting changed when reading from excel file to dataframe

The date field is getting changed while reading data from source .xls file to the dataframe. In the source xl file all columns are strings but i am not sure why date column alone behaves differentlyIn Source file date is 1/24/2022.In dataframe it is ...

  • 4095 Views
  • 5 replies
  • 12 kudos
Latest Reply
Pradeep_Namani
New Contributor III
  • 12 kudos

Hi Team, @Merca Ovnerud​ I am also facing same issue , below is the code snippet which I am using df=spark.read.format("com.crealytics.spark.excel").option("header","true").load("/mnt/dataplatform/Tenant_PK/Results.xlsx")I have a couple of date colum...

  • 12 kudos
4 More Replies
Anonymous
by Not applicable
  • 2086 Views
  • 2 replies
  • 0 kudos

Cluster Modes

Given that there are three different kinda of cluster modes, when is it appropriate to use each one?

  • 2086 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Standard clustersA Standard cluster is recommended for a single user. Standard clusters can run workloads developed in any language: Python, SQL, R, and Scala.High Concurrency clustersA High Concurrency cluster is a managed cloud resource. The key be...

  • 0 kudos
1 More Replies
am777
by New Contributor
  • 4558 Views
  • 1 replies
  • 1 kudos

I am new to Databricks and SQL. My CASE statement is not working and I cannot figure out why. Below is my code and the error message I'm receiving. Grateful for any and all suggestions. I'm trying to put yrs_to_mat into buckets.

SELECT *, yrs_to_mat, CASE WHEN < 3 THEN "under3" WHEN => 3 AND < 5 THEN "3to5" WHEN => 5 AND < 10 THEN "5to10" WHEN => 10 AND < 15 THEN "10to15" WHEN => 15 THEN "over15" ELSE null END AS maturity_bucket FROM mat...

  • 4558 Views
  • 1 replies
  • 1 kudos
Latest Reply
Pat
Honored Contributor III
  • 1 kudos

Hi @Anne-Marie Wood​ ,I think it's more SQL general issue:you are not comparing any value to `< 3`it should be something like :WHEN X < 3 THEN "under3" SELECT *, yrs_to_mat, CASE WHEN X < 3 THEN "under3" WHEN X => 3 AND <...

  • 1 kudos
maxutil
by New Contributor II
  • 13187 Views
  • 1 replies
  • 3 kudos

Invalid Characters in Column Names " ,;{}()\n\t="

I'm reading data into a dataframe withdf = spark.read.json("s3://somepath/")I've tried first creating a delta table using the DeltaTable API with:DeltaTable.createIfNotExists(spark)\ .location(target_path)\ .addColumns(df.sche...

  • 13187 Views
  • 1 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Chris Chung​, Can you re-check by trying the below code?df.write.format("delta").option("delta.columnMapping.mode", "name").save("s3://anotherpath")Now you can load it into a Spark dataframe:SELECT * FROM new_table;delta_df = spark.read.format("d...

  • 3 kudos
cmilligan
by Contributor II
  • 2621 Views
  • 1 replies
  • 9 kudos

Resolved! Catch when a notebook fails and terminate command in threaded parallel notebook run

I have a command that is running notebooks in parallel using threading. I want the command to fail whenever one of the notebooks that is running fails. Right now it is just continuing to run the command.Below is the command line that I'm currently ru...

  • 2621 Views
  • 1 replies
  • 9 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 9 kudos

Hi @Coleman Milligan​,You can run multiple Azure Databricks notebooks in parallel by using the dbutils library.Here is a python code based on the sample code from the Azure Databricks documentation on running notebooks concurrently and on Notebook wo...

  • 9 kudos
LukaszJ
by Contributor III
  • 3408 Views
  • 5 replies
  • 4 kudos

Resolved! Mount Azure Blob Storage with Cluster access control

Hello.I want to mount and share for the one group the container from Azure Blob Storage (It could be simple blob storage or Azure Data Lake Storage gen 2). But I am not able to do it because I am using Cluster with Table Access Control.This is my cod...

  • 3408 Views
  • 5 replies
  • 4 kudos
Latest Reply
LukaszJ
Contributor III
  • 4 kudos

I have a good solution to the problem:I am using Python library.There are some documentation.Topic to be closed.Best regards,Łukasz

  • 4 kudos
4 More Replies
HashStudioz
by New Contributor
  • 403 Views
  • 0 replies
  • 0 kudos

Rs 485 IoT Gateway

RS-485 IoT Gateway is used for transmitting data from one device to another usually far away by using a wired LAN or a Wi-Fi. HashStudioz Technologies Inc. provides Smart IoT Gateway Solutions for Businesses like Pharma industries Etc. Our IoT Gatewa...

  • 403 Views
  • 0 replies
  • 0 kudos
logan0015
by Contributor
  • 1502 Views
  • 3 replies
  • 3 kudos

How do you access a streaming live table's snapshots?

I have read that delta live tables will keep a history of 7 days. However after creating a streaming live table and using the dlt.apply_changes function. With this codedef run_pipeline(table_name,keys,sequence_by): lower_table_name = table_name.l...

  • 1502 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Logan Nicol​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 3 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels