cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Pradeep_Namani
by New Contributor III
  • 3162 Views
  • 5 replies
  • 2 kudos

Date field getting changed when reading from excel file to dataframe in pyspark

The date field is getting changed while reading data from source .xls file to the dataframe. In the source xl file all columns are strings but i am not sure why date column alone behaves differentlyIn Source file date is 1/24/1947.In pyspark datafram...

  • 3162 Views
  • 5 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

how about using inferschema one single time to create a correct DF, then create a schema from the df-schema.something like this f.e.from pyspark.sql.types import StructType   # Save schema from the original DataFrame into json: schema_json = df.s...

  • 2 kudos
4 More Replies
JordanYaker
by Contributor
  • 3769 Views
  • 7 replies
  • 8 kudos

Resolved! Is anyone else experiencing intermittent "Failure starting REPL" errors with PySpark Jobs?

I have a Multi-Task Job that is running a bunch of PySpark notebooks and about 30-60% of the time, my jobs fail with the following error:I haven't seen any consistency with this error. I've had as many as all of the tasks in the job giving this error...

image.png
  • 3769 Views
  • 7 replies
  • 8 kudos
Latest Reply
James_Cole
New Contributor III
  • 8 kudos

Hi. Did you ever got a resolution to this problem outside of rolling back to 10.4? I have recently moved some workloads over to runtime 11.3 and am experiencing intermittent "repl did not start in 30 seconds." errors.I have increased the repl timeout...

  • 8 kudos
6 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 2055 Views
  • 3 replies
  • 15 kudos

Resolved! MongoDB Connection

How can we connect databricks to MongoDB?Any code reference will help

  • 2055 Views
  • 3 replies
  • 15 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 15 kudos

Hi @Ajay Pandey​  (Customer)​ ​, It would mean a lot if you could select the "Best Answer" to help others find the correct answer faster.This makes that answer appear right after the question, so it's easier to find within a thread.It also helps us m...

  • 15 kudos
2 More Replies
andreas9898
by New Contributor II
  • 2494 Views
  • 3 replies
  • 5 kudos

Getting error with spark-sftp, no such file

In a databricks cluster with Scala 2.1.1 I am trying to read a file into a spark data frame using the following code.val df = spark.read .format("com.springml.spark.sftp") .option("host", "*") .option("username", "*") .option("password", "*")...

  • 2494 Views
  • 3 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Andreas P​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 5 kudos
2 More Replies
KVNARK
by Honored Contributor II
  • 3694 Views
  • 8 replies
  • 28 kudos

Resolved! Can we use Databricks or code in data bricks without learning Pyspark in depth which is used for ETL purpose and data engineering perspective.

Can we use Databricks or code in data bricks without learning Pyspark in depth which is used for ETL purpose and data engineering perspective. can someone throw some light on this. Currently learning Pyspark (basics of Pythion in handling the data) a...

  • 3694 Views
  • 8 replies
  • 28 kudos
Latest Reply
KVNARK
Honored Contributor II
  • 28 kudos

Thanks All for your valuable suggestions!

  • 28 kudos
7 More Replies
alxsbn
by New Contributor III
  • 640 Views
  • 0 replies
  • 2 kudos

Terraform x Databricks error INVALID_STATE subscription disabled

Hello,I just bootstrap a new Databricks EC2 on an AWS account with Terraform. Priori dependencies seems OK on my side (network, root storage, credentials configuration). I'm referring mainly to this guide and of course pages related to each Databrick...

  • 640 Views
  • 0 replies
  • 2 kudos
VN11111
by New Contributor III
  • 6929 Views
  • 5 replies
  • 6 kudos

Resolved! ERROR: Some streams terminated before this command could finish!

I have a databricks notebook which is to read stream from Azure Event Hub.My code does the following:1.Configure path for Eventhubs2.Read Streamdf_read_stream = (spark.readStream .format("eventhubs") .options(**conf)...

  • 6929 Views
  • 5 replies
  • 6 kudos
Latest Reply
guru1
New Contributor II
  • 6 kudos

I am also facing same issue , using Cluster11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12) liberary : com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.21Please help me for sameconf = {}conf["eventhubs.connectionString"] = "Endpoint=sb://xxxx.ser...

  • 6 kudos
4 More Replies
huyd
by New Contributor III
  • 920 Views
  • 0 replies
  • 4 kudos

Optimizing a batch load process, reading with the JDBC driver

I am doing a batch load, using the JDBC driver from a database table. I am noticing in Sparkui, that there is both memory and disk spill, but only on one executor. I am also, noticing that when trying to use the JDBC parallel read, it seems to run sl...

  • 920 Views
  • 0 replies
  • 4 kudos
ckwan48
by New Contributor III
  • 2760 Views
  • 4 replies
  • 4 kudos

Date schema issues with pyspark dataframe creation

I'm having some issues with creating a dataframe with a date column. Could I know what is wrong?from pyspark.sql import SparkSession from pyspark.sql.types import StructType from pyspark.sql.types import DateType, FloatType spark = SparkSession.bui...

  • 2760 Views
  • 4 replies
  • 4 kudos
Latest Reply
ckwan48
New Contributor III
  • 4 kudos

Hi @Kaniz Fatma​,I actually changed the date format to 'M/d/Y' and it didn't throw any errors. I found in my csv file that it had dates like '3/1/2022'. Could that be the issue? But some dates also were like '12/1/2022. So I'm kind of confused.

  • 4 kudos
3 More Replies
magnus778
by New Contributor III
  • 1555 Views
  • 2 replies
  • 4 kudos

Resolved! Error writing parquet to specific container in Azure Data Lake

I'm retrieving two files from container1, transforming them and merging before writing to a container2 within the same Storage Account in Azure. I'm mounting container1, unmouting and mounting countainer2 before writing. My code for writing the parqu...

  • 1555 Views
  • 2 replies
  • 4 kudos
Latest Reply
Pat
Honored Contributor III
  • 4 kudos

Hi @Magnus Asperud​ ,1 mounting container12 you should persist the data somewhere, creating df doesnt mean that you are reading data from container and have it accessible after unmounting. Make sure to store this merged data somewhere. Not sure if th...

  • 4 kudos
1 More Replies
Constantino
by New Contributor III
  • 1642 Views
  • 3 replies
  • 4 kudos

Resolved! cannot list all tokens with account admin

I'm trying to list all tokens (both user and service principal) for a given workspace; using an Account level admin I've tried both the CLI as well as the API endpoint to list tokens, however each time, only the admin's tokens are returned.I've confi...

  • 1642 Views
  • 3 replies
  • 4 kudos
Latest Reply
Pat
Honored Contributor III
  • 4 kudos

Great that I could help

  • 4 kudos
2 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels