cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Venky
by New Contributor III
  • 74675 Views
  • 18 replies
  • 19 kudos

Resolved! i am trying to read csv file using databricks, i am getting error like ......FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/world_bank.csv'

i am trying to read csv file using databricks, i am getting error like ......FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/world_bank.csv'

image
  • 74675 Views
  • 18 replies
  • 19 kudos
Latest Reply
Alexis
New Contributor III
  • 19 kudos

Hiyou can try: my_df = spark.read.format("csv")      .option("inferSchema","true")  # to get the types from your data      .option("sep",",")            # if your file is using "," as separator      .option("header","true")       # if you...

  • 19 kudos
17 More Replies
RantoB
by Valued Contributor
  • 21405 Views
  • 7 replies
  • 4 kudos

Resolved! read csv directly from url with pyspark

I would like to load a csv file directly to a spark dataframe in Databricks. I tried the following code :url = "https://opendata.reseaux-energies.fr/explore/dataset/eco2mix-national-tr/download/?format=csv&timezone=Europe/Berlin&lang=fr&use_labels_fo...

  • 21405 Views
  • 7 replies
  • 4 kudos
Latest Reply
MartinIsti
New Contributor III
  • 4 kudos

I know it's a 2 years old thread but I needed to find a solution to this very thing today. I had one notebook using SparkContextfrom pyspark import SparkFilesfrom pyspark.sql.functions import *sc.addFile(url) But according to the runtime 14 release n...

  • 4 kudos
6 More Replies
Jayanth746
by New Contributor III
  • 14963 Views
  • 9 replies
  • 4 kudos

Kafka unable to read client.keystore.jks.

Below is the error we have received when trying to read the stream Caused by: kafkashaded.org.apache.kafka.common.KafkaException: Failed to load SSL keystore /dbfs/FileStore/Certs/client.keystore.jksCaused by: java.nio.file.NoSuchFileException: /dbfs...

  • 14963 Views
  • 9 replies
  • 4 kudos
Latest Reply
mwoods
New Contributor III
  • 4 kudos

Ok, scrub that - the problem in my case was that I was using the 14.0 databricks runtime, which appears to have a bug relating to abfss paths here. Switching back to the 13.3 LTS release resolved it for me. So if you're in the same boat finding abfss...

  • 4 kudos
8 More Replies
CrisCampos
by New Contributor II
  • 3168 Views
  • 1 replies
  • 1 kudos

How to load a "pickle/joblib" file on Databricks

Hi Community, I am trying to load a joblib on Databricks, but doesn't seems to be working.Getting an error message: "Incompatible format detected"  Any idea of how to load this type of file on db?Thanks!

image image
  • 3168 Views
  • 1 replies
  • 1 kudos
Latest Reply
tapash-db
Databricks Employee
  • 1 kudos

You can import joblib/joblibspark package to load joblib files

  • 1 kudos
carlosjrestr
by New Contributor III
  • 3034 Views
  • 1 replies
  • 1 kudos

Does Unity Catalog on Azure require premium blob storage tier?

from the docs I read : Create a storage container where the metastore’s managed table data will be stored.This storage container must be in a Premium performance Azure Data Lake Storage Gen2 account in the same region as the workspaces you want to us...

  • 3034 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Carlos Restrepo​ We haven't heard from you since the last response from @Kaniz Fatma​ â€‹, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to o...

  • 1 kudos
Tracy_
by New Contributor II
  • 9310 Views
  • 5 replies
  • 0 kudos

Incorrect reading csv format with inferSchema

Hi All,There is a CSV with a column ID (format: 8-digits & "D" at the end).When trying to read a csv with .option("inferSchema", "true"), it returns the ID as double and trim the "D". Is there any idea (apart from inferSchema=False) to get correct ...

image.png
  • 9310 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @tracy ng​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your...

  • 0 kudos
4 More Replies
ima94
by New Contributor II
  • 4693 Views
  • 1 replies
  • 1 kudos

read cdm error: java.util.NoSuchElementException: None.get

Hi all, I'm trying to read cdm file and get the error in the image (I replaced the names in uppercase). Any ideas on how to solve it?Thank you!

  • 4693 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @imma marra​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...

  • 1 kudos
Fred_F
by New Contributor III
  • 6892 Views
  • 5 replies
  • 5 kudos

JDBC connection timeout on workflow cluster

Hi there,​I've a batch process configured in a workflow which fails due to a jdbc timeout on a Postgres DB.​I checked the JDBC connection configuration and it seems to work when I query a table and doing a df.show() in the process and it displays th...

  • 6892 Views
  • 5 replies
  • 5 kudos
Latest Reply
RKNutalapati
Valued Contributor
  • 5 kudos

HI @Fred Foucart​ ,The above code looks good to me. Can you try with below code as well.spark.read\  .format("jdbc") \  .option("url", f"jdbc:postgresql://{host}/{database}") \  .option("driver", "org.postgresql.Driver") \  .option("user", username) ...

  • 5 kudos
4 More Replies
BkP
by Contributor
  • 2109 Views
  • 2 replies
  • 3 kudos

Scala Connectivity to Databricks Bronze Layer Raw Data from a Non-Databricks Spark environment

Hi All, We are developing a new Scala/Java program which needs to read & process the raw data stored in source ADLS (which is a Databricks Environment) in parallel as the volume of the source data is very high (in GBs & TBs). What kind of connection ...

requirement
  • 2109 Views
  • 2 replies
  • 3 kudos
Latest Reply
BkP
Contributor
  • 3 kudos

hello experts. any advise on this question ?? tagging some folks from whom I have received answers before. Please help on this requirement or tag someone who can help on this@Kaniz Fatma​ , @Vartika Nain​ , @Bilal Aslam​ 

  • 3 kudos
1 More Replies
tanjil
by New Contributor III
  • 6728 Views
  • 6 replies
  • 6 kudos

Resolved! Read and transform CSVs in parallel.

I need to read and transform several CSV files and then append them to a single data frame. I am able to do this in databricks using simple for loops, but I would like to speed this up.Below is the rough structure of my code: for filepath in all_file...

  • 6728 Views
  • 6 replies
  • 6 kudos
Latest Reply
Vidula
Honored Contributor
  • 6 kudos

Hi @tanjil​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 6 kudos
5 More Replies
LearningDatabri
by Contributor II
  • 6837 Views
  • 7 replies
  • 2 kudos

Resolved! Unable to read file from S3

I tried to read a file from S3, but facing the below error:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 53.0 failed 4 times, most recent failure: Lost task 0.3 in stage 53.0 (TID 82, xx.xx.xx.xx, executor 0): com...

  • 6837 Views
  • 7 replies
  • 2 kudos
Latest Reply
Sivaprasad1
Valued Contributor II
  • 2 kudos

Which DBR version are you using? Could you please test it with a different DBR version probably DBR 9.x?

  • 2 kudos
6 More Replies
sannycse
by New Contributor II
  • 3589 Views
  • 4 replies
  • 6 kudos

Resolved! read the csv file as shown in description

Project_Details.csvProjectNo|ProjectName|EmployeeNo100|analytics|1100|analytics|2101|machine learning|3101|machine learning|1101|machine learning|4Find each employee in the form of list working on each project?Output:ProjectNo|employeeNo100|[1,2]101|...

  • 3589 Views
  • 4 replies
  • 6 kudos
Latest Reply
User16764241763
Honored Contributor
  • 6 kudos

@SANJEEV BANDRU​  You can simply do thisJust change the file path CREATE TEMPORARY VIEW readcsv USING CSV OPTIONS ( path "dbfs:/docs/test.csv", header "true", delimiter "|", mode "FAILFAST");select ProjectNo, collect_list(EmployeeNo) Employeesfrom re...

  • 6 kudos
3 More Replies
Orianh
by Valued Contributor II
  • 7255 Views
  • 4 replies
  • 2 kudos

Resolved! Read JSON with backslash.

Hello guys.I'm trying to read JSON file which contains backslash and failed to read it via pyspark.Tried a lot of options but didn't solve this yet, I thought to read all the JSON as text and replace all "\" with "/" but pyspark fail to read it as te...

  • 7255 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@orian hindi​ - Would you be happy to post the solution you came up with and then mark it as best? That will help other members.

  • 2 kudos
3 More Replies
User16783853906
by Contributor III
  • 2494 Views
  • 2 replies
  • 0 kudos

VACUUM during read/write

Is it safe to run VACUUM on a Delta Lake table while data is being added to it at the same time?  Will it impact the job result/performance?

  • 2494 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16783853906
Contributor III
  • 0 kudos

In the vast majority of cases, yes, it is safe to run VACUUM while data is concurrently being appended or updated to the same table. This is because VACUUM deletes data files no longer referenced by a Delta table's transaction log and does not effect...

  • 0 kudos
1 More Replies
Labels