cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RantoB
by Valued Contributor
  • 16691 Views
  • 8 replies
  • 4 kudos

Resolved! read csv directly from url with pyspark

I would like to load a csv file directly to a spark dataframe in Databricks. I tried the following code :url = "https://opendata.reseaux-energies.fr/explore/dataset/eco2mix-national-tr/download/?format=csv&timezone=Europe/Berlin&lang=fr&use_labels_fo...

  • 16691 Views
  • 8 replies
  • 4 kudos
Latest Reply
MartinIsti
New Contributor III
  • 4 kudos

I know it's a 2 years old thread but I needed to find a solution to this very thing today. I had one notebook using SparkContextfrom pyspark import SparkFilesfrom pyspark.sql.functions import *sc.addFile(url) But according to the runtime 14 release n...

  • 4 kudos
7 More Replies
Jayanth746
by New Contributor III
  • 10054 Views
  • 10 replies
  • 4 kudos

Kafka unable to read client.keystore.jks.

Below is the error we have received when trying to read the stream Caused by: kafkashaded.org.apache.kafka.common.KafkaException: Failed to load SSL keystore /dbfs/FileStore/Certs/client.keystore.jksCaused by: java.nio.file.NoSuchFileException: /dbfs...

  • 10054 Views
  • 10 replies
  • 4 kudos
Latest Reply
mwoods
New Contributor III
  • 4 kudos

Ok, scrub that - the problem in my case was that I was using the 14.0 databricks runtime, which appears to have a bug relating to abfss paths here. Switching back to the 13.3 LTS release resolved it for me. So if you're in the same boat finding abfss...

  • 4 kudos
9 More Replies
Venky
by New Contributor III
  • 50121 Views
  • 19 replies
  • 20 kudos

Resolved! i am trying to read csv file using databricks, i am getting error like ......FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/world_bank.csv'

i am trying to read csv file using databricks, i am getting error like ......FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/world_bank.csv'

image
  • 50121 Views
  • 19 replies
  • 20 kudos
Latest Reply
Alexis
New Contributor III
  • 20 kudos

Hiyou can try: my_df = spark.read.format("csv")      .option("inferSchema","true")  # to get the types from your data      .option("sep",",")            # if your file is using "," as separator      .option("header","true")       # if you...

  • 20 kudos
18 More Replies
CrisCampos
by New Contributor II
  • 2075 Views
  • 1 replies
  • 1 kudos

How to load a "pickle/joblib" file on Databricks

Hi Community, I am trying to load a joblib on Databricks, but doesn't seems to be working.Getting an error message: "Incompatible format detected"  Any idea of how to load this type of file on db?Thanks!

image image
  • 2075 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16859817863
New Contributor II
  • 1 kudos

You can import joblib/joblibspark package to load joblib files

  • 1 kudos
carlosjrestr
by New Contributor III
  • 1870 Views
  • 2 replies
  • 1 kudos

Does Unity Catalog on Azure require premium blob storage tier?

from the docs I read : Create a storage container where the metastore’s managed table data will be stored.This storage container must be in a Premium performance Azure Data Lake Storage Gen2 account in the same region as the workspaces you want to us...

  • 1870 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Carlos Restrepo​ We haven't heard from you since the last response from @Kaniz Fatma​ ​, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to o...

  • 1 kudos
1 More Replies
Tracy_
by New Contributor II
  • 5316 Views
  • 5 replies
  • 0 kudos

Incorrect reading csv format with inferSchema

Hi All,There is a CSV with a column ID (format: 8-digits & "D" at the end).When trying to read a csv with .option("inferSchema", "true"), it returns the ID as double and trim the "D". Is there any idea (apart from inferSchema=False) to get correct ...

image.png
  • 5316 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @tracy ng​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your...

  • 0 kudos
4 More Replies
ima94
by New Contributor II
  • 2617 Views
  • 2 replies
  • 1 kudos

read cdm error: java.util.NoSuchElementException: None.get

Hi all, I'm trying to read cdm file and get the error in the image (I replaced the names in uppercase). Any ideas on how to solve it?Thank you!

  • 2617 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @imma marra​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...

  • 1 kudos
1 More Replies
Fred_F
by New Contributor III
  • 4496 Views
  • 7 replies
  • 5 kudos

JDBC connection timeout on workflow cluster

Hi there,​I've a batch process configured in a workflow which fails due to a jdbc timeout on a Postgres DB.​I checked the JDBC connection configuration and it seems to work when I query a table and doing a df.show() in the process and it displays th...

  • 4496 Views
  • 7 replies
  • 5 kudos
Latest Reply
Kaniz
Community Manager
  • 5 kudos

Hi @Fred Foucart​, We haven’t heard from you since the last response from @Rama Krishna N​ , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be helpful to ...

  • 5 kudos
6 More Replies
BkP
by Contributor
  • 1361 Views
  • 3 replies
  • 3 kudos

Scala Connectivity to Databricks Bronze Layer Raw Data from a Non-Databricks Spark environment

Hi All, We are developing a new Scala/Java program which needs to read & process the raw data stored in source ADLS (which is a Databricks Environment) in parallel as the volume of the source data is very high (in GBs & TBs). What kind of connection ...

requirement
  • 1361 Views
  • 3 replies
  • 3 kudos
Latest Reply
BkP
Contributor
  • 3 kudos

hello experts. any advise on this question ?? tagging some folks from whom I have received answers before. Please help on this requirement or tag someone who can help on this@Kaniz Fatma​ , @Vartika Nain​ , @Bilal Aslam​ 

  • 3 kudos
2 More Replies
tanjil
by New Contributor III
  • 4183 Views
  • 6 replies
  • 6 kudos

Resolved! Read and transform CSVs in parallel.

I need to read and transform several CSV files and then append them to a single data frame. I am able to do this in databricks using simple for loops, but I would like to speed this up.Below is the rough structure of my code: for filepath in all_file...

  • 4183 Views
  • 6 replies
  • 6 kudos
Latest Reply
Vidula
Honored Contributor
  • 6 kudos

Hi @tanjil​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 6 kudos
5 More Replies
LearningDatabri
by Contributor II
  • 4524 Views
  • 7 replies
  • 2 kudos

Resolved! Unable to read file from S3

I tried to read a file from S3, but facing the below error:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 53.0 failed 4 times, most recent failure: Lost task 0.3 in stage 53.0 (TID 82, xx.xx.xx.xx, executor 0): com...

  • 4524 Views
  • 7 replies
  • 2 kudos
Latest Reply
Sivaprasad1
Valued Contributor II
  • 2 kudos

Which DBR version are you using? Could you please test it with a different DBR version probably DBR 9.x?

  • 2 kudos
6 More Replies
sannycse
by New Contributor II
  • 2334 Views
  • 6 replies
  • 6 kudos

Resolved! read the csv file as shown in description

Project_Details.csvProjectNo|ProjectName|EmployeeNo100|analytics|1100|analytics|2101|machine learning|3101|machine learning|1101|machine learning|4Find each employee in the form of list working on each project?Output:ProjectNo|employeeNo100|[1,2]101|...

  • 2334 Views
  • 6 replies
  • 6 kudos
Latest Reply
Kaniz
Community Manager
  • 6 kudos

Hi @SANJEEV BANDRU​ , Just a friendly follow-up. Do you still need help? Please let us know.

  • 6 kudos
5 More Replies
Orianh
by Valued Contributor II
  • 5187 Views
  • 7 replies
  • 3 kudos

Resolved! Read JSON with backslash.

Hello guys.I'm trying to read JSON file which contains backslash and failed to read it via pyspark.Tried a lot of options but didn't solve this yet, I thought to read all the JSON as text and replace all "\" with "/" but pyspark fail to read it as te...

  • 5187 Views
  • 7 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@orian hindi​ - Would you be happy to post the solution you came up with and then mark it as best? That will help other members.

  • 3 kudos
6 More Replies
User16783853906
by Contributor III
  • 1667 Views
  • 2 replies
  • 0 kudos

VACUUM during read/write

Is it safe to run VACUUM on a Delta Lake table while data is being added to it at the same time?  Will it impact the job result/performance?

  • 1667 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16783853906
Contributor III
  • 0 kudos

In the vast majority of cases, yes, it is safe to run VACUUM while data is concurrently being appended or updated to the same table. This is because VACUUM deletes data files no longer referenced by a Delta table's transaction log and does not effect...

  • 0 kudos
1 More Replies
Labels