cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

yubin-apollo
by New Contributor II
  • 1199 Views
  • 4 replies
  • 0 kudos

COPY INTO skipRows FORMAT_OPTIONS does not work

Based on the COPY INTO documentation, it seems I can use `skipRows` to skip the first `n` rows. I am trying to load a CSV file where I need to skip a few first rows in the file. I have tried various combinations, e.g. setting header parameter on or ...

  • 1199 Views
  • 4 replies
  • 0 kudos
Latest Reply
karthik-kobai
New Contributor II
  • 0 kudos

@yubin-apollo: My bad - I had the skipRows in the COPY_OPTIONS and not in the FORMAT_OPTIONS. It works, please ignore my previous comment. Thanks

  • 0 kudos
3 More Replies
Bilal1
by New Contributor III
  • 15483 Views
  • 6 replies
  • 2 kudos

Resolved! Simply writing a dataframe to a CSV file (non-partitioned)

When writing a dataframe in Pyspark to a CSV file, a folder is created and a partitioned CSV file is created. I have then rename this file in order to distribute it my end user.Is there any way I can simply write my data to a CSV file, with the name ...

  • 15483 Views
  • 6 replies
  • 2 kudos
Latest Reply
Bilal1
New Contributor III
  • 2 kudos

Thanks for confirming that that's the only way

  • 2 kudos
5 More Replies
prapot
by New Contributor II
  • 5132 Views
  • 2 replies
  • 2 kudos

Resolved! How to write a Spark DataFrame to CSV file with our .CRC in Azure Databricks?

val spark:SparkSession = SparkSession.builder() .master("local[3]") .appName("SparkByExamples.com") .getOrCreate()//Spark Read CSV Fileval df = spark.read.option("header",true).csv("address.csv")//Write DataFrame to address directorydf.write...

  • 5132 Views
  • 2 replies
  • 2 kudos
Latest Reply
Nw2this
New Contributor II
  • 2 kudos

Will your csv have the name prefix 'part-' or can you name it whatever you like?

  • 2 kudos
1 More Replies
nikhilkumawat
by New Contributor III
  • 4658 Views
  • 5 replies
  • 3 kudos

Resolved! Get file information while using "Trigger jobs when new files arrive" https://docs.databricks.com/workflows/jobs/file-arrival-triggers.html

I am currently trying to use this feature of "Trigger jobs when new file arrive" in one of my project. I have an s3 bucket in which files are arriving on random days. So I created a job to and set the trigger to "file arrival" type. And within the no...

  • 4658 Views
  • 5 replies
  • 3 kudos
Latest Reply
Rik
New Contributor III
  • 3 kudos

This solution doesn't answer the question...It seems that no additional parameters are passed to the job for file arrivals as described here (https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/file-arrival-triggers).Any plans on adding...

  • 3 kudos
4 More Replies
Michael42
by New Contributor III
  • 3900 Views
  • 4 replies
  • 7 kudos

Resolved! Want to load a high volume of CSV rows in the fastest way possible (in excess of 5 billion rows). I want the best approach, in terms of speed, for loading into the bronze table.

My source can only deliver CSV format (pipe delimited).My source has the ability to generate multiple CSV files and transfer them to a single upload folder.All rows must go to the same target bronze delta table.I do not care about the order in which ...

  • 3900 Views
  • 4 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hi @Michael Popp​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 7 kudos
3 More Replies
Tim_T
by New Contributor
  • 521 Views
  • 1 replies
  • 0 kudos

Are training/ecommerce data tables available as CSVs?

The course "Apache Spark™ Programming with Databricks" requires data sources such as training/ecommerce/events/events.parquet. Are these available as CSV files? My company's databricks configuration does not allow me to mount to such repositories, bu...

  • 521 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Tim Tremper​, The specific dataset you mentioned, "training/ecommerce/events/events.parquet", is in Parquet format, but you can easily convert it into a CSV format using Apache Spark™ on Databricks.Here's a step-by-step guide to convert the Parqu...

  • 0 kudos
System1999
by New Contributor III
  • 2304 Views
  • 7 replies
  • 0 kudos

My 'Data' menu item shows 'No Options' for Databases. How can I fix?

Hi, I'm new to Databricks and I've signed up for the Community edition.First, I've noticed that I cannot return to a previously created cluster, as I get the message telling me that restarting a cluster is not available to me. Ok, inconvenient, but I...

error
  • 2304 Views
  • 7 replies
  • 0 kudos
Latest Reply
System1999
New Contributor III
  • 0 kudos

Hi @Suteja Kanuri​ ,I get the error message under Data before I've created a cluster. Then I still get it when I've created a cluster and a notebook (having attached the notebook to the cluster). Thanks.

  • 0 kudos
6 More Replies
MRTN
by New Contributor III
  • 3617 Views
  • 4 replies
  • 3 kudos

Load CSV files with slightly different schemas

I have a set of CSV files generated by a system, where the schema has evolved over the years. Some columns have been added, and at least one column has been renamed in newer files. Is there any way to elegantly load these files into a dataframe? I ha...

  • 3617 Views
  • 4 replies
  • 3 kudos
Latest Reply
MRTN
New Contributor III
  • 3 kudos

For reference - for anybody struggling with the same issues. All online examples using auto loader are written as one block statement on the form: (spark.readStream.format("cloudFiles") .option("cloudFiles.format", "csv") # The schema location di...

  • 3 kudos
3 More Replies
Tracy_
by New Contributor II
  • 4541 Views
  • 5 replies
  • 0 kudos

Incorrect reading csv format with inferSchema

Hi All,There is a CSV with a column ID (format: 8-digits & "D" at the end).When trying to read a csv with .option("inferSchema", "true"), it returns the ID as double and trim the "D". Is there any idea (apart from inferSchema=False) to get correct ...

image.png
  • 4541 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @tracy ng​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your...

  • 0 kudos
4 More Replies
tech2cloud
by New Contributor II
  • 1195 Views
  • 2 replies
  • 0 kudos

Databricks Autoloader streamReader does not include the partition column as part of output.

I have folder structure at source such as/transaction/date_=2023-01-20/hr_=02/tras01.csv/transaction/date_=2023-01-20/hr_=03/tras02.csvWhere 'date_' and 'hr_' are my partitions and present in the dataset as well. But the streamReader does not read th...

image
  • 1195 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Ravi Vishwakarma​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 0 kudos
1 More Replies
Chhaya
by New Contributor III
  • 1891 Views
  • 6 replies
  • 2 kudos

Using great expectations with autolaoder

Hi everyone ,I have implemented a data pipeline using autoloader bronze-->silver-->gold .now while I do this I want to perform some data quality checks , and for that I'm using great expectations library.However I'm stuck with below error when trying...

  • 1891 Views
  • 6 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Chhaya Vishwakarma​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your fe...

  • 2 kudos
5 More Replies
kll
by New Contributor III
  • 2772 Views
  • 1 replies
  • 1 kudos

Resolved! OSError: Invalid argument when attempting to save a pandas dataframe to csv

I am attempting to save a pandas DataFrame to as csv to a directory I created in Databricks workspace or in the `cwd`. import pandas as pd   import os   df.to_csv("data.csv", index=False)   df.to_csv(str(os.getcwd()) + "/data.csv", index=False)      ...

  • 2772 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 1 kudos

Hi @Keval Shah​ ,You can save your dataframe to csv in dbfs storage.Please refer below code that might help you-df = pd.read_csv(StringIO(data), sep=',') #print(df) df.to_csv('/dbfs/FileStore/ajay/file1.txt')

  • 1 kudos
rammy
by Contributor III
  • 1253 Views
  • 2 replies
  • 3 kudos

How can we save a data frame in Docx format using pyspark?

  I am trying to save a data frame into a document but it returns saying that the below errorjava.lang.ClassNotFoundException: Failed to find data source: docx. Please find packages at http://spark.apache.org/third-party-projects.htm   #f_d...

  • 1253 Views
  • 2 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Moderator
  • 3 kudos

Hi,You cannot do it from Pyspark, but you can try to use Pandas to save to Excell. There is no Docx

  • 3 kudos
1 More Replies
sp1
by New Contributor II
  • 7978 Views
  • 6 replies
  • 4 kudos

Resolved! Pass date value as parameter in Databricks SQL notebook

I want to pass yesterday date (In the example 20230115*.csv) in the csv file. Don't know how to create parameter and use it here.CREATE OR REPLACE TEMPORARY VIEW abc_delivery_logUSING CSVOPTIONS ( header="true", delimiter=",", inferSchema="true", pat...

  • 7978 Views
  • 6 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

SQL has a function for current_timestamp, which you can cast as a date to get todays date. You don't need to pass anything to the notebook for this logic to work.

  • 4 kudos
5 More Replies
SQL_DB
by New Contributor II
  • 1162 Views
  • 2 replies
  • 2 kudos

Sharing CSV export from a dashboard

Is it possible to schedule refresh and share a csv format of a table visual in a dashboard? Also, is it possible to share only one visual in a dashboard when there are more than one?

  • 1162 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Sujitha Bommayan​ Hope everything is going great.Does @Kaniz Fatma​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 2 kudos
1 More Replies
Labels