How to write dataframe to csv file with sheetname using spark scala - dataframe

I am trying to write the dataframe to csv file with option sheetName but it is not working for me.
df13.coalesce(1).write.option("delimiter",",").mode(SaveMode.Overwrite).option("sheetName","Info").option("header","true").option("escape","").option("quote","").csv("path")
Can anyone help me on that

I don't think in CSV file you actually have a sheet name , ideally the filename is the sheet name in a CSV file. Can you try changing to excel and try..

Spark can't directly do this while writing as a csv, There is no option as sheetName, The output path is path you mention as .csv("path").
Spark uses hadoops file format, which is partitioned in multiple part files under the output path, 1 part file on your case. Also do not repartitions to 1 unless you really need it.
One thing you can do is write the dataframe without repartition and use HADOOP API to merge those small many part files to single.
Here is more on detail Write single CSV file using spark-csv

We can only 1 default sheet in csv file if we want multiple sheet then we should write the dataframe to excel format instead of csv file format.

Related

Reading CSV file using pyspark

While reading csv file with customized schema definition the column count changes whereas with inferschema column count is different. Can anyone help me why does this happens.

Convert bulk .xlsx files to .csv (UTF-8) in Pentaho

I am new to Pentaho. I am trying to build a transformation that can convert a bunch of .xlsx files to .csv (utf-8).
I tried Get file Names and Text File Output, but it saves a single file as csv and the content of that file is the file properties.
I also tried Microsoft Excel Input and Microsoft Excel Output and that did not work either.
Any help will be appreciated. TIA!
I have prepare a SOLUTION for you. I have made my solution full dynamic. For that reason solution is combination of 6 (transformation & job). You only need to define following 2 things:-
Source folder location
Destination folder location
Others will work dynamically.
Also, I have learn a lot with this solution.
Would you like to generate a separate CSV for each Excel file?
It is better to do it like this:
Using the Get File Names component, read the list of Excel files from the folder.
Then call Execute Transformation, and pass the name of the file.
Then a separate Transformation will be performed for each file, and a separate CSV will be generated in it for each Excel file.

Is there a way to write a Spark dataframe to a .dat file?

I tried this, but it didn't work. From my understanding, Spark does not support the .dat file format. I do not want to write the file as a .csv or .json, then convert via a shell script later.
a.write.format("dat").save(outputPath)
Spark format function does not accepts "dat" as argument. You can have much more information in documentation: https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html
I'm sorry but easiest thing you can do is creating csv and converting it to dat later.

write recursively different DFs in a xlsx without overwriting using Jupyter

I have 100 csv files to analyse, which I open recursively, analyse the data, and then I would like to write the results in a xlsx file.
I want to write all the results in the same xlsx file, without overwriting the results of the previous files, and putting the data one below the other. (I cannot put the results in a DF and then append all of them due to memory issue).
Briefly, the idea is as follow
for file in folder:
open the csv
analyse the data
Results into a DF
Results into Xlsx (starting from the first free row)
any suggestion?
thanks

How to extract output from a unix script to .xls/.xlsx file

Previously I have extracted the output from my unix sql script to a .csv file but it seems to cause an issue. The master script should be able to cleanly scan and append these spreadsheets into one table but the .csv file is creating an issue.
When I extracted the output from SQL developer to an XLS or XLSX file there were no issues.
Is there anyway that I can extract it in the same format as SQL Developer does?
Yes its true that sqlplus cannot extract the data as excel spreadsheet. But I have a bypass technique, if you open that csv file and save it as xls/xlsx then any ETL tool can read the file as the expected file.