EPPLUS how to know the format of the Worksheet cell - epplus

I am using EPPlus to read excel sheets and load the data into datatable to perform further operations and then save back the modified data to the excel file.
The below code checks if the cell value is a float value then it converts the float value to datetime.
The code works fine when the cell value is a date eg: Invoice Date = 42009 , but it converts the not a date value like eg : amount = 10 to a date.
Is there any way in EPPlus library from which i can determine the format of the cell (i.e General/date/number etc) ?
float floatValue;
if (float.TryParse(Convert.ToString(oSheet.Cells[i, j].Value), out floatValue)
&& Convert.ToString(oSheet.Cells[i, j].Style.Numberformat.Format).Contains("[$-409]d\\-mmm\\-yy;#"))
{
dr[j - 1] = String.Format("{0:d-MMM-yy}", DateTime.FromOADate(floatValue));
}
else
{
DateTime date;
if (DateTime.TryParse(Convert.ToString(oSheet.Cells[i, j].Value), out date))
{
dr[j - 1] = String.Format("{0:d-MMM-yy}", date);
}
else
{
dr[j - 1] = Convert.ToString(oSheet.Cells[i, j].Value).Trim();
}
}

The short answer is if the data in Excel is formatted as a "proper" date field (its format is set to Date and it does not have the triangle in the cell corner) EPPlus will determine that and automatically convert it to a date in the cell store.
Excel stores this information in the XLSX xml files, e.g. sheet1.xml and styles.xml, which you can see by changing the file extension to .ZIP and opening as a compressed folder. So EPPlus will read XML flags and convert the field values to DateTime automatically. If you want to see its gory detail download the EPPlus source code and look at the function private void SetValueFromXml(XmlTextReader xr, string type, int styleID, int row, int col) in the ExcelWorksheet class.
I created a sheet, added a date value, and copy pasted the value to cell B2 but then set the format to number. I then copied B2 to B3 but made the value in B3 a string by putting an ' in front of the value. Like this:
and when I run this unit test everything passes:
[TestMethod]
public void Date_Number_Test()
{
//http://stackoverflow.com/questions/28591763/epplus-how-to-know-the-format-of-the-worksheet-cell
var existingFile = new FileInfo(#"c:\temp\datetest.xlsx");
using (var package2 = new ExcelPackage(existingFile))
{
var ws = package2.Workbook.Worksheets["Sheet1"];
var cell = ws.Cells["B1"];
Assert.IsTrue(cell.Value is DateTime);
cell = ws.Cells["B2"];
Assert.IsTrue(cell.Value is double);
cell = ws.Cells["B3"];
Assert.IsTrue(cell.Value is string);
}
}
But if you are dealing with a sheet with formatting outside your control then I think your approach is correct in that you will have to determine what the "intent" of the numbers are rather then being able to rely on EPPlus (really excel) is reporting them as.

if you want to know current cell format, why dont you use ExcelNumberFormat class?
var style=oSheet.Cells[i, j].style;
string format=style.Numberformat.Format;
This will give you current cell format, i.e, "YYYY-MM-dddd" , "%#.##", and so on.

Related

Uploading Multiple Records Via Excel Upload in Database Using sprinboot in java

How to insert multiple records into a database using an Excel upload?
When working on a project there was a requirement for me to upload an Excel file, let's say with the number of records up to 50000 per sheet. But while uploading the Excel file there were certain validations that need to be done and if any of the records fail then the error message needs to be displayed for that respective row of the Excel on the page and none of the records needs to be be saved inside the database table.
Validations could be in any sense, for example:
The age of all employees needs to be >= 18 and <= 60
The age column should not have a non-numeric value
There were certain columns that were mandatory, if the value is not filled in that column of the Excel then the user should be notified of the error. Like the name of the Employee should not be null.
Checking that a date value is in the format of DD/MM/YYYY
and so on
When uploading this Excel file, if the preceding validations are successfully satisfied then the data needs to be saved in the database against a specific batch id that is basically a unique number to identify the records inserted within that specific batch and when they all were inserted. Also, special attention needs to be provided to the Excel column headings that we provided to the user for data entry purposes. Let's say there are 10 columns in the Excel and if the user changes the order of the Excel column heading or if they delete some column or change the name of the column, this validation also needs to be performed against the uploaded file.
For performing this validation we have made use of an ExcelStructure table that will have Excel fields/columns stored with their respective sizes, types, parameters, and whether these fields/columns are mandatory or not. The following is the structure of the ExcelStructure table. Remember we are using Entity Framework with the code-first approach. But just for demo/looking purposes, our SQL table will have the following query.
Somebody help if I upload excel in spring boot, the column name in excel that match with the database column name then save the value from the database in spring boot
public static List<Hday> excelToDatabase(InputStream inputStream) {
try {
//Workbook workbook = new XSSFWorkbook(inputStream);
Workbook workbook =WorkbookFactory.create(inputStream);
System.out.println("workbook : " + workbook);
Sheet sheet = workbook.getSheetAt(0);
System.out.println("sheet : " + sheet);
Iterator<Row> rows = sheet.iterator();
List<Hday> hdayList = new ArrayList<Hday>();
int rowNumber = 0;
while (rows.hasNext()) {
Row currentRow = rows.next();
// skip header
if (rowNumber == 0) {
rowNumber++;
continue;
}
Iterator<Cell> cellsInRow = currentRow.iterator();
Hday refhd = new Hday();
int idx = 0;
while (cellsInRow.hasNext()) {
Cell cells = cellsInRow.next();
switch (idx) {
case 0:
if (cells .getCellType() == CellType.STRING) {
String inId = cells .getStringCellValue();
refhd.Id(Long.parseLong(inId));
} else {
refhd.setinId((long) cells .getNumericCellValue());
}
break;
case 1:
if (cells .getCellType() == CellType.NUMERIC) {
long week = (long) cells .getNumericCellValue();
refhd.setWeek(Long.toString(week));
} else {
refhd.setWeek(cells .getStringCellValue());
}
break;
case 2:
if (cells .getCellType() == CellType.STRING) {
String temphdDate = cells .getStringCellValue();
Date hdayDate;
try {
hdayDate = new SimpleDateFormat("dd/MM/yyyy").parse(temphdDate);
} catch (ParseException e) {
throw new BadRequestException(e.toString());
}
refhd.setHolidayDate(hdayDate);
} else {
refhd.setHolidayDate(cells .getDateCellValue());
}
break;
case 3:
if (cells .getCellType() == CellType.STRING) {
refhd.setHolidayName(cells .getStringCellValue());
} else {
throw new BadRequestException("Cell Value Is Not a String");
}
break;
default:
break;
}
idx++;
}
hdayList.add(refhd);
}
workbook.close();
return hdayList;
} catch (IOException e) {
throw new RuntimeException("fail to parse Excel file: " + e.getMessage());
}
}
Whatever I use as the Excel column name -> same code order
Alternatively I use a wrong order col name but that value would correct but database would store wrong order that is the issue
Excel
id week date HolidayName
1 1st week 5/10/2021
SAME order save to database
That's the output in the case in the database
id week date HolidayName
1 1stweek 5/10/2021 null
Again my input
id date
1 6/10/2021
I am getting an error like Cell Value Is Not a String
So I want that output in the case in database
id week date HolidayName
1 null 6/10/2021 null
What I want match with excel col name match database col name

How to get NPOI Excel RichStringCellValue?

I am using DotNetCore.NPOI (1.2.1) in order to read an MS Excel file.
Some of the cells are of type text and contain formatted strings (e.g. some words in bold).
How do I get the formatted cell value? My final goal: Retrieve the cell text as HTML.
I tried
var cell = row.GetCell(1);
var richStringCellValue = cell.RichStringCellValue;
But this won't let me access the formatted string (just the plain string without formattings).
Does anybody have an idea or solution?
I think you'll have to take longer route in this case. First you'll have to maintain the formatting of cell value like date, currency etc and then extract the style from cell value and embed the cell value under that style.
best option is to write extenstion method to get format and style value.
To get the fomat Please see this link How to get the value of cell containing a date and keep the original formatting using NPOI
For styling first you'll have to check and find the exact style of running text and then return the value inside the html tag , below method will give you idea to extract styling from cell value. Code is untested , you may have to include missing library.
public void GetStyleOfCellValue()
{
XSSFWorkbook wb = new XSSFWorkbook("YourFile.xlsx");
ISheet sheet = wb.GetSheetAt(0);
ICell cell = sheet.GetRow(0).GetCell(0);
XSSFRichTextString richText = (XSSFRichTextString)cell.RichStringCellValue;
int formattingRuns = cell.RichStringCellValue.NumFormattingRuns;
for (int i = 0; i < formattingRuns; i++)
{
int startIdx = richText.GetIndexOfFormattingRun(i);
int length = richText.GetLengthOfFormattingRun(i);
Console.WriteLine("Text: " + richText.String.Substring(startIdx, startIdx + length));
if (i == 0)
{
short fontIndex = cell.CellStyle.FontIndex;
IFont font = wb.GetFontAt(fontIndex);
Console.WriteLine("Bold: " + (font.IsBold)); // return string <b>my string</b>.
Console.WriteLine("Italics: " + font.IsItalic + "\n"); // return string <i>my string</i>.
Console.WriteLine("UnderLine: " + font.Underline + "\n"); // return string <u>my string</u>.
}
else
{
IFont fontFormat = richText.GetFontOfFormattingRun(i);
Console.WriteLine("Bold: " + (fontFormat.IsBold)); // return string <b>my string</b>.
Console.WriteLine("Italics: " + fontFormat.IsItalic + "\n");// return string <i>my string</i>.
}
}
}
Font formatting in XLSX files are stored according to schema http://schemas.openxmlformats.org/spreadsheetml/2006/main which has no direct relationship to HTML tags. Therefore your task is not that much straight forward.
style = cell.getCellStyle();
font = style.getFont(); // or style.getFont(workBook);
// use Font object to query font attributes. E.g. font.IsItalic
Then you will have to build the HTML by appending relevant HTML tags.

How to write into a particular cell using xlsx npm package

I have to write a value to a particular cell (say the D4 cell) in my xlsm file. I can see the option of
XLSX.writeFile(workbook, 'out.xlsx');
in the XLSX package documentation (writing functions)
But I am not seeing anything to write a value to a particular cell (where should the values which needs to be written passed?). Or, it is not as clear as the example provided to read a particular cell value.
Would be glad if someone could provide me a simple example of snippet.
This is how I read a particular cell value:
if(typeof require !== 'undefined') XLSX = require('C:\\Program Files\\nodejs\\node_modules\\npm\\node_modules\\xlsx');
var workbook = XLSX.readFile('xlsm');
var first_sheet_name = workbook.SheetNames[0];
var address_of_cell = 'D5';
var worksheet = workbook.Sheets[first_sheet_name];
var desired_cell = worksheet[address_of_cell];
desired_value = (desired_cell ? desired_cell.v : undefined);
console.log('Cell Value is: '+ desired_value);
So to write to a specific cell in a defined sheet - lets say first sheet, you can do:
const XLSX = require('xlsx');
// read from a XLS file
let workbook = XLSX.readFile('test.xls');
// get first sheet
let first_sheet_name = workbook.SheetNames[0];
let worksheet = workbook.Sheets[first_sheet_name];
// read value in D4
let cell = worksheet['D4'].v;
console.log(cell)
// modify value in D4
worksheet['D4'].v = 'NEW VALUE from NODE';
// modify value if D4 is undefined / does not exists
XLSX.utils.sheet_add_aoa(worksheet, [['NEW VALUE from NODE']], {origin: 'D4'});
// write to new file
// formatting from OLD file will be lost!
XLSX.writeFile(workbook, 'test2.xls');
Hope that helps
Modify value in D4
worksheet['D4'].v = 'NEW VALUE from NODE';
This will work only if the cell already defined in the file, but sometimes you will want to write to a new undefined cell.
so, the solution I found for that is:
modify value in new cell- D4
XLSX.utils.sheet_add_aoa(worksheet, [['NEW VALUE from NODE']], {origin: 'D4'});

Workbook cell style in POI/NPOI doesn't work properly with multiple styles in workbook

I'm running into strange problem with .Net version of POI library for Excel Spreadsheets. I'm rewriting from text files to Excel 97-2003 documents and I'm like to add some formatting programmatically depend on some values gather at the begging of the program.
At the beginning, in the same method where I was creating a new cell from given value I was creating also a new Workbook CellStyle which was wrong, because I was running out of the styles very quickly (or I was just thought it was the cause of the problem).
Constructor of the class responsible for Excel Workbook:
public OldExcelWriter(TextWriter logger) : base(logger)
{
_workbook = new HSSFWorkbook();
_sheetData = _workbook.CreateSheet("sheet1");
_creationHelper = _workbook.GetCreationHelper();
}
Method that is calling all the chains of operations:
public void Write(string path, Data data)
{
FillSpreadSheetWithData(data, _sheetData);
SaveSpreadSheet(_workbook, path);
}
Long story short, in FillSpreadSheetWithData I have method for creating a row inside which I'm have a loop for each cell, so basically I'm iterating thru every column, passing IRow references to a row, column value, index and formatting information like this:
for (int j = 0; j < column.Count; j++)
{
CreateCell(row, column[j], j, data.Formatting[j]);
}
and while creating a new styles (for first shot I was trying to pass some date time values) I had situation like this in my rewrited Excel: screenshot of excel workbook
So formatting was passed correctly (also Horizontal Aligment etc.) but it get ugly after 15th row (always the same amount).
DateTime dataCell = DateTime.MaxValue;
var cell = row.CreateCell(columnIndex);
_cellStyle = _workbook.CreateCellStyle();
switch (format.Type)
{
case DataType.Date:
_cellStyle.DataFormat = _creationHelper.CreateDataFormat().GetFormat("m/dd/yyyy");
if (value.Replace("\n", "") != string.Empty)
{
dataCell = DateTime.ParseExact(value.Replace("\n", ""), "m/dd/yyyy",
System.Globalization.CultureInfo.InvariantCulture);
}
break;
}
switch (format.HorizontalAlignment)
{
case Enums.HorizontalAlignment.Left:
_cellStyle.Alignment = HorizontalAlignment.LEFT;
break;
case Enums.HorizontalAlignment.Center:
_cellStyle.Alignment = HorizontalAlignment.CENTER;
break;
}
if (dataCell != DateTime.MaxValue)
{
cell.CellStyle = _cellStyle;
cell.SetCellValue(dataCell);
dataCell = DateTime.MaxValue;
}
else
{
cell.CellStyle = _cellStyle;
cell.SetCellValue(value);
}
(It's not the cleanest code but I will don refactor after getting this work).
After running into this issue I thought that maybe I will create _cellStyle variable in the constructor and only change it's value depends on the case, because it's assigned to the new cell anyway and I see while debugging that object values are correct.
But after creating everything, it won't get any better. Styles was override by the last value of the style, and dates are spoiled also, but later: screnshoot of excel workbook after creating one instance of cell style
I'm running out of ideas, maybe I should create every combination of the cell styles (I'm using only few data formats and alignments) but before I will do that (because I'm running out of easy options right now) I wonder what you guys think that should be done here.
cell format is set to custom with date type
I am using this code to create my custom style and format. Its for XSSF Format of excel sheet. but it will work for HSSF format with some modification.
XSSFFont defaultFont = (XSSFFont)workbook.CreateFont();
defaultFont.FontHeightInPoints = (short)10;
defaultFont.FontName = "Arial";
defaultFont.Color = IndexedColors.Black.Index;
defaultFont.IsBold = false;
defaultFont.IsItalic = false;
XSSFCellStyle dateCellStyle = (XSSFCellStyle)workbook.CreateCellStyle();
XSSFDataFormat dateDataFormat = (XSSFDataFormat)workbook.CreateDataFormat();
dateCellStyle.SetDataFormat(dateDataFormat.GetFormat("m/d/yy h:mm")); //Replace format by m/dd/yyyy. try similar approach for phone number etc.
dateCellStyle.FillBackgroundColor = IndexedColors.LightYellow.Index;
//dateCellStyle.FillPattern = FillPattern.NoFill;
dateCellStyle.FillForegroundColor = IndexedColors.LightTurquoise.Index;
dateCellStyle.FillPattern = FillPattern.SolidForeground;
dateCellStyle.Alignment = HorizontalAlignment.Left;
dateCellStyle.VerticalAlignment = VerticalAlignment.Top;
dateCellStyle.BorderBottom = BorderStyle.Thin;
dateCellStyle.BorderTop = BorderStyle.Thin;
dateCellStyle.BorderLeft = BorderStyle.Thin;
dateCellStyle.BorderRight = BorderStyle.Thin;
dateCellStyle.SetFont(defaultFont);
//Apply your style to column
_sheetData.SetDefaultColumnStyle(columnIndex, dateCellStyle);
// Or you can also apply style cell wise like
var row = _sheetData.CreateRow(0);
for (int cellIndex = 0;cellIndex < TotalHeaderCount;cellIndex++)
{
row.Cells[cellIndex].CellStyle = dateCellStyle;
}

SSIS import a Flat File to SQL with the first row as header and last row as a total

I receive Text File that I have to Import to a SQL Table, I have to come with a SSIS because I will received the Flat File every Day , with the First Row as the Customer_ID, then come the invoice details and then the Total of the invoice.
Example :
30303
0000109291700080190432737000005Name of the product
0000210291700080190432737000010Name of the product
0000309291700080190432737000000Name of the product
003 000145
So let me Explain:
First 30303 is the Customer #
Other Rows Invoice Details
00001-> ROWID 092917-> DATE 000801904327->PROD 370->Trans 00010 -> AMOUNT
Name of the product
Last Row
003==>Total rows 000145==>Total of Invoice
Any Clue ?
I would use a Script Component as a source in a Data Flow Task. You can then use C# or VB.net to read the file, e.g., by using System.IO.StreamReader, in any way you wish. You can read a line at a time, store values in variables to write to every row (e.g., the customer number), etc. It's extremely flexible for complex files.
Here is an example script (C#) based on your data:
public override void CreateNewOutputRows()
{
System.IO.StreamReader reader = null;
try
{
bool line1Read = false;
int customerNumber = 0;
reader = new System.IO.StreamReader(Variables.FilePath); // this refers to a package variable that contains the file path
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
if (!line1Read)
{
customerNumber = Convert.ToInt32(line);
line1Read = true;
}
else if (!reader.EndOfStream)
{
Output0Buffer.AddRow();
Output0Buffer.CustomerNumber = customerNumber;
Output0Buffer.RowID = Convert.ToInt32(line.Substring(0, 5));
Output0Buffer.Date = DateTime.ParseExact(line.Substring(5, 6), "MMddyy", System.Globalization.CultureInfo.CurrentCulture);
Output0Buffer.Prod = line.Substring(11, 12);
Output0Buffer.Trans = Convert.ToInt32(line.Substring(23, 3));
Output0Buffer.Amount = Convert.ToInt32(line.Substring(26, 5));
Output0Buffer.ProductName = line.Substring(31);
}
}
}
catch
{
if (reader != null)
{
reader.Close();
reader.Dispose();
}
throw;
}
}
The columns in 'Output 0' of the Script Component are configured as follows:
Name DataType Length
==== ======== ======
CustomerNumber four-byte signed integer [DT_I4]
RowID four-byte signed integer [DT_I4]
Date database date [DT_DBDATE]
Prod string [DT_STR] 12
Trans four-byte signed integer [DT_I4]
Amount four-byte signed integer [DT_I4]
ProductName string [DT_STR] 255
To implement this:
Create a string variable called 'FilePath' with your file path in it for the script to reference.
Create a Data Flow Task.
Add a Script Component to the Data Flow Task - you'll be asked what type it should be, select 'Source'.
Right-click the Script Component, click 'Edit'.
On the 'Script' pane, add the 'FilePath' variable to the 'ReadOnlyVariables' section.
On the 'Inputs and Outputs' pane, expand 'Output 0' and add columns to the 'Output Columns' section as per the above table.
On the 'Script' pane, click 'Edit Script', and then paste my code over the public override void CreateNewOutputRows() method (replacing it).
Your Script Component source is now configured, and you'll be able to use it like any other data source component. To write this data to a SQL Server table, add an OLEDB Destination to the Data Flow Task, and link the Script Component to that, configuring the columns appropriately.