Case:
TestCompany Corporation is sending us expenditure data in a csv file every month. Data is a date-expenditure value pair for each serviceId. It is also possible that in a monthly file there are corrections for the data sent in previous months. The value in more recent file is more authentic value. We have to design an ingestion process with detailed data model and data flow diagram as how to store the date and expenditures for each serviceId keeping in mind the traceability of records. There are chances that the next file may contain updates.
2) Input: CSV file with the following structure:
Sr No. Header
Column 1 serviceId
Column 2 month
Column 3 d1
Column 4 d2
Column 5 d3
Column 6 d4
Column 7 d5
Column 8 d6
Column 9 d7
Column 10 d8
Column 11 d9
Column 12 d10
Column 13 d11
Column 14 d12
Column 15 d13
Column 16 d14
Column 17 d15
Column 18 d16
Column 19 d17
Column 20 d18
Column 21 d19
Column 22 d20
Column 23 d21
Column 24 d22
Column 25 d23
Column 26 d24
Column 27 d25
Column 28 d26
Column 29 d27
Column 30 d28
Column 31 d29
Column 32 d30
Column 33 d31
Note:
a. The date corresponding to first non-null value has to be
considered as starting date.
b. The date corresponding to last
non-null value has to be considered as closing date.
c. NULL in CSV
between starting and closing date must be considered as 0.00 for
calculation purpose only.
Sample Input:
serviceId,month,d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15,d16,d17,d18,d19,d20,d21,d22,d23,d24,d25,d26,d27,d28,d29,d30,d31
FEUSA0002V,200107,,,,,,,,,,,,,,,,,26.2866666667,,,,,,,25.5166666667,25.3333333333,25.7,25.8333333333,,,25.8333333333,26.1666666667
The month column represents the month eg 201707 represents 2017-07. Each day value is represented by the column number (d1 of 201707 is 2017-07-01, d2 of 201707 is 2017-07-02 and so on).
4) Problem Statement:
a. For each serviceId, find out the dates for which the value data is missing and prepare a ‘|’ separated list of the dates so that we can revert to fetch data from TestCompany Corporation.
b. Store the transformed data as given in 5.b. Total is sum of all the available values.
5) Desired Output CSV:
a.
serviceId,missing_dates
FEUSA0002V, 2001-07-18|2001-07-19|2001-07-20|2001-07-21|2001-07-22|2001-07-23
b.
serviceId,StartDate,EndDate,Total
FEUSA0002V, 2017-07-01,2017-10-31,369.1458
Sample of INPUT FEED (Multiple rows with same serviceid)
ServiceId,month,d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15,d16,d17,d18,d19,d20,d21,d22,d23,d24,d25,d26,d27,d28,d29,d30,d31
F0CAN062AH,201706,,31.55,,,31.48,31.39,31.42,31.42,31.46,,,31.29,,31.12,31.13,,,,31.33,,31.31,,31.6,,,31.65,31.46,31.64,31.34,,
F0CAN062AH,201707,,,,31.31,,,,,,31.09,,,31.43,,,,31.23,,,31.39,,,,,31.29,31.1,31.0,30.88,,,30.87
FEUSA04ABQ,200304,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,26.98,
F0CAN05N3F,201612,,,,,,,,,,,,,,,,,,,,24.78,24.77,24.8,24.82,,,,,,,,
F0CAN05N3F,201701,,,24.75,,24.96,24.93,,,,24.9,24.96,,24.91,,,24.94,,,24.93,25.12,,,25.0,25.1,,,,,,,25.23
F0CAN05N3F,201702,25.29,25.22,25.27,,,25.29,25.35,,,25.8,,,25.87,,26.02,,,,,,,26.3,,25.93,,,25.77,,,,
For question a:
One way of doing it is to normalize the csv file, so that the resulting flow is 4 columns: ServiceId, month, day and value. The parameters as in the image below would be pretty annoying to write if you weren't helped by the "Get fields button.
You then have to compute the date from the month and d# field. I would do it in a Javascript step, which allows at the same time to put the date in ISO format. The Javascript is
var date = new Date(month.substr(0,4), month.substr(5,2)-1, substr(day,1));
Then filter the ServiceId and Dates with a non null value.
If you need the date sorted: Sort them by ServiceId, date
Groupthe flow by ServiceId, and Concatenate strings separated by "|", with subject = date, and name the result "missing_dates".
For question b.
Do exactly the same, except the last step is a Group by ServiceId, where
StartDate is the first-non null value of date
EndDate is the last non null value`of date
Total is the sum of value.
Not that, in this case, the Filter non null values is optional.
I need to automatically set a value of 8.45 hours work for the winter schedule (01 October till 15 June) and 6 hours work for a summer schedule (16 June till 30 September) for a time sheet done in Microsoft Excel.
The equation I am trying is the following:
=IF(AND(DATE($G$1,6, DAY(15))>=(DATE($G$1-1, 10, DAY(1))));(DATE($G$1,6, DAY(15))<A8;8.45;"")
But this keeps on returning formula errors and this still omits the rate value for the summer schedule.
$G$1 is the year that is manually inputted for the yearly time sheet.
A8 is the current date.
Any guidance into this equation would be appreciate.
With best regards Fab
Edit
Thanks DirkReichel, Scott Craner, Alex Bell, Michael Uray for your great intervention.
I tried all the suggestions but some returned a =VALUE error and some did not omit the winter schedule as from the 1 October ->
This is the correct equation:
=IF(AND(DATE($G$1,9,30)>=A8,DATE($G$1,6,15)<=A8),6,8.45)
The equation checks the current date being A8 and checks if it falls withing the summer period (date range). If current date falls within the summer period the value is returned to 6, if the current date falls outside the summer period, it returns a value of 8.45.
Thanks to all that guided.
Depends on the regional settings, you may use comma "," instead of ";" as shown in the following example:
=IF(AND(NOW()>DATEVALUE("6/15/2016"), NOW()<DATEVALUE("9/1/2016")),6,8.45)
Hope this may help.
Try this:
=IF(AND(DATE($G$1,6,15)>=A8,DATE($G$1-1,10, 1)<=A8),8.45,6)
The following solution should work for you:
A1 is the date which gets checked, the formula is placed in B1.
On this way you can pull down a list of dates and formulas in your sheet.
=IF(AND(A1>DATE(YEAR(A1),6,15),A1<DATE(YEAR(A1),8,1)),6,8.45)
It checks if the data is in the range of YYYY-06-15 and YYYY-08-01 and sets then the output to 6, or if it is not in this range to 8.45
I did test it with e German Excel Version with the following formula and I translated it then manually in Notepad to the English formula version.
=WENN(UND(A1>DATUM(JAHR(A1);6;15);A1<DATUM(JAHR(A1);8;1));6;8,45)
I hope my formula translation will work for you in the English Excel version.