Merge multiple excel to single worksheet with options - vba

I have 2 sheets in one excel file, the first one is :
Sheet: Person
Code date start end
2301 12/08/1993 08:02 08:17
4221 12/08/1993 09:04 09:25
2312 12/08/1993 10:02 10:28
1284 19/09/1994 11:02 11:21
2312 19/09/1994 15:57 16:20
1284 23/06/1995 17:12 17:35
2312 22/06/1996 13:14 13:32
4221 22/06/1996 15:53 16:13
4221 05/05/1999 08:06 08:22
2418 05/05/1999 08:10 08:33
2301 05/05/1999 09:12 09:37
2301 05/05/1999 09:28 10:28
2301 05/05/1999 13:28 13:38
Is a list of person of a company and anyone of them is identified by badge [row Code], what I hope is to Merge data by code to a costume sheet of a person, for example, for the person who have a number of badge 2301 he have his own sheet called B2301, so based on the first sheet "Person" I hope import data of a person like that grouped by code number of this person :
sheet B2301
date Period(min)
12/08/1987 12
.... ...
So Period will be calculated from start and end rows.
I tried by using this formula but it's not working for me :
=IFERROR(INDEX(Sheet1!A$2:A$14,SMALL(IF(Sheet1!$A$2:$A$14=INT(RIGHT(CELL("filename",A1),LEN(CELL("filename",A1))-FIND("]",CELL("filename",A1)))),ROW(Sheet1!A$2:A$14)-ROW(Sheet1!A$2)+1),ROWS(Sheet1!A$2:A2))),"")
Any Idea?

This will require a lot of research on your part. You'll need to:
create a VBA Macro
define variables and create a loop to look at your main sheet.
create a sheet name based on the code.
check if the sheet already exists, if not, create it.
copy the values from the first sheet to the "code" sheet.
once all values are processed, go through each sheet, loop through your values and calculate your periods.
This is not a trivial amount of code. Do research on these 6 items and write the code. When you have that, display it and we can give you more direction.

To populate the dates, in A2 put:
=IFERROR(INDEX(Sheet1!$B$2:$B$14,MATCH(SMALL(IF(--MID(MID(CELL("filename",A1),FIND("]",CELL("filename",A1))+1,255),2,999) = Sheet1!$A$2:$A$14,Sheet1!$B$2:$B$14),ROW()-1),IF(--MID(MID(CELL("filename",A1),FIND("]",CELL("filename",A1))+1,255),2,999) = Sheet1!$A$2:$A$14,Sheet1!$B$2:$B$14),0)),"")
To populate the period put this in B2:
=IFERROR(TEXT(INDEX(Sheet1!$D$2:$D$14,MATCH(SMALL(IF(--MID(MID(CELL("filename",A1),FIND("]",CELL("filename",A1))+1,255),2,999) = Sheet1!$A$2:$A$14,IF(Sheet1!$B$2:$B$14=A2,Sheet1!$C$2:$C$14)),COUNTIF($A$1:$A2,A2)),IF(--MID(MID(CELL("filename",A1),FIND("]",CELL("filename",A1))+1,255),2,999) = Sheet1!$A$2:$A$14,IF(Sheet1!$B$2:$B$14=A2,Sheet1!$C$2:$C$14)),0))-INDEX(Sheet1!$C$2:$C$14,MATCH(SMALL(IF(--MID(MID(CELL("filename",A1),FIND("]",CELL("filename",A1))+1,255),2,999) = Sheet1!$A$2:$A$14,IF(Sheet1!$B$2:$B$14=A2,Sheet1!$C$2:$C$14)),COUNTIF($A$1:$A2,A2)),IF(--MID(MID(CELL("filename",A1),FIND("]",CELL("filename",A1))+1,255),2,999) = Sheet1!$A$2:$A$14,IF(Sheet1!$B$2:$B$14=A2,Sheet1!$C$2:$C$14)),0)),"[m]"),"")
Both are array formulas and need to be confirmed with Ctrl-Shift-Enter. Then Copy both down to desired rows.

Related

Import/Insert Excel Range and SSIS variables into SQL table?

I have an SSIS package that is to ingest a number of Excel files with similar structures but irregular names and import them into a SQL table. Along with the data from the excel files, I have a number of variables that are set and different with each file (User::ExcelFileName, User::VarMonth, User::VarProgram, User::VarYear, etc). All of the table data from the Excel files are going to the same destination table, but for each row of data alongside the Excel dataset I want to insert a column for each variable to pass through as well into SQL. An example of my dataset is below:
Excel
ID
Name
Foo
Bar
111
Bob
88yu
117
112
Jim
JKL
A TU
113
George
FTD
19900
SSIS Variables (set during execution)
User::ExcelFileName = c:\temp\excelfile1.xlsx
User::VarMonth = Jan
User::VarProgram = Daily
User::VarYear = 2023
Desired SQL Destination:
ExcelFileName
VarMonth
VarProgram
VarYear
ID
Name
Foo
Bar
c:\temp\excelfile1.xlsx
Jan
Daily
2023
111
Bob
88yu
117
c:\temp\excelfile1.xlsx
Jan
Daily
2023
112
Jim
JKL
A TU
c:\temp\excelfile1.xlsx
Jan
Daily
2023
113
George
FTD
19900
I've tried a few configurations and I've referenced this post for piping in variable data into SQL, but I haven't gotten a working model yet.
Worth noting, Excel COnnection is dynamic and set to run within a Foreach Loop container to iterate through my Excel sources. Any advice or guidance would be appreciated!
It sounds like you want a Derived Column task.
in the task, just add the new columns you want, and map the variables to the column.

How to collapse multiple unique observations into one and find a mean?

Data: https://www.dropbox.com/s/c2yef22u96dd3s5/female_mentions_centrality_1.xlsx?dl=0
Data set screenshot:
I have a data set which looks like the picture above. It has multiple (unique) observations for the same Movie Name. For example, there are 3 unique observations for the movie Aan Milo Sajna and 2 for Aap Ke Saath.
I want that wherever there are multiple observations for a given Movie Name, they get collapsed into a single observation such that each variable value is the mean of the multiple observations.
For example, see below.
Transformed data set screenshot:
The Movie Names that had single observations remain untouched. But the three observations for Aan Milo Sajna and the 2 observations for Aap Ke Sath get collapsed into single observations. And each of the variable values is changed to the mean of the multiple observations as shown in the picture.
How can I accomplish this?
df_mean = df.groupby('MOVIE NAME').agg(np.mean).reset_index()
MOVIE NAME FEMALE MENTIONS TOTAL FEMALE CENTRALITY FEMALE COUNT AVERAGE FEMALE CENTRALITY
0 1920 19.000 258.417 140.500 1.669
1 100 Days 18.600 435.320 153.000 3.427
2 13B 2.333 74.289 23.333 1.259
3 1920 London 14.500 926.183 152.500 3.118
4 1942: A Love Story 11.000 398.500 78.000 5.109
... ... ... ... ... ...
2029 Zindagi 5.000 119.667 45.667 2.506
2030 Zindagi Na Milegi Dobara 13.000 265.750 135.000 1.865
2031 Zindagi Tere Naam 2.500 57.500 21.250 3.689
2032 Zubeidaa 0.000 1260.122 101.000 14.421
2033 Zulmi 1.000 5.333 4.000 1.333

Get MAX value per year in Apache Pig

I have been trying to get the max temperature per year using the data below.
Actual data looks like this but I am interested in only first column that is year and 4th column that is temperature..
2016-11-03 12:00:00.000 +0100,Mostly Cloudy,rain,10.594444444444443,10.594444444444443,0.73,13.2664,174.0,10.1913,0.0,1019.74,Partly cloudy throughout the day.
2016-11-03 13:00:00.000 +0100,Mostly Cloudy,rain,11.072222222222223,11.072222222222223,0.72,13.1698,176.0,12.4131,0.0,1019.45,Partly cloudy throughout the day.
2016-11-03 14:00:00.000 +0100,Mostly Cloudy,rain,11.172222222222222,11.172222222222222,0.71,12.654600000000002,175.0,10.835300000000002,0.0,1019.16,Partly cloudy throughout the day.
2016-11-03 15:00:00.000 +0100,Mostly Cloudy,rain,10.911111111111111,10.911111111111111,0.72,11.753,170.0,10.867500000000001,0.0,1018.94,Partly cloudy throughout the day.
2016-11-03 16:00:00.000 +0100,Mostly Cloudy,rain,10.350000000000001,10.350000000000001,0.72,10.6582,161.0,11.592,0.0,1018.81,Partly cloudy throughout the day.
DUMP B is like below
(2014,12.038889)
(2014,21.055555)
(2016,29.905556)
(2016,30.605556)
(2016,29.95)
(2016,29.972221)
The code I have write is like below..But, it throws me the error at D. I have also used ToDate function but seems it doesn't work too..
A = load 'file.csv' using PigStorage(',')......
B = foreach A GENERATE SUBSTRING(year,0,4) as year1, Atemp
C = group B by year1;
D = foreach C GENERATE group,MAX(Atemp);
Error I get :
Invalid field projection. Projected field [year1] does not exist in schema: group:chararray,B:bag{:tuple(year1:chararray,Atemp:float)}.
I figure out myself after posting question at stackoverflow :) I wonder why!
Instead of D = foreach C GENERATE group,MAX(Atemp);
I have used D= foreach C GENERATE group, MAX(B.Atemp) as max;
and it works!
If anyone wants me to delete the post I'm happy to do so. Kindly let me know

How do I retrieve Google Finance Historical data in Google Sheets using the API?

To reproduce this problem, create a new sheet.
In cell A1, insert this formula:
=GoogleFinance("AMZN", "all", "1/1/2018", "2/1/2018")
The output (Formatting doesn't matter for this problem):
Date Open High Low Close Volume
1/2/2018 16:00:00 1172 1190 1170.51 1189.01 2694494
1/3/2018 16:00:00 1188.3 1205.49 1188.3 1204.2 3108793
1/4/2018 16:00:00 1205 1215.8699 1204.66 1209.59 3022089
1/5/2018 16:00:00 1217.51 1229.14 1210 1229.14 3544743
1/8/2018 16:00:00 1236 1253.079 1232.03 1246.87 4279475
1/9/2018 16:00:00 1256.9 1259.33 1241.76 1252.7 3661316
1/10/2018 16:00:00 1245.15 1254.33 1237.23 1254.33 2686017
1/11/2018 16:00:00 1259.74 1276.77 1256.46 1276.68 3125048
1/12/2018 16:00:00 1273.3925 1305.76 1273.3925 1305.2 5443730
1/16/2018 16:00:00 1323 1339.94 1292.3 1304.86 7220701
1/17/2018 16:00:00 1312.24 1314 1280.88 1295 5253754
1/18/2018 16:00:00 1293.95 1304.6 1284.02 1293.32 4026915
1/19/2018 16:00:00 1312 1313 1292.99 1294.58 4578536
1/22/2018 16:00:00 1297.17 1327.45 1296.6636 1327.31 4140061
1/23/2018 16:00:00 1338.09 1364.9 1337.34 1362.54 5169306
1/24/2018 16:00:00 1374.82 1388.16 1338 1357.51 6807457
1/25/2018 16:00:00 1368 1378.34 1357.62 1377.95 4753012
1/26/2018 16:00:00 1392.01 1402.53 1380.91 1402.05 4857310
1/29/2018 16:00:00 1409.18 1431.39 1400.44 1417.68 5701898
1/30/2018 16:00:00 1403.17 1439.25 1392 1437.82 5871942
1/31/2018 16:00:00 1451.3 1472.58 1450.04 1450.89 6424693
All is well so far. BUT, when I try to retrieve this data via the API (And I want to point out that all OTHER data is retrieved just fine), I get simple a "#N/A" for cell A1, and no values for all the other cells. It's like they're empty. I've tried retrieving just the values, but also all the gridinfo, both cases, it's like these values were never there.
Things I've tried:
Setting a cell to "=TODAY()" before reading (some comments say that triggers a refresh of the data).
Looping with a delay trying to read the data, but that runs forever.
All this works flawlessly in the browser, but not over API calls
Other cells referencing this data fails with "#N/A"
If I add the add-on CryptoFinance, and insert this function "=CRYPTOFINANCE("COINMARKETCAP")", a range of cells are populated, similar to GoogleFinance. However, these cells CAN be retrieved via API calls. Works like a charm.
Any 'single' cell using googlefinance works fine, so if I add cell I1 shown below, THAT value is retrieved just fine
=GoogleFinance("AMZN", "price")
It just simply seems like the data is not there when using the HISTORICAL data from google finance (as per the API call).
I have not posted my code, since it seems this is more a case of googlefinance not working than anything else, and specifically when returning a 'range' of data instead of single cell values.
Thoughts?
Duh.
Tucked at the very bottom of the Google Finance Documentation I finally noticed this little gem:
Historical data cannot be downloaded or accessed via the Sheets API or Apps Script. If you attempt to do so, you will see a #N/A error in place of the values in the corresponding cells of your spreadsheet.

Need help creating rules for a .clr to help export some data to a Excel file

Basically this data is in a report that I can view using 'Unicenter Outpout Management Document Viewer'..... Once I see the results I am able Export using the following parameters
Export Type = Worksheet
Worksheet Format = .XLS
I am also able to load a Column Rule Name file that includes the rules I would likle applied i.e 'COLRULE BLANKLINE DISCARD'
Below is the example of how the data is presented. I have changed some values due to privacy, I have made no changes to format or spacing.
XXX - DI tape DATE: 25SEP2017 Page: 0001
Header: File ID: CN15 Processing Date: 20170923
Detail
SL Agt Reason Cd Policy No. Prem Amt. Comm Rate Comm Amt. Agt Shr
Comm Yr Ins Name Paid To Date Cov Code GW Agt SL Agt Name Annual Prem Amt
001234 CSERF 0012345678 3.92 5.0000 0.20- 100.000
17 EVESO 20171028 00220 652348 ABCDE 47.04
001234 CSERF 0012345678 70.30 5.0000 3.52- 100.000
17 EVESO 20171028 30086 652348 ABCDE 843.60
001234 CSERF 0012345678 14.83 5.0000 0.74- 100.000
17 EVESO 20171028 30015 652348 ABCDE 177.96
001234 CSERF 0012345678 26.28 5.0000 1.31- 100.000
17 EVESO 20171028 30086 652348 ABCDE 315.36
Since the data seams to automatically go to the next row I have no idea what type of rule can be applied to have the headers go on row 1, and then read the data properly.
If anyone has an idea please help.
Thank you in advance.
Cheers