How do would you split this given NSString into a NSDictionary? - objective-c

I have some data i aquire from some linux box and want to put it into a NSDictionary for later processing.
How wold you get this NSString into a NSDictionary like the following?
data
(
bytes
(
60 ( 1370515694 )
48 ( 812 )
49 ( 300 )
...
)
pkt
(
60 ( 380698 )
59 ( 8 )
58 ( 412 )
...
)
block
(
60 ( 5 )
48 ( 4 )
49 ( 7 )
...
)
drop
(
60 ( 706 )
48 ( 2 )
49 ( 4 )
...
)
session
(
60 ( 3 )
48 ( 1 )
49 ( 2 )
...
)
)
The data string looks like:
//time bytes pkt block drop session
60 1370515694 380698 5 706 3
48 812 8 4 2 1
49 300 412 7 4 2
50 0 0 0 0 0
51 87 2 0 0 0
52 87 2 0 0 0
53 0 0 0 0 0
54 0 0 0 0 0
55 0 0 0 0 0
56 0 0 0 0 0
57 812 8 0 0 0
58 812 8 0 0 0
59 0 0 0 0 0
0 0 0 0 0 0
1 2239 12 2 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
6 0 0 0 0 0
7 2882 19 2 0 0
8 4906 29 4 0 0
9 1844 15 11 0 0
10 4210 29 17 0 0
11 3370 18 4 0 0
12 3370 18 4 0 0
13 1184 7 3 0 0
14 0 0 0 0 0
15 4046 19 3 0 0
16 4956 23 3 0 0
17 2960 18 2 0 0
18 2960 18 2 0 0
19 1088 6 2 0 0
20 0 0 0 0 0
21 3261 17 3 0 0
22 3261 17 3 0 0
23 1228 6 2 0 0
24 1228 6 2 0 0
25 2628 17 2 0 0
26 4688 26 3 0 0
27 1752 13 5 0 0
28 3062 21 5 0 0
29 174 2 2 0 0
30 96 1 1 0 0
31 4351 23 5 0 0
32 0 0 0 0 0
33 4930 23 7 0 0
34 6750 31 7 0 0
35 1241 6 2 0 0
36 1241 6 2 0 0
37 3571 29 2 0 0
38 0 0 0 0 0
39 1010 5 1 0 0
40 1010 5 1 0 0
41 88859 72 3 0 1
42 90783 81 4 0 1
43 2914 19 3 0 0
44 0 0 0 0 0
45 2157 17 1 0 0
46 2157 17 1 0 0
47 78 1 1 0 0
.
Time (first column) should be the key for the sub-sub-dictionaries.
So the idea behind all that is that i can later randmly access the PKT value at a given TIME x, as well as the BLOCK amount at TIME y, and SESSION value at TIME z .. and so on..
Thanks in advance

You probably don't want a dictionary but an array containing dictionaries of all the data entries. The simplest way to parse something like this in Objective-C is to use the componentsSeparatedByString method in NSString
NSString* dataString = <Your Data String> // Assumes the items are separated by newlines
NSArray* items = [dataString componentsSeparatedByString:#"\n"];
NSMutableArray* dataDictionaries = [NSMutableArray array];
for (NSString* item in items) {
NSArray* elements = [item componentsSeparatedByString:#" "];
NSDictionary* entry = #{
#"time": [elements objectAtIndex:0],
#"bytes": [elements objectAtIndex:1],
#"pkt": [elements objectAtIndex:2],
#"block": [elements objectAtIndex:3], #"drop": [elements objectAtIndex:4],
#"session": [elements objectAtIndex:5],
};
[dataDictionaries addObject: entry];
}

Related

how to sum vlaues in dataframes based on index match

I have about 16 dataframes representing weekly users' clickstream data. The photos show the samples for weeks from 0-3. I want to make a new dataframe in this way: for example if a new df is w=2, then w2=w0+w1+w2. For w3, w3=w0+w1+w2+3. As you can see the datasets do not have identical id_users, but id a user does not show in a certain week. All dataframes have the same columns, but indexes are not exactly same. So how to add based on the logic where indexes match?
id_user c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11
43284 1 8 0 8 5 0 0 0 2 3 1
45664 0 16 0 4 0 0 0 0 5 16 2
52014 0 0 0 5 4 0 0 0 0 2 2
53488 1 37 0 19 0 0 3 0 3 23 6
60135 0 124 0 87 3 0 24 0 8 19 14
id_user c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11
40419 0 8 0 3 4 0 6 0 1 6 0
43284 1 4 0 14 26 2 0 0 2 4 2
45664 0 9 0 15 11 0 0 0 1 6 14
52014 0 0 0 8 9 0 8 0 2 2 1
53488 0 2 0 4 0 0 4 0 0 0 0
id_user c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11
40419 0 8 0 3 4 0 6 0 1 6 0
43284 1 4 0 14 26 2 0 0 2 4 2
45664 0 9 0 15 11 0 0 0 1 6 14
52014 0 0 0 8 9 0 8 0 2 2 1
53488 0 2 0 4 0 0 4 0 0 0 0
concat then groupby sum
out = pd.concat([df1,df2]).groupby('id_user',as_index=False).sum()
Out[147]:
id_user c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11
0 40419 0 8 0 3 4 0 6 0 1 6 0
1 43284 2 12 0 22 31 2 0 0 4 7 3
2 45664 0 25 0 19 11 0 0 0 6 22 16
3 52014 0 0 0 13 13 0 8 0 2 4 3
4 53488 1 39 0 23 0 0 7 0 3 23 6
5 60135 0 124 0 87 3 0 24 0 8 19 14

How to turn a list of event in to a matrix to display in Panda

I have a list of events and i want to display on a graph how many happens per hour each day of the week as shown below:
Example of the graph i want
(each line is a day, x axis is the time of the day, y axis is the number of events)
As i am new to Panda i am not sure what's the best way to do it but here is my way:
x = [(rts[k].getDay(), rts[k].getHour(), 1) for k in rts]
df = pd.DataFrame(x[:30]) # Subset of 30 events
dfGrouped = df.groupby([0, 1]).sum() # Group them by day and hour
#Format to display
pd.DataFrame(np.random.randn(24, 7), index=range(0,24), columns=['Mo', 'Tu', 'We', 'Th', 'Fr', 'Sa', 'Su'])
Question is, how can i go from my dataframe with data grouped to a matrix 24x7 as required to display ?
I tried as_matrix but that give me only a one dimensional array, while i want the index of my dataframe to be the index in my matrix.
print(df)
2
0 1
0 19 1
23 1
1 10 2
18 3
22 1
2 17 1
3 8 2
9 3
11 3
13 1
19 1
4 7 1
9 1
14 1
15 1
18 1
5 1 2
7 1
13 1
19 1
6 12 1
Thanks for your help :)
Antoine
I think you need unstack for reshape data, then rename columns names by dict and if necessary add missing hours to index by reindex_axis:
df1 = df.groupby([0, 1])[2].sum().unstack(0, fill_value=0)
#set columns names
df = pd.DataFrame(x[:30], columns = ['days','hours','val'])
d = {0: 'Mo', 1: 'Tu', 2: 'We', 3: 'Th', 4: 'Fr', 5: 'Sa', 6: 'Su'}
df1 = df.groupby(['days', 'hours'])['val'].sum().unstack(0, fill_value=0)
df1 = df1.rename(columns=d).reindex_axis(range(24), fill_value=0)
print (df1)
days Mo Tu We Th Fr Sa Su
hours
0 0 0 0 0 0 0 0
1 0 0 0 0 0 2 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0
7 0 0 0 0 1 1 0
8 0 0 0 2 0 0 0
9 0 0 0 3 1 0 0
10 0 2 0 0 0 0 0
11 0 0 0 3 0 0 0
12 0 0 0 0 0 0 1
13 0 0 0 1 0 1 0
14 0 0 0 0 1 0 0
15 0 0 0 0 1 0 0
16 0 0 0 0 0 0 0
17 0 0 1 0 0 0 0
18 0 3 0 0 1 0 0
19 1 0 0 1 0 1 0
20 0 0 0 0 0 0 0
21 0 0 0 0 0 0 0
22 0 1 0 0 0 0 0
23 1 0 0 0 0 0 0

how populate columns dependng found value?

I have a pandas DataFrame with customers ID and columns related to months (1,2,3....)
I have a column with the number of months since last purchase
I am using the following to populate the relevant months columns
dt.loc[dt.month == 1, '1'] = 1
dt.loc[dt.month == 2, '2'] = 1
dt.loc[dt.month == 3, '3'] = 1
etc,
How can I populate the columns in a better way to avoid creating 12 statements?
pd.get_dummies
pd.get_dummies(dt.month)
Consider the dataframe dt
dt = pd.DataFrame(dict(
month=np.random.randint(1, 13, (10)),
a=range(10)
))
a month
0 0 8
1 1 3
2 2 8
3 3 11
4 4 3
5 5 4
6 6 1
7 7 5
8 8 3
9 9 11
Add columns like this
dt.join(pd.get_dummies(dt.month))
a month 1 3 4 5 8 11
0 0 8 0 0 0 0 1 0
1 1 3 0 1 0 0 0 0
2 2 8 0 0 0 0 1 0
3 3 11 0 0 0 0 0 1
4 4 3 0 1 0 0 0 0
5 5 4 0 0 1 0 0 0
6 6 1 1 0 0 0 0 0
7 7 5 0 0 0 1 0 0
8 8 3 0 1 0 0 0 0
9 9 11 0 0 0 0 0 1
If you wanted the column names to be strings
dt.join(pd.get_dummies(dt.month).rename(columns='month {}'.format))
a month month 1 month 3 month 4 month 5 month 8 month 11
0 0 8 0 0 0 0 1 0
1 1 3 0 1 0 0 0 0
2 2 8 0 0 0 0 1 0
3 3 11 0 0 0 0 0 1
4 4 3 0 1 0 0 0 0
5 5 4 0 0 1 0 0 0
6 6 1 1 0 0 0 0 0
7 7 5 0 0 0 1 0 0
8 8 3 0 1 0 0 0 0
9 9 11 0 0 0 0 0 1

How to combine rows or data into one row

I am working on a project to see how many units of each category that customers order and this is my Select clause:
SELECT
d2.customer_id
, ( CASE WHEN d2.category = 100 THEN d2.units ELSE 0 END ) AS produce_units
, ( CASE WHEN d2.category = 200 THEN d2.units ELSE 0 END ) AS meat_units
, ( CASE WHEN d2.category = 300 THEN d2.units ELSE 0 END ) AS seafood_units
, SUM (d2.units) AS total_units
And my result looks like this while 62779 is customer id and the last column is total units.
62779 0 0 0 0 20 0 0 0 0 0 0 20
62779 0 0 0 0 0 0 0 0 52 0 0 52
62779 0 6 0 0 0 0 0 0 0 0 0 6
62779 0 0 0 0 0 0 0 0 0 22 0 22
62779 0 0 0 0 0 14 0 0 0 0 0 14
62779 0 0 0 0 0 0 0 20 0 0 0 20
62779 0 0 0 8 0 0 0 0 0 0 0 8
62779 64 0 0 0 0 0 0 0 0 0 0 64
However, I want my result to look like this:
62779 64 6 0 8 20 14 0 20 52 22 0 206
Please advice. Thanks :)

Filter table for abundance in at least 20 % of the samples

I have a huge table tab separated like the one below:
the first row is the subject list while the other rows are my counts.
KEGGAnnotation a b c d e f g h i l m n o p q r s t u v z w ee wr ty yu im
K01824 0 0 1 5 0 0 0 0 0 0 0 0 0 0 14 6 0 0 0 0 0 0 0 0 0 0 0
K03924 17302 15372 19601 18732 17180 18094 23560 20516 14280 24187 19642 20521 20330 20843 22948 17124 19557 18319 16608 19463 18334 21022 14325 10819 13342 16876 16979
K13730 0 0 1 5 0 0 0 0 0 0 0 0 0 0 14 6 0 0 0 0 0 0 0 0 0 0 0
K13735 5360 463 12516 7235 5051 2022 2499 2778 5392 1220 6460 9490 1169 6556 14862 9657 7360 6837 7810 4368 2186 12474 7810 9755 1401 12867 4431
K07279 0 0 1 5 0 0 0 0 0 0 0 0 0 0 14 6 0 0 0 0 0 0 0 0 0 0 0
K14194 4499 2216 2322 2031 2763 2219 704 1647 2536 876 2692 4196 687 2958 3207 2153 2266 1974 370 2867 1110 5372 3637 9828 2038 2812 3472
K11494 0 0 1 10 0 0 0 0 11 0 0 0 0 0 14 6 0 0 0 0 0 0 0 0 0 0 0
K03332 0 0 1 5 0 0 0 0 0 0 0 0 0 0 14 6 0 0 0 0 0 0 0 0 0 0 0
K01317 3 1 6 0 1 3 0 14 11 0 21 8 0 20 0 263 0 0 6 3 5 0 0 41 0 0 2
I would like to grep only the lines in which the counts >100 are present in at least 20% of the samples (= in at least 6 samples).
EX. sample Ko3924 will be grepped but not K03332.
increment the counter for values greater than the threshold. Print the lines if the counter is greater than the 20% of the fields checked. This will also print the header line.
awk '{c=0; for(i=2;i<=NF;i++) c+=($i>=100); if(c>=0.2*(NF-1)) print $0}' input