I am having problems with a VBA Excel 2010 program code.
I am trying to read data from a spreadsheet on excel 2010. what I have is a set of data (see below) and I am trying to write a code that will let me use a msg box and ask me to write down the name I am looking for e.g. "Name 1" from the list of names in the column and then I want to set a criteria where if the number in the columns are equal to zero and again for a different column = 0 ("name 5"), then highlight red any number in column "Name 8 and Name 9" that is greater than let say 30 (just a random example). the important thing is, the red highlight of column "Name 8/9" must only occur if the numbers is row "Name 1" and "Name 5" are equal to zero.
I have already done this but I only used the column numbers e.g. A1:A5. now I need to use the name of the column because I want to use the code for a different excel spreadsheets but the names of columns are in different positions for each excel sheet, but if I use the names, no matter which column along excel they are I will always find the right column I am looking for and set the criteria.
the criteria for "Name 1/5" will always be = 0 or =1 but the program has to ask me to choose that when I search for it.
if you look below at the example, you can see the red highlight are when criteria of =0 is met for Name 1 and Name 5 and the number in Name8/9 are greater than 30. when it is not greater than 30 and it still meets the criteria it is highlighted blue in the excel spreadsheet example. ALL OTHER NAMES MUST BE IGNORED.
SEE EXAMPLE BELOW
Name 1 Name 2 Name 3 Name 4 Name 5 Name 6 Name 7 Name 8 Name 9 Name 10
0 0 1 0 0 1 58 35 14 19
0 0 0 0 0 1 41 45 68 74
1 0 1 0 1 0 23 18 98 87
0 0 1 0 0 1 65 36 52 89
0 0 0 0 1 1 24 95 47 75
1 1 1 0 1 0 58 87 59 14
0 1 0 0 0 0 74 41 84 32
1 1 0 0 1 0 96 25 74 96
0 0 0 0 0 0 87 35 15 53
0 0 1 0 0 1 57 49 48 47
1 0 1 0 1 1 63 84 23 65
0 1 0 0 0 0 21 54 69 12
0 0 1 0 0 0 54 23 54 54
1 1 0 0 1 1 88 34 77 88
0 0 1 0 0 0 78 48 68 69
1 0 1 0 0 1 96 87 14 65
1 0 0 0 1 0 21 96 54 25
0 1 0 0 0 0 54 72 78 29
0 1 1 0 0 1 62 38 22 78
0 0 0 0 0 0 21 49 65 54
1 0 1 0 1 1 17 65 98 99
0 0 0 0 0 0 59 15 56 70
0 1 1 0 0 0 36 12 29 54
1 0 0 0 1 0 29 49 55 54
Code:
Private Sub CommandButton21_Click()
Cells.Interior.ColorIndex = 0
For Each rw In Range("A1:V22").Rows
If Application.Sum(rw.Resize(, 4)) = 0 Then
cll.Interior.ColorIndex = 3
For Each cll In rw.Offset(, 4).Resize(, 18).Cells
If cll.Value > 50 Then cll.Interior.ColorIndex = 3
Next cll
End If
Next rw
End Sub
If I'm reading right what you want, you could try this. This will ask you to input the name and will then go through your motions on that particular column as a range for the loops. Is that what you are after?
Also, I've changed
If Application.Sum(rw.Resize(, 4)) = 0 Then
cll.Interior.ColorIndex = 3
To rw.Interior.Colorindex = 3 - as I'm guessing this was an error (as you can't use a variable outside of its loop
Private Sub CommandButton21_Click()
searchstring = InputBox("Input name?")
Set coll = Rows(1).Find(What:=searchstring, LookIn:=xlValues, LookAt:=xlWhole, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=False)
If coll Is Nothing Then
MsgBox "Name not found"
Exit Sub
Else
coll = coll.Column
Lrow = Range(Cells(2, coll), Cells(2, coll)).CurrentRegion.Rows.Count
End If
Cells.Interior.ColorIndex = 0
For Each rw In Range(Cells(2, coll), Cells(Lrow, coll))
If Application.Sum(rw.Resize(, 4)) = 0 Then
rw.Interior.ColorIndex = 3
For Each cll In rw.offset(, 4).Resize(, 18).Cells
If cll.Value > 50 Then cll.Interior.ColorIndex = 3
Next cll
End If
Next rw
End Sub
Related
I have a database with around 120.000 Entries and I need to do substring comparisons (where ... like 'test%') for an autocomplete function. The database won't change.
I have a column called "relevance" and for my searches I want them to be ordered by relevance DESC. I noticed, that as soon as I add the "ORDER BY relevance DESC" to my queries, the execution time increases by about 100% - since my queries already take around 100ms on average, this causes significant lag.
Does it make sense to re-order the whole database by relevance once so I can remove the ORDER BY? Can I be certain, that when searching through the table with SQL it will always go through the database in the order that I added the rows?
This is how my query looks like right now:
select *
from hao2_dict
where definitions like 'ba%'
or searchable_pinyin like 'ba%'
ORDER BY relevance DESC
LIMIT 100
UPDATE: For context, here is my DB structure:
And some time measurements:
Using an Index (relevance DESC) for the search term 'b%' gives me 50ms, which is faster than not using an Index. But the search term 'banana%' takes over 1700ms which is way slower than not using an Index. These are the results from 'explain':
b%:
0 Init 0 27 0 0
1 Noop 1 11 0 0
2 Integer 100 1 0 0
3 OpenRead 0 5 0 9 0
4 OpenRead 2 4223 0 k(2,-,) 0
5 Rewind 2 26 2 0 0
6 DeferredSeek 2 0 0 0
7 Column 0 6 4 0
8 Function 1 3 2 like(2) 0
9 If 2 13 0 0
10 Column 0 4 6 0
11 Function 1 5 2 like(2) 0
12 IfNot 2 25 1 0
13 IdxRowid 2 7 0 0
14 Column 0 1 8 0
15 Column 0 2 9 0
16 Column 0 3 10 0
17 Column 0 4 11 0
18 Column 0 5 12 0
19 Column 0 6 13 0
20 Column 0 7 14 0
21 Column 2 0 15 0
22 RealAffinity 15 0 0 0
23 ResultRow 7 9 0 0
24 DecrJumpZero 1 26 0 0
25 Next 2 6 0 1
26 Halt 0 0 0 0
27 Transaction 0 0 10 0 1
28 String8 0 3 0 b% 0
29 String8 0 5 0 b% 0
30 Goto 0 1 0 0
banana%:
0 Init 0 27 0 0
1 Noop 1 11 0 0
2 Integer 100 1 0 0
3 OpenRead 0 5 0 9 0
4 OpenRead 2 4223 0 k(2,-,) 0
5 Rewind 2 26 2 0 0
6 DeferredSeek 2 0 0 0
7 Column 0 6 4 0
8 Function 1 3 2 like(2) 0
9 If 2 13 0 0
10 Column 0 4 6 0
11 Function 1 5 2 like(2) 0
12 IfNot 2 25 1 0
13 IdxRowid 2 7 0 0
14 Column 0 1 8 0
15 Column 0 2 9 0
16 Column 0 3 10 0
17 Column 0 4 11 0
18 Column 0 5 12 0
19 Column 0 6 13 0
20 Column 0 7 14 0
21 Column 2 0 15 0
22 RealAffinity 15 0 0 0
23 ResultRow 7 9 0 0
24 DecrJumpZero 1 26 0 0
25 Next 2 6 0 1
26 Halt 0 0 0 0
27 Transaction 0 0 10 0 1
28 String8 0 3 0 banana% 0
29 String8 0 5 0 banana% 0
30 Goto 0 1 0 0
Can I be certain, that when searching through the table with SQL it will always go through the database in the order that I added the rows?
No. SQL results have no inherent order. They might come out in the order you inserted them, but there is no guarantee.
Instead, put an index on the column. Indexes keep their values in order.
However, this will only deal with the sorting. In the query above it still has to search the whole table for rows with matching definitions and searchable_pinyins. In general, SQL will only use one index per table at a time; usually trying to use two is inefficient. So you need one multi-column index to make this query not have to search the whole table and get the results in sorted order. Make sure relevance is first, you need to have the index columns in the same order as your order by.
(relevance, definitions, searchable_pinyins) will make that query use only the index for searching and sorting. Adding (relevance, searchable_pinyins) as well will handle searching by definitions, searchable_pinyins, or both.
I have this dataframe which looks like this:
user_id : Represents user
question_id : Represent question number
user_answer : which option user has opted for the specific question from (A,B,C,D)
correct_answer: What is correct answer for that specific question
correct : 1.0 it means user answer is right
elapsed_time : it represents time in minutes user took to answer that question
timestamp : UNIX TIMESTAMP OF EACH INTERACTION
real_date : I have added this column and converted timestamp to human date & time
** user_*iD ***
** question_*id ***
** user_*answer ***
** correct_answer **
** correct **
** elapsed_*time ***
** solving_*id ***
** bundle_*id ***
timestamp
real_date
1
1
A
A
1.0
5.00
1
b1
1547794902000
Friday, January 18, 2019 7:01:42 AM
1
2
D
D
1.0
3.00
2
b2
1547795130000
Friday, January 18, 2019 7:05:30 AM
1
5
C
C
1.0
7.00
5
b5
1547795370000
Friday, January 18, 2019 7:09:30 AM
2
10
C
C
1.0
5.00
10
b10
1547806170000
Friday, January 18, 2019 10:09:30 AM
2
1
B
B
1.0
15.0
1
b1
1547802150000
Friday, January 18, 2019 9:02:30 AM
2
15
A
A
1.0
2.00
15
b15
1547803230000
Friday, January 18, 2019 9:20:30 AM
2
7
C
C
1.0
5.00
7
b7
1547802730000
Friday, January 18, 2019 9:12:10 AM
3
12
A
A
1.0
1.00
25
b12
1547771110000
Friday, January 18, 2019 12:25:10 AM
3
10
C
C
1.0
2.00
10
b10
1547770810000
Friday, January 18, 2019 12:20:10 AM
3
3
D
D
1.0
5.00
3
b3
1547770390000
Friday, January 18, 2019 12:13:10 AM
104
6
C
C
1.0
6.00
6
b6
1553040610000
Wednesday, March 20, 2019 12:10:10 AM
104
4
A
A
1.0
5.00
4
b4
1553040547000
Wednesday, March 20, 2019 12:09:07 AM
104
1
A
A
1.0
2.00
1
b1
1553040285000
Wednesday, March 20, 2019 12:04:45 AM
I need to do some encoding , I don't know which encoding should I do and how?
What i need a next dataframe to look like this :
user_id
b1
b2
b3
b4
b5
b6
b7
b8
b9
b10
b11
b12
b13
b14
b15
1
1
2
0
0
3
0
0
0
0
0
0
0
0
0
0
2
1
0
0
0
0
0
0
0
0
2
0
0
0
0
3
3
0
0
1
0
0
0
0
0
0
2
0
3
0
0
0
104
1
0
0
2
0
3
0
0
0
0
0
0
0
0
0
As you can see with the help of timestamp and real_date ; the question_id of each user is not sorted,
The new dataframe should contain which of the bundles user has interacted with, time-based sorted.
First create the final value for each bundle element using groupby and cumcount then pivot your dataframe. Finally reindex it to get all columns:
bundle = [f'b{i}' for i in range(1, 16)]
values = df.sort_values('timestamp').groupby('user_iD').cumcount().add(1)
out = (
df.assign(value=values).pivot_table('value', 'user_iD', 'bundle_id', fill_value=0)
.reindex(bundle, axis=1, fill_value=0)
)
Output:
>>> out
bundle_id b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15
user_iD
1 1 2 0 0 3 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 2 0 0 4 0 0 0 0 3
3 0 0 1 0 0 0 0 0 0 2 0 3 0 0 0
104 1 0 0 2 0 3 0 0 0 0 0 0 0 0 0
>>> out.reset_index().rename_axis(columns=None)
user_iD b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15
0 1 1 2 0 0 3 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0 0 2 0 0 4 0 0 0 0 3
2 3 0 0 1 0 0 0 0 0 0 2 0 3 0 0 0
3 104 1 0 0 2 0 3 0 0 0 0 0 0 0 0 0
Lacking more Pythonish experience, I'm proposing the following (partially commented) code snippet which is not optimized in any way, being based merely on elementary pandas.DataFrame API reference.
import pandas as pd
import io
import sys
data_string = '''
user_iD;question_id;user_answer;correct_answer;correct;elapsed_time;solving_id;bundle_id;timestamp
1;1;A;A;1.0;5.00;1;b1;1547794902000
1;2;D;D;1.0;3.00;2;b2;1547795130000
1;5;C;C;1.0;7.00;5;b5;1547795370000
2;10;C;C;1.0;5.00;10;b10;1547806170000
2;1;B;B;1.0;15.0;1;b1;1547802150000
2;15;A;A;1.0;2.00;15;b15;1547803230000
2;7;C;C;1.0;5.00;7;b7;1547802730000
3;12;A;A;1.0;1.00;25;b12;1547771110000
3;10;C;C;1.0;2.00;10;b10;1547770810000
3;3;D;D;1.0;5.00;3;b3;1547770390000
104;6;C;C;1.0;6.00;6;b6;1553040610000
104;4;A;A;1.0;5.00;4;b4;1553040547000
104;1;A;A;1.0;2.00;1;b1;1553040285000
'''
df = pd.read_csv( io.StringIO(data_string), sep=";", encoding='utf-8')
# get only necessary columns ordered by timestamp
df_aux = df[['user_iD','bundle_id','correct', 'timestamp']].sort_values(by=['timestamp'])
# hard coded new headers (possible to build from real 'bundle_id's)
df_new_headers = ['b{}'.format(x+1) for x in range(15)]
df_new_headers.insert(0, 'user_iD')
dict_answered = {}
# create a new dataframe (I'm sure that there is a more Pythonish solution)
df_new_data = []
user_ids = sorted(set( [x for label, x in df_aux.user_iD.items()]))
for user_id in user_ids:
dict_answered[user_id] = 0
if len( sys.argv) > 1 and sys.argv[1]:
# supplied arg in the next line for better result readability
df_new_values = [sys.argv[1].strip('"').strip("'")
for x in range(len(df_new_headers)-1)]
else:
# zeroes (original assignment)
df_new_values = [0 for x in range(len(df_new_headers)-1)]
df_new_values.insert(0, user_id)
df_new_data.append(df_new_values)
df_new = pd.DataFrame(data=df_new_data, columns=df_new_headers)
# fill the new dataframe using values from the original one
for aux in df_aux.itertuples(index=True, name=None):
if aux[3] == 1.0:
# add 1 to number of already answered questions for current user
dict_answered[aux[1]] += 1
df_new.loc[ df_new["user_iD"] == aux[1], aux[2]] = dict_answered[aux[1]]
print( df_new)
Output examples
Example: .\SO\70751715.py
user_iD b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15
0 1 1 2 0 0 3 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0 0 2 0 0 4 0 0 0 0 3
2 3 0 0 1 0 0 0 0 0 0 2 0 3 0 0 0
3 104 1 0 0 2 0 3 0 0 0 0 0 0 0 0 0
Example: .\SO\70751715.py .
user_iD b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15
0 1 1 2 . . 3 . . . . . . . . . .
1 2 1 . . . . . 2 . . 4 . . . . 3
2 3 . . 1 . . . . . . 2 . 3 . . .
3 104 1 . . 2 . 3 . . . . . . . . .
Example: .\SO\70751715.py ''
user_iD b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15
0 1 1 2 3
1 2 1 2 4 3
2 3 1 2 3
3 104 1 2 3
I think you are looking for LabelEncoder. First import the library:
#Common Model Helpers
from sklearn.preprocessing import LabelEncoder
Then you should be able to convert objects to category:
#CONVERT: convert objects to category
#code categorical data
label = LabelEncoder()
dataset['question_id'] = label.fit_transform(dataset['question_id']
dataset['user_answer'] = label.fit_transform(dataset['user_answer'])
dataset['correct_answer'] = label.fit_transform(dataset['correct_answer'])
Or just use below:
dataset.apply(LabelEncoder().fit_transform)
I have about 16 dataframes representing weekly users' clickstream data. The photos show the samples for weeks from 0-3. I want to make a new dataframe in this way: for example if a new df is w=2, then w2=w0+w1+w2. For w3, w3=w0+w1+w2+3. As you can see the datasets do not have identical id_users, but id a user does not show in a certain week. All dataframes have the same columns, but indexes are not exactly same. So how to add based on the logic where indexes match?
id_user c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11
43284 1 8 0 8 5 0 0 0 2 3 1
45664 0 16 0 4 0 0 0 0 5 16 2
52014 0 0 0 5 4 0 0 0 0 2 2
53488 1 37 0 19 0 0 3 0 3 23 6
60135 0 124 0 87 3 0 24 0 8 19 14
id_user c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11
40419 0 8 0 3 4 0 6 0 1 6 0
43284 1 4 0 14 26 2 0 0 2 4 2
45664 0 9 0 15 11 0 0 0 1 6 14
52014 0 0 0 8 9 0 8 0 2 2 1
53488 0 2 0 4 0 0 4 0 0 0 0
id_user c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11
40419 0 8 0 3 4 0 6 0 1 6 0
43284 1 4 0 14 26 2 0 0 2 4 2
45664 0 9 0 15 11 0 0 0 1 6 14
52014 0 0 0 8 9 0 8 0 2 2 1
53488 0 2 0 4 0 0 4 0 0 0 0
concat then groupby sum
out = pd.concat([df1,df2]).groupby('id_user',as_index=False).sum()
Out[147]:
id_user c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11
0 40419 0 8 0 3 4 0 6 0 1 6 0
1 43284 2 12 0 22 31 2 0 0 4 7 3
2 45664 0 25 0 19 11 0 0 0 6 22 16
3 52014 0 0 0 13 13 0 8 0 2 4 3
4 53488 1 39 0 23 0 0 7 0 3 23 6
5 60135 0 124 0 87 3 0 24 0 8 19 14
I have a dataframe like the one below. I would like to add one to all of the values in each row. I am new to this forum and python so i can't conceptualise how to do this. I need to add 1 to each value. I intend to use bayes probability and the posterior probability will be 0 when i multiply them. PS. I am also new to probability but others have applied the same method. Thanks for your help in advance. I am using pandas to do this.
Disease Gene1 Gene2 Gene3 Gene4
D1 0 0 25 0
D2 0 0 0 0
D3 0 17 0 16
D4 24 0 0 0
D5 0 0 0 0
D6 0 32 0 11
D7 0 0 0 0
D8 4 0 0 0
With this being your dataframe:
df = pd.DataFrame({
"Disease":[f"D{i}" for i in range(1,9)],
"Gene1":[0,0,0,24,0,0,0,4],
"Gene2":[0,0,17,0,0,32,0,0],
"Gene3":[25,0,0,0,0,0,0,0],
"Gene4":[0,0,16,0,0,11,0,0]})
Disease Gene1 Gene2 Gene3 Gene4
0 D1 0 0 25 0
1 D2 0 0 0 0
2 D3 0 17 0 16
3 D4 24 0 0 0
4 D5 0 0 0 0
5 D6 0 32 0 11
6 D7 0 0 0 0
7 D8 4 0 0 0
The easiest way to do this is to do
df += 1
However, since you have a column which is string (The Disease column)
This will not work.
But we can conveniently set the Disease column to be the index, like this:
df.set_index('Disease', inplace=True)
Now your dataframe looks like this:
Gene1 Gene2 Gene3 Gene4
Disease
D1 0 0 25 0
D2 0 0 0 0
D3 0 17 0 16
D4 24 0 0 0
D5 0 0 0 0
D6 0 32 0 11
D7 0 0 0 0
D8 4 0 0 0
And if we do df += 1 now, we get:
Gene1 Gene2 Gene3 Gene4
Disease
D1 1 1 26 1
D2 1 1 1 1
D3 1 18 1 17
D4 25 1 1 1
D5 1 1 1 1
D6 1 33 1 12
D7 1 1 1 1
D8 5 1 1 1
because the plus operation only acts on the data columns, not on the index.
You can also do this on column basis, like this:
df["Gene1"] = df["Gene1"] + 1
You can filter the df whether the underlying dtype is not 'object':
In [110]:
numeric_cols = [col for col in df if df[col].dtype.kind != 'O']
numeric_cols
Out[110]:
['Gene1', 'Gene2', 'Gene3', 'Gene4']
In [111]:
df[numeric_cols] += 1
df
Out[111]:
Disease Gene1 Gene2 Gene3 Gene4
0 D1 1 1 26 1
1 D2 1 1 1 1
2 D3 1 18 1 17
3 D4 25 1 1 1
4 D5 1 1 1 1
5 D6 1 33 1 12
6 D7 1 1 1 1
7 D8 5 1 1 1
EDIT
It looks like your df possibly has strings instead of numeric types, you can convert the dtype to numeric using convert_objects:
df = df.convert_objects(convert_numeric=True)
I have some data i aquire from some linux box and want to put it into a NSDictionary for later processing.
How wold you get this NSString into a NSDictionary like the following?
data
(
bytes
(
60 ( 1370515694 )
48 ( 812 )
49 ( 300 )
...
)
pkt
(
60 ( 380698 )
59 ( 8 )
58 ( 412 )
...
)
block
(
60 ( 5 )
48 ( 4 )
49 ( 7 )
...
)
drop
(
60 ( 706 )
48 ( 2 )
49 ( 4 )
...
)
session
(
60 ( 3 )
48 ( 1 )
49 ( 2 )
...
)
)
The data string looks like:
//time bytes pkt block drop session
60 1370515694 380698 5 706 3
48 812 8 4 2 1
49 300 412 7 4 2
50 0 0 0 0 0
51 87 2 0 0 0
52 87 2 0 0 0
53 0 0 0 0 0
54 0 0 0 0 0
55 0 0 0 0 0
56 0 0 0 0 0
57 812 8 0 0 0
58 812 8 0 0 0
59 0 0 0 0 0
0 0 0 0 0 0
1 2239 12 2 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
6 0 0 0 0 0
7 2882 19 2 0 0
8 4906 29 4 0 0
9 1844 15 11 0 0
10 4210 29 17 0 0
11 3370 18 4 0 0
12 3370 18 4 0 0
13 1184 7 3 0 0
14 0 0 0 0 0
15 4046 19 3 0 0
16 4956 23 3 0 0
17 2960 18 2 0 0
18 2960 18 2 0 0
19 1088 6 2 0 0
20 0 0 0 0 0
21 3261 17 3 0 0
22 3261 17 3 0 0
23 1228 6 2 0 0
24 1228 6 2 0 0
25 2628 17 2 0 0
26 4688 26 3 0 0
27 1752 13 5 0 0
28 3062 21 5 0 0
29 174 2 2 0 0
30 96 1 1 0 0
31 4351 23 5 0 0
32 0 0 0 0 0
33 4930 23 7 0 0
34 6750 31 7 0 0
35 1241 6 2 0 0
36 1241 6 2 0 0
37 3571 29 2 0 0
38 0 0 0 0 0
39 1010 5 1 0 0
40 1010 5 1 0 0
41 88859 72 3 0 1
42 90783 81 4 0 1
43 2914 19 3 0 0
44 0 0 0 0 0
45 2157 17 1 0 0
46 2157 17 1 0 0
47 78 1 1 0 0
.
Time (first column) should be the key for the sub-sub-dictionaries.
So the idea behind all that is that i can later randmly access the PKT value at a given TIME x, as well as the BLOCK amount at TIME y, and SESSION value at TIME z .. and so on..
Thanks in advance
You probably don't want a dictionary but an array containing dictionaries of all the data entries. The simplest way to parse something like this in Objective-C is to use the componentsSeparatedByString method in NSString
NSString* dataString = <Your Data String> // Assumes the items are separated by newlines
NSArray* items = [dataString componentsSeparatedByString:#"\n"];
NSMutableArray* dataDictionaries = [NSMutableArray array];
for (NSString* item in items) {
NSArray* elements = [item componentsSeparatedByString:#" "];
NSDictionary* entry = #{
#"time": [elements objectAtIndex:0],
#"bytes": [elements objectAtIndex:1],
#"pkt": [elements objectAtIndex:2],
#"block": [elements objectAtIndex:3], #"drop": [elements objectAtIndex:4],
#"session": [elements objectAtIndex:5],
};
[dataDictionaries addObject: entry];
}