Boundary Value Analysis - When to use two value or three value? - testing

I had a few questions regarding boundary value analysis that I was hoping someone could help me with. I am learning this for a university exam based off of the ISTQB spec, not for real world application at the moment.
The first is - when should you assume to use two and three value methods of BVA? Is there any actual distinction as to when you should use one or the other, or does it depend on the specific question asked (in terms of an examination) and I would simply need to just know how to use both? Is there a default method to use?
Secondly, consider this possible question:
A smart home app measures the average temperature in the
house and provides feedback to the occupants for different
average temperature ranges:
• Up to 10 C – Icy cool!
• 11 C to 15 C - Chilled out!
• 16 C to 19 C - Cool man!
• 20 C to 22 C –Too warm!
• Above 22 C - Hot and Sweaty!
Apply 3 point BVA to the above temperature ranges. How many test cases would you consider?
The answer that was provided was that:
For 10 - we would test 9, 10, 11.
For 11 - we would test 10 (already tested), 11 (already tested), 12
For 16 - we would test 15, 16, 17
For 20 - we would test 19, 20, 21
For 22 - we would test 21 (already tested), 22, 23
This would result in a a total of 12 test cases.
Can somebody please explain to me as to why we would test the upper boundary in the first partition (10 degrees), but not the upper boundaries in other valid partitions (such as 15, which would lead to us testing 14,15,16 or 19 leading to testing 18,19,20). Is this because the boundary of 10 is the only boundary within that partition, as the lower boundary is open?
As a follow on, assume that the boundaries were instead:
• 0 C to 10 C – Icy cool!
• 11 C to 15 C - Chilled out!
• 16 C to 19 C - Cool man!
• 20 C to 22 C –Too warm!
• 23 - 40 C - Hot and Sweaty!
Would the values that are tested then change to the below? Would we still need to test the upper boundary in the first partition, or would we now ignore this as we have a lower boundary value?
For the lower invalid partition -2, -1, 0
For the first valid partition: -1, 0, 1
For the second valid partition: 10, 11, 12
For the third valid partition: 15, 16, 17
For the fourth valid partition: 19, 20, 21
For the fifth valid partition: 22, 23, 24
For the upper invalid partition: 40, 41, 42
Thank you in advance - I think I have been over complicating things in my head and wracking my brain over this!

3-value boundaries are not strictly required. E.g. in your example 9 and 10 are from the same equivalence class. Personally I'd either skip 9 completely or randomize values across the whole partition.
In your "provided answers" the division "which cases relate to which partition" is arbitrary. Is 11 a "negative" boundary for ∞-10 range or a "positive" boundary for 11-15 range? There's no right answer. In practice you'll simply write these 8 checks w/o labelling which one is for which boundary:
* 10 - Icy
* 11, 15 - Chilled out
* 16, 19 - Cool man
* 20, 22 - Too warm
* 23 - Hot and Sweaty
If you decide to go with "3-value method" and check non-boundary values for each range you'll get additional 5 cases because there are 5 ranges.
Interestingly your "provided answer" is inconsistent. Every equivalence class has non-boundary value checked except for the "Hot and Sweaty" - it should've had 24 on the list resulting in 13 cases.
Disclaimer: this is an answer of a practitioner. I haven't read ISTQB, so if you need this info to pass a test (as opposed to using it in practice) - use it with caution.

Jordan -
You have a point and I think the authors of the question you quote made a mistake. Following the ISTQB definition (CTFL 2018, Section 4.2.2): "three boundary values per boundary: the values before, at, and just over the boundary."
The problem is that almost everywhere you look, the examples given are of a single valid range, with invalid ranges on both sides ("almost everywhere" includes Jorgensen's book, "SW testing, a Craftsman's Approach", which the ISTQB syllabus quotes as reference in this case).
Following the ISTQB definition above, for EACH boundary value, you need to repeat the "before, at, and just over" analysis. For the first example, this would give you:
For 10 - we would test 9, 10, 11.
For 11 - we would test 10 (already tested), 11 (already tested), 12
For 15 - we would test 14, 15, 16
For 16 - we would test 15 (already tested), 16 (already tested), 17
For 19 - we would test 18, 19, 20
For 20 - we would test 19 (already tested), 20 (already tested), 21
For 22 - we would test 21 (already tested), 22, 23
This would result in a a total of **14** test cases.
There is one way to explain the 12 tests in the answer, and for that you need to read "The Boundary Value Fallacy" by René Tuinhout, in the Sep 2018 issue of "Testing Experience" Magazine (good luck finding it... I tried a bit but the places I found all asked me to create an account).
Following that article, the values 12, 14 and the values 17,18 can be "merged" (that is, testing with one of them is sufficient to find the type of errors that are possible in code that defines boundaries). But I will be very surprised if the writers of the question (which I understand aims at the Foundation level) were thinking of this. It is definitely an advanced topic.

Related

Why do these 2 for looping over sequences differ?

First:
$ raku -e "for 1...6, 7...15 { .say }"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Now:
$ raku -e "for 1...3, 7...15 { .say }"
1
2
3
7
11
15
I would expect this case to print 1,2,3,7,8,... 15.
What's happening here?
I think you might want the raku Range operator .. (two dots) and not the raku Sequence operator ... (three dots).
Here's how you examples look with the Range operator instead:
> raku -e 'for 1..6, 7..15 { .say }'
1..6
7..15
Oh, that's not good ... looks like for is just iterating over the two things 1..6 and 7..15 and stringifying them.
We can use a Slip | to fix that:
> raku -e 'for |(1..6), |(7..15) { .say }'
1
2
... (all the numbers)
14
15
And then:
raku -e 'for |(1..3), |(7..15) { .say }'
1
2
3
7
8
9
10
11
12
13
14
15
With the Sequence operator, you have made something like:
>raku -e 'for 3,7...15 { .say }'
3
7
11
15
That is raku for "make a sequence that starts with 3, then 7, then all the values until you get to the last at 15" ... and since the gap from 3 to 7 is 4, raku will count up in steps of 4. Then you began it with 1..3. ;-)
~p6steve
It's because it is two deductive sequences.
1...3
Is obviously a sequence where you add 1 to each successive value.
1, 2, 3
And since 7 is 4 more than 3, this is a sequence where you add 4 to each successive value.
3, 7 ... 15
3, 7, 11, 15
To get what you want, you could use a flattened Range.
1...3, |(7..15)
Or even a flattened Sequence.
1...3, |(7...15)
TL;DR This answer focuses on addressing what you originally asked (which was about "sequences") and precisely what the code you wrote is doing, rather than providing a solution (using ranges instead).
This is a work in progress dealing with something that seems both poorly documented and hard to fathom (which may explain part though not all of the doc situation). Please bear with me! (And I may just end up deleting this answer.)
1 ... 3, 7 ... 15 ≡ 1 ... (3, 7) ... 15
In the absence of parentheses, operators within an expression are applied according to rules of "precedence" and "associativity".
Infix , has a higher precedence than infix ....¹ The above two lines of code thus produce the same result (1␤2␤3␤7␤11␤15␤):
for 1 ... 3, 7 ... 15 { .say } # Operator evaluation by precedence
for 1 ... (3, 7) ... 15 { .say } # Operator evaluation by parentheses
That said, while the result is what, given a glance at the code, I would expect based on my own "magical" DWIM ("Do What I Mean") thinking, I must say I don't yet know what the precise Raku(do)'s rule(s) are that lead to it DWIMing.
The doc for infix ... says:
If the endpoint is not *, it's smartmatched against each generated element and the sequence is terminated when the smartmatch succeeded.
But that seems overly simple. What if the endpoint of one sequence is another sequence? (As, at least taking a naive view, appears to be the case in your code.)
Also, as #MustafaAydin has noted:
how does your post explain the irregular last step size (of 2) instead of 3? I mean 4, 7 ... 15 alone produces (4, 7, 10, 13). But 1... 4, 7...15 now produces 7, 10, 13, 15 in the tail. Why is 15 included? Maybe i'm missing something idk
I'm at least as confused as Mustafa.
Indeed, I'm confused about several things. How come Raku(do) flattens the two sequences? [D'oh. Because the infix comma is higher precedence than the infix ....] Why doesn't it repeat the 3 in the final combined list? [Perhaps because multiple infix ...s are smart about what to do when there's an expression that's the endpoint of one sequence and the start of another?]
I'm going to go read the old design docs and/or spelunk roast and/or the Rakudo compiler code to see if I can see what's supposedly/actually going on. But not tonight.
Footnotes
¹ There's a table of operators in the current official operator doc. Supposedly this table:
summarizes the precedence levels offered by Raku, listing them in order from high to low precedence.
Unfortunately, at the time of writing this, the central operator table in the Operators page is profoundly wrong #4071.
Until that's fixed, here are "official" and "unofficial" options for determining the precedence of operators:
"official" Use in page search to search the official doc operator page for the operator of interest. Skip to the match in the entries on the left hand side of that same page. As you'll see, infix ,' is one level higher precedence than infix ...`:
Comma operator precedence
infix ,
infix :
List infix precedence
infix Z
infix X
infix ...
"unofficial" Look at the corresponding page of a staging site for an improved doc site. (I don't know how up to date it is, but the central table appears to list operators by precedence order as it claims.)

Creating a Nested/Loop Calculation in Vertica (?)

So maybe I'm just way over-thinking things, but is there any way to replicate a nested/loop calculation in Vertica with just SQL syntax.
Explanation -
In Column AP I have remaining values per month by an attribute key, in column CHANGE_1M I have an attribution value to apply.
The goal is for future values to calculate the preceding Row partition AP*CHANGE_1M, by the subsequent row partition CHANGE_1M to fill in the future AP values.
For reference I have 15,000 Keys Per Period and 60 Periods Per Year in the full-data set.
Sample Calculation
Period 5 =
(Period4_AP * Period5_CHANGE_1M)+Period4_AP
Period 6 =
(((Period4_AP * Period5_CHANGE_1M)+Period4_AP)*Period6_CHANGE_1M)
+
((Period4_AP * Period5_CHANGE_1M)+Period4_AP)
ect.
Sample Data on Top
Expected Results below
Vertica does not have (yet?) the RECURSIVE WITH clause, which you would need for the recursive calculation you seem to be needing here.
Only possible workaround would be tedious: write (or generate, using perl or Python, for example) as many nested queries as you need iterations.
I'll only want to detail this if you want to go down that path.
Long time no see - I should have returned to answer this question earlier.
I got so stuck on thinking of the programmatic way to solve this issue, I inherently forgot it is a math equation, and where you have math functions you have solutions.
Basically this question revolves around doing table multiplication.
The solution is to simply use LOG/LN functions to multiply and convert back using EXP.
Snippet of the simple solve.
Hope this helps other lost souls, don't forget your math background and spiral into a whirlpool of self-defeat.
EXP(SUM(LN(DEGREDATION)) OVER (ORDER BY PERIOD_NUMBER ASC ROWS UNBOUNDED PRECEDING)) AS DEGREDATION_RATE
** Controlled by what factors/attributes you need the data stratified by with a PARTITION
Basically instead of starting at the retention PX/P0, I back into with the degradation P1/P0 - P2/P1 ect.
PERIOD_NUMBER
DEGRADATION
DEGREDATION_RATE
DEGREDATION_RATE x 100000
0
100.00%
100.00%
100000.00
1
57.72%
57.72%
57715.18
2
60.71%
35.04%
35036.59
3
70.84%
24.82%
24820.66
4
76.59%
19.01%
19009.17
5
79.29%
15.07%
15071.79
6
83.27%
12.55%
12550.59
7
82.08%
10.30%
10301.94
8
86.49%
8.91%
8910.59
9
89.60%
7.98%
7984.24
10
86.03%
6.87%
6868.79
11
86.00%
5.91%
5907.16
12
90.52%
5.35%
5347.00
13
91.89%
4.91%
4913.46
14
89.86%
4.41%
4414.99
15
91.96%
4.06%
4060.22
16
89.36%
3.63%
3628.28
17
90.63%
3.29%
3288.13
18
92.45%
3.04%
3039.97
19
94.95%
2.89%
2886.43
20
92.31%
2.66%
2664.40
21
92.11%
2.45%
2454.05
22
93.94%
2.31%
2305.32
23
89.66%
2.07%
2066.84
24
94.12%
1.95%
1945.26
25
95.83%
1.86%
1864.21
26
92.31%
1.72%
1720.81
27
96.97%
1.67%
1668.66
28
90.32%
1.51%
1507.18
29
90.00%
1.36%
1356.46
30
94.44%
1.28%
1281.10
31
94.12%
1.21%
1205.74
32
100.00%
1.21%
1205.74
33
90.91%
1.10%
1096.13
34
90.00%
0.99%
986.52
35
94.44%
0.93%
931.71
36
100.00%
0.93%
931.71

Regex - isolating string from larger word

The following regex within DB2 SQL works pretty well to get extra elements out of an address (i.e. not the street name or number). Limiting myself to two cases (UNIT or GATE) to keep my example simple, where HAD1 is the field containing the first line of a street address:
select HAD1,
regexp_substr(HAD1,'(UNITS?|GATES?)\s[0-9A-Z]{1,}')
from ECH
where regexp_like(HAD1,'(UNIT|GATE)')
and length(trim(HAD1)) > 12
I get this:
Ship To REGEXP_SUBSTR
Address
Line 1
UNIT 4, 117 MONTGOMORIE RD UNIT 4
END OF WAINUI RD, HIGHGATE -
UNIT 3, 37 TE ROTO DRIVE UNIT 3
GATE 6 52 MAHIA ROAD GATE 6
UNIT B 11 LANGSTONE LANE UNIT B
ASHBURTON FITTINGS GATE 2 GATE 2
GOODS: PLACEMAKERS - WESTGATE -
UNIT 3, 37 TE ROTO DRIVE UNIT 3
ASHBURTON FITTINGS GATE 2 GATE 2
SH 8A TARRAS-LUGGATE HIGHWAY GATE HIGHWAY
Which is very encouraging. It correctly didn't pick up HIGHGATE or WESTGATE because they weren't followed by a space then something else.
But it did pick up LUGGATE (last line), which I don't want. So, I'd like to be able to include that my text strings are not preceded by any character.
As you may guess I'm an absolute beginner with regex, so thank you for your patience.
Edit
Now I have my most excellent regex like so:
\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}
Using it over a larger data set I notice the occasional unwanted match where, for instance, GATE is followed by an ordinary English word:
THE THIRD GATE ON THE LEFT = GATE ON
The gates, levels, doors and units that I'm looking for will always be followed by one of the following: (a) A number of up to 6 digits (b) One letter (c) A number and one letter, possibly with a dash
Examples:
UNIT 7A
GATE 6
GATE 31113
UNIT B
LEVEL B2
LEVEL 2B
UNIT D06
So, my follow up question is, can I limit the number of letters in second part of the expression to 0 or 1, but allow up to six digits.
I've played around with the numbers in curly brackets but they seem to affect only how many characters are returned rather than how many characters must be present.

Composite indexing using Redis in a hierarchical data model

I have a data model like this:
Fields:
counter number (e.g. 00888, 00777, 00123 etc)
counter code (e.g. XA, XD, ZA, SI etc)
start date (e.g. 2017-12-31 ...)
end date (e.g. 2017-12-31 ...)
Other counter date (e.g. xxxxx)
Current Datastructure organization is like this (root and multiple child format):
counter_num + counter_code
---> start_date + end_date --> xxxxxxxx
---> start_date + end_date --> xxxxxxxx
---> start_date + end_date --> xxxxxxxx
Example:
00888 + XA
---> Jan 10 + Jan 20 --> xxxxxxxx
---> Jan 21 + Jan 31 --> xxxxxxxx
---> Feb 01 + Dec 31 --> xxxxxxxx
00888 + ZI
---> Jan 09 + Feb 24 --> xxxxxxxx
---> Feb 25 + Dec 31 --> xxxxxxxx
00777 + XA
---> Jan 09 + Feb 24 --> xxxxxxxx
---> Feb 25 + Dec 31 --> xxxxxxxx
Today the retrieval happens in 2 ways:
//Fetch unique counter data using all the composite keys
counter_number + counter_code + date (start_date <= date <= end_date)
//Fetch all the counter codes and corresponding data matching the below conditions
counter_number + date (start_date <= date <= end_date)
What's the best way to model this in redis as I need to cache some of the frequently hit data. I feel sorted sets should do this somehow, but unable to model it.
UPDATE:
Just to remove the confusion, the ask here is not for an SQL "BETWEEN" like query. 'Coz I don't know what the start_date and end_date values are. Think they are just column names.
What I don't want is
SELECT * FROM redis_db
WHERE counter_num AND
date_value BETWEEN start_date AND end_date
What I want is
SELECT * FROM redis_db
WHERE counter_num AND
start_date <= specifc_date AND end_date >= specific_date
NOTE: The requirement is pretty much close to 2D indexing of what is proposed in Redis multi-dimensional indexing document
https://redis.io/topics/indexes#multi-dimensional-indexes
I understood the concept but unable to digest the implementation detail that is given.
I'm unlikely to get this done in time for the bounty, but what the hell...
This sounds like a job for geohashing. Geohashing is what you do when you want to index a 2-dimensional (or higher) dataset. For example, if you have a database of cities and you want to be able to quickly respond to queries like "find all the cities within 50km of X", you use geohashing.
For the purposes of this question, you can think of start_date and end_date as x and y coordinates. Normally in geohashing you're searching for points in your dataset near a particular point in space, or in a certain bounded region of space. In this case you just have a lower bound on one of the coordinates and an upper bound on the other one. But I suppose in practice the whole dataset is bounded anyway, so that's not a problem.
It would be nice if there was a library for doing this in Redis. There probably is, if you look hard enough. The newer versions of Redis have built-in geohashing functionality. See the commands starting with GEO. But it doesn't claim to be very accurate, and it's designed for the surface of a sphere rather than a flat surface.
So as far as I can see you have 3 options:
Map your search space to a small part of the sphere, preferably near the equator. Use the Redis GEO commands. To search, use GEOSPHERE on a circle covering the triangle you're trying to search, taking into account the inbuilt inaccuracy and the distortion you get by mapping onto the sphere, then filter the results to get the ones that are actually inside the triangle.
Find some 3rd-party geohashing client for Redis which works on flat space and is more accurate than GEO.
Read the rest of this answer, or some other primer on geohashing, then implement it yourself on top of Redis. This is the hardest (but most educational) option.
If you have a database that indexes data using a numerical ordering, such that you can do queries like "find all the rows/records for which z is between a and b", you can build a geohash index on top of it. Suppose the coordinates are (non-negative) integers x and y. Then you add an integer-valued column z, and index by z. To calculate z, write x and y in binary, then take alternate digits from each. Example:
x = 969 = 0 1 1 1 1 0 0 1 0 0 1
y = 1130 = 1 0 0 0 1 1 0 1 0 1 0
z = 1750214 = 0110101011010011000110
Note that the index allows you to find, for example, all records positioned with z between 0101100000000000000000 and 0101101111111111111111 inclusive. In other words, all records for which z starts with 010110. Or to put it another way, you can find all records for which x starts with 001 and y starts with 110. This set of records corresponds to a square in the 2-dimensional space we are trying to search.
Not all squares can be searched in this way. We'll call these ones searchable squares. Suppose the client sends a request for all records for which (x,y) is inside a particular rectangle. (Or a circle, or some other reasonable geometric shape.) Then you need to find a set of searchable squares which cover the rectangle. Then, for each of these squares you've chosen, query the database for records inside that square and send the results to the client. (But you'll have to filter the results, because not all the records in the square are actually in the original rectangle.)
There's a balance to be struck. If you choose a small number of large special squares, you'll probably end up covering a much larger area of the map than you need; the query to the database will return lots of extra results that you'll have to filter out. Alternatively, if you use lots of little special squares, you'll be doing lots of queries to the database, many of which will return no results.
I said above that x and y could be start_time and end_time. But actually the distribution of your dataset won't be as symmetrical as in most uses of geohashing. So the performance might be better (or worse) if you use x = end_time + start_time and y = end_time - start_time.
Because your question remains a bit vague on how you desire to query your data, it remains unclear on how to solve your question. With that in mind, however, here are my thoughts on how I might model your data:
Updated answer, detailing how to use SORTED SET
I have edited this answer to be able to store your values in a way that you can query by dynamic date ranges. This edit assumes that your database values are timestamps, as in the value is for a single time, not 2, as in your current setup.
Yes, you are correct that using Sorted Sets will be able to accomplish this. I suggest that you always use a Unix timestamp value for the score component in these sorted sets.
In case you were not already familiar with redis, let's explain indexing limitations. Redis is a simple key-value designed to quickly retrieve values by a key. Because of this design, it does not contain many features of your traditional DBMS, like indexing a column for instance.
In redis, you accomplish indexing by using a key, and the most nested key-like structures are available in HASH and SORTED SET, but you only get 2 key-like structures. In a HASH, you have the key (same as any data type), and a inner hash key, which can take the form of any string.
In a SORTED SET, you have the key (same as any data type), and a numeric value.
A HASH is nice to use to keep a grouped data organized.
A SORTED SET is nice if you want to query by a range of values. This could be a good fit for your data.
Your SORTED SET would look like the following:
key
00888:XA =>
score (date value) value
1452427200 (2016-01-10) xxxxxxxx
1452859200 (2016-01-10) yyyyxxxx
1453291200 (2016-01-10) zzzzxxxx
Let's use a more intuitive example, the 2017 Juventus roster:
To produce the SORTED SET in the table below, issue this command in your redis client:
ZADD JUVENTUS 32 "Emil Audero" 1 "Gianluigi Buffon" 42 "Mattia Del Favero" 36 "Leonardo Loria" 25 "Neto" 15 "Andrea Barzagli" 4 "Medhi Benatia" 19 "Leonardo Bonucci" 3 "Giorgio Chiellini" 40 "Luca Coccolo" 29 "Paolo De Ceglie" 26 "Stephan Lichtsteiner" 12 "Alex Sandro" 24 "Daniele Rugani" 43 "Alessandro Semprini" 23 "Dani Alves" 22 "Kwadwo Asamoah" 7 "Juan Cuadrado" 6 "Sami Khedira" 18 "Mario Lemina" 46 "Mehdi Leris" 38 "Rolando Mandragora" 8 "Claudio Marchisio" 14 "Federico Mattiello" 45 "Simone Muratore" 20 "Marko Pjaca" 5 "Miralem Pjanic" 28 "Tomás Rincón" 27 "Stefano Sturaro" 21 "Paulo Dybala" 9 "Gonzalo Higuaín" 34 "Moise Kean" 17 "Mario Mandzukic"
Jersey Name Jersey Name
32 Emil Audero 23 Dani Alves
1 Gianluigi Buffon 42 Mattia Del Favero
36 Leonardo Loria 25 Neto
15 Andrea Barzagli 4 Medhi Benatia
19 Leonardo Bonucci 3 Giorgio Chiellini
40 Luca Coccolo 29 Paolo De Ceglie
26 Stephan Lichtsteiner 12 Alex Sandro
24 Daniele Rugani 43 Alessandro Semprini
22 Kwadwo Asamoah 7 Juan Cuadrado
6 Sami Khedira 18 Mario Lemina
46 Mehdi Leris 38 Rolando Mandragora
8 Claudio Marchisio 14 Federico Mattiello
45 Simone Muratore 20 Marko Pjaca
5 Miralem Pjanic 28 Tomás Rincón
27 Stefano Sturaro 21 Paulo Dybala
9 Gonzalo Higuaín 34 Moise Kean
17 Mario Mandzukic
To query the roster by a range of jersey numbers:
ZRANGEBYSCORE JUVENTUS 1 5
Output:
1) "Gianluigi Buffon"
2) "Giorgio Chiellini"
3) "Medhi Benatia"
4) "Miralem Pjanic"
Note that the scores are not returned, however ZRANGEBYSCORE command orders the results in ASC order by score.
To add the scores, append "WITHSCORES" to the command, like so: ZRANGEBYSCORE JUVENTUS 1 5 WITHSCORES
By using ZRANGEBYSCORE, you should be able to query any key (counter number + counter code) with a date range,
producing the values in that range.
Original: Below is my original answer, recommending HASH
Based on your examples, I recommend you use a HASH.
With a hash, you would have a main key to find the hash (Ex. 00888:XA). Then within the hash, you have key -> value pairs (Ex. 2017-01-10:2017-01-20 -> xxxxxxxx). I prefer to delimit or tokenize my keys' components with the colon char :, but you can use any delimiter.
HASH follows your example data structure very well:
key
00888:XA =>
hashkey value
2017-01-10:2017-01-20 xxxxxxxx
2017-01-21:2017-01-31 yyyyxxxx
2016-02-01:2016-12-31 zzzzxxxx
key
00888:ZI =>
hashkey value
2017-01-10:2017-01-20 xxxxxxxx
2017-01-21:2017-01-31 xxxxyyyy
2016-02-01:2016-12-31 xxxxzzzz
When querying for data, instead of GET key, you would query with HGET key hashkey. Same for setting values, instead of SET key value, use HSET key hashkey value.
Example commands
HSET 00777:XA 2017-01-10:2017-01-20 xxxxxxxx
HSET 00777:XA 2017-01-21:2017-01-31 yyyyyyyy
HSET 00777:XA 2016-02-01:2016-12-31 zzzzzzzz
(Note: there is also a HMSET to simplify this into a single command)
Then:
HGET 00777:XA 2017-01-21:2017-01-31
Would return yyyyyyyy
Unless there is some specific performance consideration, or other goal for your data, I think Hashes will work great for your system.
It's also very convenient if you want to get all hashkeys or all values for a given hash, using commands like HKEYS, HVALS, or HGETALL.

How to find the "lexical file" in Wordnet?

If you look at the original Wordnet search and select "Display options: Show Lexical File Info", you'll see an extremely useful classification of words called lexical file. Eg for "filling" we have:
<noun.substance>S: (n) filling, fill (any material that fills a space or container)
<noun.process>S: (n) filling (flow into something (as a container))
<noun.food>S: (n) filling (a food mixture used to fill pastry or sandwiches etc.)
<noun.artifact>S: (n) woof, weft, filling, pick (the yarn woven across the warp yarn in weaving)
<noun.artifact>S: (n) filling ((dentistry) a dental appliance consisting of ...)
<noun.act>S: (n) filling (the act of filling something)
The first thing in brackets is the "lexical file". Unfortunately I have not been able to find a SPARQL endpoint that provides this info
The latest RDF translation of Wordnet 3.0 points to two things:
Talis SPARQL endpoint. Use eg this query to check there's no such info:
DESCRIBE <http://purl.org/vocabularies/princeton/wn30/synset-chair-noun-1>
W3C's mapping description. Appendix D "Conversion details" describes something useful: wn:classifiedByTopic.
But it's not the same as lexical file, and is quite incomplete. Eg "chair" has nothing, while one of the senses of "completion" is in the topic "American Football"
DESCRIBE <http://purl.org/vocabularies/princeton/wn30/synset-completion-noun-1> ->
<j.1:classifiedByTopic rdf:resource="http://purl.org/vocabularies/princeton/wn30/synset-American_football-noun-1"/>
The question: is there a public Wordnet query API, or a database, that provides the lexical file information?
Using the Python NLTK interface:
from nltk.corpus import wordnet as wn
for synset in wn.synsets('can'):
print synset.lexname
I don't think you can find it in the RDF/OWL Representation of WordNet. It's in the WordNet distribution though: dict/lexnames. Here is the content of the file as of WordNet 3.0:
00 adj.all 3
01 adj.pert 3
02 adv.all 4
03 noun.Tops 1
04 noun.act 1
05 noun.animal 1
06 noun.artifact 1
07 noun.attribute 1
08 noun.body 1
09 noun.cognition 1
10 noun.communication 1
11 noun.event 1
12 noun.feeling 1
13 noun.food 1
14 noun.group 1
15 noun.location 1
16 noun.motive 1
17 noun.object 1
18 noun.person 1
19 noun.phenomenon 1
20 noun.plant 1
21 noun.possession 1
22 noun.process 1
23 noun.quantity 1
24 noun.relation 1
25 noun.shape 1
26 noun.state 1
27 noun.substance 1
28 noun.time 1
29 verb.body 2
30 verb.change 2
31 verb.cognition 2
32 verb.communication 2
33 verb.competition 2
34 verb.consumption 2
35 verb.contact 2
36 verb.creation 2
37 verb.emotion 2
38 verb.motion 2
39 verb.perception 2
40 verb.possession 2
41 verb.social 2
42 verb.stative 2
43 verb.weather 2
44 adj.ppl 3
For each entry of dict/data.*, the second number is the lexical file info. For example, this filling entry contains the number 13, which is noun.food.
07883031 13 n 01 filling 0 002 # 07882497 n 0000 ~ 07883156 n 0000 | a food mixture used to fill pastry or sandwiches etc.
It can be done through MIT JWI (MIT Java Wordnet Interface) a Java API to query Wordnet. There's a topic in this link showing how to implement a java class to access lexicographic
This is what worked for me,
Synset[] synsets = database.getSynsets(wordStr);
ReferenceSynset referenceSynset = (ReferenceSynset) synsets[i];
int lexicalCode =referenceSynset.getLexicalFileNumber();
Then use above table to deduce "lexnames" e.g. noun.time
If you're on Windows, chances are it is in your appdata, in the local directory. To get there, you will want to open your file browser, go to the top, and type in %appdata%
Next click on roaming, and then find the nltk_data directory. In there, you will have your corpora file. The full path is something like:
C:\Users\yourname\AppData\Roaming\nltk_data\corpora
and lexnames will present under
C:\Users\yourname\AppData\Roaming\nltk_data\corpora\wordnet.