I currently have a system which based on certain answers provided by a user certain sections are shown. currently I had it designed as:
Id---SectionId---QuestionId---Answer
for example, if User answered Y to question 12 I would enable Section 21. If they answered Y to both 13 AND 14 then Section 22 would be enabled.
1 --- 21 --- 12 --- Y
2 --- 22 --- 13 --- Y
2 --- 22 --- 14 --- Y
This was working fine, recently I started receiving request which are based on an OR condition. I was thinking of changing the table to the below design, but wanted feedback, if there is a better way to approach this:
Id---SectionId---ConditionalId
1---21---12
2---21---13
ConditionalId---QuestionId---Answer
12---4---Y
12---5---N
13---6---Y
13---7---Y
So this way for Section 21 to display the system would have to verify either ConditiondalId 12 or 13, if either is valid then display.
You should not save ConditionalId with the answer of the user.
UserAnswers table:
UserID - QuestionId - Answer
Question Table:
QuestionId - NextLevelId - ChildLevelCondition (0 for any, 1 for all)
Each question Id can tell from ChildLevelCondition how it should check the level below for Y and N.
If at the child you don't have as many Y as question and ChildLevelCondition = 1, it is not allowed. With at least 1 Y and ChildLevelCondition = 0, you can go to this level
You can use a group by and a case
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
So, let's say I have 5 items, A, B, C, D and E. Item A comes in sizes 1 and 2, item B comes in sizes 2 and 3, C comes in 1 and 3, D comes in 1 and E comes in 3. Now, I am considering 2 table options, as follow:
Table 1
Name
Size
A
1
A
2
B
2
B
3
C
1
C
3
D
1
E
3
Another option is Table 2, as follows:
Name
A1
A2
B2
B3
C1
C3
D1
E3
Now, which of these 2 tables is actually a better option? What are the advantages and disadvantages (if any) of each of the 2 tables above? One thing that I can think of is that, if I use table 1, I can easily extract all items by size, no matter what item I want. So, for instance, if I want to analyze this month's sales of items of size 1, it's easy to do it with Table 1. I can't seem to see the same advantage if I use table 2. What do you guys think? Please kindly enlighten me on this matter. Thank you in advance for your kind assistance, everyone. Cheers! :)
I don't even understand why you have the second table option - what purpose does it have or how does it help you? Plain and simple you have a one to many relationship. That is an item comes in 1 or more different sizes. You just saying that sentence should scream ONLY option 1. Option 2 will make your life a living hell because you are going against normalization guidelines by taking 2 datatypes into 1, and it has no real benefit.
Option 1 says I have an item and it can have one or more sizes associated with it.
Item Size
A 1
A 2
A 3
B 1
C 1
C 2
Then you can do simple queries like give me all items that have more then 1 size. Give me any item that only has 1 size. Give me all the sizes of item with item id A, etc.
I have a column using bits to record status of every mission. The index of bits represents the number of mission while 1/0 indicates if this mission is successful and all bits are logically isolated although they are put together.
For instance: 1010 is stored in decimal means a user finished the 2nd and 4th mission successfully and the table looks like:
uid status
a 1100
b 1111
c 1001
d 0100
e 0011
Now I need to calculate: for every mission, how many users passed this mission. E.g.: for mission1: it's 0+1+1+0+1 = 5 while for mission2, it's 0+1+0+0+1 = 2.
I can use a formula FLOOR(status%POWER(10,n)/POWER(10,n-1)) to get the bit of every mission of every user, but actually this means I need to run my query by n times and now the status is 64-bit long...
Is there any elegant way to do this in one query? Any help is appreciated....
The obvious approach is to normalise your data:
uid mission status
a 1 0
a 2 0
a 3 1
a 4 1
b 1 1
b 2 1
b 3 1
b 4 1
c 1 1
c 2 0
c 3 0
c 4 1
d 1 0
d 2 0
d 3 1
d 4 0
e 1 1
e 2 1
e 3 0
e 4 0
Alternatively, you can store a bitwise integer (or just do what you're currently doing) and process the data in your application code (e.g. a bit of PHP)...
uid status
a 12
b 15
c 9
d 4
e 3
<?php
$input = 15; // value comes from a query
$missions = array(1,2,3,4); // not really necessary in this particular instance
for( $i=0; $i<4; $i++ ) {
$intbit = pow(2,$i);
if( $input & $intbit ) {
echo $missions[$i] . ' ';
}
}
?>
Outputs '1 2 3 4'
Just convert the value to a string, remove the '0's, and calculate the length. Assuming that the value really is a decimal:
select length(replace(cast(status as char), '0', '')) as num_missions as num_missions
from t;
Here is a db<>fiddle using MySQL. Note that the conversion to a string might look a little different in Hive, but the idea is the same.
If it is stored as an integer, you can use the the bin() function to convert an integer to a string. This is supported in both Hive and MySQL (the original tags on the question).
Bit fiddling in databases is usually a bad idea and suggests a poor data model. Your data should have one row per user and mission. Attempts at optimizing by stuffing things into bits may work sometimes in some programming languages, but rarely in SQL.
I have a data model like this:
Fields:
counter number (e.g. 00888, 00777, 00123 etc)
counter code (e.g. XA, XD, ZA, SI etc)
start date (e.g. 2017-12-31 ...)
end date (e.g. 2017-12-31 ...)
Other counter date (e.g. xxxxx)
Current Datastructure organization is like this (root and multiple child format):
counter_num + counter_code
---> start_date + end_date --> xxxxxxxx
---> start_date + end_date --> xxxxxxxx
---> start_date + end_date --> xxxxxxxx
Example:
00888 + XA
---> Jan 10 + Jan 20 --> xxxxxxxx
---> Jan 21 + Jan 31 --> xxxxxxxx
---> Feb 01 + Dec 31 --> xxxxxxxx
00888 + ZI
---> Jan 09 + Feb 24 --> xxxxxxxx
---> Feb 25 + Dec 31 --> xxxxxxxx
00777 + XA
---> Jan 09 + Feb 24 --> xxxxxxxx
---> Feb 25 + Dec 31 --> xxxxxxxx
Today the retrieval happens in 2 ways:
//Fetch unique counter data using all the composite keys
counter_number + counter_code + date (start_date <= date <= end_date)
//Fetch all the counter codes and corresponding data matching the below conditions
counter_number + date (start_date <= date <= end_date)
What's the best way to model this in redis as I need to cache some of the frequently hit data. I feel sorted sets should do this somehow, but unable to model it.
UPDATE:
Just to remove the confusion, the ask here is not for an SQL "BETWEEN" like query. 'Coz I don't know what the start_date and end_date values are. Think they are just column names.
What I don't want is
SELECT * FROM redis_db
WHERE counter_num AND
date_value BETWEEN start_date AND end_date
What I want is
SELECT * FROM redis_db
WHERE counter_num AND
start_date <= specifc_date AND end_date >= specific_date
NOTE: The requirement is pretty much close to 2D indexing of what is proposed in Redis multi-dimensional indexing document
https://redis.io/topics/indexes#multi-dimensional-indexes
I understood the concept but unable to digest the implementation detail that is given.
I'm unlikely to get this done in time for the bounty, but what the hell...
This sounds like a job for geohashing. Geohashing is what you do when you want to index a 2-dimensional (or higher) dataset. For example, if you have a database of cities and you want to be able to quickly respond to queries like "find all the cities within 50km of X", you use geohashing.
For the purposes of this question, you can think of start_date and end_date as x and y coordinates. Normally in geohashing you're searching for points in your dataset near a particular point in space, or in a certain bounded region of space. In this case you just have a lower bound on one of the coordinates and an upper bound on the other one. But I suppose in practice the whole dataset is bounded anyway, so that's not a problem.
It would be nice if there was a library for doing this in Redis. There probably is, if you look hard enough. The newer versions of Redis have built-in geohashing functionality. See the commands starting with GEO. But it doesn't claim to be very accurate, and it's designed for the surface of a sphere rather than a flat surface.
So as far as I can see you have 3 options:
Map your search space to a small part of the sphere, preferably near the equator. Use the Redis GEO commands. To search, use GEOSPHERE on a circle covering the triangle you're trying to search, taking into account the inbuilt inaccuracy and the distortion you get by mapping onto the sphere, then filter the results to get the ones that are actually inside the triangle.
Find some 3rd-party geohashing client for Redis which works on flat space and is more accurate than GEO.
Read the rest of this answer, or some other primer on geohashing, then implement it yourself on top of Redis. This is the hardest (but most educational) option.
If you have a database that indexes data using a numerical ordering, such that you can do queries like "find all the rows/records for which z is between a and b", you can build a geohash index on top of it. Suppose the coordinates are (non-negative) integers x and y. Then you add an integer-valued column z, and index by z. To calculate z, write x and y in binary, then take alternate digits from each. Example:
x = 969 = 0 1 1 1 1 0 0 1 0 0 1
y = 1130 = 1 0 0 0 1 1 0 1 0 1 0
z = 1750214 = 0110101011010011000110
Note that the index allows you to find, for example, all records positioned with z between 0101100000000000000000 and 0101101111111111111111 inclusive. In other words, all records for which z starts with 010110. Or to put it another way, you can find all records for which x starts with 001 and y starts with 110. This set of records corresponds to a square in the 2-dimensional space we are trying to search.
Not all squares can be searched in this way. We'll call these ones searchable squares. Suppose the client sends a request for all records for which (x,y) is inside a particular rectangle. (Or a circle, or some other reasonable geometric shape.) Then you need to find a set of searchable squares which cover the rectangle. Then, for each of these squares you've chosen, query the database for records inside that square and send the results to the client. (But you'll have to filter the results, because not all the records in the square are actually in the original rectangle.)
There's a balance to be struck. If you choose a small number of large special squares, you'll probably end up covering a much larger area of the map than you need; the query to the database will return lots of extra results that you'll have to filter out. Alternatively, if you use lots of little special squares, you'll be doing lots of queries to the database, many of which will return no results.
I said above that x and y could be start_time and end_time. But actually the distribution of your dataset won't be as symmetrical as in most uses of geohashing. So the performance might be better (or worse) if you use x = end_time + start_time and y = end_time - start_time.
Because your question remains a bit vague on how you desire to query your data, it remains unclear on how to solve your question. With that in mind, however, here are my thoughts on how I might model your data:
Updated answer, detailing how to use SORTED SET
I have edited this answer to be able to store your values in a way that you can query by dynamic date ranges. This edit assumes that your database values are timestamps, as in the value is for a single time, not 2, as in your current setup.
Yes, you are correct that using Sorted Sets will be able to accomplish this. I suggest that you always use a Unix timestamp value for the score component in these sorted sets.
In case you were not already familiar with redis, let's explain indexing limitations. Redis is a simple key-value designed to quickly retrieve values by a key. Because of this design, it does not contain many features of your traditional DBMS, like indexing a column for instance.
In redis, you accomplish indexing by using a key, and the most nested key-like structures are available in HASH and SORTED SET, but you only get 2 key-like structures. In a HASH, you have the key (same as any data type), and a inner hash key, which can take the form of any string.
In a SORTED SET, you have the key (same as any data type), and a numeric value.
A HASH is nice to use to keep a grouped data organized.
A SORTED SET is nice if you want to query by a range of values. This could be a good fit for your data.
Your SORTED SET would look like the following:
key
00888:XA =>
score (date value) value
1452427200 (2016-01-10) xxxxxxxx
1452859200 (2016-01-10) yyyyxxxx
1453291200 (2016-01-10) zzzzxxxx
Let's use a more intuitive example, the 2017 Juventus roster:
To produce the SORTED SET in the table below, issue this command in your redis client:
ZADD JUVENTUS 32 "Emil Audero" 1 "Gianluigi Buffon" 42 "Mattia Del Favero" 36 "Leonardo Loria" 25 "Neto" 15 "Andrea Barzagli" 4 "Medhi Benatia" 19 "Leonardo Bonucci" 3 "Giorgio Chiellini" 40 "Luca Coccolo" 29 "Paolo De Ceglie" 26 "Stephan Lichtsteiner" 12 "Alex Sandro" 24 "Daniele Rugani" 43 "Alessandro Semprini" 23 "Dani Alves" 22 "Kwadwo Asamoah" 7 "Juan Cuadrado" 6 "Sami Khedira" 18 "Mario Lemina" 46 "Mehdi Leris" 38 "Rolando Mandragora" 8 "Claudio Marchisio" 14 "Federico Mattiello" 45 "Simone Muratore" 20 "Marko Pjaca" 5 "Miralem Pjanic" 28 "Tomás Rincón" 27 "Stefano Sturaro" 21 "Paulo Dybala" 9 "Gonzalo Higuaín" 34 "Moise Kean" 17 "Mario Mandzukic"
Jersey Name Jersey Name
32 Emil Audero 23 Dani Alves
1 Gianluigi Buffon 42 Mattia Del Favero
36 Leonardo Loria 25 Neto
15 Andrea Barzagli 4 Medhi Benatia
19 Leonardo Bonucci 3 Giorgio Chiellini
40 Luca Coccolo 29 Paolo De Ceglie
26 Stephan Lichtsteiner 12 Alex Sandro
24 Daniele Rugani 43 Alessandro Semprini
22 Kwadwo Asamoah 7 Juan Cuadrado
6 Sami Khedira 18 Mario Lemina
46 Mehdi Leris 38 Rolando Mandragora
8 Claudio Marchisio 14 Federico Mattiello
45 Simone Muratore 20 Marko Pjaca
5 Miralem Pjanic 28 Tomás Rincón
27 Stefano Sturaro 21 Paulo Dybala
9 Gonzalo Higuaín 34 Moise Kean
17 Mario Mandzukic
To query the roster by a range of jersey numbers:
ZRANGEBYSCORE JUVENTUS 1 5
Output:
1) "Gianluigi Buffon"
2) "Giorgio Chiellini"
3) "Medhi Benatia"
4) "Miralem Pjanic"
Note that the scores are not returned, however ZRANGEBYSCORE command orders the results in ASC order by score.
To add the scores, append "WITHSCORES" to the command, like so: ZRANGEBYSCORE JUVENTUS 1 5 WITHSCORES
By using ZRANGEBYSCORE, you should be able to query any key (counter number + counter code) with a date range,
producing the values in that range.
Original: Below is my original answer, recommending HASH
Based on your examples, I recommend you use a HASH.
With a hash, you would have a main key to find the hash (Ex. 00888:XA). Then within the hash, you have key -> value pairs (Ex. 2017-01-10:2017-01-20 -> xxxxxxxx). I prefer to delimit or tokenize my keys' components with the colon char :, but you can use any delimiter.
HASH follows your example data structure very well:
key
00888:XA =>
hashkey value
2017-01-10:2017-01-20 xxxxxxxx
2017-01-21:2017-01-31 yyyyxxxx
2016-02-01:2016-12-31 zzzzxxxx
key
00888:ZI =>
hashkey value
2017-01-10:2017-01-20 xxxxxxxx
2017-01-21:2017-01-31 xxxxyyyy
2016-02-01:2016-12-31 xxxxzzzz
When querying for data, instead of GET key, you would query with HGET key hashkey. Same for setting values, instead of SET key value, use HSET key hashkey value.
Example commands
HSET 00777:XA 2017-01-10:2017-01-20 xxxxxxxx
HSET 00777:XA 2017-01-21:2017-01-31 yyyyyyyy
HSET 00777:XA 2016-02-01:2016-12-31 zzzzzzzz
(Note: there is also a HMSET to simplify this into a single command)
Then:
HGET 00777:XA 2017-01-21:2017-01-31
Would return yyyyyyyy
Unless there is some specific performance consideration, or other goal for your data, I think Hashes will work great for your system.
It's also very convenient if you want to get all hashkeys or all values for a given hash, using commands like HKEYS, HVALS, or HGETALL.
I wasnt able to find anything like this yet... but here is what i need to do:
I have a query result like this:
ID Data1 Data2 Data3 Data4 ... Data7
1 12 13 15 1 ... 12
2 12 13 15 1 ... 12
3 12 13 15 1 ... 12
4 12 13 15 1 ... 12
I need to make a BarChart With 2 Values, 1 is the first row (ID=1) one is the last row (ID=4). The column headers DataX is what i need the series to be paired by.
Example:
ID Insured Uninsured Rejected
1 12 3 0
4 16 9 2
In the BarChart i need to see the number of insured or ID=1 and ID=2 next to each other, the number of Uninsured and rejected the same.
I feel like i have tried all ways possible but was not able to get anything besides a BarChart where all values of ID=1 where displayed and then all values for ID=2 where displayed next to each other.
Im sure this was a very confusing way to describe it, but i hope someone can understand what i am looking for.
NOTE: I tried to do this in Excel, and it worked within 2 minutes. I set the filter: Series on the 2 rows that i wanted, and set the Categories to the dataX Columns as described, and everything looked great. When i tried to translate this into SSRS i was able to do all the same things in the Series and Categories, but then i had to put in values and that screwed everything up.
PLEASE HELP!
I bet you need to add a grouping to your values by a spanning factor.
If you look at the original Wordnet search and select "Display options: Show Lexical File Info", you'll see an extremely useful classification of words called lexical file. Eg for "filling" we have:
<noun.substance>S: (n) filling, fill (any material that fills a space or container)
<noun.process>S: (n) filling (flow into something (as a container))
<noun.food>S: (n) filling (a food mixture used to fill pastry or sandwiches etc.)
<noun.artifact>S: (n) woof, weft, filling, pick (the yarn woven across the warp yarn in weaving)
<noun.artifact>S: (n) filling ((dentistry) a dental appliance consisting of ...)
<noun.act>S: (n) filling (the act of filling something)
The first thing in brackets is the "lexical file". Unfortunately I have not been able to find a SPARQL endpoint that provides this info
The latest RDF translation of Wordnet 3.0 points to two things:
Talis SPARQL endpoint. Use eg this query to check there's no such info:
DESCRIBE <http://purl.org/vocabularies/princeton/wn30/synset-chair-noun-1>
W3C's mapping description. Appendix D "Conversion details" describes something useful: wn:classifiedByTopic.
But it's not the same as lexical file, and is quite incomplete. Eg "chair" has nothing, while one of the senses of "completion" is in the topic "American Football"
DESCRIBE <http://purl.org/vocabularies/princeton/wn30/synset-completion-noun-1> ->
<j.1:classifiedByTopic rdf:resource="http://purl.org/vocabularies/princeton/wn30/synset-American_football-noun-1"/>
The question: is there a public Wordnet query API, or a database, that provides the lexical file information?
Using the Python NLTK interface:
from nltk.corpus import wordnet as wn
for synset in wn.synsets('can'):
print synset.lexname
I don't think you can find it in the RDF/OWL Representation of WordNet. It's in the WordNet distribution though: dict/lexnames. Here is the content of the file as of WordNet 3.0:
00 adj.all 3
01 adj.pert 3
02 adv.all 4
03 noun.Tops 1
04 noun.act 1
05 noun.animal 1
06 noun.artifact 1
07 noun.attribute 1
08 noun.body 1
09 noun.cognition 1
10 noun.communication 1
11 noun.event 1
12 noun.feeling 1
13 noun.food 1
14 noun.group 1
15 noun.location 1
16 noun.motive 1
17 noun.object 1
18 noun.person 1
19 noun.phenomenon 1
20 noun.plant 1
21 noun.possession 1
22 noun.process 1
23 noun.quantity 1
24 noun.relation 1
25 noun.shape 1
26 noun.state 1
27 noun.substance 1
28 noun.time 1
29 verb.body 2
30 verb.change 2
31 verb.cognition 2
32 verb.communication 2
33 verb.competition 2
34 verb.consumption 2
35 verb.contact 2
36 verb.creation 2
37 verb.emotion 2
38 verb.motion 2
39 verb.perception 2
40 verb.possession 2
41 verb.social 2
42 verb.stative 2
43 verb.weather 2
44 adj.ppl 3
For each entry of dict/data.*, the second number is the lexical file info. For example, this filling entry contains the number 13, which is noun.food.
07883031 13 n 01 filling 0 002 # 07882497 n 0000 ~ 07883156 n 0000 | a food mixture used to fill pastry or sandwiches etc.
It can be done through MIT JWI (MIT Java Wordnet Interface) a Java API to query Wordnet. There's a topic in this link showing how to implement a java class to access lexicographic
This is what worked for me,
Synset[] synsets = database.getSynsets(wordStr);
ReferenceSynset referenceSynset = (ReferenceSynset) synsets[i];
int lexicalCode =referenceSynset.getLexicalFileNumber();
Then use above table to deduce "lexnames" e.g. noun.time
If you're on Windows, chances are it is in your appdata, in the local directory. To get there, you will want to open your file browser, go to the top, and type in %appdata%
Next click on roaming, and then find the nltk_data directory. In there, you will have your corpora file. The full path is something like:
C:\Users\yourname\AppData\Roaming\nltk_data\corpora
and lexnames will present under
C:\Users\yourname\AppData\Roaming\nltk_data\corpora\wordnet.