Below I have summarized how my outputVar should behave with DecisionVar:
Entity EntityValue DecisionVar1 OutputVar1 DecisionVar2 OutputVar2
A 5 1 5(base) 1 5(base)
B 2 1 5(=prev) 1 5(=prev)
C 3 1 5(=prev) 2 5(3+2)
D 4 2 9(4+3+2) 2 5(=prev)
Scenario 1:
Since A, B, C are all allotted 1, each outputVar is set to base(=5), while D is the sum of the rest of the values.
Scenario 2:
Since A and B are allotted 1. outputVar is set to base same as A, while for C, the value is the sum of the previous remaining values and since D is set same as C, its outputVar is set same as C.
Context: If certain entities are grouped together, we are trying to constraint the time allotted to process those entities. For group of entities(except the base) the time remaining is the time from the first of the previous group to the first of the current group.
One of the easier ways to do this is to redefine your problem in terms of the groups of entities and model it as a set partitioning/ set covering problem.
Related
I have a table whose structure looks like the following:
k | i | p | v
Notice that the key (k) is not unique, there are no keys, nothing. Each key can have multiple attributes (i = 0, 1, 2, ...) which can be of different types (p) and have different values (v). One attribute type may also appear multiple times (p(i-1) = p(i)).
What I want to do is pick certain attribute types and their corresponding values and place them in the same row. For example I want to have:
k | attr_name1 | attr_name2
I have managed to make a query that does this and works for all keys (k) for which attr_name1 and attr_name2 appear in the column p of the initial table:
SELECT DISTINCT ON (key) fn.k AS key, fn.v AS attr_name1, a.v AS attr_name2
FROM Table fn
LEFT JOIN Table a ON fn.k = a.k
AND a.p = 'attr_name2'
WHERE fn.p = 'attr_name1'
I would like, however, to take into account the case where a certain key has no attribute named attr_name1 and insert a NULL value into the corresponding column of the new table. I am not sure how to achieve that. I have no issue using multiple queries or intermediate tables etc, but there are quite a lot of rows in the table and I need something that scales to millions of rows.
Any help would be appreciated.
Example:
k i p v
1 0 a 10
1 1 b 12
1 2 c 34
1 3 d 44
1 4 e 09
2 0 a 11
2 1 b 13
2 2 d 22
2 3 f 34
Would turn into (assuming I am only interested in columns a, b, c):
k a b c
1 10 12 34
2 11 13 NULL
I would use conditional aggregation. That is, an aggregate function around a CASE expression.
SELECT
k,
MAX(CASE WHEN p='a' THEN v END) AS a,
MAX(CASE WHEN p='b' THEN v END) AS b,
MAX(CASE WHEN p='c' THEN v END) AS c
FROM
your_table
GROUP BY
k
This presumes that (k, p) is unique. If there are duplicate keys, this will clearly find the one v with the highest value (for each (k,p))
As a general rule this kind of pivoting makes the data harder to process in SQL. This is often done for display purposes because humans find this easier to read. However, from a software engineering perspective, such formatting should not be done in the data layer; be careful that by doing this you don't actually make your future life harder.
I was thinking about simple reordering rows in relational database's table.
I would like to avoid method described here:
How can I reorder rows in sql database
My simple idea was to use as ListOrder column of type double-precision 64-bit IEEE 754 floating point.
At inserting a row between two existing rows we calculate listOrder value as average of these sibling elements.
Example:
1. Starting state:
value, listOrder
a 1
b 2
c 3
d 4
e 5
f 6
2. Moving "e" two rows up
One simple sql update on e-row: update mytable set listorder=2.5 where value='e'
value, listOrder
a 1
b 2
e 2.5
c 3
d 4
f 6
3. Moving "a" one position down
value, listOrder
b 2
a 2.25
e 2.5
c 3
d 4
f 6
I have a question. How many insertions can I perform (in the edge situation) to have properly ordered list.
For the 64 bit integer there is less than 64 insertions in the same place.
Is floating point types allows to more insertions?
There are other problems with described approach?
Do you see any patches/adjustments to make this idea safe and usable in applications?
This is similar to a lexical order, which can also be done with varchar columns:
A
B
C
D
E
F
becomes
A
B
BM
C
D
F
becomes
B
BF
BM
C
D
F
I prefer the two step process, where you update every row in the table after the one you move to be one larger. Sql is efficient about this, where updating the rows following a change is not as bad as it seems. You preserve something that's more human readable, the storage size for your ordinal value scales in a linear rather with your data size, and you don't risk coming to a point where you don't have enough precision to put an item in between two values
I am trying to solve this question which has to do with candidate keys in a relation.
This is the question:
Consider table R with attributes A, B, C, D, and E. What is the largest number of
candidate keys that R could simultaneously have?
the answer is 10 but i have no clue how it was done, nor how does the word simultaneously plays into effect when calculating the answer.
Sets that are not subsets of other sets.
For example {A-B} and {A,B,C} can't be candidates keys simultaneously, because {A,B} is a subset of {A,B,C}.
Combinations of 2 attributes or 3 attributes generates the maximum number of simultaneous candidates keys.
See how the 3 attributes sets are actually complements of the 2 attributes sets, e.g. {C,D,E} is the complement of {A,B}.
2 3
attributes attributes
sets sets
1. {A,B} - {C,D,E}
2. {A,C} - {B,D,E}
3. {A,D} - {B,C,E}
4. {A,E} - {B,C,D}
-
5. {B,C} - {A,D,E}
6. {B,D} - {A,C,E}
7. {B,E} - {A,C,D}
-
8. {C,D} - {A,B,E}
9. {C,E} - {A,B,D}
-
10. {D,E} - {A,B,C}
If I would take sets of a single attribute I would have only 4 options
{A},{B},{C},{D}
Any set with more than 1 element will contain one of the above and therefore will not be qualified.
If I would take sets of 4 attributes I would have only 4 options
{A,B,C,D},{A,B,C,E},{A,B,D,E},{B,C,D,E}
Any set with more than 4 element will contain one of the above and therefore will not be qualified.
Any set with less than 4 element will be contained by one of the above and therefore will not be qualified.
etc.
For 5 keys, it is probably best to do this by brute force. Understanding the ideas is more important than the calculation (DuDu/David gives a good example of 10 candidate keys, showing that a set of 10 keys is possible so the maximum is at least this large).
What is the idea? A candidate key is a combination of attributes that is unique. So, if A is unique, then A with any other column is also unique. One set of candidate keys is simply:
A
B
C
D
E
If each of these are unique, then any combination of keys is going to contain at least one of these attributes and the combination will also be unique. Hence, the uniqueness of these five would imply the uniqueness of any other combination.
5 is not the largest number of candidate keys with this property.
It gets a bit more complicated. If {A, B, C, D, E} is unique (and no subset is a candidate key), then there is exactly 1 candidate key. Rearranging the columns doesn't change the set (sets are unordered).
One thing we might postulate is that the biggest set of candidate keys has keys all of the same length. This is in fact true. Why? Well, if we have a set of keys that are of different lengths, we can lengthen the shorter ones by adding arbitrary attributes and still have a maximal set.
So, you only need to consider subsets of 1, 2, 3, 4, and 5 keys, exactly. When you work it out, you will find that the maximum numbers are:
5 10 10 5 1
You can add a "1" to the beginning and you may recognize the pattern. This is a row from Pascal's Triangle. This observation (well, and the related proof) actually makes it easy to determine the maximum value for any given n.
Incidentally, the sets of length 3 are:
A B C
A B D
A B E
A C D
A C E
A D E
B C D
B C E
B D E
C D E
MSSQL: i have this example data:
NAME AValue BValue
A 1 11
B 1 11
C 2 11
D 2 21
E 3 21
F 3 21
G 4 31
H 4 31
I 5 41
J 5 NULL
...
I am looking for algorhitm which looks for all the Names closed by values by different seed (AValue and Bvalue, in this case seed is given by 2 for AValue and by 3 for Bvalue, but this can be skipped and given later and so on, not only looking for smallest multiple). In this case output should be 1,2,3,4,11,21,31 as a first group/result. Then all the Names with these values can be updated etc.
I need to find out all the Names in "closed circle" of values by different seed.
EDIT:
(try of simplier example)
Imagine that you have list of names. Each name is given two numbers. In most cases these numbers are given by some seed (in this example AValue is given twice, BValue three times) but some numbers can be skipped, so you cannot just count smallest multiple of these different seeds(in this case it would be 2x3, ever 6 names you have closed group where no Name contains AValue or BValue from next/different group). For example Name A have 1 and 11. 1 is given for A and B, 11 for A, B, C. These Names have 1,2,11,21. So you check for 2 and 21 and then you get E and F in addition and then the loop of checking should continue, but as long as no more Names are contained there should be output 1,2,3,11,21. "Closed circle"
I have two dataset data1 and data2
data data1;
input sn id $;
datalines;
1 a
2 a
3 a
;
run;
data data2;
input id $ sales x $;
datalines;
a 10 x
a 20 y
a 30 z
a 40 q
;
run;
I am merging them from below code:
data join;
merge data1(in=a) data2(in=b);
by id;
if a and b;
run;
Result: (I was expecting an Inner Join result which is not the case)
1 a 10 x
2 a 20 y
2 a 30 z
2 a 40 w
Result from proc sql inner join.
proc sql;
select data1.id,sn,sales,x from data2 inner join data1 on data1.hh_id;
quit;
Result: (As expected from an inner join)
a 1 10 x
a 1 20 y
a 1 30 z
a 1 40 w
a 2 10 x
a 2 20 y
a 2 30 z
a 2 40 w
b 3 10 x
b 3 20 y
b 3 30 z
b 3 40 w
I want to know the concept and STEP BY STEP working of merge statement in SAS with In= and proving the above result.
PS: I have read this, and it says
An obvious use for these variables is to control what kind of 'merge'
will occur, using if statements. For example, if
ThisRecordIsFromYourData and ThisRecordIsFromOtherData; will make SAS
only include rows that match on the by variables from both input data
sets (like an inner join).
which I guess, (like an Inner Join) is not always the case.
Basically, this is a result of the difference in how the SAS data step and SQL process their respective join/merges.
SQL creates a separate record for each possible combination of keys. This is a Cartesian Product (at the key level).
SAS data step, however, process merges very differently. MERGE is really nothing more than a special case of SET. It still processes rows iteratively, one at a time - it never goes back, and never has more than one row from any dataset in the PDV at once. Thus, it cannot create a Cartesian product in its normal process - that would require random access, which the SAS datastep doesn't do normally.
What it does:
For each unique BY value
Take the next record from the left side dataset, if one exists with that BY value
Take the next record from the right side dataset, if one exists with that BY value
Output a row
Continue until both datasets are exhausted for that BY value
With BY values that yield unique records per value on either side (or both), it is effectively identical to SQL. However, with BY values that yield duplicates on BOTH sides, you get what you have there: a side-by-side merge, and if one runs out before the other, the values from the last row of the shorter dataset (for that by value) are more-or-less copied down. (They're actually RETAINED, so if you overwrite them with changes, they will not reset on new records from the longer dataset).
So, if left has 3 records and right has 4 records for key value a, like in your example, then you get data from the following records (assuming you don't alter the data after):
left right
1 1
2 2
3 3
3 4