PostgreSQL data transformation - Turn rows into columns - sql

I have a table whose structure looks like the following:
k | i | p | v
Notice that the key (k) is not unique, there are no keys, nothing. Each key can have multiple attributes (i = 0, 1, 2, ...) which can be of different types (p) and have different values (v). One attribute type may also appear multiple times (p(i-1) = p(i)).
What I want to do is pick certain attribute types and their corresponding values and place them in the same row. For example I want to have:
k | attr_name1 | attr_name2
I have managed to make a query that does this and works for all keys (k) for which attr_name1 and attr_name2 appear in the column p of the initial table:
SELECT DISTINCT ON (key) fn.k AS key, fn.v AS attr_name1, a.v AS attr_name2
FROM Table fn
LEFT JOIN Table a ON fn.k = a.k
AND a.p = 'attr_name2'
WHERE fn.p = 'attr_name1'
I would like, however, to take into account the case where a certain key has no attribute named attr_name1 and insert a NULL value into the corresponding column of the new table. I am not sure how to achieve that. I have no issue using multiple queries or intermediate tables etc, but there are quite a lot of rows in the table and I need something that scales to millions of rows.
Any help would be appreciated.
Example:
k i p v
1 0 a 10
1 1 b 12
1 2 c 34
1 3 d 44
1 4 e 09
2 0 a 11
2 1 b 13
2 2 d 22
2 3 f 34
Would turn into (assuming I am only interested in columns a, b, c):
k a b c
1 10 12 34
2 11 13 NULL

I would use conditional aggregation. That is, an aggregate function around a CASE expression.
SELECT
k,
MAX(CASE WHEN p='a' THEN v END) AS a,
MAX(CASE WHEN p='b' THEN v END) AS b,
MAX(CASE WHEN p='c' THEN v END) AS c
FROM
your_table
GROUP BY
k
This presumes that (k, p) is unique. If there are duplicate keys, this will clearly find the one v with the highest value (for each (k,p))
As a general rule this kind of pivoting makes the data harder to process in SQL. This is often done for display purposes because humans find this easier to read. However, from a software engineering perspective, such formatting should not be done in the data layer; be careful that by doing this you don't actually make your future life harder.

Related

SQL dealing every bit without run query repeatedly

I have a column using bits to record status of every mission. The index of bits represents the number of mission while 1/0 indicates if this mission is successful and all bits are logically isolated although they are put together.
For instance: 1010 is stored in decimal means a user finished the 2nd and 4th mission successfully and the table looks like:
uid status
a 1100
b 1111
c 1001
d 0100
e 0011
Now I need to calculate: for every mission, how many users passed this mission. E.g.: for mission1: it's 0+1+1+0+1 = 5 while for mission2, it's 0+1+0+0+1 = 2.
I can use a formula FLOOR(status%POWER(10,n)/POWER(10,n-1)) to get the bit of every mission of every user, but actually this means I need to run my query by n times and now the status is 64-bit long...
Is there any elegant way to do this in one query? Any help is appreciated....
The obvious approach is to normalise your data:
uid mission status
a 1 0
a 2 0
a 3 1
a 4 1
b 1 1
b 2 1
b 3 1
b 4 1
c 1 1
c 2 0
c 3 0
c 4 1
d 1 0
d 2 0
d 3 1
d 4 0
e 1 1
e 2 1
e 3 0
e 4 0
Alternatively, you can store a bitwise integer (or just do what you're currently doing) and process the data in your application code (e.g. a bit of PHP)...
uid status
a 12
b 15
c 9
d 4
e 3
<?php
$input = 15; // value comes from a query
$missions = array(1,2,3,4); // not really necessary in this particular instance
for( $i=0; $i<4; $i++ ) {
$intbit = pow(2,$i);
if( $input & $intbit ) {
echo $missions[$i] . ' ';
}
}
?>
Outputs '1 2 3 4'
Just convert the value to a string, remove the '0's, and calculate the length. Assuming that the value really is a decimal:
select length(replace(cast(status as char), '0', '')) as num_missions as num_missions
from t;
Here is a db<>fiddle using MySQL. Note that the conversion to a string might look a little different in Hive, but the idea is the same.
If it is stored as an integer, you can use the the bin() function to convert an integer to a string. This is supported in both Hive and MySQL (the original tags on the question).
Bit fiddling in databases is usually a bad idea and suggests a poor data model. Your data should have one row per user and mission. Attempts at optimizing by stuffing things into bits may work sometimes in some programming languages, but rarely in SQL.

SQL find all rows with assigned values

MSSQL: i have this example data:
NAME AValue BValue
A 1 11
B 1 11
C 2 11
D 2 21
E 3 21
F 3 21
G 4 31
H 4 31
I 5 41
J 5 NULL
...
I am looking for algorhitm which looks for all the Names closed by values by different seed (AValue and Bvalue, in this case seed is given by 2 for AValue and by 3 for Bvalue, but this can be skipped and given later and so on, not only looking for smallest multiple). In this case output should be 1,2,3,4,11,21,31 as a first group/result. Then all the Names with these values can be updated etc.
I need to find out all the Names in "closed circle" of values by different seed.
EDIT:
(try of simplier example)
Imagine that you have list of names. Each name is given two numbers. In most cases these numbers are given by some seed (in this example AValue is given twice, BValue three times) but some numbers can be skipped, so you cannot just count smallest multiple of these different seeds(in this case it would be 2x3, ever 6 names you have closed group where no Name contains AValue or BValue from next/different group). For example Name A have 1 and 11. 1 is given for A and B, 11 for A, B, C. These Names have 1,2,11,21. So you check for 2 and 21 and then you get E and F in addition and then the loop of checking should continue, but as long as no more Names are contained there should be output 1,2,3,11,21. "Closed circle"

Working of Merge in SAS (with IN=)

I have two dataset data1 and data2
data data1;
input sn id $;
datalines;
1 a
2 a
3 a
;
run;
data data2;
input id $ sales x $;
datalines;
a 10 x
a 20 y
a 30 z
a 40 q
;
run;
I am merging them from below code:
data join;
merge data1(in=a) data2(in=b);
by id;
if a and b;
run;
Result: (I was expecting an Inner Join result which is not the case)
1 a 10 x
2 a 20 y
2 a 30 z
2 a 40 w
Result from proc sql inner join.
proc sql;
select data1.id,sn,sales,x from data2 inner join data1 on data1.hh_id;
quit;
Result: (As expected from an inner join)
a 1 10 x
a 1 20 y
a 1 30 z
a 1 40 w
a 2 10 x
a 2 20 y
a 2 30 z
a 2 40 w
b 3 10 x
b 3 20 y
b 3 30 z
b 3 40 w
I want to know the concept and STEP BY STEP working of merge statement in SAS with In= and proving the above result.
PS: I have read this, and it says
An obvious use for these variables is to control what kind of 'merge'
will occur, using if statements. For example, if
ThisRecordIsFromYourData and ThisRecordIsFromOtherData; will make SAS
only include rows that match on the by variables from both input data
sets (like an inner join).
which I guess, (like an Inner Join) is not always the case.
Basically, this is a result of the difference in how the SAS data step and SQL process their respective join/merges.
SQL creates a separate record for each possible combination of keys. This is a Cartesian Product (at the key level).
SAS data step, however, process merges very differently. MERGE is really nothing more than a special case of SET. It still processes rows iteratively, one at a time - it never goes back, and never has more than one row from any dataset in the PDV at once. Thus, it cannot create a Cartesian product in its normal process - that would require random access, which the SAS datastep doesn't do normally.
What it does:
For each unique BY value
Take the next record from the left side dataset, if one exists with that BY value
Take the next record from the right side dataset, if one exists with that BY value
Output a row
Continue until both datasets are exhausted for that BY value
With BY values that yield unique records per value on either side (or both), it is effectively identical to SQL. However, with BY values that yield duplicates on BOTH sides, you get what you have there: a side-by-side merge, and if one runs out before the other, the values from the last row of the shorter dataset (for that by value) are more-or-less copied down. (They're actually RETAINED, so if you overwrite them with changes, they will not reset on new records from the longer dataset).
So, if left has 3 records and right has 4 records for key value a, like in your example, then you get data from the following records (assuming you don't alter the data after):
left right
1 1
2 2
3 3
3 4

Check if value is already in a query field to change the value of another

I'll clarify this: I have a data result with the twist that the two PK's (A and B) are the same, and field C doesn't.
Example:
A B C D
> 14 20 1 null
> 14 20 2 1
> 15 20 2 0
As you can see, D field has a null and a 0.
What I have to do is to change D's null value to 1 whenever A fields are the same, and there's more than 1 record with those, not touching the 0's in D.
I tried initially with NVLs and DECODEs, like this:
DECODE(migr.A,NULL,(NVL(C,1)),D) AS D
but I'm not getting all the records, only the D-1's.
I really don't want to relate to an extra table/step for validation, as my query result can be easily over 1 million records, but if that's the best, I'm ok.
Many thanks.

How do I insert data with SQLite?

Total newbie here, regarding sqlite, so don't flame too hard :)
I have a table:
index name length L breadth B height H
1 M-1234 10 5 2
2 M-2345 20 10 3
3 ....
How do I put some tabular data (let' say ten x,y values) corresponding to index 1, then another table to index 2, and then another, etc. In short, so that I have a table of x and y values that is "connected" to first row, then another that is connected to second row.
I'm reading some tutorials on sqlite3 (which I'm using), but am having trouble finding this. If anyone knows a good newbie tutorial or a book dealing with sqlite3 (CLI) I'm all ears for that too :)
You are just looking for information on joins and the concept of foreign key, that although SQLite3 doesn't enforce, is what you need. You can go without it, anyway.
In your situation you can either add two "columns" to your table, being one x and another y, or create a new table with 3 "columns": foreign_index, x and y. Which one to use depends on what you are trying to accomplish, performance and maintainability.
If you go the linked table route, you'd end up with two tables, like this:
MyTable
index name length L breadth B height H
1 M-1234 10 5 2
2 M-2345 20 10 3
3 ....
XandY
foreign_index x y
1 12 9
2 8 7
3 ...
When you want the x and y values of your element, you just use something like SELECT x, y FROM XandY WHERE foreign_index = $idx;
To get all the related attributes, you just do a JOIN:
SELECT index, name, length, breadth, height, x, y FROM MyTable INNER JOIN XandY ON MyTable.index = XandY.foreign_index;