Group key-value columns into a single row - sql

I'm trying to extract data from a SQLite table that stores key-value pairs in dual columns. For example, with the keys foo, bar, man, and row, the table would look like:
| _id | external_id | key | value |
|-----|-------------|------|-------|
| 1 | 12345 | foo | cow |
| 2 | 12345 | bar | moo |
| 3 | 12345 | man | hole |
| 4 | 12345 | row | boat |
| 5 | 67980 | foo | abc |
| 6 | 67890 | bar | def |
| 7 | 67890 | man | ghi |
| 8 | 67890 | row | jkl |
I want to perform a query that gives me each external_id in a row with the keys as the columns and the values as the rows. Like this:
| external_id | foo | bar | man | row |
|-------------|-----|------|------|------|
| 12345 | cow | moo | hole | boat |
| 67890 | abc | def | ghi | jkl |
The only solution I've been able to come up with is a join for each key:
SELECT a.external_id, b.foo, c.bar, d.main, e.row
FROM myTable AS a
LEFT JOIN
(SELECT external_id, key AS foo
FROM myTable
WHERE key="foo") AS b
ON a.external_id = b.external_id
...
LEFT JOIN
(SELECT external_id, key AS row
FROM myTable
WHERE key="row") AS e
ON a.external_id = e.external_id
GROUP BY a.external_id
Is there a better way to do this?

The other available option is to use conditional aggregation:
SELECT external_id,
MAX(CASE WHEN key = 'foo' THEN value END) AS foo,
MAX(CASE WHEN key = 'bar' THEN value END) AS bar,
MAX(CASE WHEN key = 'man' THEN value END) AS man,
... etc
FROM mytable
GROUP BY external_id

you can also use the collect() method
select external_id,
collected_objects['foo'] as foo,
collected_objects['bar'] as bar,
collected_objects['man'] as man,
collected_objects['row'] as row
from(
select external_id,
collect(key, value) as collected_objects
group by external_id)t1

Related

Selecting the two most common attribute pairings from a Entity-Attribute Table?

I have a simple Entity-Attribute table in my database describing simply if an Entity has some Attribute by the existance of a row consisting of (Entity, Attribute).
I want to find out, of all the Entities with two and only two Attributes, what are the most common Attribute pairs
For example, if my table looked like:
+--------+-----------+
| Entity | Attribute |
+--------+-----------+
| Bob | A |
| Sally | B |
| Terry | C |
| Bob | B |
| Sally | A |
| Terry | D |
| Larry | C |
+--------+-----------+
I would want it to return
+-------------+-------------+-------+
| Attribute-1 | Attribute-2 | Count |
+-------------+-------------+-------+
| A | B | 2 |
| C | D | 1 |
+-------------+-------------+-------+
I currently have a short query that looks like:
WITH TwoAtts (
SELECT entity
FROM table
GROUP BY entity
HAVING COUNT(att) = 2
)
SELECT t1.att, t2.att, COUNT(entity)
FROM table t1
JOIN table t2
ON t1.entity = t2.entity
WHERE t1.entity IN (SELECT * FROM TwoAtts)
AND t1.att != t2.att
GROUP BY t1.att, t2.att
ORDER BY COUNT(entity) DESC
but is only capable of producing "duplicate" results like
+-------------+-------------+-------+
| Attribute-1 | Attribute-2 | Count |
+-------------+-------------+-------+
| A | B | 2 |
| B | A | 2 |
| D | C | 1 |
| C | D | 1 |
+-------------+-------------+-------+
In a sense I would like to be able to run a unordered DISTINCT / set operator over the two attribute columns, but I am not sure how to acheive this functionality in SQL?
Hmmm, I think you want two levels of aggregation, with some filtering:
select attribute_1, attribute_2, count(*)
from (select min(ea.attribute) as attribute_1, max(ea.attribute) as attribute_2
from entity_attribute ea
group by entity
having count(*) = 2
) aa
group by attribute_1, attribute_2;
Here is a db<>fiddle

How can I subtract two row's values within same column using sql query in access?

(query access)
This is the table structure:
+-----+--------+--------+
| id | name | sub1 |
+-----+--------+--------+
| 1 | ABC | 6.27% |
| 2 | ABC | 7.47% |
| 3 | PQR | 3.39% |
| 4 | PQR | 2.21% |
+-----+--------+--------+
I want to subtract Sub1
Output should be:
+-----+--------+---------+------------------------------------+
| id | name | sub1 | |
+-----+--------+---------+------------------------------------+
| 1 | ABC | 6.27% | 0 First Rec no need Subtract |
| 2 | ABC | 7.47% | 1.2% <=(7.47-6.27) |
| 3 | PQR | 3.39% | 0 First Rec no need Subtract |
| 4 | PQR | 2.21% | -1.18% <=(2.21-3.39) |
+-----+--------+---------+------------------------------------+
Thank you so much.
If you can guarantee consecutive id values, then the following presents an alternative:
select t.*, nz(t.sub1-u.sub1,0) as sub2
from YourTable t left join YourTable u on t.name = u.name and t.id = u.id+1
Change YourTable to the name of your table.
This is painful, but you can do:
select t.*,
(select top 1 t2.sub1
from t as t2
where t2.name = t.name and t2.id < t.id
order by t2.id desc
) as prev_sub1
from t;
This gives the previous value or NULL for the first row. You can just use - for the subtraction.
An index on (name, id) would help a bit with performance. However, if you can upgrade to a better database, you can then just use lag().

Over Partition to find duplicates and remove them based on criteria SQL

I hope everyone is doing well. I have a dilemma that i can not quite figure out.
I am trying to find a unique value for a field that is not a duplicate.
For example:
Table 1
|Col1 | Col2| Col3 |
| 123 | A | 1 |
| 123 | A | 2 |
| 12 | B | 1 |
| 12 | B | 2 |
| 12 | C | 3 |
| 12 | D | 4 |
| 1 | A | 1 |
| 2 | D | 1 |
| 3 | D | 1 |
Col 1 is the field that would have the duplicate values. Col2 would be the owner of the value in Col 1. Col 3 uses the row number() Over Partition syntax to get the numbers in ascending order.
The goal i am trying to accomplish is to remove the value in col 1 if it is not truly unique when looking at col2.
Example:
Col1 has the value 123, Col2 has the value A. Although there are two instances of 123 being owned by A, i can determine that it is indeed unique.
Now look at Col1 that has the value 12 with values in Col2 of B,C,D.
Value 12 is associated with three different owners thus eliminating 12 from our result list.
So in the end i would like to see a result table such as this :
|Col1 | Col2|
| 123 | A |
| 1 | A |
| 2 | D |
| 3 | D |
To summarize, i would like to first use the partition numbers to identify if the value in col1 is repeated. From there i want to verify that the values in col 2 are the same. If so the value in col 1 and col 2 remains as one single entry. However if the values in col 2 do not match, all records for the col1 value are removed.
I will provide the syntax code for my query if needed.
Update**
I failed to mention that table 1 is the result of inner joining two tables.
So Col1 comes from table a and Col2 comes from table b.
The values in table a for col2 are hard to interpret so i had to make sense of them and assigned it proper name values.
The join query i used to combine the two are:
Select a.Col1, B.Col2 FROM Table a INNER JOIN Table b on a.Colx = b.Colx
Update**
Table a:
|Col1 | Colx| Col3 |
| 123 | SMS | 1 |
| 123 | S9W | 2 |
| 12 | NAV | 1 |
| 12 | NFR | 2 |
| 12 | ABC | 3 |
| 12 | DEF | 4 |
| 1 | SMS | 1 |
| 2 | DEF | 1 |
| 3 | DES | 1 |
Table b:
|Colx | Col2|
| SMS | A |
| S9W | A |
| DEF | D |
| DES | D |
| NAV | B |
| NFR | B |
| ABC | C |
Above are sample data for both tables that get joined in order to create the first table displayed in this body.
Thank you all so much!
NOT EXISTS operator can be used to do this task:
SELECT distinct Col1 , Col2
FROM table t
WHERE NOT EXISTS(
SELECT 1 FROM table t1
WHERE t.col1=t1.col1 AND t.col2 <> t1.col2
)
If I understand correctly, you want:
select col1, min(col2)
from t
group by col1
where min(col2) <> max(col2);
I think the third column is confusing you. It doesn't seem to play any role in the logic you want.

Merging multiple rows according to an order

Suppose there are the following rows
| Id | MachineName | WorkerName | MachineState |
|----------------------------------------------|
| 1 | Alpha | Young | RUNNING |
| 1 | Beta | | STOPPED |
| 1 | Gamma | Foo | READY |
| 1 | Zeta | Zatta | |
| 2 | Guu | Niim | RUNNING |
| 2 | Yuu | Jaam | STOPPED |
| 2 | Nuu | | READY |
| 2 | Faah | Siim | |
| 3 | Iem | | RUNNING |
| 3 | Nyt | Fish | READY |
| 3 | Qwe | Siim | |
We want to merge these rows according to following priority :
STOPPED > RUNNING > READY > (null or empty)
If a row has a value for greatest priority, then value from that row should be used (only if it is not null). If it is null, a value from any other row should be used. The rows should be grouped by id
The correct output for the above input is :
| Id | MachineName | WorkerName | MachineState |
|----------------------------------------------|
| 1 | Beta | Foo | STOPPED |
| 2 | Yuu | Jaam | STOPPED |
| 3 | Iem | Fish | RUNNING |
What would be a good sql query to accomplish this? I tried using joins, but it did not work out.
You can view this as a case of the group-wise maximum problem, provided you can obtain a suitable ordering over your MachineState column—e.g. by using a CASE expression:
SELECT a.Id,
COALESCE(a.MachineName, t.MachineName) MachineName,
COALESCE(a.WorkerName , t.WorkerName ) WorkerName,
a.MachineState
FROM myTable a JOIN (
SELECT Id,
MIN(MachineName) AS MachineName,
MIN(WorkerName ) AS WorkerName,
MAX(CASE MachineState
WHEN 'READY' THEN 1
WHEN 'RUNNING' THEN 2
WHEN 'STOPPED' THEN 3
END) AS MachineState
FROM myTable
GROUP BY Id
) t ON t.Id = a.Id AND t.MachineState = CASE a.MachineState
WHEN 'READY' THEN 1
WHEN 'RUNNING' THEN 2
WHEN 'STOPPED' THEN 3
END
See it on sqlfiddle:
| id | machinename | workername | machinestate |
|----|-------------|------------|--------------|
| 1 | Beta | Foo | STOPPED |
| 2 | Yuu | Jaam | STOPPED |
| 3 | Iem | Fish | RUNNING |
You could save yourself the pain of using CASE if MachineState was an ENUM type column (defined in the appropriate order). It so happens in this case that a simple lexicographic ordering over the string value will yield the same result, but that's a coincidence on which you really shouldn't rely as it's bound to slip under the radar when someone tries to maintain this code in the future.
This is a prioritization query. One method uses variables. Another uses union all . . . this works if the states are not repeated for a given id:
select t.*
from table t
where machinestate = 'STOPPED'
union all
select t.*
from table t
where machinestate = 'RUNNING' and
not exists (select 1 from table t2 where t2.id = t.id and t2.machinestate in ('STOPPED'))
union all
select t.*
from table t
where machinestate = 'READY' and
not exists (select 1 from table t2 where t2.id = t.id and t2.machinestate in ('STOPPED', 'RUNNING'));
change MachineState as enum:
`MachineState` enum('READY','RUNNING','STOPPED') DEFAULT NULL
and sql is simple:
select t.id,state.machinename,state.workername,t.mstate from state,(select id,max(MachineState) mstate from state group by Id) t where t.mstate=state.machinestate and t.id=state.id;

How to apply a SUM operation without grouping the results in SQL?

I have a table like this one:
+----+---------+----------+
| id | group | value |
+----+---------+----------+
| 1 | GROUP A | 0.641028 |
| 2 | GROUP B | 0.946927 |
| 3 | GROUP A | 0.811552 |
| 4 | GROUP C | 0.216978 |
| 5 | GROUP A | 0.650232 |
+----+---------+----------+
If I perform the following query:
SELECT `id`, SUM(`value`) AS `sum` FROM `test` GROUP BY `group`;
I, obviously, get:
+----+-------------------+
| id | sum |
+----+-------------------+
| 1 | 2.10281205177307 |
| 2 | 0.946927309036255 |
| 4 | 0.216977506875992 |
+----+-------------------+
But I need a table like this one:
+----+-------------------+
| id | sum |
+----+-------------------+
| 1 | 2.10281205177307 |
| 2 | 0.946927309036255 |
| 3 | 2.10281205177307 |
| 4 | 0.216977506875992 |
| 5 | 2.10281205177307 |
+----+-------------------+
Where summed rows are explicitly repeated.
Is there a way to obtain this result without using multiple (nested) queries?
IT would depend on your SQL server, in Postgres/Oracle I'd use Window Functions. In MySQL... not possible afaik.
Perhaps you can fake it like this:
SELECT a.id, SUM(b.value) AS `sum`
FROM test AS a
JOIN test AS b ON a.`group` = b.`group`
GROUP BY a.id, b.`group`;
No there isn't AFAIK. You will have to use a join like
SELECT t.`id`, tsum.sum AS `sum`
FROM `test` as t GROUP BY `group`
JOIN (SELECT `id`, SUM(`value`) AS `sum` FROM `test` GROUP BY `group`) AS tsum
ON tsum.id = t.id