How to get the count of similar columns from the same table - sql

I have a table with columns id and value. I want to select the records where there exist other records in the same value with a lower id but the same value. I need the count of these. For example, if I have this table
id | value
---+------
1 | 1
2 | 2
3 | 1
4 | 3
5 | 2
6 | 1
I need the answer
id | value | count
---+-------+------
3 | 1 | 1 // 1 other row with value 1 and a lower id
5 | 2 | 1 // 1 other row with value 2 and a lower id
6 | 1 | 2 // 2 other rows with value 1 and a lower id.
I can get the first two columns by doing
select id as id1, value as value1 from table where exists
(select id as id2, value as value2 from table
where value2 = value1 and id1 < id2);
However I can't work out how to get the count. Should I use having or group by to get the count?

You can use row_number() for this:
select t.*
from (select t.*,
row_number() over (partition by value order by id) - 1 as prev_values
from t
) t
where prev_values > 0;

Related

pulling data from max field

I have a table structure with columns similar to the following:
ID | line | value
1 | 1 | 10
1 | 2 | 5
2 | 1 | 6
3 | 1 | 7
3 | 2 | 4
ideally, i'd like to pull the following:
ID | value
1 | 5
2 | 6
3 | 4
one solution would be to do something like the following:
select a.ID, a.value
from
myTable a
inner join (select id, max(line) as line from myTable group by id) b
on a.id = b.id and a.line = b.line
Given the size of the table and that this is just a part of a larger pull, I'd like to see if there's a more elegant / simpler way of pulling this directly.
This is a task for OLAP-functions:
select *
from myTable a
qualify
rank() -- assign a rank for each id
over (partition by id
order by line desc) = 1
Might return multiple rows per id if they share the same max line. If you want to return only one of them, add another column to the order by to make it unique or switch to row_number to get an indeterminate row.

sqlite query - select data which prefers some value

I'm trying to create an sqlite query but I'm having some problems.
Let's say that table has three columns id, foreign-id and value.
I need to select all rows with distinct foreign_id with a given value, however that value may not exist for all different foreign_ids.
In which case a row where value is set to some fallback value must be selected (such row always exists) for that foreign_id.
I apologize for my english since I'm not native english speaker.
Here is an example:
Table:
id | foreign_id | value
------------------------
1 | 1 | 1
2 | 1 | 2
3 | 1 | 3
4 | 2 | 1
5 | 2 | 3
If desired value is 2 and fallback value is 1 then the query should return
id | foreign_id | value|
------------------------
2 | 1 | 2
4 | 2 | 1
It return row with id 1 because it has desired value 2 for foreign_id 1.
And it return row with id 4 because for foreign_id 2 a row with value of 2 does not exits, so it selects a row with fallback value of 1.
Hope that clears up my question a bit.
You might be able to do it with a Union... something like:
SELECT DISTINCT (foreign_id), value
FROM TABLE
WHERE value = 2
UNION
SELECT DISTINCT (foreign_id), '1' as value
FROM TABLE
WHERE foreign_id NOT IN (
SELECT DISTINCT (foreign_id), value
FROM TABLE
WHERE value = 2
)
where everything that has a value 2 set value as 2 and everything else sets value as 1
(I haven't tested this query, you might have to do some tweaking)
This is the solution that I produced based on Seth's answer.
SELECT DISTINCT (foreign_id), value, id FROM testTable
WHERE value = 2
UNION SELECT DISTINCT (foreign_id), value, id FROM testTable
WHERE value = 1
AND foreign_id NOT IN
(
SELECT foreign_id
FROM testTable
WHERE value = 2
)

SQL advanced query counting the max value of a group

I want to create a query that will count the number of times the following condition is met.
I have a table that consists of multiple records with a matching foreign key. I want to check only for each of the foreign key groups if the highest value of another column of that key occurs more than once. If it does that will up the count.
Data
--------------------------
ID | Foreign Key | Value
--------------------------
1 | 1 | 1
2 | 1 | 2
3 | 1 | 2
4 | 2 | 0
5 | 2 | 2
6 | 2 | 1
7 | 3 | 0
8 | 3 | 1
9 | 3 | 1
The query I want should return the number 2. This is because the maximum value in group 1(Foreign Key) occurs twice, the value is 2. In group 2 the maximum value is 2 but only occurs once so this will not up the count. Then in group 3 the maximum value is 1 which occurs twice which will up the count. The count therefore ends up as two.
All credit goes to the comment from #Bob, but here is the sql that solved this problem.
SELECT Count(1)
FROM (SELECT DISTINCT foreign_key
FROM (SELECT foreign_key,
Count(1)
FROM data
WHERE ( foreign_key, value ) IN (SELECT foreign_key,
Max(value)
FROM data
GROUP BY foreign_key)
GROUP BY foreign_key
HAVING Count(1) > 1) AS data) AS data;
This is one approach:
select max(num_at_max)
from (select t.*, count(val) over(partition by fk) as num_at_max
from tbl t
join (select max(max_val_by_grp) as max_val_all_grps
from (select fk, max(val) as max_val_by_grp
from tbl
group by fk) x) x
on t.val = x.max_val_all_grps) x

Count rows in each 'partition' of a table

Disclaimer: I don't mean partition in the window function sense, nor table partitioning; I mean it in the more general sense, i.e. to divide up.
Here's a table:
id | y
----+------------
1 | 1
2 | 1
3 | 1
4 | 2
5 | 2
6 | null
7 | 2
8 | 2
9 | null
10 | null
I'd like to partition by checking equality on y, such that I end up with counts of the number of times each value of y appears contiguously, when sorted on id (i.e. in the order shown).
Here's the output I'm looking for:
y | count
-----+----------
1 | 3
2 | 2
null | 1
2 | 2
null | 2
So reading down the rows in that output we have:
The first partition of three 1's
The first partition of two 2's
The first partition of a null
The second partition of two 2's
The second partition of two nulls
Try:
SELECT y, count(*)
FROM (
SELECT y,
sum( xyz ) OVER (
ORDER BY id
rows between unbounded preceding
and current row
) qwe
FROM (
SELECT *,
case
when y is null and
lag(y) OVER ( ORDER BY id ) is null
then 0
when y = lag(y) OVER ( ORDER BY id )
then 0
else 1 end xyz
FROM table1
) alias
) alias
GROUP BY qwe, y
ORDER BY qwe;
demo: http://sqlfiddle.com/#!15/b1794/12

SQL - min() gets the lowest value, max() the highest, what if I want the 2nd (or 5th or nth) lowest value?

The problem I'm trying to solve is that I have a table like this:
a and b refer to point on a different table. distance is the distance between the points.
| id | a_id | b_id | distance | delete |
| 1 | 1 | 1 | 1 | 0 |
| 2 | 1 | 2 | 0.2345 | 0 |
| 3 | 1 | 3 | 100 | 0 |
| 4 | 2 | 1 | 1343.2 | 0 |
| 5 | 2 | 2 | 0.45 | 0 |
| 6 | 2 | 3 | 110 | 0 |
....
The important column I'm looking is a_id. If I wanted to keep the closet b for each a, I could do something like this:
update mytable set delete = 1 from (select a_id, min(distance) as dist from table group by a_id) as x where a_gid = a_gid and distance > dist;
delete from mytable where delete = 1;
Which would give me a result table like this:
| id | a_id | b_id | distance | delete |
| 1 | 1 | 1 | 1 | 0 |
| 5 | 2 | 2 | 0.45 | 0 |
....
i.e. I need one row for each value of a_id, and that row should have the lowest value of distance for each a_id.
However I want to keep the 10 closest points for each a_gid. I could do this with a plpgsql function but I'm curious if there is a more SQL-y way.
min() and max() return the smallest and largest, if there was an aggregate function like nth(), which'd return the nth largest/smallest value then I could do this in similar manner to the above.
I'm using PostgeSQL.
Try this:
SELECT *
FROM (
SELECT a_id, (
SELECT b_id
FROM mytable mib
WHERE mib.a_id = ma.a_id
ORDER BY
dist DESC
LIMIT 1 OFFSET s
) AS b_id
FROM (
SELECT DISTINCT a_id
FROM mytable mia
) ma, generate_series (1, 10) s
) ab
WHERE b_id IS NOT NULL
Checked on PostgreSQL 8.3
I love postgres, so it took it as a challenge the second I saw this question.
So, for the table:
Table "pg_temp_29.foo"
Column | Type | Modifiers
--------+---------+-----------
value | integer |
With the values:
SELECT value FROM foo ORDER BY value;
value
-------
0
1
2
3
4
5
6
7
8
9
14
20
32
(13 rows)
You can do a:
SELECT value FROM foo ORDER BY value DESC LIMIT 1 OFFSET X
Where X = 0 for the highest value, 1 for the second highest, 2... And so forth.
This can be further embedded in a subquery to retrieve the value needed. So, to use the dataset provided in the original question we can get the a_ids with the top ten lowest distances by doing:
SELECT a_id, distance FROM mytable
WHERE id IN
(SELECT id FROM mytable WHERE t1.a_id = t2.a_id
ORDER BY distance LIMIT 10);
ORDER BY a_id, distance;
a_id | distance
------+----------
1 | 0.2345
1 | 1
1 | 100
2 | 0.45
2 | 110
2 | 1342.2
Does PostgreSQL have the analytic function rank()? If so try:
select a_id, b_id, distance
from
( select a_id, b_id, distance, rank() over (partition by a_id order by distance) rnk
from mytable
) where rnk <= 10;
This SQL should find you the Nth lowest salary should work in SQL Server, MySQL, DB2, Oracle, Teradata, and almost any other RDBMS: (note: low performance because of subquery)
SELECT * /*This is the outer query part */
FROM mytable tbl1
WHERE (N-1) = ( /* Subquery starts here */
SELECT COUNT(DISTINCT(tbl2.distance))
FROM mytable tbl2
WHERE tbl2.distance < tbl1.distance)
The most important thing to understand in the query above is that the subquery is evaluated each and every time a row is processed by the outer query. In other words, the inner query can not be processed independently of the outer query since the inner query uses the tbl1 value as well.
In order to find the Nth lowest value, we just find the value that has exactly N-1 values lower than itself.