Group By Retrieve 4 Values - sql

I have the following query
SELECT Cod ,
MIN(Id) AS id_Min,
-- retrieve value min in the middle as id_Min_Middle,
-- retrieve value max in the middle as id_Max_Middle,
MAX(Id) AS id_Max,
COUNT(*) AS Tot
FROM Table a ( NOLOCK )
GROUP BY Cod
HAVING COUNT(*)=4
How could I retrieve the values between min and max as I have done for min and max?
If I use (SUM(Id) - (MIN(Id)+MAX(Id)) I get the sum of middle min and max, but not the values I want.
EXAMPLES
Cod | Id
Stack 10
Stack 15
Stack 11
Stack 40
Overflow 1
Overflow 120
Overflow 15
Overflow 100
Required output
Cod | Min | Min_In_The_Middle | Max_In_The_Middle | Max
Stack 10 11 15 40
Overflow 1 15 100 120

Just only one [Table|[Clustered] Index]]Scan (demo here):
SELECT pvt.Cod,
pvt.[1] AS MinValue,
pvt.[2] AS MinInterValue,
pvt.[3] AS MaxInterValue,
pvt.[4] AS MaxValue
FROM
(
SELECT x.Cod, x.ID, x.RowNumAsc
FROM
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY t.Cod ORDER BY t.ID ASC) RowNumAsc,
ROW_NUMBER() OVER(PARTITION BY t.Cod ORDER BY t.ID DESC) RowNumDesc
FROM MyTable t
) x
WHERE x.RowNumAsc = 1 AND x.RowNumDesc = 4
OR x.RowNumAsc = 2 AND x.RowNumDesc = 3
OR x.RowNumAsc = 3 AND x.RowNumDesc = 2
OR x.RowNumAsc = 4 AND x.RowNumDesc = 1
) y
PIVOT ( MAX(y.ID) FOR y.RowNumAsc IN ([1], [2], [3], [4]) ) pvt;

Try using this, best of luck
WITH temp AS
(SELECT cod, MIN (ID) min_id, MAX (ID) max_id
FROM tab
GROUP BY cod
HAVING COUNT (ID) = 4)
SELECT code, temp.min_id,
(SELECT MIN (ID)
FROM tab
WHERE cod = temp.cod AND ID NOT IN (temp.min_id)
GROUP BY cod) min_mid_id,
(SELECT MAX (ID)
FROM tab
WHERE cod = temp.cod AND ID NOT IN (temp.max_id)
GROUP BY cod) max_min_id, temp.max_id
FROM temp;

I'm not sure what it means for your question to be tagged plsql and sql-server. But I'll assume you're working with a database system that supports CTEs and window functions.
To generalize what you're been trying to do, first assign row numbers to the rows, then use whatever technique you want to achieve the pivot:
;WITH OrderedValues as (
SELECT Cod,Id,ROW_NUMBER() OVER (PARTITION BY Cod ORDER BY Id) as rn
COUNT(*) OVER (PARTITION BY Cod) as Cnt
FROM Table (NOLOCK)
), With4Values as (
SELECT * from OrderedValues where Cnt=4
)
SELECT Cod,
--However you want to do the pivot. Here I'll use MAX/CASE
MAX(CASE WHEN rn=1 THEN Id END) as Value1,
MAX(CASE WHEN rn=2 THEN Id END) as Value2,
MAX(CASE WHEN rn=3 THEN Id END) as Value3,
MAX(CASE WHEN rn=4 THEN Id END) as Value4
FROM
With4Values
GROUP BY
Cod
You can hopefully see that this is more easily extended to more columns than answering your overly specific questions about 3 rows, or 4 rows. But if you need to deal with an arbitrary number of columns, you'll have to switch to dynamic SQL.

I understand you want to exclude the extreme values and find min and max for the rest.
This is what I think of, but I had no chance to run and test it...
WITH Extremes AS ( SELECT Cod, MAX(ID) AS Id_Max, MIN(ID) AS Id_Min
FROM [Table] a GROUP BY Cod)
SELECT
e.Cod,
e.Id_Min,
MIN(a.Id) AS id_Min_Middle,
MAX(a.Id) AS id_Max_Middle,
e.Id_Max
FROM Extremes e
LEFT JOIN [Table] a ON a.Cod = e.Cod AND a.Id > e.Id_Min AND a.Id < e.Id_Max
GROUP BY e.Cod

Related

select value based on max of other column

I have a few questions about a table I'm trying to make in Postgres.
The following table is my input:
id
area
count
function
1
100
20
living
1
200
30
industry
2
400
10
living
2
400
10
industry
2
400
20
education
3
150
1
industry
3
150
1
education
I want to group by id and get the dominant function based on max area. With summing up the rows for area and count. When area is equal it should be based on max count, when area and count is equal it should be based on prior function (i still have to decide if education is prior to industry or vice versa). So the result should be:
id
area
count
function
1
300
50
industry
2
1200
40
education
3
300
2
industry
I tried a lot of things and maybe it's easy, but i don't get it. Can someone help to get the right SQL?
One method uses row_number() and conditional aggregation:
select id, sum(area), sum(count),
max(function) over (filter where seqnum = 1) as function
from (select t.*,
row_number() over (partition by id order by area desc) as seqnum
from t
) t
group by id;
Another method uses ``distinct on`:
select id, sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
function
from t
order by id, area desc;
Use a scalar sub-query for "function".
select t.id, sum(t.area), sum(t.count),
(
select "function"
from the_table
where id = t.id
order by area desc, count desc, "function" desc
limit 1
) as "function"
from the_table as t
group by t.id order by t.id;
SQL Fiddle
you can use sum as window function:
select distinct on (t.id)
id,
sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
( select function from tbl_test where tbl_test.id = t.id order by count desc limit 1 ) as function
from tbl_test t
This is how you get the function for each group based on id:
select id, function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null;
(we ensure that no yt2 exists that would be of the same id but of higher areay)
This would work nicely, but you might have several max areas with different values. To cope with this isue, let's ensure that exactly one is chosen:
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id;
Now, let's join this to our main table;
select yourtable.id, sum(yourtable.area), sum(yourtable.count), t.function
from yourtable
join (
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id
) t
on yourtable.id = t.id
group by yourtable.id;

select two grouped by columns but only select row with the highest COUNT()

I have a table that consists of three columns - UPC, ATTRIBUTE, STORE_NUM. I have 10 stores and 2 UPCs at each with different ATTRIBUTEs.
Every store either has either attribute X or Y. I group by UPC and ATTRIBUTE and get the count of stores.
SELECT [UPC], [ATTRIBUTE], COUNT([STORE_NUM]) AS [COUNT]
FROM TABLEA
GROUP BY [UPC], [ATTRIBUTE]
Yields this:
UPC ATTRIBUTE COUNT
1 X 8
1 Y 2
2 X 1
2 Y 9
And I want to select UPC and ATTRIBUTE with the highest count. My desired output would be this:
UPC ATTRIBUTE
1 X
2 Y
I can't figure out how to reach this desired outcome.
You can use window functions with aggregation:
SELECT *
FROM (SELECT [UPC], [ATTRIBUTE], COUNT(*) AS [COUNT],
ROW_NUMBER() OVER (PARTITION BY UPC ORDER BY COUNT(*) DESC) as seqnum
FROM TABLEA
GROUP BY [UPC], [ATTRIBUTE]
) x
WHERE seqnum = 1;
Use RANK() if you want duplicates in the event of ties.
Use row_number and a subquery:
SELECT UPC, ATTRIBUTE
FROM (
SELECT UPC, ATTRIBUTE, ROW_NUMBER() OVER (PARTITION BY UPC ORDER BY a_count DESC) as rn
FROM ( SELECT [UPC],[ATTRIBUTE],COUNT([STORE_NUM]) AS [a_COUNT]
FROM TABLEA
GROUP BY [UPC],[ATTRIBUTE]
) t
) q
WHERE q.rn = 1

Finding top count of a value in a table using SQL

I'm looking for a way to find the top count value of a column by SQL.
If for example this is my data
id type
----------
1 A
1 B
1 A
2 C
2 D
2 D
I would like the result to be:
1 A
2 D
I'm looking for a way to do it without groping by the column I count (type in the example)
Thanks
Statistically, this is called the "mode". You can calculate it using window functions:
select id, type, cnt
from (select id, type, count(*) as cnt,
row_number() over (partition by id order by count(*) desc) as seqnum
from t
group by id, type
) t
where seqnum = 1;
If there are ties, then an arbitrary value is chosen from among the ties.
You are looking for the statistic mode (the most often ocurring value):
select id, stats_mode(type)
from mytable
group by id
order by id;
Not all DBMS support this however. Check your docs, wheher this function or a similar one is available in your DBMS.
Just GROUP BY id, type and keep the rows with the maximum counter:
select id, type
from tablename
group by id, type
having count(*) = (
select count(*) from tablename group by id, type order by count(*) desc limit 1
)
See the demo
Or
select id, type
from tablename
group by id, type
having count(*) = (
select max(t.counter) from (select count(*) counter from tablename group by id, type) t
)
See the demo

SQL query to get recent items

I have a sql table
id item date
A apple 2017-09-17
A banana 2017-08-10
A orange 2017-10-01
B banana 2015-06-17
B apple 2014-06-18
How do I write a sql query, so that for each id I get the two most recent items based on date. ex:
id recent second_recent
a orange apple
b banana apple
You can use row_number() and conditional aggregation:
select id,
max(case when seqnum = 1 then item end) as most_recent,
max(case when seqnum = 2 then item end) as most_recent_but_one,
from (select t.*,
row_number() over (partition by id order by date desc) as seqnum
from t
) t
group by id;
Like said on:
SQL: Group by minimum value in one field while selecting distinct rows
You must use A group By to get min
SELECT mt.*,
FROM MyTable mt INNER JOIN
(
SELECT item AS recent, MIN(date) MinDate, ID
FROM MyTable
GROUP BY ID
) t ON mt.ID = t.ID AND mt.date = t.MinDate
I think you can do the same with a order by to get two value instead of one
You can use Pivot table
SELECT first_column AS <first_column_alias>,
[pivot_value1], [pivot_value2], ... [pivot_value_n]
FROM
(<source_table>) AS <source_table_alias>
PIVOT
(
aggregate_function(<aggregate_column>)
FOR <pivot_column> IN ([pivot_value1], [pivot_value2], ... [pivot_value_n])
) AS <pivot_table_alias>;
Learn More with example here
Example

PostgreSQL - column value changed - select query optimization

Say we have a table:
CREATE TABLE p
(
id serial NOT NULL,
val boolean NOT NULL,
PRIMARY KEY (id)
);
Populated with some rows:
insert into p (val)
values (true),(false),(false),(true),(true),(true),(false);
ID VAL
1 1
2 0
3 0
4 1
5 1
6 1
7 0
I want to determine when the value has been changed. So the result of my query should be:
ID VAL
2 0
4 1
7 0
I have a solution with joins and subqueries:
select min(id) id, val from
(
select p1.id, p1.val, max(p2.id) last_prev
from p p1
join p p2
on p2.id < p1.id and p2.val != p1.val
group by p1.id, p1.val
) tmp
group by val, last_prev
order by id;
But it is very inefficient and will work extremely slow for tables with many rows.
I believe there could be more efficient solution using PostgreSQL window functions?
SQL Fiddle
This is how I would do it with an analytic:
SELECT id, val
FROM ( SELECT id, val
,LAG(val) OVER (ORDER BY id) AS prev_val
FROM p ) x
WHERE val <> COALESCE(prev_val, val)
ORDER BY id
Update (some explanation):
Analytic functions operate as a post-processing step. The query result is broken into groupings (partition by) and the analytic function is applied within the context of a grouping.
In this case, the query is a selection from p. The analytic function being applied is LAG. Since there is no partition by clause, there is only one grouping: the entire result set. This grouping is ordered by id. LAG returns the value of the previous row in the grouping using the specified order. The result is each row having an additional column (aliased prev_val) which is the val of the preceding row. That is the subquery.
Then we look for rows where the val does not match the val of the previous row (prev_val). The COALESCE handles the special case of the first row which does not have a previous value.
Analytic functions may seem a bit strange at first, but a search on analytic functions finds a lot of examples walking through how they work. For example: http://www.cs.utexas.edu/~cannata/dbms/Analytic%20Functions%20in%20Oracle%208i%20and%209i.htm Just remember that it is a post-processing step. You won't be able to perform filtering, etc on the value of an analytic function unless you subquery it.
Window function
Instead of calling COALESCE, you can provide a default from the window function lag() directly. A minor detail in this case since all columns are defined NOT NULL. But this may be essential to distinguish "no previous row" from "NULL in previous row".
SELECT id, val
FROM (
SELECT id, val, lag(val, 1, val) OVER (ORDER BY id) <> val AS changed
FROM p
) sub
WHERE changed
ORDER BY id;
Compute the result of the comparison immediately, since the previous value is not of interest per se, only a possible change. Shorter and may be a tiny bit faster.
If you consider the first row to be "changed" (unlike your demo output suggests), you need to observe NULL values - even though your columns are defined NOT NULL. Basic lag() returns NULL in case there is no previous row:
SELECT id, val
FROM (
SELECT id, val, lag(val) OVER (ORDER BY id) IS DISTINCT FROM val AS changed
FROM p
) sub
WHERE changed
ORDER BY id;
Or employ the additional parameters of lag() once again:
SELECT id, val
FROM (
SELECT id, val, lag(val, 1, NOT val) OVER (ORDER BY id) <> val AS changed
FROM p
) sub
WHERE changed
ORDER BY id;
Recursive CTE
As proof of concept. :)
Performance won't keep up with posted alternatives.
WITH RECURSIVE cte AS (
SELECT id, val
FROM p
WHERE NOT EXISTS (
SELECT 1
FROM p p0
WHERE p0.id < p.id
)
UNION ALL
SELECT p.id, p.val
FROM cte
JOIN p ON p.id > cte.id
AND p.val <> cte.val
WHERE NOT EXISTS (
SELECT 1
FROM p p0
WHERE p0.id > cte.id
AND p0.val <> cte.val
AND p0.id < p.id
)
)
SELECT * FROM cte;
With an improvement from #wildplasser.
SQL Fiddle demonstrating all.
Can even be done without window functions.
SELECT * FROM p p0
WHERE EXISTS (
SELECT * FROM p ex
WHERE ex.id < p0.id
AND ex.val <> p0.val
AND NOT EXISTS (
SELECT * FROM p nx
WHERE nx.id < p0.id
AND nx.id > ex.id
)
);
UPDATE: Self-joining a non-recursive CTE (could also be a subquery instead of a CTE)
WITH drag AS (
SELECT id
, rank() OVER (ORDER BY id) AS rnk
, val
FROM p
)
SELECT d1.*
FROM drag d1
JOIN drag d0 ON d0.rnk = d1.rnk -1
WHERE d1.val <> d0.val
;
This nonrecursive CTE approach is surprisingly fast, although it needs an implicit sort.
Using 2 row_number() computations: This is also possible to do with usual "islands and gaps" SQL technique (could be useful if you can't use lag() window function for some reason:
with cte1 as (
select
*,
row_number() over(order by id) as rn1,
row_number() over(partition by val order by id) as rn2
from p
)
select *, rn1 - rn2 as g
from cte1
order by id
So this query will give you all islands
ID VAL RN1 RN2 G
1 1 1 1 0
2 0 2 1 1
3 0 3 2 1
4 1 4 2 2
5 1 5 3 2
6 1 6 4 2
7 0 7 3 4
You see, how G field could be used to group this islands together:
with cte1 as (
select
*,
row_number() over(order by id) as rn1,
row_number() over(partition by val order by id) as rn2
from p
)
select
min(id) as id,
val
from cte1
group by val, rn1 - rn2
order by 1
So you'll get
ID VAL
1 1
2 0
4 1
7 0
The only thing now is you have to remove first record which can be done by getting min(...) over() window function:
with cte1 as (
...
), cte2 as (
select
min(id) as id,
val,
min(min(id)) over() as mid
from cte1
group by val, rn1 - rn2
)
select id, val
from cte2
where id <> mid
And results:
ID VAL
2 0
4 1
7 0
A simple inner join can do it. SQL Fiddle
select p2.id, p2.val
from
p p1
inner join
p p2 on p2.id = p1.id + 1
where p2.val != p1.val