Exclude columns from Group by - sql

I have a table like this
My current query
Select team,
stat_id,
max(statsval) as statsval
from tbl
group by team,
statid
Issue :
I need to get season also in select and obliviously I need to add to group by but is is giving me un expected results I can't change my group by.Because I need to group by stat_id only I can group by season. I need to get the season of the max() record. Can some one help me on this?
I even tried
Select team,
stat_id,
max (seasonid),
max(statsval) as statsval
from tbl
group by team,
statid
But it takes the max season not exactly the correct result.
Excepted result
+--------+--------+-------+---------+---------+
| season | team | round | stat_id | statval |
+--------+--------+-------+---------+---------+
| 2004 | 500146 | 3 | 1 | 5 |
| 2007 | 500147 | 1 | 1 | 4 |
+--------+--------+-------+---------+---------+

Depending on your edition of SQL Server, this can be done with Window functions only:
SELECT DISTINCT team
, stat_id
, max(statsval) OVER (PARTITION BY team, stat_id) statsval
, FIRST_VALUE(season_id) OVER (PARTITION BY team, stat_id ORDER BY statsval DESC)
FROM tbl

Try this using windows functions
Select distinct team,
statid,
max(statsval) OVER(PARTITION BY team,statid ORDER BY seasonid) as statid,
max(seasonid) OVER(PARTITION BY team,statid ORDER BY statid)
from tbl

Try this and look up the team id after the grouping is done:
;with tmp as
(
select team,
stat_id,
max(statsval) as statsval
from tbl
group by team,
statid
)
select tmp.*,
tbl.seasonid
from tmp join tbl
on tmp.team = tbl.team and tmp.statid = tbl.stat_id;

If you want the complete row, you can simply use a correlated subquery:
Select t.*
from tbl t
where t.season = (select max(t2.season)
from tbl t2
where t2.team = t.team and t2.statsval = t.statsval
);
With an index on tbl(team, statsval, season), this probably has as good as or better performance than other options.
A fun method that has worse performance (even with the index) is:
select top (1) with ties t.*
from tbl t
order by row_number() over (partition by team, statsval order by season desc);

Related

Is there a way to calculate average based on distinct rows without using a subquery?

If I have data like so:
+----+-------+
| id | value |
+----+-------+
| 1 | 10 |
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
| 2 | 20 |
+----+-------+
How do I calculate the average based on the distinct id WITHOUT using a subquery (i.e. querying the table directly)?
For the above example it would be (10+20+30)/3 = 20
I tried to do the following:
SELECT AVG(IF(id = LAG(id) OVER (ORDER BY id), NULL, value)) AS avg
FROM table
Basically I was thinking that if I order by id and check the previous row to see if it has the same id, the value should be NULL and thus it would not be counted into the calculation, but unfortunately I can't put analytical functions inside aggregate functions.
As far as I know, you can't do this without a subquery. I would use:
SELECT AVG(avg_value)
FROM
(
SELECT AVG(value) AS avg_value
FROM yourTable
GROUP BY id
) t;
WITH RANK AS (
Select *,
ROW_NUMBER() OVER(PARTITION BY ID) AS RANK
FROM
TABLE
QUALIFY RANK = 1
)
SELECT
AVG(VALUES)
FROM RANK
The outer query will have other parameters that need to access all the data in the table
I interpret this comment as wanting an average on every row -- rather than doing an aggregation. If so, you can use window functions:
select t.*,
avg(case when seqnum = 1 then value end) over () as overall_avg
from (select t.*,
row_number() over (partition by id order by id) as seqnum
from t
) t;
Yes there is a way,
Simply use distinct inside the avg function as below :
select avg(distinct value) from tab;
http://sqlfiddle.com/#!4/9d156/2/0

SELECT only rows when count=1 - without additional SELECT or/ and having

I wonder if there is a way to build a query without joins or/and having clause that would return the same result as the query below? I already found similar question (select and count rows) but didn't find the answer.
SELECT ID, CATEGORY, PRODUCT, DESC
FROM SALES s
JOIN (SELECT ID, COUNT(CATEGORY)
FROM SALES
GROUP by ID
HAVING count(CATEGORY)=1) S2 ON S.ID=S2.ID;
So the table looks like
ID | Country | Product | DESC
1 | USA | Cream | Super cream
1 | Canada | Toothpaste| Great Toothpaste
2 | Germany | Beer | Tasty Beer
and the result I would like to get is
ID | Country | Product | DESC
2 | Germany | Beer | Tasty Beer
because id=1 has 2 different countries assigned
I'm using SQL Server
In general I'm interested in the 'fastest' solution. The table is huge and I just wonder if there is a way to do it smarter.
you may want to consider this query.
select t2.id, t2.category, t2.product, t2.desc from (
select id, category, product,
case when (select count(1) from sales where id=t1.id group by id) as ct
,desc
from sales t1) t2 where t2.ct = 1
You can try this Query:
SELECT ID, CATEGORY, PRODUCT, DESC
FROM SALES s
WHERE 1 = (
SELECT COUNT(*)
FROM SALES x
WHERE x.ID = s.ID
);
One method uses window functions:
SELECT ID, CATEGORY, PRODUCT, DESC
FROM (SELECT s.*, COUNT(*) OVER (PARTITION BY ID) as cnt
FROM SALES s
) s
WHERE cnt = 1;
However, the fastest solution would require a unique id and an index. That would be:
select s.*
from sales s
where not exists (select 1
from sales s2
where s2.id = s.id and
s2.<unique key> <> s.<unique key>
);
This can take advantage of an index on (id, <unique key>).
Note: This particular formulation assumes that category is never null.

Getting proper count for longest user streaks

I'm having a difficult time getting the correct counts for longest user streaks. Streaks are consecutive days with check-ins for each user.
Any help would be greatly appreciated. Here's a fiddle with my script and sample data: http://sqlfiddle.com/#!17/d2825/1/0
check_ins table:
user_id goal_id check_in_date
------------------------------------------
| colt | 40365fa0 | 2019-01-07 15:35:53
| colt | d31efe70 | 2019-01-11 15:35:52
| berry| be2fcd50 | 2019-01-12 15:35:51
| colt | e754d050 | 2019-01-13 15:17:16
| colt | 9c87a7f0 | 2019-01-14 15:35:54
| colt | ucgtdes0 | 2019-01-15 12:30:59
PostgreSQL script:
WITH dates(DATE) AS
(SELECT DISTINCT Cast(check_in_date AS DATE),
user_id
FROM check_ins),
GROUPS AS
(SELECT Row_number() OVER (
ORDER BY DATE) AS rn, DATE - (Row_number() OVER (ORDER BY DATE) * interval '1' DAY) AS grp, DATE, user_id
FROM dates)
SELECT Count(*) AS streak,
user_id
FROM GROUPS
GROUP BY grp,
user_id
ORDER BY 1 DESC;
Here's what I get when I run the code above:
streak user_id
--------------
4 colt
1 colt
1 berry
What it should be. I'd like to also only get the longest streak for each user.
streak user_id
--------------
3 colt
1 berry
In Postgres, you can write this as:
select distinct on (user_id) user_id, count(distinct check_in_date::date) as num_days
from (select ci.*,
dense_rank() over (partition by user_id order by check_in_date::date) as seq
from check_ins ci
) ci
group by user_id, check_in_date::date - seq * interval '1 day'
order by user_id, num_days desc;
Here is a db<>fiddle.
This follows similar logic to your approach, but your query seems more complicated than necessary. This does use the Postgres distinct on functionality, which is handy to avoid an additional subquery.
Firstly, Thanks for the fiddle script and sample data.
You are not using the right row_number to implement gaps and islands problem. It should be like in the below query for your data set. On top of that, to get the one with the highest streak, you would need to use DISTINCT ON after grouping by the the group number (grp in your query, I called it seq).
I hope you want to see only the distinct entries per day for a user's data. I have tried to reflect the same with slight changes in the with clause.
SELECT * FROM (
WITH check_ins_dt AS
( SELECT DISTINCT check_in_date::DATE as check_in_date,
user_id
FROM check_ins)
SELECT DISTINCT ON (user_id) COUNT(*) AS streak,user_id
FROM (
SELECT c.*,
ROW_NUMBER() OVER(
ORDER BY check_in_date
) - ROW_NUMBER() OVER(
PARTITION BY user_id
ORDER BY check_in_date
) AS seq
FROM check_ins_dt c
) s
GROUP BY user_id,
seq
ORDER BY user_id,
COUNT(*) DESC ) q order
by streak desc;
Demo

order by count(catid) without group

I want count how many rows use the same catid and order the query by total.
id | catid | name
0 | 1 | foo
1 | 1 | bar
2 | 2 | paint
I've tried COUNT(catid) but this requires a GROUP BY, and I do not want to compress rows.
How may I do this?
Do you want window functions?
select t.*, count(*) over (partition by catid) as cat_cnt
from t
order by cat_cnt, catid;
I should note that if you don't want to see the total, you can put the window function in the order by:
select *
from t
order by count(*) over (partition by catid), catid
Maybe you could run the GROUP BY as a separate SELECT, then JOIN?
E.g.
select orig.*, summ.totals
from t
join (select count(cat_id) totals
from t
group by cat_id) summ
on t.cat_id = summ.cat_id;

Selecting compared pairs from table

I don't really know how to describe it. I have a table:
ID | Name | Date
-------------------------
1 | Mike | 01.01.2016
1 | Michael | 02.03.2016
2 | Samuel | 23.12.2015
2 | Sam | 05.03.2015
3 | Tony | 02.04.2012
I want to select pairs of IDs and Names with latest dates in each pair. The result here should be:
ID | Name | Date
-------------------------
1 | Michael | 02.03.2016
2 | Samuel | 23.12.2015
3 | Tony | 02.04.2012
How do I achieve this?
Oracle Database 11g
You can do it using the ROW_NUMBER() analytic function:
SELECT id, name, "date"
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY "date" DESC ) rn
FROM table_name t
)
WHERE rn = 1
This requires only a single table scan (it does not have a self-join or correlated sub-query - i.e. IN (...) or EXISTS(...)).
Have a sub-select that returns each id and it's max date:
select * from table
where (id, date) in (select id, max(date) from table group by id)
You can use NOT EXISTS() :
SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
WHERE t.id = s.id and s.date > t.date)
Possibly the most efficient method is:
select t.*
from table t
where t.date = (select max(date) from table t2 where t2.id = t.id);
along with an index on table(id, date).
This version should scan the table and look up the correct value in the index.
Or, if there are only three columns, you can use keep:
select id, max(date) as date,
max(name) keep (dense_rank first order by date desc) as name
from table
group by id;
I have found that this version works very well in Oracle.