Grouping values when only all values are equal - sql

I'm trying to group some data. This is the situation:
I'm doing this select:
select
min(id) id
,week
,percentage
from table1
group by week,percentage
Could anyone help me out? I want that it only groups if all values are equal. If there is some different value in percentage it should not be grouping.. the id3 should not group for the week 3. I'm using SQL Server 2012.
Thanks.

You don't really want aggregation. You want to remove certain ids.
The following concatenates all the week/percentage values together to get a single identifier to identify duplicates. Despite what I just said, this then uses aggregation for "filtering" to get the first one:
select min(id) as id, week, percentage
from (select t1.*,
string_agg(concat(week, ':', percentage), ',') within group (order by week) over (partition by id) as id_week_percentages
from table1 t1
) t1
group by id_week_percentages, week, percentage;
The aggregation just allows this to be written without an extra subquery. You could do something similar as:
select top (1) with ties id, week, percentage
from (select t1.*,
string_agg(concat(week, ':', percentage), ',') within group (order by week) over (partition by id) as id_week_percentages
from table1 t1
) t1
order by row_number() over (partition by id_week_percentages, week, percentage order by id);
Or use a separate subquery to pull the first id for each week/percentage combination.
EDIT:
With for xml path:
select top (1) with ties id, week, percentage
from (select t1.*,
(select concat(week, ':', percentage, ',')
from table1 tt1
where tt1.id = t1.id
order by week, percentage
for xml path ('')
) as id_week_percentages
from table1 t1
) t1
order by row_number() over (partition by id_week_percentages, week, percentage order by id);

Related

Display Prev and Current value based on a ID - SQL

I am not sure if a similar question has been posted. I was unable to find one.
I have the following table:
What I trying to get is the below:
Any advice will be appreciated.
Thanks in advance,
Sam
Worked both in Oracle and Snowflake:
SELECT t.ID,
t.prev_dt,
t.current_dt,
t.prev_code,
t.curr_code
FROM (
SELECT id,
order_dt,
LAG(order_dt, 1) OVER (PARTITION by id ORDER BY id, order_dt) prev_dt,
upd_dt current_dt,
LAG(code, 1) OVER (PARTITION by id ORDER BY id, upd_dt) prev_code,
code curr_code
FROM t111
) t
INNER JOIN (
SELECT id,
MAX(order_dt) max_date
FROM t111
GROUP BY id
) idm
ON idm.id=t.id AND t.order_dt=idm.max_date
You seem to want window function lag():
select
id,
lag(order_dt) over(partition by id order by order_by_id) prev_dt,
order_dt current_dt,
lag(code) over(partition by id order by order_by_id) prev_code,
code curr_code
from mytable
Note that the above query does not filter the records of the table. When there is no preceeding record, lag() returns null. If you want to filter out the first record per group, and assuming that such record is identify by order_by_id = 1, you can do:
select *
from (
select
id,
lag(order_dt) over(partition by id order by order_by_id) prev_dt,
order_dt current_dt,
lag(code) over(partition by id order by order_by_id) prev_code,
code curr_code,
order_by_id
from mytable
) t
where order_by_id > 1
Window functions might be the best approach. But you could also use join:
select t1.id, t1.order_dt as prev_dt, t2.upd_dt as curr_date,
t1.code as prev_code, t2.code as curr_code
from t t1 join
t t2
on t1.id = t2.id and t1.order_by_id = 1 and t2.order_by_id = 2;
In Snowflake, I simply do not know whether this would have better, worse, or similar performance to using window functions.

Show entire record from table with minimum timestamp in a group

I have been trying for about three hours to solve this problem but cannot find the solution.
How would I show the entire row (all 20 columns) for the first occurance (minimum time) of each name in my table?
For example, I would like to do something like this, which does not work:
SELECT name, MIN(time), col1, col2, col3, col4
FROM table
GROUP BY name;
You have to first get the minimum time for each name, and then join back to your original table where the name/time matches.
To get the minimum time:
SELECT name, MIN(time) AS minTime
FROM myTable
GROUP BY name;
Then, get all columns:
SELECT m.*
FROM myTable m
JOIN(
SELECT name, MIN(time) AS minTime
FROM myTable
GROUP BY name) tmp ON tmp.name = m.name AND tmp.minTime = m.time;
Most databases support ANSI standard window functions. With these, you can just do:
select t.*
from (select t.*, row_number() over (partition by name order by time) as seqnum
from table t
) t
where seqnum = 1;

Create a running subtotal for SQL Server

I am trying to get a running subtotal (understanding this is different from subtotals for groups, and the rollup approach).
Tried using
Row_Number() over (order by ID_Number) as Row_Count
and nesting it in select statements and using a LEFT OUTER JOIN on itself (which just churns).
What I am trying to get is this:
if ROW_COUNT > 1 THEN RUNNINGTOTAL = Volume_Category + (RUNNINGTOTAL for ID_Number where ROW_COUNT= ROW_COUNT(for this ID_Number*)-1)
I have a table with a list of unique "ID-Numbers" which are the focus here.
Unless you are using SQL Server 2012, the easiest way to do a cumulative sum is with a correlated subquery. Here is the template for the code:
select t.*,
(select sum(val) from t t2 where t2.ordercol <= t.ordercol) as cumesum
from t
In 2012, you can do:
select t.*,
sum(val) over (order by ordercol) as cumesum
from t
In both these, val is the column you want to sum and ordercol is how the ordering is specified.
Try this:
SELECT
T1.Id,
SUM(T2.Amount) Total
FROM tbl T1
JOIN Tbl T2
ON T1.Id>= T2.Id
GROUP BY T1.Id

PostgreSQL MAX and GROUP BY

I have a table with id, year and count.
I want to get the MAX(count) for each id and keep the year when it happens, so I make this query:
SELECT id, year, MAX(count)
FROM table
GROUP BY id;
Unfortunately, it gives me an error:
ERROR: column "table.year" must appear in the GROUP BY clause or be
used in an aggregate function
So I try:
SELECT id, year, MAX(count)
FROM table
GROUP BY id, year;
But then, it doesn't do MAX(count), it just shows the table as it is. I suppose because when grouping by year and id, it gets the max for the id of that specific year.
So, how can I write that query? I want to get the id´s MAX(count) and the year when that happens.
The shortest (and possibly fastest) query would be with DISTINCT ON, a PostgreSQL extension of the SQL standard DISTINCT clause:
SELECT DISTINCT ON (1)
id, count, year
FROM tbl
ORDER BY 1, 2 DESC, 3;
The numbers refer to ordinal positions in the SELECT list. You can spell out column names for clarity:
SELECT DISTINCT ON (id)
id, count, year
FROM tbl
ORDER BY id, count DESC, year;
The result is ordered by id etc. which may or may not be welcome. It's better than "undefined" in any case.
It also breaks ties (when multiple years share the same maximum count) in a well defined way: pick the earliest year. If you don't care, drop year from the ORDER BY. Or pick the latest year with year DESC.
For many rows per id, other query techniques are (much) faster. See:
Select first row in each GROUP BY group?
Optimize GROUP BY query to retrieve latest row per user
select *
from (
select id,
year,
thing,
max(thing) over (partition by id) as max_thing
from the_table
) t
where thing = max_thing
or:
select t1.id,
t1.year,
t1.thing
from the_table t1
where t1.thing = (select max(t2.thing)
from the_table t2
where t2.id = t1.id);
or
select t1.id,
t1.year,
t1.thing
from the_table t1
join (
select id, max(t2.thing) as max_thing
from the_table t2
group by id
) t on t.id = t1.id and t.max_thing = t1.thing
or (same as the previous with a different notation)
with max_stuff as (
select id, max(t2.thing) as max_thing
from the_table t2
group by id
)
select t1.id,
t1.year,
t1.thing
from the_table t1
join max_stuff t2
on t1.id = t2.id
and t1.thing = t2.max_thing

Query to find the FIRST AND SECOND largest value from a group

i have a query like this:
SELECT
DATEPART(year,some_date),
DATEPART(month,some_date),
MAX(some_value) max_value
FROM
some_table
GROUP BY
DATEPART(year,some_date),
DATEPART(month,some_date)
This returns a table with: year, month, the largest value for the month.
I would like to modify the query so that i could obtain:
year, month, the largest value for the month, the second largest value for the month in each row.
It seems to me that the well-known solutions like "TOP 2", "NOT IN TOP 1" or a subselect won't work here.
(To be really specific - i am using SQL Server 2008.)
It seems to me that the question calls for a query that would return best, and second best in the same row for each month and year, like so:
month, year, best, second best
...
...
and not two rows for the same month and year containing best and second best value.
This is the solution that I came up with, so if anyone has a simpler way of achieving this, I would like to know.
with ranks as (
select
year(entrydate) as [year],
month(entrydate) as [month],
views,
rank() over (partition by year(entrydate), month(entrydate) order by views desc) as [rank]
from product
)
select
t1.year,
t1.month,
t1.views as [best],
t2.views as [second best]
from ranks t1
inner join ranks t2
on t1.year = t2.year
and t1.month = t2.month
and t1.rank = 1
and t2.rank = 2
EDIT: Just out of curiosity I did a bit more testing and ended up with a simpler variation on the Stephanie Page's answer that doesn't use an aditional subquery. And I changed the rank() function to row_number() as it doesn't work when two max values are the same.
with ranks as (
select
year(entrydate) as [year],
month(entrydate) as [month],
views,
row_number() over (partition by year(entrydate), month(entrydate) order by views desc) as [rank]
from product
)
select
t1.year,
t1.month,
max(case when t1.rank = 1 then t1.views else 0 end) as [best],
max(case when t1.rank = 2 then t1.views else 0 end) as [second best]
from
ranks t1
where
t1.rank in (1,2)
group by
t1.year, t1.month
RANK() is maybe the thing you are looking for...
http://msdn.microsoft.com/en-us/library/ms176102.aspx
to do this without joins ( I'll show the Oracle... you'll just use CASE instead of DECODES)
with ranks as (
select
year(entrydate) as [year],
month(entrydate) as [month],
views,
rank() over (partition by year(entrydate), month(entrydate) order by views desc) as [rank]
from product
)
SELECT [year], [month], Max([best]), Max([second best])
FROM
( select
t1.year,
t1.month,
Decode([rank],1,t1.views,0) as [best],
Decode([rank],2,t1.views,0) as [second best]
from ranks t1
where t1.rank <= 2 ) x
GROUP BY [year], [month]
This is a bit old-school but TOP and a subquery will work if you use ORDER BY. Try this:
SELECT TOP 2
DATEPART(year,some_date),
DATEPART(month,some_date),
(SELECT MAX(st1.some_value) FROM some_table AS st1
WHERE DATEPART(month,some_date) = DATEPART(month,st1.some_date)) AS max_value
FROM
some_table
GROUP BY
DATEPART(year,some_date),
DATEPART(month,some_date)
ORDER BY DATEPART(month,some_date) DESC
That will give you the two rows with the "highest" month values and the added subselect should give you the max from each grouping.
You can use a CTE with the ranking functions in SQL Server 2005 and up:
;WITH TopValues AS
(
SELECT
YEAR(some_date) AS 'Year',
MONTH(some_date) AS 'Month',
Some_Value,
ROW_NUMBER() OVER(PARTITION BY YEAR(some_date),MONTH(some_date)
ORDER BY Some_Value DESC) AS 'RowNumber'
FROM
dbo.some_table
)
SELECT
Year, Month, Some_Value
FROM
TopValues
WHERE
RowNumber <= 2
This will "partition" (i.e. group) your data by month/year, order inside each group by Some_Value descending (largest first), and then you can select the first two of each group from that CTE.
RANK() works as well (I most often use ROW_NUMBER) - it produces slightly different results, though - really depends on what your needs are.
Hmmm it's kind of a rig, but you can do this with subqueries... instead of using that max I'd select the some_values which have the matching year & month, row_number()=1 / row_number() = 2 respectively and order by some_value DESC.
The inability to use OFFSET / LIMIT like you can in SQLite is one of my dislikes about SQL Server.