I have a group of records by id ordered by date ('Date'), which I want to sum up the amounts or, in 2 groups, 1 before any number appears ('Condition'), and another group after the first number,It doesn't matter if a 0 appears after a number, add before any number appears, and add after.
You can use conditional aggregation. For example:
select
id,
sum(case when s = 0 then amount else 0 end) as amount_before,
sum(case when s <> 0 then amount else 0 end) as amount_after
from (
select t.*,
sum(abs(condition)) over(partition by id order by date) as s
from t
) x
group by id
Related
I have a table such as:
date
id
value
2020/4/4
1
a
2020/4/4
1
a
2020/4/4
1
b
2020/4/4
2
t
2020/4/4
2
u
2020/5/4
3
u
I want to find out how many IDs have more than one unique value at a particular date.
So this is what I should get for the table from above:
2020/4/4: 1 (=> only ID=1 has more than one unique value (a+b))
2020/4/5: 0
I tried to get it with:
SELECT date, SUM(CASE WHEN COUNT(DISTINCT value)>1 THEN 1 ELSE 0 END)
FROM table
GROUP BY date, id
But it did not work. How do I do it right?
Some databases will let you count "tuples", allowing this...
SELECT
date,
CASE WHEN COUNT(DISTINCT (id, value)) > COUNT(*) THEN 1 ELSE 0 END)
FROM
table
GROUP BY
date
Otherwise your style of approach works, but you need to aggregate twice using sub-queries.
SELECT date, MAX(has_duplicate_values)
FROM
(
SELECT date, id, CASE WHEN COUNT(DISTINCT value) > COUNT(*) THEN 1 ELSE 0 END has_duplicate_values
FROM table
GROUP BY date, id
)
AS date_id_aggregate
GROUP BY date
modify your request as following:
SELECT date, CASE WHEN COUNT(DISTINCT value)>1 THEN 1 ELSE 0 END FROM `table` GROUP BY date
I want to find out how many IDs have more than one unique value at a particular date.
You can aggregate twice:
SELECT date,
SUM(CASE WHEN num > num_distinct_values THEN 1 ELSE 0 END) as num_ids_with_duplicates
FROM (SELECT date, id,
COUNT(DISTINCT value) as num_distinct_values,
COUNT(*) as num
FROM table
GROUP BY date, id
) di
GROUP BY date;
Given this table where we have users, the product that they used, and the first date that they used the product (I have also created a simple rank by user window). Note, each user will only have minimum 0 rows if they used nothing before, and 2 rows, if they used both products. There are only 3 products - cigars and beers.
How can I create a new view where each row is 1 user, the next column shows the first product, the next column shows the 2nd product, and the last column shows the lead-time b/w the first dates of use?
One method is conditional aggregation with row_number():
select user,
max(case when seqnum = 1 then product end) as product_1,
max(case when seqnum = 2 then product end) as product_2,
(max(case when seqnum = 2 then time_used end) -
max(case when seqnum = 1 then time_used end)
) as dif
from (select t.*,
row_number() over (partition by user order by time_used) as seqnum
from t
) t
group by user;
Date/time functions vary significantly across different databases. Not all support a simple -, so you might nee to adjust for your database.
Minus between dates may not work on each database
select
c1.user_id,
c1.first_product_used,
c2.second_product_used,
COALESCE(CAST((Cast(c2.second_date AS DATE) - Cast(c1.first_date AS DATE)) AS VARCHAR(20)), 'n/a') AS "leadtime_days"
from
(
select
user_id,
product_used AS first_product_used,
time_used AS first_date
from
check2
where
rank_of_use = 1
)c1
LEFT OUTER JOIN
(
select
user_id,
product_used AS second_product_used,
time_used AS second_date
from
check2
where
rank_of_use = 2
)c2
ON
c1.user_id = c2.user_id
I have this one query that I am using to pull data from one table that store customer data along with their feedback. However I have an issue where the same customer (cust_id) has more that one entry. How could I modify this to only return the first row (based on timestamp) and ignore all other records..
I am using Amazon redshift.
with q1 as
(select cust_id,
sum(case when response <= 6 then 1 else 0 end) as bad,
sum(case when response between 7 and 8 then 1 else 0 end) as good
from customers
group by cust_id
order by 1 DESC ,last_visit_datetime desc),
q2 as (select cust_id,rating as neg_rating,response as neg_response from customers
where rating is not null
order by neg_rating asc, last_visit_datetime desc )
select DISTINCT q1.cust_id,q1.good,q1.bad,q2.neg_response,q2.neg_rating
from q1 join q2 on q1.cust_id = q2.cust_id
Could anyone assist, thanks..
Use row_number to get one row per cust_id and then do the aggregation.
select cust_id,
sum(case when response <= 6 then 1 else 0 end) as bad,
sum(case when response between 7 and 8 then 1 else 0 end) as good
from (select c.*,row_number() over(partition by cust_id order by last_visit_datetime desc) as rnum
from customers c
) c
where rnum=1
group by cust_id
To find repeated items only when when it satisfies two conditions. In this example count repeats of item type for each customer_id only when it has order size "Big" and its corresponding date is before other instances. This first condition and repeats can be achieved by using this code.
Select Customer_id, Item_Type, COUNT(*)
from table
group by Customer_id, Item_Type
having count(*) > 1 and sum(case when Order_Size = 'Big' then 1 else 0 end) > 0;
how do I include date aspect as well to this?
I would do this as:
select t.customer_id, t.item_type, count(*)
from (select t.*,
min(case when OrderSize = 'Big' then date end) over (partition by customer_id, item_type) as min_big
from t
) t
where date > min_big
group by t.customer_id, t.item_type;
I believe you could use a window function in a subquery to decide which rows to count, then count them in your main query. Something like:
Select
customer_id, item_type, sum(count_pass) as Count
FROM
(
Select Customer_id,
Item_Type,
CASE
WHEN Order_Size = 'Big' THEN 0
WHEN MIN(Order_Size) OVER (PARTITION BY Customer_ID, Item_Type ORDER BY DateField ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) = 'BIG' THEN 1
ELSE 0
END as count_pass
FROM table
) subqry
GROUP BY 1,2
That big case statement breaks down like:
If this record is 'Big' then ignore it
If you order all the records by date for each group of customer_id/item_type and look at all the records that precede this record, and the min(order_size) in that group of records (sorted lexicographically) is 'Big' then you have a preceding date with big and can count this record
Otherwise... you can't count it. Which would just be records with order_size='small' without a preceding 'big'.
I am calling a stored procedure that returns a table with two columns, an ident (integer) and a score (float4). The integer column will be unique values. I now want to know how many rows in the table have a larger/smaller score than the ident with a given value. I'm struggling to figure out how to do that in SQL. If it were something like PHP I'd sort the returned data by score, find the index of the row that has the ident I am looking for, and then subtract that from the total # of rows, for example. In PostgreSQL 9.1.15, I'm not sure how to do that.
SELECT COUNT(*)
FROM my_stored_proc()
WHERE score > *Score of person with given ident*
ORDER BY score;
If you only care about ident = 2, you can do:
select sum(case when t.score < t2.score then 1 else 0 end) as LessThan,
sum(case when t.score > t2.score then 1 else 0 end) as GreaterThan
from table t cross join
(select t.* from table where ident = 2) t2;
If you only want to reference the table once (as you would if accessing it were expensive), you could do the above with a CTE or you could do:
select sum(case when score < score2 then 1 else 0 end) as LessThan,
sum(case when score > score2 then 1 else 0 end) as GreaterThan
from (select t.*,
max(case when ident = 2 then score end) over () as score2
from table t
) t
Use window functions:
SELECT worse, better
FROM (
SELECT
ident,
COUNT(*) OVER (ORDER BY score ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) worse,
COUNT(*) OVER (ORDER BY score ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) better,
FROM my_stored_proc()
) t
WHERE ident = 2; -- replace with the "ident" you care about
This will simply count the number of rows in the result set that are above or below the current row if ordered by score.
Anyway, Gordon's solution might be slightly better, as it takes into account the possibility of an ident being returned more than once from your my_stored_proc(), and taking each ident's maximum score.