counting groups which contain an element with a certain attribute

counting groups which contain an element with a certain attribute - sql

I have a table containing 3 values
CREATE TABLE x (
start integer NOT NULL,
end integer NOT NULL,
random integer NOT NULL);
I want to count the amount of groups (GROUP BY start,end) that contain at least one element with random > 42 but are bigger than one element. But as far as I know HAVING can only be used with aggregate functions.
My current attempt:
SELECT count(*) FROM (
SELECT count(*) FROM routes
GROUP BY start,end
HAVING random > 42
AND count(*) > 1);
results in
no such column: random
What would be the most efficient way to solve this problem?

SELECT count(*)
FROM (SELECT SUM(case when random > 42 then 1 else 0 end) as cnt
FROM routes
GROUP BY start,end
HAVING count(*) > 1) as t
WHERE cnt > 1

Include the condition in your aggregate function itself like below
SELECT count(case when random > 42 then 1 else 0 end) as computed_column
FROM routes
GROUP BY start, end;
Your query is bit weird, it can be re-written like
SELECT COUNT(*) FROM
(
SELECT count(*) as Count_Route
FROM routes
WHERE random > 42
GROUP BY start, end
HAVING count(*) > 1
) XXX;

Related

Redshift SQL statement that will return 1 or 0 if the select statement returns any rows

I have the following select statement in Redshift that will return rows with certain values if the condition inside is met. I want to transform this into a DQ check which will return 1 (True) if no rows ae returned or 0 if any row is returned, but I do not know where I should apply the case statement.
Here is the select statement:
select * from (select brand,calendar_dt, product,
count(account) count from revenue_base
where player_days = 0 and volume_loc >0 group by brand,calendar_dt, product)
where count > 1000 and calendar_dt >='2020-07-12'
and calendar_dt < '2020-07-13'
Can you please offer me some ideas for this?

You may try using exists logic here:
select
case when not exists (
select 1 from
(
select brand, calendar_dt, product, count(account) as count
from revenue_base
where player_days = 0 and volume_loc > 0
group by brand, calendar_dt, product
) t
where calendar_dt >= '2020-07-12' and calendar_dt < '2020-07-13' and
count > 1000
)
then 1 else 0 end as result;

First, Redshift supports booleans, so case is not needed. Second, do the filtering on the date before the aggregation. This is usually faster.
Then, you can filter by the count using a having clause, so no subquery is needed:
select not exists (select 1
from revenue_base
where player_days = 0 and volume_loc > 0 and
calendar_dt >= '2020-07-12' and calendar_dt < '2020-07-13'
group by brand, calendar_dt, product
having count(*) > 1000
) as result

How to use SQL (postgresql) query to conditionally change value within each group?

I am pretty new to postgresql (or sql), and have not learned how to deal with such "within group" operation. My data is like this:
p_id number
97313 4
97315 10
97315 10
97325 0
97325 15
97326 4
97335 0
97338 0
97338 1
97338 2
97344 5
97345 14
97349 0
97349 5
p_id is not unique and can be viewed as a grouping variable. I would like to change the number within each p_id to achieve such operation:
if for a given p_id, one of the value is 0, but any of the other "number" for that pid is >2, then set the 0 value as NULL. Like the "p_id" 97325, there are "0" and "15" associated with it. I will replace the 0 by NULL, and keep the other 15 unchanged.
But for p_id 97338, the three rows associated with it have number "0" "1" "2", therefore I do not replace the 0 by NULL.
The final data should be like:
p_id number
97313 4
97315 10
97315 10
97325 NULL
97325 15
97326 4
97335 0
97338 0
97338 1
97338 2
97344 5
97345 14
97349 NULL
97349 5
Thank you very much for the help!

A CASE in a COUNT OVER in a CASE:
SELECT
p_id,
(CASE
WHEN number = 0 AND COUNT(CASE WHEN number > 2 THEN number END) OVER (PARTITION BY p_id) > 0
THEN NULL
ELSE number
END) AS number
FROM yourtable
Test it here on rextester.

Works for PostgreSQL 10:
SELECT p_id, CASE WHEN number = 0 AND maxnum > 2 AND counts >= 2 THEN NULL ELSE number END AS number
FROM
(
SELECT a.p_id AS p_id, a.number AS number, b.maxnum AS maxnum, b.counts AS counts
FROM trans a
LEFT JOIN
(
SELECT p_id, MAX(number) AS maxnum, COUNT(1) AS counts
FROM trans
GROUP BY p_id
) b
ON a.p_id = b.p_id
) a1

use case when
select p_id,
case when p_id>2 and number=0 then null else number end as number
from yourtable
http://sqlfiddle.com/#!17/898c3/1

I would express this as:
SELECT p_id,
(CASE WHEN number <> 0 OR MAX(number) OVER (PARTITION BY p_id) <= 2
THEN number
END) as number
FROM t;

If the fate of a record depends on the existence of other records within (the same or another) table, you could use EXISTS(...) :
UPDATE ztable zt
SET number = NULL
WHERE zt.number = 0
AND EXISTS ( SELECT *
FROM ztable x
WHERE x.p_id = zt.p_id
AND x.number > 2
);

How to compare a number with count result then use it in limit statement in redshift/sql

I have a table with two columns id and flag.
The data is very imbalanced. Only a few flag has value 1 and others are 0.
id flag
1 0
2 0
3 0
4 0
5 1
6 1
7 0
Now I want to create a balanced table. Therefore, I want get a subset from flag = 0 based on the number of records where flag = 1. Also, I don't want the number to be greater than 1000.
I am thinking about a code like this:
select *
from table
where flag = 0
order by random()
limit (least(1000,
select count(*)
from table
where flag = 1));
Expected result(Only two records have flag as 1 so I get two records with flag as 0, if there are more than 1000 records have flag as 1 I will only get 1000.):
id flag
2 0
7 0

If you want a balanced sample:
select t.*
from (select t.*, row_number() over (partition by flag order by flag) as seqnum,
sum(case when flag = 1 then 1 else 0 end) over () as cnt_1
from t
) t
where seqnum <= cnt_1;
You can change this to:
where seqnum <= least(cnt_1, 1000)
If you want an overall maximum.

You can use row_number to simulate LIMIT.
select * from (
select column1, column2, row_number() OVER() AS rownum
from table
where flag = 0 )
where rownum < 1000
If I’ve made a bad assumption please comment and I’ll refocus my answer.

Check whether an employee is present on three consecutive days

I have a table called tbl_A with the following schema:
After insert, I have the following data in tbl_A:
Now the question is how to write a query for the following scenario:
Put (1) in front of any employee who was present three days consecutively
Put (0) in front of employee who was not present three days consecutively
The output screen shoot:
I think we should use case statement, but I am not able to check three consecutive days from date. I hope I am helped in this
Thank you

select name, case when max(cons_days) >= 3 then 1 else 0 end as presence
from (
select name, count(*) as cons_days
from tbl_A, (values (0),(1),(2)) as a(dd)
group by name, adate + dd
)x
group by name

With a self-join on name and available = 'Y', we create an inner table with different combinations of dates for a given name and take a count of those entries in which the dates of the two instances of the table are less than 2 units apart i.e. for each value of a date adate, it will check for entries with its own value adate as well as adate + 1 and adate + 2. If all 3 entries are present, the count will be 3 and you will have a flag with value 1 for such names(this is done in the outer query). Try the below query:
SELECT Z.NAME,
CASE WHEN Z.CONSEQ_AVAIL >= 3 THEN 1 ELSE 0 END AS YOUR_FLAG
FROM
(
SELECT A.NAME,
SUM(CASE WHEN B.ADATE >= A.ADATE AND B.ADATE <= A.ADATE + 2 THEN 1 ELSE 0 END) AS CONSEQ_AVAIL
FROM
TABL_A A INNER JOIN TABL_A B
ON A.NAME = B.NAME AND A.AVAILABLE = 'Y' AND B.AVAILABLE = 'Y'
GROUP BY A.NAME
) Z;
Due to the complexity of the problem, I have not been able to test it out. If something is really wrong, please let me know and I will be happy to take down my answer.

--Below is My Approch
select Name,
Case WHen Max_Count>=3 Then 1 else 0 end as Presence
from
(
Select Name,MAx(Coun) as Max_Count
from
(
select Name, (count(*) over (partition by Name,Ref_Date)) as Coun from
(
select Name,adate + row_number() over (partition by Name order by Adate desc) as Ref_Date
from temp
where available='Y'
)
) group by Name
);

select name as employee , case when sum(diff) > =3 then 1 else 0 end as presence
from
(select id, name, Available,Adate, lead(Adate,1) over(order by name) as lead,
case when datediff(day, Adate,lead(Adate,1) over(order by name)) = 1 then 1 else 0 end as diff
from table_A
where Available = 'Y') A
group by name;

force a ceiling to count(*) in sql query

I am using a subquery to return a count as an integer value to my main query. This query is used to rebind an ASP.NET DataGrid and I have only two characters width available for this column. I want to restrict the width to two characters. So, I want to set a value of 99 when the count exceeds 99. I can't figure a way to do this? I can't see how to apply a case statement here.
SELECT
MEMB_ID,
MEMB_Name,
SELECT COUNT(*)
FROM SessionOrder
WHERE SessionOrder.SORD_MEMB_ID = m.MEMB_ID
And SessionOrder.SORD_NumberCompleteDownloads <> 0
As MEMB_Downloads,
MEMB_JoinDate
FROM Member
How can this be done?

Replace
COUNT(*)
With
CASE WHEN COUNT(*) > 99 THEN 99 ELSE COUNT(*) END AS YourColumnName

The CASE expression can look like this:
CASE WHEN COUNT(*) > 99 THEN 99 ELSE COUNT(*) END
There appear to be a couple of errors with your existing query (for example m is not defined). With these errors corrected and the above change made the resulting query could look like this:
SELECT
MEMB_ID,
MEMB_Name,
(
SELECT CASE WHEN COUNT(*) > 99 THEN 99 ELSE COUNT(*) END
FROM SessionOrder
WHERE SessionOrder.SORD_MEMB_ID = MEMB_ID
AND SessionOrder.SORD_NumberCompleteDownloads <> 0
) AS MEMB_Downloads,
MEMB_JoinDate
FROM Member

This might be a bit more efficient. As it can stop scanning rows once the 99th is reached.
SELECT MEMB_ID ,
MEMB_Name,
( SELECT COUNT(*)
FROM (
SELECT TOP 99 *
FROM SessionOrder
WHERE SessionOrder.SORD_MEMB_ID = MEMB_ID
AND SessionOrder.SORD_NumberCompleteDownloads <> 0
)
Top99
) AS MEMB_Downloads,
MEMB_JoinDate
FROM Member

Rather than change the COUNT(*) result, better count at most 99:
SELECT
MEMB_ID,
MEMB_Name,
(SELECT COUNT(*)
FROM (
SELECT TOP(99) *
FROM SessionOrder
WHERE SessionOrder.SORD_MEMB_ID = m.MEMB_ID
And SessionOrder.SORD_NumberCompleteDownloads <> 0)
as TOP99_Downloads)
As MEMB_Downloads,
MEMB_JoinDate
FROM Member;
This way you avoid counting all the downloads when you'll only display 99 anyway. Of course, one would ask what is the point of displaying a value if is incorrect to start with and why not make your UI layer capable of displaying 'more than 99'.

CASE it should be ...

or double UNION as
SELECT
MEMB_ID,
MEMB_Name,
SELECT COUNT(*) AS WC
FROM SessionOrder
WHERE SessionOrder.SORD_MEMB_ID = m.MEMB_ID
And SessionOrder.SORD_NumberCompleteDownloads <> 0
And WC =< 99
As MEMB_Downloads,
MEMB_JoinDate
FROM Member
UNION
SELECT
MEMB_ID,
MEMB_Name,
99 AS WC
FROM SessionOrder
WHERE SessionOrder.SORD_MEMB_ID = m.MEMB_ID
And SessionOrder.SORD_NumberCompleteDownloads <> 0
And WC > 99
As MEMB_Downloads,
MEMB_JoinDate
FROM Member

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

counting groups which contain an element with a certain attribute - sql

SELECT count() FROM (SELECT SUM(case when random > 42 then 1 else 0 end) as cnt FROM routes GROUP BY start,end HAVING count() > 1) as t WHERE cnt > 1

Related

Redshift SQL statement that will return 1 or 0 if the select statement returns any rows

How to use SQL (postgresql) query to conditionally change value within each group?

How to compare a number with count result then use it in limit statement in redshift/sql

Check whether an employee is present on three consecutive days

force a ceiling to count(*) in sql query

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

counting groups which contain an element with a certain attribute - sql

SELECT count(*) FROM (SELECT SUM(case when random > 42 then 1 else 0 end) as cnt FROM routes GROUP BY start,end HAVING count(*) > 1) as t WHERE cnt > 1

Related

Redshift SQL statement that will return 1 or 0 if the select statement returns any rows

How to use SQL (postgresql) query to conditionally change value within each group?

How to compare a number with count result then use it in limit statement in redshift/sql

Check whether an employee is present on three consecutive days

force a ceiling to count(*) in sql query

Categories

Resources

SELECT count() FROM (SELECT SUM(case when random > 42 then 1 else 0 end) as cnt FROM routes GROUP BY start,end HAVING count() > 1) as t WHERE cnt > 1