How to use SQL (postgresql) query to conditionally change value within each group? - sql

I am pretty new to postgresql (or sql), and have not learned how to deal with such "within group" operation. My data is like this:
p_id number
97313 4
97315 10
97315 10
97325 0
97325 15
97326 4
97335 0
97338 0
97338 1
97338 2
97344 5
97345 14
97349 0
97349 5
p_id is not unique and can be viewed as a grouping variable. I would like to change the number within each p_id to achieve such operation:
if for a given p_id, one of the value is 0, but any of the other "number" for that pid is >2, then set the 0 value as NULL. Like the "p_id" 97325, there are "0" and "15" associated with it. I will replace the 0 by NULL, and keep the other 15 unchanged.
But for p_id 97338, the three rows associated with it have number "0" "1" "2", therefore I do not replace the 0 by NULL.
The final data should be like:
p_id number
97313 4
97315 10
97315 10
97325 NULL
97325 15
97326 4
97335 0
97338 0
97338 1
97338 2
97344 5
97345 14
97349 NULL
97349 5
Thank you very much for the help!

A CASE in a COUNT OVER in a CASE:
SELECT
p_id,
(CASE
WHEN number = 0 AND COUNT(CASE WHEN number > 2 THEN number END) OVER (PARTITION BY p_id) > 0
THEN NULL
ELSE number
END) AS number
FROM yourtable
Test it here on rextester.

Works for PostgreSQL 10:
SELECT p_id, CASE WHEN number = 0 AND maxnum > 2 AND counts >= 2 THEN NULL ELSE number END AS number
FROM
(
SELECT a.p_id AS p_id, a.number AS number, b.maxnum AS maxnum, b.counts AS counts
FROM trans a
LEFT JOIN
(
SELECT p_id, MAX(number) AS maxnum, COUNT(1) AS counts
FROM trans
GROUP BY p_id
) b
ON a.p_id = b.p_id
) a1

use case when
select p_id,
case when p_id>2 and number=0 then null else number end as number
from yourtable
http://sqlfiddle.com/#!17/898c3/1

I would express this as:
SELECT p_id,
(CASE WHEN number <> 0 OR MAX(number) OVER (PARTITION BY p_id) <= 2
THEN number
END) as number
FROM t;

If the fate of a record depends on the existence of other records within (the same or another) table, you could use EXISTS(...) :
UPDATE ztable zt
SET number = NULL
WHERE zt.number = 0
AND EXISTS ( SELECT *
FROM ztable x
WHERE x.p_id = zt.p_id
AND x.number > 2
);

Related

How to compare a number with count result then use it in limit statement in redshift/sql

I have a table with two columns id and flag.
The data is very imbalanced. Only a few flag has value 1 and others are 0.
id flag
1 0
2 0
3 0
4 0
5 1
6 1
7 0
Now I want to create a balanced table. Therefore, I want get a subset from flag = 0 based on the number of records where flag = 1. Also, I don't want the number to be greater than 1000.
I am thinking about a code like this:
select *
from table
where flag = 0
order by random()
limit (least(1000,
select count(*)
from table
where flag = 1));
Expected result(Only two records have flag as 1 so I get two records with flag as 0, if there are more than 1000 records have flag as 1 I will only get 1000.):
id flag
2 0
7 0
If you want a balanced sample:
select t.*
from (select t.*, row_number() over (partition by flag order by flag) as seqnum,
sum(case when flag = 1 then 1 else 0 end) over () as cnt_1
from t
) t
where seqnum <= cnt_1;
You can change this to:
where seqnum <= least(cnt_1, 1000)
If you want an overall maximum.
You can use row_number to simulate LIMIT.
select * from (
select column1, column2, row_number() OVER() AS rownum
from table
where flag = 0 )
where rownum < 1000
If I’ve made a bad assumption please comment and I’ll refocus my answer.

Counting if data exists in a row

Hey guys I have the below sample data which i want to query for.
MemberID AGEQ1 AGEQ2 AGEQ2
-----------------------------------------------------------------
1217 2 null null
58458 3 2 null
58459 null null null
58457 null 5 null
299576 6 5 7
What i need to do is to lookup the table and if any AGEx COLUMN contains any data then it counts the number of times there is data for that row in each column
Results example:
for memberID 1217 the count would be 1
for memberID 58458 the count would be 2
for memberID 58459 the count would be 0 or null
for memberID 58457 the count would be 1
for memberID 299576 the count would be 3
This is how it should look like in SQL if i query the entire table
1 Children - 2
2 Children - 1
3 Children - 1
0 Children - 1
So far i have been doing it using the following query which isnt very efficient and does give incorrect tallies as there are multiple combinations that people can answer the AGE question. Also i have to write multiple queries and change the is null to is not null depending on how many children i am looking to count a person has
select COUNT (*) as '1 Children' from Member
where AGEQ1 is not null
and AGEQ2 is null
and AGEQ3 is null
The above query only gives me an answer of 1 but i want to be able to count the other columns for data as well
Hope this is nice and clear and thank you in advance
If all of the columns are integers, you can take advantage of integer math - dividing the column by itself will yield 1, unless the value is NULL, in which case COALESCE can convert the resulting NULL to 0.
SELECT
MemberID,
COALESCE(AGEQ1 / AGEQ1, 0)
+ COALESCE(AGEQ2 / AGEQ2, 0)
+ COALESCE(AGEQ3 / AGEQ3, 0)
+ COALESCE(AGEQ4 / AGEQ4, 0)
+ COALESCE(AGEQ5 / AGEQ5, 0)
+ COALESCE(AGEQ6 / AGEQ6, 0)
FROM dbo.table_name;
To get the number of people with each count of children, then:
;WITH y(y) AS
(
SELECT TOP (7) rn = ROW_NUMBER() OVER
(ORDER BY [object_id]) - 1 FROM sys.objects
),
x AS
(
SELECT
MemberID,
x = COALESCE(AGEQ1 / AGEQ1, 0)
+ COALESCE(AGEQ2 / AGEQ2, 0)
+ COALESCE(AGEQ3 / AGEQ3, 0)
+ COALESCE(AGEQ4 / AGEQ4, 0)
+ COALESCE(AGEQ5 / AGEQ5, 0)
+ COALESCE(AGEQ6 / AGEQ6, 0)
FROM dbo.table_name
)
SELECT
NumberOfChildren = y.y,
NumberOfPeopleWithThatMany = COUNT(x.x)
FROM y LEFT OUTER JOIN x ON y.y = x.x
GROUP BY y.y ORDER BY y.y;
I'd look at using UNPIVOT. That will make your wide column into rows. Since you don't care about what value was in a column, just the presence/absence of value, this will generate a row per not-null column.
The trick then becomes mashing that into the desired output format. It could probably have been done cleaner but I'm a fan of "showing my work" so that others can conform it to their needs.
SQLFiddle
-- Using the above logic
WITH HadAges AS
(
-- Find everyone and determine number of rows
SELECT
UP.MemberID
, count(1) AS rc
FROM
dbo.Member AS M
UNPIVOT
(
ColumnValue for ColumnName in (AGEQ1, AGEQ2, AGEQ3)
) AS UP
GROUP BY
UP.MemberID
)
, NoAge AS
(
-- Account for those that didn't show up
SELECT M.MemberID
FROM
dbo.Member AS M
EXCEPT
SELECT
H.MemberID
FROM
HadAges AS H
)
, NUMBERS AS
(
-- Allowable range is 1-6
SELECT TOP 6
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS TheCount
FROM
sys.all_columns AS SC
)
, COMBINATION AS
(
-- Link those with rows to their count
SELECT
N.TheCount AS ChildCount
, H.MemberID
FROM
NUMBERS AS N
LEFT OUTER JOIN
HadAges AS H
ON H.rc = N.TheCount
UNION ALL
-- Deal with the unlinked
SELECT
0
, NA.MemberID
FROM
NoAge AS NA
)
SELECT
C.ChildCount
, COUNT(C.MemberID) AS Instances
FROM
COMBINATION AS C
GROUP BY
C.ChildCount;
Try this:
select id, a+b+c+d+e+f
from ( select id,
case when age1 is null then 0 else 1 end a,
case when age2 is null then 0 else 1 end b,
case when age3 is null then 0 else 1 end c,
case when age4 is null then 0 else 1 end d,
case when age5 is null then 0 else 1 end e,
case when age6 is null then 0 else 1 end f
from ages
) as t
See here in fiddle http://sqlfiddle.com/#!3/88020/1
To get the quantity of persons with childs
select childs, count(*) as ct
from (
select id, a+b+c+d+e+f childs
from
(
select
id,
case when age1 is null then 0 else 1 end a,
case when age2 is null then 0 else 1 end b,
case when age3 is null then 0 else 1 end c,
case when age4 is null then 0 else 1 end d,
case when age5 is null then 0 else 1 end e,
case when age6 is null then 0 else 1 end f
from ages ) as t
) ct
group by childs
order by 1
See it here at fiddle http://sqlfiddle.com/#!3/88020/24

Can I get the minimum of 2 columns which is greater than a given value using only one scan of a table

This is my example data (there are no indexes and I do not want to create any):
CREATE TABLE tblTest ( a INT , b INT );
INSERT INTO tblTest ( a, b ) VALUES
( 1 , 2 ),
( 5 , 1 ),
( 1 , 4 ),
( 3 , 2 )
I want the minimum value in of both column a and column b which is greater then a given value. E.g. if the given value is 3 then I want 4 to be returned.
This is my current solution:
SELECT MIN (subMin) FROM
(
SELECT MIN (a) as subMin FROM tblTest
WHERE a > 3 -- Returns 5
UNION
SELECT MIN (b) as subMin FROM tblTest
WHERE b > 3 -- Returns 4
)
This searches the table twice - once to get min(a) once to get min(b).
I believe it should be faster to do this with just one pass. Is this possible?
You want to use conditional aggregatino for this:
select min(case when a > 3 then a end) as minA,
min(case when b > 3 then b end) as minB
from tblTest;
To get the minimum of both values, you can use a SQLite extension, which handles multiple values for min():
select min(min(case when a > 3 then a end),
min(case when b > 3 then b end)
)
from tblTest
The only issue is that the min will return NULL if either argument is NULL. You can fix this by doing:
select coalesce(min(min(case when a > 3 then a end),
min(case when b > 3 then b end)
),
min(case when a > 3 then a end),
min(case when b > 3 then b end)
)
from tblTest
This version will return the minimum value, subject to your conditions. If one of the conditions has no rows, it will still return the minimum of the other value.
From the top of my head, you could modify the table and add a min value column to store the minimum value of the two columns. then query that column.
Or you can do this:
select min(val)
from
(
select min(col1, col2) as val
from table1
)
where
val > 3
The outer SELECT, queries the memory, not the table itself.
Check SQL Fiddle

SQL (TSQL) - Select values in a column where another column is not null?

I will keep this simple- I would like to know if there is a good way to select all the values in a column when it never has a null in another column. For example.
A B
----- -----
1 7
2 7
NULL 7
4 9
1 9
2 9
From the above set I would just want 9 from B and not 7 because 7 has a NULL in A. Obviously I could wrap this as a subquery and USE the IN clause etc. but this is already part of a pretty unique set and am looking to keep this efficient.
I should note that for my purposes this would only be a one-way comparison... I would only be returning values in B and examining A.
I imagine there is an easy way to do this that I am missing, but being in the thick of things I don't see it right now.
You can do something like this:
select *
from t
where t.b not in (select b from t where a is null);
If you want only distinct b values, then you can do:
select b
from t
group by b
having sum(case when a is null then 1 else 0 end) = 0;
And, finally, you could use window functions:
select a, b
from (select t.*,
sum(case when a is null then 1 else 0 end) over (partition by b) as NullCnt
from t
) t
where NullCnt = 0;
The query below will only output one column in the final result. The records are grouped by column B and test if the record is null or not. When the record is null, the value for the group will increment each time by 1. The HAVING clause filters only the group which has a value of 0.
SELECT B
FROM TableName
GROUP BY B
HAVING SUM(CASE WHEN A IS NULL THEN 1 ELSE 0 END) = 0
If you want to get all the rows from the records, you can use join.
SELECT a.*
FROM TableName a
INNER JOIN
(
SELECT B
FROM TableName
GROUP BY B
HAVING SUM(CASE WHEN A IS NULL THEN 1 ELSE 0 END) = 0
) b ON a.b = b.b

Sqlite 3: select and count together with group by and without group by

Ahead of this question
Sqlite 3 Insert and Replace fails on more than 1 unique column
I have a table with schema,
CREATE TABLE tbl_poll (
id INTEGER PRIMARY KEY AUTOINCREMENT,
poll_id STRING NOT NULL,
ip_address STRING NOT NULL,
opt STRING NULL,
CONSTRAINT 'unique_vote_per_poll_per_ip_address' UNIQUE ( poll_id, ip_address ) ON CONFLICT REPLACE
);
When I do,
select opt,count(opt) as count from tbl_poll where poll_id = 'jsfw' group by opt
Result is
opt count
0 4
2 2
3 2
i.e. 4 users have selected 0 option, 2 and 3 option is selected by 2 and 2 users respectively.
Is there any way so I can get a result like following
opt count percent
0 4 0.5
2 2 0.25
3 2 0.25
where percent = count / total count
If I can get total count i.e. (4+2+2 = 8 ) that will solve my problem too.
I have tried this,
select opt,count(opt) as count from tbl_poll where poll_id = 'jsfw'
but it doesn't work as no of columns are not same.
SELECT opt
, COUNT(*) AS count
, ROUND(CAST(COUNT(*) AS REAL)/total, 2) AS percent
FROM tbl_poll
CROSS JOIN
( SELECT COUNT(*) AS total
FROM tbl_poll
WHERE poll_id = 'jsfw'
) AS t
WHERE poll_id = 'jsfw'
GROUP BY opt ;
If you know all of the possible values from opt, you can use CASE WHEN statement:
SELECT COUNT(opt) as total, SUM(CASE WHEN opt = '0' OR opt IS NULL OR TRIM(opt) = '' THEN 1 ELSE 0 END) as total0, SUM(CASE WHEN opt = '1' THEN 1 ELSE 0 END) as total1, ... FROM tbl_poll WHERE poll_id = 'jsfw'
This will give you:
total total0 total1 ...
8 4 0 ...
Let me know if this isn't a closed set of options.
The sintax follows:
CASE WHEN condition THEN result_for_true ELSE result_for_false END
CASE WHEN condition1 THEN result_for_1 WHEN condition2 THEN result_for_2 ELSE result_for_false_on_all END