How to compare a number with count result then use it in limit statement in redshift/sql - sql

I have a table with two columns id and flag.
The data is very imbalanced. Only a few flag has value 1 and others are 0.
id flag
1 0
2 0
3 0
4 0
5 1
6 1
7 0
Now I want to create a balanced table. Therefore, I want get a subset from flag = 0 based on the number of records where flag = 1. Also, I don't want the number to be greater than 1000.
I am thinking about a code like this:
select *
from table
where flag = 0
order by random()
limit (least(1000,
select count(*)
from table
where flag = 1));
Expected result(Only two records have flag as 1 so I get two records with flag as 0, if there are more than 1000 records have flag as 1 I will only get 1000.):
id flag
2 0
7 0

If you want a balanced sample:
select t.*
from (select t.*, row_number() over (partition by flag order by flag) as seqnum,
sum(case when flag = 1 then 1 else 0 end) over () as cnt_1
from t
) t
where seqnum <= cnt_1;
You can change this to:
where seqnum <= least(cnt_1, 1000)
If you want an overall maximum.

You can use row_number to simulate LIMIT.
select * from (
select column1, column2, row_number() OVER() AS rownum
from table
where flag = 0 )
where rownum < 1000
If I’ve made a bad assumption please comment and I’ll refocus my answer.

Related

Find items in table with 2 specific sizes

I have items table where the item code repeats as it has different sizes, variants.
I want to find items which has 2 specific sizes, ie size in both M/Y and Euro.
Items table:
Id size
1 0
1 2Y
1 EU-15
2 2M
2 4M
3 0
3 2M-4M
3 EU-12
4 EU-11
4 EU-15
Required, I want to query for item id 1 and 3.
I was trying with SUM(), CASE but not able to figure it as it involves LIKE operator. (Size like '[^EU]%' and Size like 'EU%')
#Update:
With little hint, I could do it with 2 queries using temp table. Would be nice to see it in single query.
1st Query.
select id,
case when size like '[^EU]%' then 'S'
when size like 'EU%' then 'EU' END as size
into #t from table
2nd Query.
select id, size from table
where id in
( select id from #t
group by id
having count(distinct(size))>1)
order by id, size
Thanks.
I think you wanted Id with both EU% and non EU%
select t.Id
from tbl t
group by t.Id
having count(distinct case when size like 'EU%' then 1 else 2 end) = 2
You can use the analytical function as follows:
select * from
(select t.*,
count(case when Size like '%M' OR Size like '%Y' then 1 end)
over (partition by id) cnt1,
count(case when Size like 'EU%' then 1 end)
over (partition by id) cnt2
from your_Table t) t
where cnt1 > 0 AND cnt2 > 0

SQL: Update every entry with value from another entry that share same column value

I have the following table trn_ReceiptLog
I am wondering if it's possible to update amount of entry #1 to have same as entry #2 IF amount of entry #1 is 0?
I have over 5000 of these entries that need to be updated, basically something like:
UPDATE trn_ReceiptLog SET amount = (SELECT amount FROM trn_ReceiptLog WHERE receipt_type = 0) WHERE amount = 0
But I am not sure how to do it for all entries individually, do I need some sort of loop?
Condition 1: Receipt type will always be 0 of the one where amount needs to be taken from.
Condition 2: person_id will always be identical across two of these.
Condition 3 (Optional): Only perform this update IF there is only one receipt_type = 9 (Sometimes there might be 3 or 4 entries with same person_id and being receipt_type 9
You can use window functions to calculate the information needed for the conditions. Then the logic is simple:
with toupdate as (
select t.*,
max(case when receipt_type = 9 then amount else 0 end) over (partition by person_id) as amount_9,
sum(case when receipt_type = 9 then 1 else 0 end) over (partition by person_id) as num_9s
from t
)
update toupdate
set amount = amount_9
where receipt_type = 0;
With a self join:
update t
set t.amount = tt.amount
from trn_ReceiptLog t inner join trn_ReceiptLog tt
on tt.person_id = t.person_id
where t.receipt_type = 9 and tt.receipt_type = 0 and t.amount = 0
and not exists (
select 1 from trn_ReceiptLog
where entry_id <> t.entry_id and person_id = t.person_id and receipt_type = 9
)
The last part of the WHERE clause with AND NOT EXISTS... is the 3d optional condition.
See a simplified demo.

How to use SQL (postgresql) query to conditionally change value within each group?

I am pretty new to postgresql (or sql), and have not learned how to deal with such "within group" operation. My data is like this:
p_id number
97313 4
97315 10
97315 10
97325 0
97325 15
97326 4
97335 0
97338 0
97338 1
97338 2
97344 5
97345 14
97349 0
97349 5
p_id is not unique and can be viewed as a grouping variable. I would like to change the number within each p_id to achieve such operation:
if for a given p_id, one of the value is 0, but any of the other "number" for that pid is >2, then set the 0 value as NULL. Like the "p_id" 97325, there are "0" and "15" associated with it. I will replace the 0 by NULL, and keep the other 15 unchanged.
But for p_id 97338, the three rows associated with it have number "0" "1" "2", therefore I do not replace the 0 by NULL.
The final data should be like:
p_id number
97313 4
97315 10
97315 10
97325 NULL
97325 15
97326 4
97335 0
97338 0
97338 1
97338 2
97344 5
97345 14
97349 NULL
97349 5
Thank you very much for the help!
A CASE in a COUNT OVER in a CASE:
SELECT
p_id,
(CASE
WHEN number = 0 AND COUNT(CASE WHEN number > 2 THEN number END) OVER (PARTITION BY p_id) > 0
THEN NULL
ELSE number
END) AS number
FROM yourtable
Test it here on rextester.
Works for PostgreSQL 10:
SELECT p_id, CASE WHEN number = 0 AND maxnum > 2 AND counts >= 2 THEN NULL ELSE number END AS number
FROM
(
SELECT a.p_id AS p_id, a.number AS number, b.maxnum AS maxnum, b.counts AS counts
FROM trans a
LEFT JOIN
(
SELECT p_id, MAX(number) AS maxnum, COUNT(1) AS counts
FROM trans
GROUP BY p_id
) b
ON a.p_id = b.p_id
) a1
use case when
select p_id,
case when p_id>2 and number=0 then null else number end as number
from yourtable
http://sqlfiddle.com/#!17/898c3/1
I would express this as:
SELECT p_id,
(CASE WHEN number <> 0 OR MAX(number) OVER (PARTITION BY p_id) <= 2
THEN number
END) as number
FROM t;
If the fate of a record depends on the existence of other records within (the same or another) table, you could use EXISTS(...) :
UPDATE ztable zt
SET number = NULL
WHERE zt.number = 0
AND EXISTS ( SELECT *
FROM ztable x
WHERE x.p_id = zt.p_id
AND x.number > 2
);

Query to find ranges of consecutive rows

I have file that contains a dump of a SQL table with 2 columns: int ID (auto increment identity field) and bit Flag. The flag = 0 means a record is good and the flag = 1 means a record is bad (contains an error). The goal is to find all blocks of consecutive bad records (with flag value of 1) with 1,000 or more rows. The solution shouldn't use cursors or while loops and it should use the set-based queries only (selects, joins etc).
We would like to see the actual queries used and the results in the following format:
StartID – EndID NumberOfErrorsInTheBlock
StartID – EndID NumberOfErrorsInTheBlock
……………………….
StartID – EndID NumberOfErrorsInTheBlock
For example if our data were only 30 records and we were looking for blocks with 5 or more records then the results would look as follows (see the screenshot below, the errors blocks that met the criteria are highlighted) :
[ID Range].....[Number of errors in the block]
11-15..... 5
19-25..... 7
sql file containing sample rows, dropbox
T-SQL Solution for SQL Server 2012 and Above
IF OBJECT_ID('tempdb..#tbl_ranges') IS NOT NULL
DROP TABLE #tbl_ranges;
CREATE TABLE #tbl_ranges
(
row_num INT PRIMARY KEY,
ID INT,
Flag BIT,
Label TINYINT
);
WITH cte_yourTable
AS
(
SELECT Id,
Flag,
CASE
--label min
WHEN Flag != LAG(flag,1) OVER (ORDER BY ID) THEN 1
--inner
WHEN Flag = LAG(flag,1) OVER (ORDER BY ID) AND Flag = LEAD(flag,1) OVER (ORDER BY ID) THEN 2
--end
WHEN Flag = LAG(flag,1) OVER (ORDER BY ID) AND Flag != LEAD(flag,1) OVER (ORDER BY ID) THEN 3
END label
FROM yourTable
)
INSERT INTO #tbl_ranges
SELECT ROW_NUMBER() OVER (ORDER BY ID) row_num,
ID,
Flag,
label
FROM cte_yourTable
WHERE label != 2;
SELECT A.ID ID_start,
B.ID ID_end,
B.ID - A.ID range_cnt
FROM #tbl_ranges A
INNER JOIN #tbl_ranges B
ON A.row_num = B.row_num - 1
AND A.Flag = B.Flag;
IF OBJECT_ID('tempdb..#tbl_ranges') IS NOT NULL
DROP TABLE #tbl_ranges;
Abbreviated Results:
ID_start ID_end range_cnt
----------- ----------- -----------
2 3 1
5 8 3
9 10 1
11 35 24
36 356 320
357 358 1
359 406 47
...
With out using Temp Table, This is the best solution, Here is the Answer and It is perfect example for CTE with in CTE ( Nested CTE )
With Evaluation (ID,Flag,Evaluate)
as
(select ID,Flag,Evaluate = ID-row_number() over (order by Flag,ID)
from [dbo].[SqltestRecordsNew]
where Flag = 1
),
Evaluation_Final (StartingRecordID,EndRecordID,Flag,cnt)
as
(
select min(ID) as StartingRecordID,max(ID) as EndRecordID,
Flag, cnt = count(*)
from Evaluation
group by Evaluate, Flag
)
select Concat(StartingRecordID,' - ', EndRecordID) as 'StartingRecordID - EndRecordId',
cnt as GroupItemCnt from Evaluation_Final
where cnt > 999
order by Concat(StartingRecordID,' - ', EndRecordID)
-- Test results Case 1
Select ID,Flag,
Case when Flag=1 then 'Success'
else 'Defect Data'
End as TestResults
from SqltestRecordsNew
where ID between 1494363 and 1495559
-- Test results Case 2
Select ID,Flag,
Case when Flag=1 then 'Success'
else 'Defect Data'
End as TestResults from SqltestRecordsNew
where ID between 1498409 and 1503899
-- Test results Case 3
Select ID,Flag,
Case when Flag=1 then 'Success'
else 'Defect Data'
End as TestResults from SqltestRecordsNew
where ID between 1548257 and 1550489

Get the distinct count of values from a table with multiple where clauses

My table structure is this
id last_mod_dt nr is_u is_rog is_ror is_unv
1 x uuid1 1 1 1 0
2 y uuid1 1 0 1 1
3 z uuid2 1 1 1 1
I want the count of rows with:
is_ror=1 or is_rog =1
is_u=1
is_unv=1
All in a single query. Is it possible?
The problem I am facing is that there can be same values for nr as is the case in the table above.
Case statments provide mondo flexibility...
SELECT
sum(case
when is_ror = 1 or is_rog = 1 then 1
else 0
end) FirstCount
,sum(case
when is_u = 1 then 1
else 0
end) SecondCount
,sum(case
when is_unv = 1 then 1
else 0
end) ThirdCount
from MyTable
you can use union to get multiple results e.g.
select count(*) from table with is_ror=1 or is_rog =1
union
select count(*) from table with is_u=1
union
select count(*) from table with is_unv=1
Then the result set will contain three rows each with one of the counts.
Sounds pretty simple if "all in a single query" does not disqualify subselects;
SELECT
(SELECT COUNT(DISTINCT nr) FROM table1 WHERE is_ror=1 OR is_rog=1) cnt_ror_reg,
(SELECT COUNT(DISTINCT nr) FROM table1 WHERE is_u=1) cnt_u,
(SELECT COUNT(DISTINCT nr) FROM table1 WHERE is_unv=1) cnt_unv;
how about something like
SELECT
SUM(IF(is_u > 0 AND is_rog > 0, 1, 0)) AS count_something,
...
from table
group by nr
I think it will do the trick
I am of course not sure what you want exactly, but I believe you can use the logic to produce your desired result.