Querying a subset - sql

I want to write an SQL query to find records which contain a particular column and from that subset want to find records which doesn't contain a some other value. How do you write a query for that?
cid id2 attribute
--------------------------------
1 100 delete
1 100 payment
1 100 void
2 100 delete
2 102 payment
2 102 void
3 102 delete
3 103 payment
In above example, I want to list cid for which payment and delete attributes exist but void attribute doesn't exist. So it should list out 3 from above example because it doesn't have void attribute.
Forgot to mention that there could be more attributes. However, I need to list out records for which delete and payment exist regardless of other attributes but void doesn’t.

I call this a "set-within-sets" query, because you are looking for particular sets of attributes within each cid.
I would express this with group by and conditions in the having:
select cid
from t
group by cid
having sum(case when attribute = 'payment' then 1 else 0 end) > 0 and
sum(case when attribute = 'delete' then 1 else 0 end) > 0 and
sum(case when attribute = 'void' then 1 else 0 end) = 0 ;
In some databases, you can simplify this with string aggregation -- assuming there are no duplicate attributes for cids. For instance, using the MySQL function:
select cid
from t
where attribute in ('payment', 'delete' 'void')
group by cid
having group_concat(attribute order by attribute) = 'delete,payment';

You can use conditional aggregation:
select cid
from tablename
where attribute in ('delete', 'payment', 'void')
group by cid
having
count(distinct attribute) = 2
and
sum(
case attribute
when 'void' then 1
else 0
end
) = 0
If there are not more attributes than these 3, then you can omit the WHERE clause.
See the demo.
Results:
| cid |
| --- |
| 3 |

I'm assuming that there are only three attributes, so the logic behind this query is:
First COUNT the number of attributes GROUP BY cid, and then LEFT JOIN the original table ON attribute is void. You should grab cid that has exactly 2 attributes and no void.
The original table is named as temp:
SELECT
subq2.result_cid
FROM (
SELECT
*
FROM (
SELECT
T.cid AS result_cid,
COUNT(T.attribute) AS count
FROM
temp AS T
GROUP BY
T.cid
) AS subq
LEFT OUTER JOIN temp AS T2 ON subq.result_cid = T2.cid AND T2.attribute = 'void'
) AS subq2
WHERE subq2.count = 2 AND subq2.id2 IS NULL

use corelated subquery by using not exists
select t1.* from tablename t1
where not exists( select 1 from tablename t2
where t1.cid=t2.cid and attribute='void'
)
and exists ( select 1 from tablename t2
where t1.cid=t2.cid
having count(distinct attribute)=2
)
and attribute in ('payment','delete')
demo online

Related

Get a particular record based on a condition in SQL

My requirement is to get id for missing status from SQL table. I will get a list of status for each id, say A,B,C,D. In a scenario, I have to check status B exists or not. Table gets updated everyday and each time new Id will be created
Conditions,
If status A exists and other statuses such as C and D does not
exists, then don't need to get id.
If status A and B exists and other statuses such as C or D does not exists, then don't need to get id .
If status A exists and B not exists, other
statuses such as C or D exists, then I should get the id of that
record
If status A and B exists, other
statuses such as C or D exists (all status exists), then I don't need to get the id of that
record
Table1:
Id StatusCode
1 A
1 C
2 A
2 B
2 C
3 A
3 C
3 D
How do I get Id 1 and 3 using SQL query?, Seems simple but as I am new to SQL I could not able to get it in SQL.
select statement in this screenshot works fine when there is only one id, it fails on multiple id. I tried many other way, but no use
Try this
SELECT DISTINCT ID
FROM T1
WHERE Statuscode = 'A' AND ID NOT IN (SELECT ID FROM T1 WHERE Statuscode = 'B' )
AND (ID IN (SELECT ID FROM T1 WHERE Statuscode = 'C' ) OR ID IN (SELECT ID FROM T1 WHERE Statuscode = 'D' ))
FIDDLE DEMO
Also, To correct Gordon Linoff's answer, we need to add one more where criteria there
SELECT Id
FROM T1
GROUP BY Id
HAVING SUM(CASE WHEN Statuscode = 'A' THEN 1 ELSE 0 END) > 0 AND
SUM(CASE WHEN Statuscode = 'B' THEN 1 ELSE 0 END) = 0 AND
SUM(CASE WHEN Statuscode IN ('C', 'D') THEN 1 ELSE 0 END) > 0;
FIDDLE DEMO
This answers the original version of the question.
I think you can use aggregation:
select id
from t
group by id
having sum(case when status = 'A' then 1 else 0 end) > 0 and
sum(case when status in ('C', 'D') then 1 else 0 end) > 0;
SELECT id
FROM t
GROUP BY
Id
HAVING MAX(status) = CHAR(64 + COUNT(*))
--char(64+1) = A, char(64+2) = B etc
The logic behind this is that it will take all count the same types of id. So if you have 3 rows you will need abc. If you have an id with 4 rows you will have ABCD. Generally the max status should always be the same as the number of rows.
This is true of course if you have no duplicate between id and status code.
select distinct id from t where t.statuscode = 'C' or t.statuscode = 'D' group by t.id

Selecting a group with or without certain conditions across many rows in SQL

I have data like this:
ID SomeVar
123 0
123 1
123 2
234 1
234 2
234 3
456 3
567 0
567 1
I'm trying to group by my ID to to return all of the IDs that do not have a record with the value 0. That is, my selection would look like this:
ID
234
456
Is there an easy way to do this without creating a subset table with all records not containing 0 then joining it back to the full data set where the tables don't match?
I generally try to avoid subqueries, but you could use one for this case. Do the same group by, and check that the id isn't in a subquery of ids that have 0 for SomeVar. In this case, distinct will do the same and more efficiently, so I'll do that first:
SELECT DISTINCT ID
FROM [table_name]
WHERE ID NOT IN (
SELECT ID FROM [table_name] WHERE SomeVar = 0
);
And if you want to get other information by using a GROUP BY:
SELECT ID, max(SomeVar), count(*), sum(SomeVar)
FROM [table_name]
WHERE ID NOT IN (
SELECT ID FROM [table_name] WHERE SomeVar = 0
)
GROUP BY ID;
You can use aggregation and having:
select id
from t
group by id
having min(somevar) > 0;
This assumes that somevar is never negative. If that is a possibility, then you can use the slightly more verbose:
select id
from t
group by id
having sum(case when somevar = 0 then 1 else 0 end) = 0;
Use case statement with count or sum aggregation, filter by count using having:
select ID
from
(
select ID, count(case when SomeVar=0 then 1 end) cnt
from mytable
group by ID having count(case when SomeVar=0 then 1 end) = 0
) s
;

identify rows with not null values in sql

How to retrieve all rows having value in a status column (not null) group by ID column.
Id Name Status
1394 Test 1 Y
1394 Test 2 null
1394 Test 3 null
1395 Test 4 Y
1395 Test 5 Y
I wrote like select * from table where status = 'Y'. It brings me 3 records, how to add condition to bring in only last 2? the 1394 ID have other 2 records, which status is null.
If you want to select groups where the status is only y, you can do:
select t.*
from t
where not exists (select 1
from t t2
where t2.id = t.id and
(t2.Status <> 'Y' or t2.status is null)
);
If you only want the ids, I would use group by and having:
select id
from t
group by id
having min(status) = 'Y' and max(status) = 'Y' and count(*) = count(status);
The last condition checks for no NULL values.
You could also write:
having min(status = 'Y' then 1 else 0 end) = 1
A simple way is:
select * from mytable
where status = 'Y'
and id not in (select id from mytable where status is null)
The existing query "where status = 'Y'" will bring you not null by definition.
If you are trying to get grouped results, a "GROUP BY id" clause will achieve this, which will also require putting id in the select explicitly instead of "*".
Example: SELECT id, COUNT(id) from table where status = 'Y'
If I am reading this correctly you want to bring in the ID for a grouping that never has a NULL status value:
I would use a subquery with a not-exist:
SELECT DISTINCT ID FROM mytable WHERE status IS NULL;
Then filter IDs that do not exist in that list:
SELECT * FROM mytable WHERE id NOT IN (SELECT DISTINCT ID FROM mytable WHERE status IS NULL);
Here are some possible solutions, because I am unclear on exactly what you want as output:
Select Id, Name, Status from table where status is not null;
results in 3 rows:
Id Name Status
1394 Test 1 Y
1395 Test 4 Y
1395 Test 5 Y
Select Id, count(*) as anAmt from table where status is not null group by Id;
/* only retrieves counts per Id */
results in 1 row for each Id:
Id anAmt
1394 1
1395 2

Replace NULL with values

Here is my challenge:
I have a log table which every time a record is changed adds a new record but puts a NULL value for each non-changed value in each record. In other words only the changed value is set, the rest unchanged fields in each row simply has a NULL value.
Now I would like to replace each NULL value with the value above it that is NOT a NULL value like below:
Source table: Task_log
ID Owner Status Flag
1 Bob Registrar T
2 Sue NULL NULL
3 NULL NULL F
4 Frank Admission T
5 NULL NULL F
6 NULL NULL T
Desired output table: Task_log
ID Owner Status Flag
1 Bob Registrar T
2 Sue Registrar T
3 Sue Registrar F
4 Frank Admission T
5 Frank Admission F
6 Frank Admission T
How do I write a query which will generate the desired output table?
One the new windowed function of SQLServer 2012 is FIRST_VALUE, wich have quite a direct name, it can be partitioned through the OVER clause, before using it is necessary to divide every column in data block, a block for a column begin when a value is found.
With Block As (
Select ID
, Owner
, OBlockID = SUM(Case When Owner Is Null Then 0 Else 1 End)
OVER (ORDER BY ID)
, Status
, SBlockID = SUM(Case When Status Is Null Then 0 Else 1 End)
OVER (ORDER BY ID)
, Flag
, FBlockID = SUM(Case When Flag Is Null Then 0 Else 1 End)
OVER (ORDER BY ID)
From Task_log
)
Select ID
, Owner = FIRST_VALUE(Owner) OVER (PARTITION BY OBlockID ORDER BY ID)
, Status = FIRST_VALUE(Status) OVER (PARTITION BY SBlockID ORDER BY ID)
, Flag = FIRST_VALUE(Flag) OVER (PARTITION BY FBlockID ORDER BY ID)
FROM Block
SQLFiddle demo
The UPDATE query is easily derived
As I mentioned in my comment, I would try to fix the process that is creating the records rather than fixing the junk data. If that is not an option, the code below should get you pointed in the right direction.
UPDATE t1
set t1.owner = COALESCE(t1.owner, t2.owner),
t1.Status = COALESCE(t1.status, t2.status),
t1.Flag = COALESCE(t1.flag, t2.flag)
FROM Task_log as t1
INNER JOIN Task_log as t2
ON t1.id = (t1.id + 1)
where t1.owner is null
OR t1.status is null
OR t1.flag is null
I can think of several approaches.
You could use a combination of COALESCE with an array aggregate function. Unfortunately it doesn't look like SQL Server supports array_agg natively (although some nice people have developed some workarounds).
You could also use a subselect for each column.
SELECT id,
(SELECT TOP 1 FROM (SELECT owner FROM ... WHERE id = outer_id AND owner IS NOT NULL order by ID desc )) AS owner,
-- other columns
You could probably do something with window functions, too.
A vanilla solution would be:
select id
, owner
, coalesce(owner, ( select owner from t t2
where id = (select max(id) from t t3
where id < t1.id and owner is not null))
) as new_owner
, flag
, coalesce(flag, ( select flag from t t2
where id = (select max(id) from t t3
where id < t1.id and flag is not null))
) as new_flag
from t t1
Rather inefficient, but should work on most DBMS

How do I modify this query without increasing the number of rows returned?

I've got a sub-select in a query that looks something like this:
left outer join
(select distinct ID from OTHER_TABLE) as MYJOIN
on BASE_OBJECT.ID = MYJOIN.ID
It's pretty straightforward. Checks to see if a certain relation exists between the main object being queried for and the object represented by OTHER_TABLE by whether or not MYJOIN.ID is null on the row in question.
But now the requirements have changed a little. There's another row in OTHER_TABLE that can have a value of 1 or 0, and the query needs to know whether a relation exists between the primary for a 1-value, and also if it exists for a 0 value. The obvious solutions is to put:
left outer join
(select distinct ID, TYPE_VALUE from OTHER_TABLE) as MYJOIN
on BASE_OBJECT.ID = MYJOIN.ID
But that would be wrong because if 0-type and 1-type objects both exist for the same ID, it will increase the number of rows returned by the query, which isn't acceptable. So what I need is some sort of subselect that will return 1 row for each distinct ID, with a "1-type exists" column and a "0-type exists" column. And I have no idea how to code that in SQL.
For example, for the following table,
ID | TYPE_VALUE
_________________
1 | 1
3 | 0
3 | 1
4 | 0
I'd like to see a result set like this:
ID | HAS_TYPE_0 | HAS_TYPE_1
______________________________
1 | 0 | 1
3 | 1 | 1
4 | 1 | 0
Anyone know how I could set up a query to do this? Hopefully with a minimum of ugly hacks?
In the general case, you would use EXISTS:
SELECT DISTINCT ID,
CASE WHEN EXISTS (
SELECT * FROM Table1 y
WHERE y.TYPE_VALUE = 0 AND ID = x.ID)
THEN 1
ELSE 0 END AS HAS_TYPE_0,
CASE WHEN EXISTS (
SELECT * FROM Table1 y
WHERE y.TYPE_VALUE = 1 AND ID = x.ID)
THEN 1
ELSE 0 END AS HAS_TYPE_1
FROM Table1 x;
If you have a very large number of elements in the table, this won't perform so great - those nested subselects are often a kiss of death when it comes to performance.
For your specific case, you could also use GROUP BY and MAX() and MIN() to speed things up:
SELECT
ID,
CASE WHEN MIN(TYPE_VALUE) = 0 THEN '1' ELSE 0 END AS HAS_TYPE_0,
CASE WHEN MAX(TYPE_VALUE) = 1 THEN '1' ELSE 0 END AS HAS_TYPE_1
FROM Table1
GROUP BY ID;
Instead of select distinct ID, TYPE_VALUE from OTHER_TABLE
use
select ID,
MAX(CASE WHEN TYPE_VALUE =0 THEN 1 END) as has_type_0,
MAX(CASE WHEN TYPE_VALUE =1 THEN 1 END) as has_type_1
from OTHER_TABLE
GROUP BY ID;
You can do the same using PIVOT opearator...