SQL - Tags searching query - sql

I have following table in my database:
ID name
1 x
2 x
3 y
1 y
1 z
Now I want to select only this objects (ID's) which has both 'x' and 'y' value s tag name. In this case this will be only record with ID = 1 because sought values set ('x' and 'y') is subset of this record possible names set ('x', 'y' and 'z').
How to write a SQL query?
Thanks for help :)

One method uses aggregation:
select id
from t
where name in ('x', 'y')
group by id
having count(*) = 2;
If you care about performance you might want to compare this to:
select id
from t tx join
t ty
on tx.id = ty.id and tx.name = 'x' and ty.name = 'y';
The first version is easier to generalize to more tags. Under some circumstances, the second might have better performance.

Related

How do I check if a certain value exists?

I have a historization table called CUR_VALID. This table looks something like this:
ID CUR_VALID
1 N
1 N
1 Y
2 N
2 Y
3 Y
For every ID there needs to be one Y. If there is no Y or multiple Y there is something wrong. The statment for checking if there are multiple Y I already got. Now I only need to check for every ID if there is one Y existing. Im just not sure how to do that. This is what I have so far. So how do I check if the Value 'Y' exists?
SELECT Count(1) [Number of N]
,MAX(CUR_VALID = 'N')
,[BILL_ID]
,[BILL_MONTH]
,[BILL_SRC_ID]
FROM db.dbo.table
GROUP BY [BILL_ID]
,[BILL_MONTH]
,[BILL_SRC_ID]
Having MAX(CUR_VALID = 'N') > 1
Why are you fiddling with 'N' when you are interested in 'Y'?
Use conditional aggregation to get the count of the value your are interested in.
SELECT
COUNT(*) AS number_of_all,
COUNT(CASE WHEN cur_valid = 'Y' THEN 1 END) AS number_of_y,
COUNT(CASE WHEN cur_valid = 'N' THEN 1 END) AS number_of_n,
bill_id,
bill_month,
bill_src_id,
FROM db.dbo.table
GROUP BY bill_id, bill_month, bill_src_id;
Add a HAVING clause in order to get only valid
HAVING COUNT(CASE WHEN cur_valid = 'Y' THEN 1 END) = 1
or invalid
HAVING COUNT(CASE WHEN cur_valid = 'Y' THEN 1 END) <> 1
bills.
The following query will give you the list of id for which your integrity condition is not met: For every ID there needs to be one Y. If there is no Y or multiple Y there is something wrong.
select T1.id from table T1 where (select count(*) from table T2 where T2.id=T1.id and T2.CUR_VALID='Y')!=1
This query returns both not having at least one 'Y' value and more than one 'Y' value ID's.
First, sum up the Y values and relate to each id, then select not 1 ones from that table.
select * from (
select ID, SUM(case when CUR_VALID = 'Y' then 1 else 0 end) as CNT
from table
group by ID
) b where b.CNT <> 1
DBFiddle
As I understand, you want to get all the id for which your integrity check passes. And integrity check for you means, there is only one row with CUR_VALID value equal to Y in the CUR_VALID table.
This can be achieved by a group by clause:
select id from CUR_VALID
where CUR_VALID.CUR_VALID = 'Y'
group by id
having count(CUR_VALID.CUR_VALID) = 1;

Sql query to find count with a difference condition and total count in the same query

Here is a sample table I have
Logs
user_id, session_id, search_query, action
1, 100, dog, A
1, 100, dog, B
2, 101, cat, A
3, 102, ball, A
3, 102, ball, B
3, 102, kite, A
4, 103, ball, A
5, 104, cat, A
where
miss = for the same user_id and same session id , if action A is not followed by action B its termed a miss.
Note: action B can happen only after action A has happened.
I am able to find the count of misses for each unique search_query across all users and sessions.
SELECT l1.search_query, count(l1.*) as misses
FROM logs l1
WHERE NOT EXISTS
(SELECT NULL FROM logs l2
WHERE l1.user_id = l2.user_id
AND l1.session_id = l2.session_id
AND l1.session_id != ''
AND l2.action = 'B'
AND l1.action = 'A')
AND l1.action='A'
AND l1.search_query != ''
GROUP BY v1.search_query
order by misses desc;
I am trying to find the value of miss_percentage=(number of misses/total number of rows)*100 for each unique search_query. I couldn't figure out how to find the count with a condition and count without that condition in the same query. Any help would be great.
expected output:
cat 100
kite 100
ball 50
One way to do it is to move the EXISTS into the count
SELECT l1.search_query, count(case when NOT EXISTS
(SELECT 1 FROM logs l2
WHERE l1.user_id = l2.user_id
AND l1.session_id = l2.session_id
AND l1.search_query = l2.search_query
AND l2.action = 'B'
AND l1.action = 'A') then 1 else null end
)*100.0/count(*) as misses
FROM logs l1
WHERE l1.action='A'
AND l1.search_query != ''
GROUP BY l1.search_query
order by misses desc;
This produces the desired results, but also zeros if no misses were found. This can be removed with a HAVING clause, or postprocessing.
Note I also added the clause l1.search_query = l2.search_query that was missing, since otherwise it was counting kite as succeeded, since there is a row with B in the same session.
I think you just need to use case statements here. If I have understood your problem correctly .. then the solution would be something like this -
WITH summary
AS (
SELECT user_id
,session_id
,search_query
,count(1) AS total_views
,sum(CASE
WHEN action = 'A'
THEN 1
ELSE 0
END) AS action_a
,sum(CASE
WHEN action = 'B'
THEN 1
ELSE 0
END) AS action_b
FROM logs l
GROUP BY user_id
,session_id
,search_query
)
SELECT search_query
,(sum(action_a - action_b) / sum(action_a)) * 100 AS miss_percentage
FROM summary
GROUP BY search_query;
You can allways create two queries, and combine them into one with a join. Then you can do the calculations in the bridging (or joining) SQL statement.
In MS-SQL compatible SQL this would be:
SELECT ActiontypeA,countedA,isNull(countedB,0) as countedB,
(countedA-isNull(countedB,0))*100/CountedA as missed
FROM (SELECT search_query as actionTypeA, count(*) as countedA
FROM logs WHERE Action='A' GROUP BY actionType
) as TpA
LEFT JOIN
(SELECT search_query as actionTypeB, count(*) as countedB
FROM logs WHERE Action='B' GROUP BY actionType
) as TpB
ON TpA.ActionTypeA = TpB.ActiontypeB
The LEFT JOIN is required to select all activities (search_query) from the 'A' results, and join them to only those from the 'B' results where a B is available.
Since this is very basic SQL (and well optimized by SQL engines) I'd suggest to prevent WHERE EXISTS as much as possible. The IsNull() function is an MS-SQL function to force a NULL value into the int(0) value which can be used in a calculation.
Finally you could filter on
WHERE missed>0
to get the final result.

Update column in set of records only if multiple rows exist with a given value in a different field?

My_Table would be something like this:
user_id shared_field bool_field
------- ------------ ----------
1 abc null
2 def null
3 ghi Y
4 ghi null
5 ghi null
6 abc Y
7 jkl null
If the bool_field changes for a user who shares the same shared_field with other users (such as user_id 3, 4, and 5 above), only that one user should have a 'Y'. The rest should have null values in the bool_field column. For example, if user_id 4 should now have the 'Y', I have to change user_id 4's bool_field to 'Y', and ensure that user_id 3 and 5 have bool_field values of null.
If the user doesn't share a shared_field value with anyone else, then that bool_field should be null (as in user_id 1 and 2 above).
Update: added a couple of lines to show that multiple user_ids could share a given shared_field (eg, 1 and 6 both have 'abc'; 3, 4, and 5 all have 'ghi' - only one 'abc' user should have a 'Y', and only one 'ghi' user should have a 'Y' and so on, while the rest have null in their bool_field column; user_ids that don't share a shared_field value, such as user_ids 2 and 7, should all have null in their bool_field column.) Clear as mud, right? ;)
This statement works:
UPDATE my_table
SET bool_field = (CASE
WHEN user_id = 4 THEN 'Y'
ELSE NULL
END)
WHERE shared_field = 'ghi'
AND (SELECT COUNT(shared_field)
FROM my_table
WHERE shared_field = 'ghi') > 1;
The question: is there some way that I can accomplish this same thing without knowing the shared_field value in advance? For example (and this doesn't work, of course) - Update: "of course" means I know this doesn't work because it is not correct Oracle syntax! The point is to give an idea of what I'm trying to do.
UPDATE my_table
SET bool_field = (CASE
WHEN user_id = 4 THEN 'Y'
ELSE NULL
END)
WHERE shared_field = (SELECT shared_field FROM my_table WHERE user_id = 4) as sharedVal
AND (SELECT COUNT(shared_field)
FROM my_table
WHERE shared_field = sharedVal) > 1;
Update: this is a regular SQL statement - I can't use a stored procedure.
First, rather than saying that something "doesn't work", it is generally helpful to tell us how it doesn't work. The query you posted, for example, appears to have syntax errors (as sharedVal is invalid because you can't assign an alias to an expression you're computing in the SELECT list). But it's not clear if "doesn't work" means that you're getting syntax errors (which we can relatively easily debug with the error message) or whether it means that the query runs but doesn't do what you want (which I would expect the query to do if the syntax errors were corrected) in which case knowing how the query isn't doing what you want would be helpful.
I would expect something like
UPDATE my_table a
SET bool_field = (CASE WHEN user_id = 4
THEN 'Y'
ELSE NULL
END)
WHERE shared_field = (SELECT shared_field
FROM my_table b
WHERE b.user_id = 4)
AND EXISTS( SELECT 1
FROM my_table c
WHERE a.shared_field = c.shared_field
AND a.user_id != c.user_id )
to work assuming that user_id is the primary key.
It's been awhile since I've worked on Oracle, so check to make sure that this query returns the list of records you want to update:
SELECT *
FROM my_table m
WHERE EXISTS (SELECT 1
FROM my_table
WHERE shared_field = m.shared_field
HAVING COUNT(shared_field) > 1);
If so, then this should work:
UPDATE my_table
SET bool_field = (CASE
WHEN user_id = 4 THEN 'Y'
ELSE NULL
END)
WHERE EXISTS (SELECT 1
FROM my_table
WHERE shared_field = m.shared_field
HAVING COUNT(shared_field) > 1);
You could also try this WHERE clause:
WHERE shared_field IN (SELECT shared_field
FROM my_table
GROUP BY shared_field
HAVING COUNT(shared_field) > 1);

How do I determine if a group of data exists in a table, given the data that should appear in the group's rows?

I am writing data to a table and allocating a "group-id" for each batch of data that is written. To illustrate, consider the following table.
GroupId Value
------- -----
1 a
1 b
1 c
2 a
2 b
3 a
3 b
3 c
3 d
In this example, there are three groups of data, each with similar but varying values.
How do I query this table to find a group that contains a given set of values? For instance, if I query for (a,b,c) the result should be group 1. Similarly, a query for (b,a) should result in group 2, and a query for (a, b, c, e) should result in the empty set.
I can write a stored procedure that performs the following steps:
select distinct GroupId from Groups -- and store locally
for each distinct GroupId: perform a set-difference (except) between the input and table values (for the group), and vice versa
return the GroupId if both set-difference operations produced empty sets
This seems a bit excessive, and I hoping to leverage some other commands in SQL to simplify. Is there a simpler way to perform a set-comparison in this context, or to select the group ID that contains the exact input values for the query?
This is a set-within-sets query. I like to solve it using group by and having:
select groupid
from GroupValues gv
group by groupid
having sum(case when value = 'a' then 1 else 0 end) > 0 and
sum(case when value = 'b' then 1 else 0 end) > 0 and
sum(case when value = 'c' then 1 else 0 end) > 0 and
sum(case when value not in ('a', 'b', 'c') then 1 else - end) = 0;
The first three conditions in the having clause check that each elements exists. The last condition checks that there are no other values. This method is quite flexible, for various exclusions and inclusion conditions on the values you are looking for.
EDIT:
If you want to pass in a list, you can use:
with thelist as (
select 'a' as value union all
select 'b' union all
select 'c'
)
select groupid
from GroupValues gv left outer join
thelist
on gv.value = thelist.value
group by groupid
having count(distinct gv.value) = (select count(*) from thelist) and
count(distinct (case when gv.value = thelist.value then gv.value end)) = count(distinct gv.value);
Here the having clause counts the number of matching values and makes sure that this is the same size as the list.
EDIT:
query compile failed because missing the table alias. updated with right table alias.
This is kind of ugly, but it works. On larger datasets I'm not sure what performance would look like, but the nested instances of #GroupValues key off GroupID in the main table so I think as long as you have a good index on GroupID it probably wouldn't be too horrible.
If Object_ID('tempdb..#GroupValues') Is Not Null Drop Table #GroupValues
Create Table #GroupValues (GroupID Int, Val Varchar(10));
Insert #GroupValues (GroupID, Val)
Values (1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'b'),(3,'a'),(3,'b'),(3,'c'),(3,'d');
If Object_ID('tempdb..#FindValues') Is Not Null Drop Table #FindValues
Create Table #FindValues (Val Varchar(10));
Insert #FindValues (Val)
Values ('a'),('b'),('c');
Select Distinct gv.GroupID
From (Select Distinct GroupID
From #GroupValues) gv
Where Not Exists (Select 1
From #FindValues fv2
Where Not Exists (Select 1
From #GroupValues gv2
Where gv.GroupID = gv2.GroupID
And fv2.Val = gv2.Val))
And Not Exists (Select 1
From #GroupValues gv3
Where gv3.GroupID = gv.GroupID
And Not Exists (Select 1
From #FindValues fv3
Where gv3.Val = fv3.Val))

sql ORDER BY multiple values in specific order?

Ok I have a table with a indexed key and a non indexed field.
I need to find all records with a certain value and return the row.
I would like to know if I can order by multiple values.
Example:
id x_field
-- -----
123 a
124 a
125 a
126 b
127 f
128 b
129 a
130 x
131 x
132 b
133 p
134 p
135 i
pseudo: would like the results to be ordered like this, where ORDER BY x_field = 'f', 'p', 'i', 'a'
SELECT *
FROM table
WHERE id NOT IN (126)
ORDER BY x_field 'f', 'p', 'i', 'a'
So the results would be:
id x_field
-- -----
127 f
133 p
134 p
135 i
123 a
124 a
125 a
129 a
The syntax is valid but when I execute the query it never returns any results, even if I limit it to 1 record. Is there another way to go about this?
Think of the x_field as test results and I need to validate all the records that fall in the condition. I wanted to order the test results by failed values, passed values. So I could validate the failed values first and then the passed values using the ORDER BY.
What I can't do:
GROUP BY, as I need to return the specific record values
WHERE x_field IN('f', 'p', 'i', 'a'), I need all the values as I'm trying to use one query for several validation tests. And x_field values are not in DESC/ASC order
After writing this question I'm starting to think that I need to rethink this, LOL!
...
WHERE
x_field IN ('f', 'p', 'i', 'a') ...
ORDER BY
CASE x_field
WHEN 'f' THEN 1
WHEN 'p' THEN 2
WHEN 'i' THEN 3
WHEN 'a' THEN 4
ELSE 5 --needed only is no IN clause above. eg when = 'b'
END, id
Try:
ORDER BY x_field='f', x_field='p', x_field='i', x_field='a'
You were on the right track, but by putting x_field only on the 'f' value, the other three were treated as constants and not compared against anything in the dataset.
You can use a LEFT JOIN with a "VALUES ('f',1),('p',2),('a',3),('i',4)" and use the second column in your order-by expression. Postgres will use a Hash Join which will be much faster than a huge CASE if you have a lot of values. And it is easier to autogenerate.
If this ordering information is fixed, then it should have its own table.
I found a much cleaner solution for this:
ORDER BY array_position(ARRAY['f', 'p', 'i', 'a']::varchar[], x_field)
Note: array_position needs Postgres v9.5 or higher.
Use a case switch to translate the codes into numbers that can be sorted:
ORDER BY
case x_field
when 'f' then 1
when 'p' then 2
when 'i' then 3
when 'a' then 4
else 5
end
The CASE and ORDER BY suggestions should all work, but I'm going to suggest a horse of a different color. Assuming that there are only a reasonable number of values for x_field and you already know what they are, create an enumerated type with F, P, A, and I as the values (plus whatever other possible values apply). Enums will sort in the order implied by their CREATE statement. Also, you can use meaninful value names—your real application probably does and you have just masked them for confidentiality—without wasted space, since only the ordinal position is stored.
For someone who is new to ORDER BY with CASE this may be useful
ORDER BY
CASE WHEN GRADE = 'A' THEN 0
WHEN GRADE = 'B' THEN 1
ELSE 2 END
#bobflux's answer is great. I would like to extend it by adding a complete query that uses proposed approach.
select tt.id, tt.x_field
from target_table as tt
-- Here we join our target_table with order_table to specify custom ordering.
left join
(values ('f', 1), ('p', 2), ('i', 3), ('a', 4)) as order_table (x_field, order_num)
on order_table.x_field = tt.x_field
order by
order_table.order_num, -- Here we order values by our custom order.
tt.x_field; -- Other values can be ordered alphabetically, for example.
Here is complete demo.
Since i don't have enough reputation to write as a comment, added this as a new answer.
You can add asc or desc to order by clause.
ORDER BY x_field='A' ASC, x_field='I' DESC, x_field='P' DESC, x_field='F' ASC
which makes I first, P second and A as last one and F before the last.
You can order by a selected column or other expressions.
Here an example, how to order by the result of a case-statement:
SELECT col1
, col2
FROM tbl_Bill
WHERE col1 = 0
ORDER BY -- order by case-statement
CASE WHEN tbl_Bill.IsGen = 0 THEN 0
WHEN tbl_Bill.IsGen = 1 THEN 1
ELSE 2 END
The result will be a List starting with "IsGen = 0" rows, followed by "IsGen = 1" rows and all other rows a the end.
You could add more order-parameters at the end:
SELECT col1
, col2
FROM tbl_Bill
WHERE col1 = 0
ORDER BY -- order by case-statement
CASE WHEN tbl_Bill.IsGen = 0 THEN 0
WHEN tbl_Bill.IsGen = 1 THEN 1
ELSE 2 END,
col1,
col2
if you are using MySQL 4.0 afterwards, consider using FIELD() . It returns the index position of the first argument through the next arguments and it is case-sensitive.
ORDER BY FIELD(x_field, 'f', 'p', 'i', 'a')
you can use position(text in text) in order by for ordering the sequence