What SQL query can answer "Do these rows exist?" - sql

Here is the code to create the database:
CREATE TABLE foo (
id TEXT PRIMARY KEY,
value TEXT
);
INSERT INTO foo VALUES(1, 10), (2, 20), (3, 30), (5, 50);
Now I have a set of rows and I want back 0 if the row doesnt exist, 1 if the row exists but is not the same, and 2 if the row exists exactly.
So the result of the query on (1, 11), (2, 20), (4, 40) should be 1, 2, 0.
The reason I want this is to know what query to use to insert the data into the database. If it is a 0, I do a normal insert, if it is a 1 I do an update, and if it is a 2 I skip the row. I know that INSERT OR REPLACE will result in nearly the same rows, but the problem is that it doesnt trigger the correct triggers (it will always trigger an on insert trigger instead of an update trigger or no trigger if the row exists exactly).
Also, I want to do one query with all of the rows, not one query per row.

The idea is to use an aggregation query. Count the number of times that the id matches. If there are none, then return 0. Then check the value to distinguish between 1 and 2:
select (case when max(id = 1) = 0 then 0
when max(id = 1 and value = 11) = 0 then 1
else 2
end) as flag
from table t;
You need to plug the values into the query.
EDIT:
If you want to match a bunch of rows, do something like this:
select testvalue.id,
(case when max(t.id = testvalue.id) = 0 then 0
when max(t.id = testvalue.id and t.value = testvalue.value) = 0 then 1
else 2
end) as flag
from table t cross join
(select 1 as id 10 as value union all
select 2, 20 union all
select 4, 40
) as testvalues
group by testvalues.id;

You can use the EXISTS argument in Transact-SQL. MSDN Documentation.
This returns true if a row exists. You can then use an If statement within that to check if the row is the same or different, and if true, use the RETURN argument with your specified values. MSDN Documentation.

This is based off of Gordon Linoff's answer so upvote him. I just wanted to share what I actually went with:
select testvalues.id,
(case when t.id != testvalues.id then 0
when t.value != testvalues.value then 1
else 2
end) as flag
from (select 1 as id, 11 as entity union all
select 2, 20 union all
select 4, 40
) as testvalues
LEFT OUTER JOIN foo t on testvalues.id=t.id
This prevents the full memory usage of a cross join and group by clauses.

Related

SQL Update all rows for one column with a list of values

How can I set all rows of a column by manually typing the values?
My table has 3 rows, I want the column named mycolumn to have 3 values a, b and c (currently those values are NULL):
update mytable set mycolumn = ('a','b','c')
ORA-00907 missing right parenthesis
EDIT: my table is very simple, I have one column ID INT NOT NULL with values 1, 2, 3 and another column mycolumn with all NULL values and I want those values to become 'a' where ID = 1, 'b' where ID=2 etc.
EDIT2: I might have a huge amount of rows, so I want to avoid typing every single ID value where to replace mycolumn. Isn't it possible to match the ID values of 1 to 3 to the values 'a', 'b', 'c' in an automatic way, something like match(ID, ('a','b','c')) perhaps
I just want to replace all values of mycolumn by increasing order of ID. ID being strictly equivalent to what I call a row number in a matrix
EDIT3: I'd like a solution which would work in a general case with all sorts of values, not only the letters of the alphabet given here for simplicity. What if for example my values to replace in mycolumn are ('oefaihfoiashfe', 'fiaohoawdihoiwahopah', 'aefohdfaohdao')? However the ID row numbers will always be a sequence from 1 to N by 1.
Obviously, you should do this in a single update. Like this:
update mytable
set mycolumn = case id when 1 then 'a' when 2 then 'b' when 3 then 'c' end
;
More compact (but also more cryptic, and only works in Oracle, while case expressions are in the SQL standard):
update mytable
set mycolumn = decode(id, 1, 'a', 2, 'b', 3, 'c')
;
Note - this only works if there really are only three rows. If you have many more rows, make sure to add where id in (1, 2, 3) at the end. Otherwise all the OTHER values (in the other rows) will be updated to null!
You can try an update like the one below. This will update 1 > a, 2 > b, 3 > c, 4 > d, etc. When you reach ID 27, since there are no more letters, it will begin at a again and continue down the alphabet.
UPDATE mytable
SET mycolumn = CASE MOD (id, 26)
WHEN 0 THEN 'z'
ELSE CHR (MOD (id, 26) + 96)
END;
Update
To update based on any list of values, you can try an update statement like the one below. If you add a 4th item to the comma delimited list, ID 4 in mytable will be set to whatever you specified as the 4th value.
UPDATE mytable
SET mycolumn =
(SELECT COLUMN_VALUE
FROM (SELECT ROWNUM AS row_num, t.COLUMN_VALUE
FROM TABLE (
sys.odcivarchar2list ('oefaihfoiashfe',
'fiaohoawdihoiwahopah',
'aefohdfaohdao')) t)
WHERE row_num = id);
Hmmm . . . A row can only have one value. Perhaps something like this to assign random values:
update mytable
set mycolumn = (case floor(dbms_random.random * 3)
case 0 then 'a' case 1 then 'b' else 'c'
end)
if you want the 3 rows to have different values a, b and c then you will have to write 3 update statements.
update mytable set mycolumn = 'a' where id = 1;
update mytable set mycolumn = 'b' where id = 2;
update mytable set mycolumn = 'c' where id = 3;

Create a new table with columns with case statements and max function

I have some problems in creating a new table from an old one with new columns defined by case statements.
I need to add to a new table three columns, where I compute the maximum based on different conditions. Specifically,
if time is between 1 and 3, I define a variable max_var_1_3 as max((-1)*var),
if time is between 1 and 6, I define a variable max_var_1_6 as max((-1)*var),
if time is between 1 and 12, I define a variable max_var_1_12 as max((-1)*var),
The max function needs to take the maximum value of the variable var in the window between 1 and 3, 1 and 6, 1 and 12 respectively.
I wrote this
create table new as(
select t1.*,
(case when time between 1 and 3 then MAX((-1)*var)
else var
end) as max_var_1_3,
(case when time between 1 and 6 then MAX((-1)*var)
else var
end) as max_var_1_6,
(case when time between 1 and 12 then MAX((-1)*var)
else var
end) as max_var_1_12
from old_table t1
group by time
) with data primary index time
but unfortunately it is not working. The old_table has already some columns, and I would like to import all of them and then compare the old table with the new one. I got an error that says that should be something between ) and ',', but I cannot understand what. I am using Teradata SQL.
Could you please help me?
Many thanks
The problem is that you have GROUP BY time in your query while trying to return all the other values with your SELECT t1.*. To make your query work as-is, you'd need to add each column from t1.* to your GROUP BY clause.
If you want to find the MAX value within the different time ranges AND also return all the rows, then you can use a window function. Something like this:
CREATE TABLE new AS (
SELECT
t1.*,
CASE
WHEN t1.time BETWEEN 1 AND 3 THEN (
MAX(CASE WHEN t1.time BETWEEN 1 AND 3 THEN (-1 * t1.var) ELSE NULL END) OVER()
)
ELSE t1.var
END AS max_var_1_3,
CASE
WHEN t1.time BETWEEN 1 AND 6 THEN (
MAX(CASE WHEN t1.time BETWEEN 1 AND 6 THEN (-1 * t1.var) ELSE NULL END) OVER()
)
ELSE t1.var
END AS max_var_1_6,
CASE
WHEN t1.time BETWEEN 1 AND 12 THEN (
MAX(CASE WHEN t1.time BETWEEN 1 AND 12 THEN (-1 * t1.var) ELSE NULL END) OVER()
)
ELSE t1.var
END AS max_var_1_12,
FROM old_table t1
) WITH DATA PRIMARY INDEX (time)
;
Here's the logic:
check if a row falls in the range
if it does, return the desired MAX value for rows in that range
otherwise, just return that given row's default value (var)
return all rows along with the three new columns
If you have performance issues, you could also move the max_var calculations to a CTE, since they only need to be calculated once. Also to avoid confusion, you may want to explicitly specify the values in your SELECT instead of using t1.*.
I don't have a TD system to test, but try it out and see if that works.
I cannot help with the CREATE TABLE AS, but the query you want is this:
SELECT
t.*,
(SELECT MAX(-1 * var) FROM old_table WHERE time BETWEEN 1 AND 3) AS max_var_1_3,
(SELECT MAX(-1 * var) FROM old_table WHERE time BETWEEN 1 AND 6) AS max_var_1_6,
(SELECT MAX(-1 * var) FROM old_table WHERE time BETWEEN 1 AND 12) AS max_var_1_12
FROM old_table t;

SQL selecting values with Null and 0 values from range null, 0 & 1

One of the columns returned by a T-SQL query has one of these three values
NULL, 0, 1
I am trying to filter out the 1 values, but when I use these clause:
Query:
select foo
from dbo.table1
where ..
Clause:
where foo <> 1: returns only the data with 0 values in the column
Where foo is null: returns only the null values in the column
where foo in (Null, 0): returns only the data with 0 values in the column
What is the correct method to filter out the data?
Since there are only 3 values, this is enough:
where isnull(foo, 0) <> 1
or:
where isnull(foo, 0) = 0
In SQL Server, you can do:
where foo <> 1 or foo is null
Or, if you know that there really are only three values, something like:
where coalesce(foo, -1) <> 1
SQL Server does not have a NULL-safe comparison. Standard SQL supports:
where foo is distinct from 1
but that is not available in SQL Server.

Sql query to find count with a difference condition and total count in the same query

Here is a sample table I have
Logs
user_id, session_id, search_query, action
1, 100, dog, A
1, 100, dog, B
2, 101, cat, A
3, 102, ball, A
3, 102, ball, B
3, 102, kite, A
4, 103, ball, A
5, 104, cat, A
where
miss = for the same user_id and same session id , if action A is not followed by action B its termed a miss.
Note: action B can happen only after action A has happened.
I am able to find the count of misses for each unique search_query across all users and sessions.
SELECT l1.search_query, count(l1.*) as misses
FROM logs l1
WHERE NOT EXISTS
(SELECT NULL FROM logs l2
WHERE l1.user_id = l2.user_id
AND l1.session_id = l2.session_id
AND l1.session_id != ''
AND l2.action = 'B'
AND l1.action = 'A')
AND l1.action='A'
AND l1.search_query != ''
GROUP BY v1.search_query
order by misses desc;
I am trying to find the value of miss_percentage=(number of misses/total number of rows)*100 for each unique search_query. I couldn't figure out how to find the count with a condition and count without that condition in the same query. Any help would be great.
expected output:
cat 100
kite 100
ball 50
One way to do it is to move the EXISTS into the count
SELECT l1.search_query, count(case when NOT EXISTS
(SELECT 1 FROM logs l2
WHERE l1.user_id = l2.user_id
AND l1.session_id = l2.session_id
AND l1.search_query = l2.search_query
AND l2.action = 'B'
AND l1.action = 'A') then 1 else null end
)*100.0/count(*) as misses
FROM logs l1
WHERE l1.action='A'
AND l1.search_query != ''
GROUP BY l1.search_query
order by misses desc;
This produces the desired results, but also zeros if no misses were found. This can be removed with a HAVING clause, or postprocessing.
Note I also added the clause l1.search_query = l2.search_query that was missing, since otherwise it was counting kite as succeeded, since there is a row with B in the same session.
I think you just need to use case statements here. If I have understood your problem correctly .. then the solution would be something like this -
WITH summary
AS (
SELECT user_id
,session_id
,search_query
,count(1) AS total_views
,sum(CASE
WHEN action = 'A'
THEN 1
ELSE 0
END) AS action_a
,sum(CASE
WHEN action = 'B'
THEN 1
ELSE 0
END) AS action_b
FROM logs l
GROUP BY user_id
,session_id
,search_query
)
SELECT search_query
,(sum(action_a - action_b) / sum(action_a)) * 100 AS miss_percentage
FROM summary
GROUP BY search_query;
You can allways create two queries, and combine them into one with a join. Then you can do the calculations in the bridging (or joining) SQL statement.
In MS-SQL compatible SQL this would be:
SELECT ActiontypeA,countedA,isNull(countedB,0) as countedB,
(countedA-isNull(countedB,0))*100/CountedA as missed
FROM (SELECT search_query as actionTypeA, count(*) as countedA
FROM logs WHERE Action='A' GROUP BY actionType
) as TpA
LEFT JOIN
(SELECT search_query as actionTypeB, count(*) as countedB
FROM logs WHERE Action='B' GROUP BY actionType
) as TpB
ON TpA.ActionTypeA = TpB.ActiontypeB
The LEFT JOIN is required to select all activities (search_query) from the 'A' results, and join them to only those from the 'B' results where a B is available.
Since this is very basic SQL (and well optimized by SQL engines) I'd suggest to prevent WHERE EXISTS as much as possible. The IsNull() function is an MS-SQL function to force a NULL value into the int(0) value which can be used in a calculation.
Finally you could filter on
WHERE missed>0
to get the final result.

How do I determine if a group of data exists in a table, given the data that should appear in the group's rows?

I am writing data to a table and allocating a "group-id" for each batch of data that is written. To illustrate, consider the following table.
GroupId Value
------- -----
1 a
1 b
1 c
2 a
2 b
3 a
3 b
3 c
3 d
In this example, there are three groups of data, each with similar but varying values.
How do I query this table to find a group that contains a given set of values? For instance, if I query for (a,b,c) the result should be group 1. Similarly, a query for (b,a) should result in group 2, and a query for (a, b, c, e) should result in the empty set.
I can write a stored procedure that performs the following steps:
select distinct GroupId from Groups -- and store locally
for each distinct GroupId: perform a set-difference (except) between the input and table values (for the group), and vice versa
return the GroupId if both set-difference operations produced empty sets
This seems a bit excessive, and I hoping to leverage some other commands in SQL to simplify. Is there a simpler way to perform a set-comparison in this context, or to select the group ID that contains the exact input values for the query?
This is a set-within-sets query. I like to solve it using group by and having:
select groupid
from GroupValues gv
group by groupid
having sum(case when value = 'a' then 1 else 0 end) > 0 and
sum(case when value = 'b' then 1 else 0 end) > 0 and
sum(case when value = 'c' then 1 else 0 end) > 0 and
sum(case when value not in ('a', 'b', 'c') then 1 else - end) = 0;
The first three conditions in the having clause check that each elements exists. The last condition checks that there are no other values. This method is quite flexible, for various exclusions and inclusion conditions on the values you are looking for.
EDIT:
If you want to pass in a list, you can use:
with thelist as (
select 'a' as value union all
select 'b' union all
select 'c'
)
select groupid
from GroupValues gv left outer join
thelist
on gv.value = thelist.value
group by groupid
having count(distinct gv.value) = (select count(*) from thelist) and
count(distinct (case when gv.value = thelist.value then gv.value end)) = count(distinct gv.value);
Here the having clause counts the number of matching values and makes sure that this is the same size as the list.
EDIT:
query compile failed because missing the table alias. updated with right table alias.
This is kind of ugly, but it works. On larger datasets I'm not sure what performance would look like, but the nested instances of #GroupValues key off GroupID in the main table so I think as long as you have a good index on GroupID it probably wouldn't be too horrible.
If Object_ID('tempdb..#GroupValues') Is Not Null Drop Table #GroupValues
Create Table #GroupValues (GroupID Int, Val Varchar(10));
Insert #GroupValues (GroupID, Val)
Values (1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'b'),(3,'a'),(3,'b'),(3,'c'),(3,'d');
If Object_ID('tempdb..#FindValues') Is Not Null Drop Table #FindValues
Create Table #FindValues (Val Varchar(10));
Insert #FindValues (Val)
Values ('a'),('b'),('c');
Select Distinct gv.GroupID
From (Select Distinct GroupID
From #GroupValues) gv
Where Not Exists (Select 1
From #FindValues fv2
Where Not Exists (Select 1
From #GroupValues gv2
Where gv.GroupID = gv2.GroupID
And fv2.Val = gv2.Val))
And Not Exists (Select 1
From #GroupValues gv3
Where gv3.GroupID = gv.GroupID
And Not Exists (Select 1
From #FindValues fv3
Where gv3.Val = fv3.Val))