SQL Query to Update Object Count Based on Event Name - sql

Imagine I am an owner of many bookstores. I keep a database of all events that occur in all of my many bookstores. Two events of note are "Book Added" and "Book Removed", for when a book is added to the inventory of a story, and when it is sold from a store. An example schema would be bookstore_id, event_name, `time.
Now say I have a second table, which maintains the current state of each bookstore, so the schema would be bookstore_id, num_books.
I want to be able to use the first table to get the count of all the "Book Added" events per bookstore, subtract the count of all the "Book Removed" events per bookstore, and then update the number of books in each bookstore in the second table.
The only way I can think to do it requires using a cursor, but I'm assuming there's a more "SQL-esque" way to do it that is more set-based and doesn't require a cursor.

You can count the events by using a GROUP BY clause.
If we would create 2 subtables where we count the added respectively the removed books, we can simply subtract the results and update these in the parent table. This will look like:
UPDATE b
SET b.numbooks = AddedBooks.BooksAdded - RemovedBooks.BooksRemoved
FROM dbo.Books b
INNER JOIN (SELECT be.book_id, count(*) AS BooksAdded
FROM dbo.BookEvents be
WHERE be.event = 'BookAdded'
GROUP BY be.book_id, be.event) AS AddedBooks
ON b.bookid = AddedBooks.book_id
INNER JOIN (SELECT be.book_id, count(*) AS BooksRemoved
FROM dbo.BookEvents be
WHERE be.event = 'BookRemoved'
GROUP BY be.book_id, be.event) AS RemovedBooks
ON b.bookid = RemovedBooks.book_id

select bookstore_id
, sum(case when event_name = "Book Removed" then -1 else 1 end) as "num books"
from bookstores
group by bookstore_id
if more than 2 events
select bookstore_id
, sum(case when event_name = "Book Removed" then -1
when event_name = "Book Added" then 1
end) as "num books"
from bookstores
group by bookstore_id
And I would just make it a view unless you come up with performance issues

We can use CTEs to get details individually and process them.
With CTE_Add AS
(
Select Bkstr_ID, Count(event_Name) As Added From temp Where event = 'Added' Group by Bkstr_ID
), CTE_Rem As
(
Select Bkstr_ID, Count(event_Name) As Removed From temp Where event = 'Removed' Group by Bkstr_ID
)
Select A.Bkstr_ID, Added - Removed
From CTE_Add A
Left Join CTE_Rem R On A.Bkstr_ID= R.Bkstr_ID
This will give you ID and count.
Instead of select, you can use Insert statement

I'd use SUM(CASE WHEN ...). Below is an example.
If object_id('tempdb..#BoookStores') Is Not Null Drop Table #BoookStores
Create Table #BoookStores (bookstore_id int, num_books int)
/* We have 3 stores */
Insert #BoookStores (bookstore_id, num_books)
Values (1, 0), (2, 0), (3, 0)
If object_id('tempdb..#Events') Is Not Null Drop Table #Events
Create Table #Events (bookstore_id int, event_name varchar(10), time dateTime Default(GetDate()) )
Insert #Events (bookstore_id, event_name)
Values
(1, 'Added'), (1, 'Added'), (1, 'Added'), (1, 'Added'), -- Added 4 books to 1. store
(2, 'Added'), (2, 'Added'), (2, 'Added'), -- Added 3 books to 2. store
(3, 'Added'), (3, 'Added'), -- Added 2 books to 3. store
/* removed 2 books from each stores */
(1, 'Removed'), (1, 'Removed'),
(2, 'Removed'), (2, 'Removed'),
(3, 'Removed'), (3, 'Removed')
/* Calculate adds and removes. Update the results */
;With Tmp As (
Select E.bookstore_id,
Sum(Case When E.event_name = 'Added' Then 1 Else 0 End) As AddCount,
Sum(Case When E.event_name = 'Removed' Then 1 Else 0 End) As RemoveCount
From #Events E
Group By E.bookstore_id
)
Update BS Set num_books = T.AddCount-T.RemoveCount
From #BoookStores BS
Inner Join Tmp T On T.bookstore_id = BS.bookstore_id
/* check results*/
Select * From #BoookStores BS

Something like this will get you in the ball park. Similar logic could be used for INSERT.
UPDATE tableA
SET tableA.num_books = tableB.num_books
FROM secondTable AS TableA
INNER JOIN (
SELECT bookstore_id,
SUM(CASE
WHEN event_name = 'Books Added'
THEN 1
END) - SUM(CASE
WHEN event_name = 'Books Removed'
THEN 1
END
) AS num_books
FROM firstTable
GROUP BY bookstore_id
) TableB ON TableA.bookstore_id = tableB.bookstore_id

You can try a query like below:
update t1
set num_books=inventory
FROM bs t1 LEFT JOIN
(select bookstore_id,SUM(case when event_name like 'A' then 1 when event_name like 'R' then -1 else NULL end) as inventory
from bse
group by bookstore_id) t2
on t1.bookstore_id=t2.bookstore_id
Live SQL demo

UPDATE bsc
SET bsc.num_books = bse.num_books
FROM bookstorecounts bsc
JOIN (SELECT bookstore_id,
SUM(CASE event_name
WHEN 'Book Removed' THEN -1
WHEN 'Book Added' THEN 1
END) AS num_books
FROM bookstoreevents
GROUP BY bookstore_id
) bse ON bsc.bookstore_id = bse.bookstore_id

Related

SQL query to get repeating column value that have other columns in a certain codition

Let's say we have below table of below schema.
create table result
(
id int,
task_id int,
test_name string,
test_result string
);
And dataset populated on this table looks like this.
insert into result
values (1, 1, 'test_a', 'pass'),
(2, 1, 'test_b', 'fail'),
(3, 1, 'test_c', 'pass'),
(4, 1, 'test_d', 'pass'),
(5, 2, 'test_a', 'pass'),
(6, 2, 'test_b', 'pass'),
(7, 2, 'test_c', 'pass'),
(8, 2, 'test_d', 'pass');
Basically single task has multiple test results entry. I want to retrieve task_id that has test_b fail but all the other test passed. So in this example it should return only task_id: 1.
I've tried with EXISTS and HAVING but it doesn't seem working in this case. I'm new to SQL. How can I implement it?
I would just use aggregation with a having clause:
select task_id
from result
group by task_id
having sum(case when test_name = 'test_b' and test_result = 'fail' then 1 else 0 end) = 1 and
sum(case when test_result = 'pass' then 1 else 0 end) = count(*) - 1;
The first condition validates that test_b failed. The second counts the number of passes and it should be one less then the number of rows for the task.
If your database supports except (or minus), you an use set-based operations:
select task_id
from result
where test_name = 'test_b' and test_result = 'fail'
except
select task_id
from result
where test_name <> 'test_b' and test_result = 'fail'
Maybe selecting distinct task IDs that have a fail result:
select distinct [task_id], [task_result]
from [result]
where [task_result] = 'fail'
Note that this query will scan the entire table unless there is an index on task_result.
Following code first sums test takers per task and counts fro 'test_b' whether it failed or not. Outer select ensure 'test_b' failed and other have passed.
select task_id from (
select
task_id,
count(test_result) numberoftakers,
sum(case when test_result<>'pass' AND test_name='test_b' then 1 else 0 end) numberoffailb,
sum(case when test_result='pass' then 1 else 0 end) numberofallpasses
from result
group by task_id) a
where numberoftakers=numberoffailb+numberofallpasses and numberoffailb=1
Assuming that (task_id, task_name) is a unique key of your table, you can indeed use (not) exists, along with a correlated subqueries wich ensures that other records having the same task_id did not passed.
select task_id
from result r
where
test_name = 'test_b'
and test_result = 'fail'
and not exists (
select 1
from result r1
where
r1.task_id = r.task_id
and r1.id != r.id
and r1.test_result = 'fail'
)
The left join antipattern also comes to mind:
select r.task_id
from result r
left join result r1
on r1.task_id = r.task_id
and r1.id != r.id
and r1.test_result = 'fail'
where
r.test_name = 'test_b'
and r.test_result = 'fail'
and r1.id is null
Demo on DB Fiddle - Both queries return:
| task_id |
| :------ |
| 1 |

SQL select statement that returns records where a set of events have occurred more than once on the same ID

Say I have a table:
CREATE TABLE births
(
childid INT,
momid INT,
eclampsia VARCHAR(1),
preeclampsia VARCHAR(1),
hypertension VARCHAR(1)
);
Insert records:
INSERT INTO BIRTHS (CHILDID, MOMID, ECLAMPSIA)
VALUES (654321, 123456, 'Y'),
(654321, 123456, 'Y'),
INSERT INTO BIRTHS (CHILDID, MOMID, HYPERTENSION)
VALUES (987652, 465468, 'Y'),
(987987, 465468, 'Y')
INSERT INTO BIRTHS (CHILDID, MOMID)
VALUES (687765, 465468)
INSERT INTO BIRTHS (CHILDID, MOMID, PREECLAMPSIA)
VALUES (649870, 846587, 'Y')
INSERT INTO BIRTHS (CHILDID, MOMID)
VALUES (787463, 846587);
I want to return records for all mothers who have had more than one child and have had one of these three diagnoses in more than one pregnancy.
My expected results are:
child momid eclampsia preeclampsia hypertension
-------------------------------------------------------------
654321 123456 Y
431265 123456 Y
987652 465468 Y
987987 465468 Y
How would I do write this?
I have a sloppy query that does not quite do what I want. It works to some degree, but still gives me records where the momid has had a diagnosis only for one pregnancy.
select distinct
a.*, b.eclampsia, b.preeclampsia, b.hypertension
from
births a
join
births b on a.momid = b.momid
where
a.childid != b.childid
and a.eclampsia = 'y'
and (b.eclampsia = 'y' or b.preeclampsia = 'y' or b.hypertension = 'y')
or a.preeclampsia = 'y'
and (b.preeclampsia = 'y' or b.eclampsia = 'Y' or b.hypertension = 'y')
or a.hypertension = 'y'
and (b.hypertension = 'y' or b.eclampsia = 'y' or b.preeclampsia = 'y')
order by
mapersonid
I would solve your problem with this query:
SELECT * FROM births
WHERE momid IN(
SELECT momid FROM births GROUP BY momid
HAVING COUNT(1) >1 AND
SUM(CASE WHEN eclampsia = 'Y' THEN 1 WHEN preeclampsia = 'Y' THEN 1 WHEN hypertension = 'Y' THEN 1 ELSE 0 END) > 1)
AND (eclampsia = 'Y' OR preeclampsia = 'Y' OR hypertension = 'Y')
Basicly, you filter the momids via grouping and formulate your conditions within the HAVING clause and then using this list of momids to build your desired output.
This is one way of doing it. It counts records in the births table which display one of the symptoms for each mother, using that count > 1 as a condition to display the record, as long as the record also shows one of the conditions:
SELECT childid, momid,
COALESCE(eclampsia, '') AS eclampsia,
COALESCE(preeclampsia, '') AS preeclampsia,
COALESCE(hypertension, '') AS hypertension
FROM births b1
WHERE (SELECT COUNT(*) FROM births b2 WHERE b2.momid = b1.momid AND
(ECLAMPSIA = 'Y' OR PREECLAMPSIA = 'Y' OR HYPERTENSION = 'Y')
GROUP BY momid) > 1 AND
(ECLAMPSIA = 'Y' OR PREECLAMPSIA = 'Y' OR HYPERTENSION = 'Y')
Output
child momid eclampsia preeclampsia hypertension
654321 123456 Y
431265 123456 Y
987652 465468 Y
987987 465468 Y
First get the total complications for each mom using CTE and CASEexpression , then join the CTE with Births table on Momid, then filter the moms who have more than one complication. Something like below -
;WITH BirthCTE as(
Select momid,
SUM(CASE WHEN ECLAMPSIA = 'Y' OR PREECLAMPSIA = 'Y' OR HYPERTENSION = 'Y' THEN 1 ELSE 0 END) As TotalComl
FROM births
GROUP BY momid
)
select b.* from births b
inner join BirthCTE cte on b.momid = cte.momid
Where TotalComl > 1 -- More than one complication
and (ECLAMPSIA = 'Y' OR PREECLAMPSIA = 'Y' OR HYPERTENSION = 'Y') -- atleast one complication
This is the wrong data structure. You want a table of births with no complications. Then you want a table birthComplications with one row per complication, if any.
You can restructure the data on the fly. And then aggregation:
select b.momid
from births b outer apply
(select v.complication
from (values ('eclampsia', b.eclampsia), ('hypertension', b.hypertension), ('preeclampsia', b.preeclampsia)
) v(complication, flag)
where flag = 'y'
)
group by b.momid
having count(*) > 1 and -- more than one pregnancy
count(distinct case when v.complication is not null then b.childid end) > 1;
Actually, you can simplify the logic to moms who have had complications in more than one pregnancy. This looks like:
select b.momid
from births b apply -- only keep pregnancies with complications
(select v.complication
from (values ('eclampsia', b.eclampsia), ('hypertension', b.hypertension), ('preeclampsia', b.preeclampsia)
) v(complication, flag)
where flag = 'y'
)
group by b.momid
having count(distinct b.childid) > 1;
try this
select momid, count(*) as "children",
count(eclampsia) as "eclampsia",
count(preeclampsia) as "preeclampsia",
count(hypertension) as "hypertension"
from births
group by momid
having count(*) > 1 and
(
count(eclampsia) > 1 or
count(preeclampsia) > 1 or
count(hypertension) > 1
);
you will get something like:

SQL - most efficient way to find if a pair of row does NOT exist

I can't seem to find a similar situation to mine online. I have a table for 'orders' called Order, and a table for details on those orders, called 'order detail'. The definition of a certain type of order is if it has 1 of two pairs of order details (Value-Unit pairs). So, my order detail table might look like this:
order_id | detail
---------|-------
1 | X
1 | Y
1 | Z
2 | X
2 | Z
2 | B
3 | A
3 | Z
3 | B
The two pairs that go together are (X & Y) and (A & B). What is an efficient way of retrieving only those order_ids that DO NOT contain either one of these pairs? e.g. For the above table, I need to receive only the order_id 2.
The only solution I can come up with is essentially to use two queries and perform a self join:
select distinct o.order_id
from orders o
where o.order_id not in (
select distinct order_id
from order_detail od1 where od1.detail=X
join order_detail od2 on od2.order_id = od1.order_id and od2.detail=Y
)
and o.order_id not in (
select distinct order_id
from order_detail od1 where od1.detail=A
join order_detail od2 on od2.order_id = od1.order_id and od2.detail=B
)
The problem is that performance is an issue, my order_detail table is HUGE, and I am quite inexperienced in query languages. Is there a faster way to do this with a lower cardinality? I also have zero control over the schema of the tables, so I can't change anything there.
First and foremost I'd like to emphasise that finding the most efficient query is a combination of a good query and a good index. Far too often I see questions here where people look for magic to happen in only one or the other.
E.g. Of a variety of solutions, yours is the slowest (after fixing syntax errors) when there are no indexes, but is quite a bit better with an index on (detail, order_id)
Please also note that you have the actual data and table structures. You'll need to experiment with various combinations of queries and indexes to find what works best; not least because you haven't indicated what platform you're using and results are likely to vary between platforms.
[/ranf-off]
Query
Without further ado, Gordon Linoff has provided some good suggestions. There's another option likely to offer similar performance. You said you can't control the schema; but you can use a sub-query to transform the data into a 'friendlier structure'.
Specifically, if you:
pivot the data so you have a row per order_id
and columns for each detail you want to check
and the intersection is a count of how many orders have that detail...
Then your query is simply: where (x=0 or y=0) and (a=0 or b=0). The following uses SQL Server's temporary tables to demonstrate with sample data. The queries below work regardless of duplicate id, val pairs.
/*Set up sample data*/
declare #t table (
id int,
val char(1)
)
insert #t(id, val)
values (1, 'x'), (1, 'y'), (1, 'z'),
(2, 'x'), (2, 'z'), (2, 'b'),
(3, 'a'), (3, 'z'), (3, 'b')
/*Option 1 manual pivoting*/
select t.id
from (
select o.id,
sum(case when o.val = 'a' then 1 else 0 end) as a,
sum(case when o.val = 'b' then 1 else 0 end) as b,
sum(case when o.val = 'x' then 1 else 0 end) as x,
sum(case when o.val = 'y' then 1 else 0 end) as y
from #t o
group by o.id
) t
where (x = 0 or y = 0) and (a = 0 or b = 0)
/*Option 2 using Sql Server PIVOT feature*/
select t.id
from (
select id ,[a],[b],[x],[y]
from (select id, val from #t) src
pivot (count(val) for val in ([a],[b],[x],[y])) pvt
) t
where (x = 0 or y = 0) and (a = 0 or b = 0)
It's interesting to note that the query plans for options 1 and 2 above are slightly different. This suggests the possibility of different performance characteristics over large data sets.
Indexes
Note that the above will likely process the whole table. So there is little to be gained from indexes. However, if the table has "long rows", an index on only the 2 columns you're working with means that less data needs to be read from disk.
The query structure you provided is likely to benefit from an indexes such as (detail, order_id). This is because the server can more efficiently check the NOT IN sub-query conditions. How beneficial will depend on the distribution of data in your table.
As a side note I tested various query options including a fixed version of yours and Gordon's. (Only a small data size though.)
Without the above index, your query was slowest in the batch.
With the above index, Gordon's second query was slowest.
Alternative Queries
Your query (fixed):
select distinct o.id
from #t o
where o.id not in (
select od1.id
from #t od1
inner join #t od2 on
od2.id = od1.id
and od2.val='Y'
where od1.val= 'X'
)
and o.id not in (
select od1.id
from #t od1
inner join #t od2 on
od2.id = od1.id
and od2.val='a'
where od1.val= 'b'
)
Mixture between Gordon's first and second query. Fixes the duplicate issue in the first and the performance in the second:
select id
from #t od
group by id
having ( sum(case when val in ('X') then 1 else 0 end) = 0
or sum(case when val in ('Y') then 1 else 0 end) = 0
)
and( sum(case when val in ('A') then 1 else 0 end) = 0
or sum(case when val in ('B') then 1 else 0 end) = 0
)
Using INTERSECT and EXCEPT:
select id
from #t
except
(
select id
from #t
where val = 'a'
intersect
select id
from #t
where val = 'b'
)
except
(
select id
from #t
where val = 'x'
intersect
select id
from #t
where val = 'y'
)
I would use aggregation and having:
select order_id
from order_detail od
group by order_id
having sum(case when detail in ('X', 'Y') then 1 else 0 end) < 2 and
sum(case when detail in ('A', 'B') then 1 else 0 end) < 2;
This assumes that orders do not have duplicate rows with the same detail. If that is possible:
select order_id
from order_detail od
group by order_id
having count(distinct case when detail in ('X', 'Y') then detail end) < 2 and
count(distinct case when detail in ('A', 'B') then detail end) < 2;

SQL query to get count based on filtered status

I have a table which has two columns, CustomerId & Status (A, B, C).
A customer can have multiple status in different rows.
I need to get the count of different status based on following rules:
If the status of a customer is A & B, he should be counted in Status A.
If status is both B & C, it should be counted in Status B.
If status is all three, it will fall in status A.
What I need is a table with status and count.
Could please someone help?
I know that someone would ask me to write my query first, but i couldn't understand how to implement this logic in query.
You could play with different variations of this:
select customerId,
case when HasA+HasB+HasC = 3 then 'A'
when HasA+HasB = 2 then 'A'
when HasB+HasC = 2 then 'B'
when HasA+HasC = 2 then 'A'
when HasA is null and HasB is null and HasC is not null then 'C'
when HasB is null and HasC is null and HasA is not null then 'A'
when HasC is null and HasA is null and HasB is not null then 'B'
end as overallStatus
from
(
select customerId,
max(case when Status = 'A' then 1 end) HasA,
max(case when Status = 'B' then 1 end) HasB,
max(case when Status = 'C' then 1 end) HasC
from tableName
group by customerId
) as t;
I like to use Cross Apply for this type of query as it allows for use of the calculated status in the Group By clause.
Here's my solution with some sample data.
Declare #Table Table (Customerid int, Stat varchar(1))
INSERT INTO #Table (Customerid, Stat )
VALUES
(1, 'a'),
(1 , 'b'),
(2, 'b'),
(2 , 'c'),
(3, 'a'),
(3 , 'b'),
(3, 'c')
SELECT
ca.StatusGroup
, COUNT(DISTINCT Customerid) as Total
FROM
#Table t
CROSS APPLY
(VALUES
(
CASE WHEN
EXISTS
(SELECT 1 FROM #Table x where x.Customerid = t.CustomerID and x.Stat = 'a')
AND EXISTS
(SELECT 1 FROM #Table x where x.Customerid = t.CustomerID and x.Stat = 'b')
THEN 'A'
WHEN
EXISTS
(SELECT 1 FROM #Table x where x.Customerid = t.CustomerID and x.Stat = 'b')
AND EXISTS
(SELECT 1 FROM #Table x where x.Customerid = t.CustomerID and x.Stat = 'c')
THEN 'B'
ELSE t.stat
END
)
) ca (StatusGroup)
GROUP BY ca.StatusGroup
I edited this to deal with Customers who only have one status... in which case it will return A, B or C dependant on the customers status

SQL statement for maximum common element in a set

I have a table like
id contact value
1 A 2
2 A 3
3 B 2
4 B 3
5 B 4
6 C 2
Now I would like to get the common maximum value for a given set of contacts.
For example:
if my contact set was {A,B} it would return 3;
for the set {A,C} it would return 2
for the set {B} it would return 4
What SQL statement(s) can do this?
Try this:
SELECT value, count(distinct contact) as cnt
FROM my_table
WHERE contact IN ('A', 'C')
GROUP BY value
HAVING cnt = 2
ORDER BY value DESC
LIMIT 1
This is MySQL syntax, may differ for your database. The number (2) in HAVING clause is the number of elements in set.
SELECT max(value) FROM table WHERE contact IN ('A', 'C')
Edit: max common
declare #contacts table ( contact nchar(10) )
insert into #contacts values ('a')
insert into #contacts values ('b')
select MAX(value)
from MyTable
where (select COUNT(*) from #contacts) =
(select COUNT(*)
from MyTable t
join #contacts c on c.contact = t.contact
where t.value = MyTable.value)
Most will tell you to use:
SELECT MAX(t.value)
FROM TABLE t
WHERE t.contact IN ('A', 'C')
GROUP BY t.value
HAVING COUNT(DISTINCT t.*) = 2
Couple of caveats:
The DISTINCT is key, otherwise you could have two rows of t.contact = 'A'.
The number of COUNT(DISTINCT t.*) has to equal the number of values specified in the IN clause
My preference is to use JOINs:
SELECT MAX(t.value)
FROM TABLE t
JOIN TABLE t2 ON t2.value = t.value AND t2.contact = 'C'
WHERE t.contact = 'A'
The downside to this is that you have to do a self join (join to the same table) for every criteria (contact value in this case).