problem with Update in MySQL - sql

According to the documentation, joins, when used with the update statement, work in the same way as when used in selects.
For example, if we have these two tables:
mysql> SELECT * FROM orders;
+---------+------------+
| orderid | customerid |
+---------+------------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 1 |
+---------+------------+
mysql> SELECT * FROM customers;
+------------+------------+
| customerid | ordercount |
+------------+------------+
| 1 | 9 |
| 2 | 3 |
| 3 | 8 |
| 4 | 5 |
| 5 | 7 |
+------------+------------+
using this select statements:
SELECT orders.customerid
FROM orders
JOIN customers ON (customers.customerid = orders.customerid)
returns:
+------------+
| customerid |
+------------+
| 1 |
| 1 |
| 2 |
| 3 |
+------------+
So, I was expecting the statement below:
UPDATE orders
JOIN customers ON (customers.customerid = orders.customerid)
SET ordercount = ordercount + 1
to update ordercount for customer #1 (customerid = 1) to be 11, but actually this is not the case, here are the results after the update:
mysql> SELECT * FROM customers;
+------------+------------+
| customerid | ordercount |
+------------+------------+
| 1 | 10 |
| 2 | 4 |
| 3 | 9 |
| 4 | 5 |
| 5 | 7 |
+------------+------------+
As you can see it was only incremented once despite that it occurs twice in the orders table and despite that the select statement returns it correctly.
Is this a bug in MySQL or is it me doing something wrong? I'm trying to avoid using group by for performance reasons hence my interest to understand what's going on.
Thanks in advance

Yes, MySQL updates each record in a joined table at most once.
I cannot find it in the documentation, but practice says so.
I'll probably post it as a bug, so they at least add it to documentation:
CREATE TABLE updater (value INT NOT NULL);
INSERT
INTO updater
VALUES (1);
SELECT *
FROM updater;
value
---
1
UPDATE updater u
JOIN (
SELECT 1 AS newval
UNION ALL
SELECT 2
) q
SET u.value = u.value + newval;
SELECT *
FROM updater;
value
---
2
(expected 4).
SQL Server, by the way, behaves same in a multiple table UPDATE.
You can use:
UPDATE orders o
SET ordercount = ordercount +
(
SELECT COUNT(*)
FROM customers c
WHERE c.customerid = o.customerid
)
which is same on performance as long as you have an index on customers (customer_id)

Related

Getting a distinct value from one column if all rows matches a certain criteria

I'm trying to find a performant and easy-to-read query to get a distinct value from one column, if all rows in the table matches a certain criteria.
I have a table that tracks e-commerce orders and whether they're delivered on time, contents and schema as following:
> select * from orders;
+----+--------------------+-------------+
| id | delivered_on_time | customer_id |
+----+--------------------+-------------+
| 1 | 1 | 9 |
| 2 | 0 | 9 |
| 3 | 1 | 10 |
| 4 | 1 | 10 |
| 5 | 0 | 11 |
+----+--------------------+-------------+
I would like to get all distinct customer_id's which have had all their orders delivered on time. I.e. I would like an output like this:
+-------------+
| customer_id |
+-------------+
| 10 |
+-------------+
What's the best way to do this?
I've found a solution, but it's a bit hard to read and I doubt it's the most efficient way to do it (using double CTE's):
> with hits_all as (
select memberid,count(*) as count from orders group by memberid
),
hits_true as
(select memberid,count(*) as count from orders where hit = true group by memberid)
select
*
from
hits_true
inner join
hits_all on
hits_all.memberid = hits_true.memberid
and hits_all.count = hits_true.count;
+----------+-------+----------+-------+
| memberid | count | memberid | count |
+----------+-------+----------+-------+
| 10 | 2 | 10 | 2 |
+----------+-------+----------+-------+
You use group by and having as follows:
select customer_id
from orders
group by customer_id
having sum(delivered_on_time) = count(*)
This works because an ontime delivery is identified by delivered_on_time = 1. So you can just ensure that the sum of delivered_on_time is equal to the number of records for the customer.
You can use aggregation and having:
select customer_id
from orders
group by customer_id
having min(delivered_on_time) = max(delivered_on_time);

Is it possible to select subqueries

Consider the query below:
SELECT DISTINCT ON (ser.id) *
FROM server ser
LEFT JOIN subscription sub ON ser.id = sub.server_id
WHERE (
COUNT(SELECT err.id FROM error err WHERE ser.id = err.id) > 0
OR SUM(SELECT pay.amount FROM payment pay WHERE ser.id = pay.id) > 0
);
Here, a list of unique servers that are being subscribed to and that has errors or payments is returned.
However, instead of returning all server columns (*), I want to return the server id, the number of errors and the sum of payments. For example, the initial selection should look like this:
SELECT DISTINCT ON (ser.id) ser.id, countErrors, sumPayments
Selecting ser.id is straight forward, but how can countErrors and sumPayments be selected from the aggregate functions "count" and "sum" (considering that they are conditions in a WHERE clause)?
I imagined the "where" conditions would look something like this:
COUNT(SELECT err.id FROM error err WHERE ser.id = err.id) AS countErrors > 0
OR SUM(SELECT pay.amount FROM payment pay WHERE ser.id = pay.id) AS sumPayments > 0
Is it possible to do this? If so, how can it be achieved?
Test data is shown below:
server
+----+
| id |
+----+
| 1 |
+----+
| 2 |
+----+
| 3 |
+----+
| 4 |
+----+
subscription
+----+-----------+
| id | server_id |
+----+-----------+
| 1 | 1 |
+----+-----------+
| 2 | 2 |
+----+-----------+
| 3 | 2 |
+----+-----------+
| 4 | 3 |
+----+-----------+
| 5 | 3 |
+----+-----------+
error
+----+-----------+
| id | server_id |
+----+-----------+
| 1 | 1 |
+----+-----------+
| 3 | 4 |
+----+-----------+
payment
+----+-----------+--------+
| id | server_id | amount |
+----+-----------+--------+
| 1 | 1 | 200 |
+----+-----------+--------+
| 2 | 2 | 200 |
+----+-----------+--------+
| 3 | 2 | 100 |
+----+-----------+--------+
Wanted result from test data:
+-----------+-------------+-------------+
| server_id | countErrors | sumPayments |
+-----------+-------------+-------------+
| 1 | 1 | 200 |
+-----------+-------------+-------------+
| 2 | 0 | 300 |
+-----------+-------------+-------------+
Server#4 has no subscription, so it should be left out.
Server#3 has a subscription, but no errors or payments, so should be left out.
Server#1 and server#2 both have subscription and payments and/or errors.
Unless I'm missing something, I would just write your query as follows. Perform the aggregation of errors and payments in separate bona fide subqueries, and join to them. Also, there is a join to the subscription table, but this only exists to filter off servers having no subscription. Finally, the WHERE clause removes any servers which do not either have some errors or payments.
SELECT
s.id AS server_id,
COALESCE(e.countErrors, 0) AS countErrors,
COALESCE(p.sumPayments, 0) AS sumPayments
FROM server s
INNER JOIN
(
SELECT DISTINCT server_id
FROM subscription
) su
ON s.id = su.server_id
LEFT JOIN
(
SELECT server_id, COUNT(*) AS countErrors
FROM error
GROUP BY server_id
) e
ON s.id = e.server_id
LEFT JOIN
(
SELECT server_id, SUM(amount) AS sumPayments
FROM payment
GROUP BY server_id
) p
ON s.id = p.server_id
WHERE
p.sumPayments > 0 OR
e.countErrors > 0
ORDER BY
s.id;
Demo
The mistake here is to put the COUNT outside of the SELECT, it needs to go inside:
(SELECT COUNT(err.id) FROM error err WHERE ser.id = err.id) > 0
OR (SELECT SUM(pay.amount) FROM payment pay WHERE ser.id = pay.id) > 0

how to bake in a record count in a sql query

I have a query that looks like this:
select id, extension, count(distinct(id)) from publicids group by id,extension;
This is what the results looks like:
id | extension | count
-------------+-------------------------+-------
18459154909 | 12333 | 1
18459154909 | 9891114 | 1
18459154919 | 43244 | 1
18459154919 | 8776232 | 1
18766145025 | 12311 | 1
18766145025 | 1122111 | 1
18766145201 | 12422 | 1
18766145201 | 14141 | 1
But what I really want is for the results to look like this:
id | extension | count
-------------+-------------------------+-------
18459154909 | 12333 | 2
18459154909 | 9891114 | 2
18459154919 | 43244 | 2
18459154919 | 8776232 | 2
18766145025 | 12311 | 2
18766145025 | 1122111 | 2
18766145201 | 12422 | 2
18766145201 | 14141 | 2
I'm trying to get the count field to show the total number of records that have the same id.
Any suggestions would be appreciated
I think you want to count distincts extentions, not ids.
Run this query:
select id
, extension
(select count(*) from publicids p1 where p.id = p1.id ) distinct_id_count
from publicids p
group by id,extension;
This is more or less the same as Pastor's answer. Depending on what the optimizer does it might be faster with higher record count source tables.
select p.id, p.extension, p2.id_count
from publicids p
inner join (
select id, count(*) as id_count
from publicids group by id
) as p2 on p.id = p2.id

Doing a market basket analysis on the order details

I have a table that looks (abbreviated) like:
| order_id | item_id | amount | qty | date |
|---------- |--------- |-------- |----- |------------ |
| 1 | 1 | 10 | 1 | 10-10-2014 |
| 1 | 2 | 20 | 2 | 10-10-2014 |
| 2 | 1 | 10 | 1 | 10-12-2014 |
| 2 | 2 | 20 | 1 | 10-12-2014 |
| 2 | 3 | 45 | 1 | 10-12-2014 |
| 3 | 1 | 10 | 1 | 9-9-2014 |
| 3 | 3 | 45 | 1 | 9-9-2014 |
| 4 | 2 | 20 | 1 | 11-11-2014 |
I would like to run a query that would calculate the list of items
that most frequently occur together.
In this case the result would be:
|items|frequency|
|-----|---------|
|1,2, |2 |
|1,3 |1 |
|2,3 |1 |
|2 |1 |
Ideally, first presenting orders with more than one items, then presenting
the most frequently ordered single items.
Could anyone please provide an example for how to structure this SQL?
This query generate all of the requested output, in the cases where 2 items occur together. It doesn't include the last item of the requested output since a single value (2) technically doesn't occur together with anything... although you could easily add a UNION query to include values that happen alone.
This is written for PostgreSQL 9.3
create table orders(
order_id int,
item_id int,
amount int,
qty int,
date timestamp
);
INSERT INTO ORDERS VALUES(1,1,10,1,'10-10-2014');
INSERT INTO ORDERS VALUES(1,2,20,1,'10-10-2014');
INSERT INTO ORDERS VALUES(2,1,10,1,'10-12-2014');
INSERT INTO ORDERS VALUES(2,2,20,1,'10-12-2014');
INSERT INTO ORDERS VALUES(2,3,45,1,'10-12-2014');
INSERT INTO ORDERS VALUES(3,1,10,1,'9-9-2014');
INSERT INTO ORDERS VALUES(3,3,45,1,'9-9-2014');
INSERT INTO ORDERS VALUES(4,2,10,1,'11-11-2014');
with order_pairs as (
select (pg1.item_id, pg2.item_id) as items, pg1.date
from
(select distinct item_id, date
from orders) as pg1
join
(select distinct item_id, date
from orders) as pg2
ON
(
pg1.date = pg2.date AND
pg1.item_id != pg2.item_id AND
pg1.item_id < pg2.item_id
)
)
SELECT items, count(*) as frequency
FROM order_pairs
GROUP by items
ORDER by items;
output
items | frequency
-------+-----------
(1,2) | 2
(1,3) | 2
(2,3) | 1
(3 rows)
Market Basket Analysis with Join.
Join on order_id and compare if item_id < self.item_id. So for every item_id you get its associated items sold. And then group by items and count the number of rows for each combinations.
select items,count(*) as 'Freq' from
(select concat(x.item_id,',',y.item_id) as items from orders x
JOIN orders y ON x.order_id = y.order_id and
x.item_id != y.item_id and x.item_id < y.item_id) A
group by A.items order by A.items;

PostgreSQL select all from one table and join count from table relation

I have two tables, post_categories and posts. I'm trying to select * from post_categories;, but also return a temporary column with the count for each time a post category is used on a post.
Posts
| id | name | post_category_id |
| 1 | test | 1 |
| 2 | nest | 1 |
| 3 | vest | 2 |
| 4 | zest | 3 |
Post Categories
| id | name |
| 1 | cat_1 |
| 2 | cat_2 |
| 3 | cat_3 |
Basically, I'm trying to do this without subqueries and with joins instead. Something like this, but in real psql.
select * from post_categories some-type-of-join posts, count(*)
Resulting in this, ideally.
| id | name | count |
| 1 | cat_1 | 2 |
| 2 | cat_2 | 1 |
| 3 | cat_3 | 1 |
Your help is greatly appreciated :D
You can use a derived table that contains the counts per post_category_id and left join it to the post_categories table
select p.*, coalesce(t1.p_count,0)
from post_categories p
left join (
select post_category_id, count(*) p_count
from posts
group by post_category_id
) t1 on t1.post_category_id = p.id
select post_categories.id, post_categories.name , count(posts.id)
from post_categories
inner join posts
on post_category_id = post_categories.id
group by post_categories.id, post_categories.name