Calculating Win-rates - sql

I have to calculate win-rates of the players from a table (bots_match_history) which has data in the format:
id | username | sub_level_id | bot_type | match_result | system_win_balance | created_at | analyzed | stakes
------+-----------------+--------------+----------+--------------+--------------------+----------------------------+----------+--------
5487 | ashishish | 5 | hard | l | -831 | 2017-11-29 06:26:13.288267 | f | 18
5486 | dilip.kumar | 3 | hard | l | -821 | 2017-11-29 06:25:09.106075 | f | 50
5485 | abhinav.garg | 5 | hard | w | -791 | 2017-11-29 06:24:07.589281 | f | 18
I need to use only those entries which haven't been analyzed yet (analyzed=false) and which has more than 3 entries for a particular level.
This is the query that I had written, somehow for some entries it is returning a win-rate of > 100%.
WITH total AS (
SELECT COUNT(b.match_result) AS total_matches, b.bot_type, sl.level_id, b.stakes
FROM bots_match_history b
JOIN sub_levels sl ON b.sub_level_id = sl.id
WHERE b.analyzed=FALSE
GROUP BY b.bot_type, sl.level_id, b.stakes
HAVING COUNT(b.match_result) >=3
)
SELECT total.bot_type, total.level_id, total.stakes, round(cast(((
SELECT COUNT(b2.*)
FROM bots_match_history b2
JOIN sub_levels sl2 ON b2.sub_level_id = sl2.id
WHERE b2.match_result='w' AND b2.analyzed=FALSE
AND b2.bot_type = total.bot_type AND sl2.level_id = total.level_id
)::FLOAT * 100.0 / total.total_matches) AS NUMERIC), 2)::FLOAT AS win_percentage
FROM total, bots_match_history b3
JOIN sub_levels sl3 ON sl3.id = b3.sub_level_id
WHERE b3.bot_type = total.bot_type AND sl3.level_id=total.level_id
GROUP BY total.bot_type, total.level_id, total.stakes, total.total_matches;
What is wrong in this query that it is returning a win-rate of more than 100%?

You are grouping your total by stakes, but do not JOIN by it later, so one row from total (limited to certain stakes) can be matched with more rows than calculating total did (depends on how many of those are won).
I have made fiddle, excluding sub_levels table and substituting it with sub_level_id instead for grouping and joins: http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=b3558b9d3cddc63a47d3f29a4a6c08f6
Fixed SQL should look like this:
WITH total AS (
SELECT COUNT(b.match_result) AS total_matches, b.bot_type, sl.level_id, b.stakes
FROM bots_match_history b
JOIN sub_levels sl ON b.sub_level_id = sl.id
WHERE b.analyzed=FALSE
GROUP BY b.bot_type, sl.level_id, b.stakes
HAVING COUNT(b.match_result) >=3
)
SELECT total.bot_type, total.level_id, total.stakes, round(cast(((
SELECT COUNT(b2.*)
FROM bots_match_history b2
JOIN sub_levels sl2 ON b2.sub_level_id = sl2.id
WHERE b2.match_result='w' AND b2.analyzed=FALSE
AND b2.bot_type = total.bot_type AND sl2.level_id = total.level_id
AND b2.stakes = total.stakes
)::FLOAT * 100.0 / total.total_matches) AS NUMERIC), 2)::FLOAT AS win_percentage
FROM total, bots_match_history b3
JOIN sub_levels sl3 ON sl3.id = b3.sub_level_id
WHERE b3.bot_type = total.bot_type AND sl3.level_id=total.level_id
AND b3.stakes=total.stakes
GROUP BY total.bot_type, total.level_id, total.stakes, total.total_matches;
Also, you might get (IMO) incorrect results if you limit yourself to analyzed rows only. I saw 40% winrate for one set of parameters, then checked actual rows and there was actually 3 won and 3 lost games with left me puzzled for a while only to notice that one of those won games was "analyzed" already.

Related

SQL union / join / intersect multiple select statements

I have two select statements. One gets a list (if any) of logged voltage data in the past 60 seconds and related chamber names, and one gets a list (if any) of logged arc event data in the past 5 minutes. I am trying to append the arc count data as new columns to the voltage data table. I cannot figure out how to do this.
Note that, there may or may not be arc count rows, for a given chamber name that is in the voltage data table. If there are no rows, I want to set the arc count column value to zero.
Any ideas on how to accomplish this?
Voltage Data:
SELECT DISTINCT dbo.CoatingChambers.Name,
AVG(dbo.CoatingGridVoltage_Data.ChanA_DCVolts) AS ChanADC,
AVG(dbo.CoatingGridVoltage_Data.ChanB_DCVolts) AS ChanBDC,
AVG(dbo.CoatingGridVoltage_Data.ChanA_RFVolts) AS ChanARF,
AVG(dbo.CoatingGridVoltage_Data.ChanB_RFVolts) AS ChanBRF FROM
dbo.CoatingGridVoltage_Data LEFT OUTER JOIN dbo.CoatingChambers ON
dbo.CoatingGridVoltage_Data.CoatingChambersID =
dbo.CoatingChambers.CoatingChambersID WHERE
(dbo.CoatingGridVoltage_Data.DT > DATEADD(second, - 60,
SYSUTCDATETIME())) GROUP BY dbo.CoatingChambers.Name
Returns
Name | ChanADC | ChanBDC | ChanARF | ChanBRF
-----+-------------------+--------------------+---------------------+------------------
OX2 | 2.9099999666214 | -0.485000004371007 | 0.344801843166351 | 0.49748428662618
S2 | 0.100000001490116 | -0.800000016887983 | 0.00690172302226226 | 0.700591623783112
S3 | 4.25666658083598 | 0.5 | 0.96554297208786 | 0.134956782062848
Arc count table:
SELECT CoatingChambers.Name,
SUM(ArcCount) as ArcCount
FROM CoatingChambers
LEFT JOIN CoatingArc_Data
ON dbo.[CoatingArc_Data].CoatingChambersID = dbo.CoatingChambers.CoatingChambersID
where EventDT > DATEADD(mi,-5, GETDATE())
Group by Name
Returns
Name | ArcCount
-----+---------
L1 | 283
L4 | 0
L6 | 1
S2 | 55
To be clear, I want this table (with added arc count column), given the two tables above:
Name | ChanADC | ChanBDC | ChanARF | ChanBRF | ArcCount
-----+-------------------+--------------------+---------------------+-------------------+---------
OX2 | 2.9099999666214 | -0.485000004371007 | 0.344801843166351 | 0.49748428662618 | 0
S2 | 0.100000001490116 | -0.800000016887983 | 0.00690172302226226 | 0.700591623783112 | 55
S3 | 4.25666658083598 | 0.5 | 0.96554297208786 | 0.134956782062848 | 0
You can treat the select statements as virtual tables and just join them together:
select
x.Name,
x.ChanADC,
x.ChanBDC,
x.ChanARF,
x.ChanBRF,
isnull( y.ArcCount, 0 ) ArcCount
from
(
select distinct
cc.Name,
AVG(cgv.ChanA_DCVolts) AS ChanADC,
AVG(cgv.ChanB_DCVolts) AS ChanBDC,
AVG(cgv.ChanA_RFVolts) AS ChanARF,
AVG(cgv.ChanB_RFVolts) AS ChanBRF
from
dbo.CoatingGridVoltage_Data cgv
left outer join
dbo.CoatingChambers cc
on
cgv.CoatingChambersID = cc.CoatingChambersID
where
cgv.DT > dateadd(second, - 60, sysutcdatetime())
group by
cc.Name
) as x
left outer join
(
select
cc.Name,
sum(ac.ArcCount) as ArcCount
from
dbo.CoatingChambers cc
left outer join
dbo.CoatingArc_Data ac
on
ac.CoatingChambersID = cc.CoatingChambersID
where
EventDT > dateadd(mi,-5, getdate())
group by
Name
) as y
on
x.Name = y.Name
Also, it's worthwhile to simplify your names with aliases and format the queries for readability...which I shamelessly took a stab at.

SQL select only highest date

For a project I want to generate a price list.
I want to get only the latest prices from each supplier for each article.
There are just those two tables.
Table articles
ARTNR | TXT | ACTIVE | SUPPLIER
------------------------------------------
10 | APPLE | Y | 10
20 | ORANGE | Y | 10
30 | KEYBOARD | N | 20
40 | ORANGE | Y | 20
50 | BANANA | Y | 10
60 | CHERRY | Y | 10
Table prices
ARTNR | PRCGRP | PRCDAT | PRICE
--------------------------------------
10 | 10 | 01-Aug-10 | 2.1
10 | 10 | 05-Aug-11 | 2.2
10 | 10 | 21-Aug-12 | 2.5
20 | 0 | 01-Aug-10 | 2.1
20 | 10 | 09-Aug-12 | 2.3
10 | 10 | 14-Aug-13 | 2.7
This is what I have so far:
SELECT
ARTICLES.[ARTNR], ARTICLES.[TXT], ARTICLES.[ACTIVE], ARTICLES.[SUPPLIER], PRICES.PRCGRP, PRICES.PRCDAT, PRICES.PRICE
FROM
ARTICLES INNER JOIN PRICES ON ARTICLES.ARTNR = PRICES.ARTNR
WHERE
(
(ARTICLES.[ACTIVE]="Y") AND
(ARTICLES.[SUPPLIER]=10) AND
(PRICES.PRCGRP=0) AND
(PRICES.PRCDAT=(SELECT MAX(PRCDAT) FROM PRICES as art WHERE art.ARTNR = PRICES.artnr) )
)
ORDER BY ARTICLES.ARTNR
;
It is okay to choose just one supplier each time, but I want the max price.
The problem is:
Lots of articles do not show up with the query above,
but I cannot figure out what is wrong.
I can see that they should be in the resultset when I leave out the subselect on max prcdat.
What is wrong?
Your subquery to get the latest price does not take the other conditions into account, that is when you're getting the latest price, you may get a price in another price group or that is not active. When you join that against the filtered list that has no inactive prices and only prices in a single price group, you get no hits that exist in both.
Either you need to duplicate or - better - move your conditions inside the subquery to get the best price under the conditions. I can't test against access, but something like this should be possible if the SQL is not too limited;
SELECT a.artnr, a.txt, a.active, a.supplier, p.prcgrp, p.prcdat, p.price
FROM articles a INNER JOIN prices p ON a.ARTNR = p.ARTNR
JOIN (
SELECT a.artnr, MAX(p.prcdat) prcdat
FROM articles a JOIN prices p ON a.artnr = p.artnr
WHERE a.active='Y' AND a.supplier=10 AND p.prcgrp=10
GROUP BY a.artnr) z
ON a.artnr = z.artnr AND p.prcdat = z.prcdat
ORDER BY a.ARTNR
If the SQL support in access won't allow a join with a subquery, you can just move the conditions inside your existing subquery, something like;
SELECT a.artnr, a.txt, a.active, a.supplier, p.prcgrp, p.prcdat, p.price
FROM articles a INNER JOIN prices p ON a.ARTNR = p.ARTNR
WHERE p.prcdat = (
SELECT MAX(p2.prcdat)
FROM articles a2 JOIN prices p2 ON a2.artnr = p2.artnr
WHERE a.artnr = a2.artnr AND a2.active='Y' AND a2.supplier=10 AND p2.prcgrp=10
)
ORDER BY a.ARTNR;
Note that due to limitations in identifying a unique price (no primary key in prices), the queries may give duplicates if several prices for the same article have the same prcdat. If that's a problem, you'll probably need to duplicate your conditions outside the subquery too.

Select multiple (non-aggregate function) columns with GROUP BY

I am trying to select the max value from one column, while grouping by another non-unique id column which has multiple duplicate values. The original database looks something like:
mukey | comppct_r | name | type
65789 | 20 | a | 7n
65789 | 15 | b | 8m
65789 | 1 | c | 1o
65790 | 10 | a | 7n
65790 | 26 | b | 8m
65790 | 5 | c | 1o
...
This works just fine using:
SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.mukey;
Which returns a table like:
mukey | ComponentPercent
65789 | 20
65790 | 26
65791 | 50
65792 | 90
I want to be able to add other columns in without affecting the GROUP BY function, to include columns like name and type into the output table like:
mukey | comppct_r | name | type
65789 | 20 | a | 7n
65790 | 26 | b | 8m
65791 | 50 | c | 7n
65792 | 90 | d | 7n
but it always outputs an error saying I need to use an aggregate function with select statement. How should I go about doing this?
You have yourself a greatest-n-per-group problem. This is one of the possible solutions:
select c.mukey, c.comppct_r, c.name, c.type
from c yt
inner join(
select c.mukey, max(c.comppct_r) comppct_r
from c
group by c.mukey
) ss on c.mukey = ss.mukey and c.comppct_r= ss.comppct_r
Another possible approach, same output:
select c1.*
from c c1
left outer join c c2
on (c1.mukey = c2.mukey and c1.comppct_r < c2.comppct_r)
where c2.mukey is null;
There's a comprehensive and explanatory answer on the topic here: SQL Select only rows with Max Value on a Column
Any non-aggregate column should be there in Group By clause .. why??
t1
x1 y1 z1
1 2 5
2 2 7
Now you are trying to write a query like:
select x1,y1,max(z1) from t1 group by y1;
Now this query will result only one row, but what should be the value of x1?? This is basically an undefined behaviour. To overcome this, SQL will error out this query.
Now, coming to the point, you can either chose aggregate function for x1 or you can add x1 to group by. Note that this all depends on your requirement.
If you want all rows with aggregation on z1 grouping by y1, you may use SubQ approach.
Select x1,y1,(select max(z1) from t1 where tt.y1=y1 group by y1)
from t1 tt;
This will produce a result like:
t1
x1 y1 max(z1)
1 2 7
2 2 7
Try using a virtual table as follows:
SELECT vt.*,c.name FROM(
SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.muke;
) as VT, c
WHERE VT.mukey = c.mukey
You can't just add additional columns without adding them to the GROUP BY or applying an aggregate function. The reason for that is, that the values of a column can be different inside one group. For example, you could have two rows:
mukey | comppct_r | name | type
65789 | 20 | a | 7n
65789 | 20 | b | 9f
How should the aggregated group look like for the columns name and type?
If name and type is always the same inside a group, just add it to the GROUP BY clause:
SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.muke, c.name, c.type;
Use a 'Having' clause
SELECT *
FROM c
GROUP BY c.mukey
HAVING c.comppct_r = Max(c.comppct_r);

join on three tables? Error in phpMyAdmin

I'm trying to use a join on three tables query I found in another post (post #5 here). When I try to use this in the SQL tab of one of my tables in phpMyAdmin, it gives me an error:
#1066 - Not unique table/alias: 'm'
The exact query I'm trying to use is:
select r.*,m.SkuAbbr, v.VoucherNbr from arrc_RedeemActivity r, arrc_Merchant m, arrc_Voucher v
LEFT OUTER JOIN arrc_Merchant m ON (r.MerchantID = m.MerchantID)
LEFT OUTER JOIN arrc_Voucher v ON (r.VoucherID = v.VoucherID)
I'm not entirely certain it will do what I need it to do or that I'm using the right kind of join (my grasp of SQL is pretty limited at this point), but I was hoping to at least see what it produced.
(What I'm trying to do, if anyone cares to assist, is get all columns from arrc_RedeemActivity, plus SkuAbbr from arrc_Merchant where the merchant IDs match in those two tables, plus VoucherNbr from arrc_Voucher where VoucherIDs match in those two tables.)
Edited to add table samples
Table arrc_RedeemActivity
RedeemID | VoucherID | MerchantID | RedeemAmt
----------------------------------------------
1 | 2 | 3 | 25
2 | 6 | 5 | 50
Table arrc_Merchant
MerchantID | SkuAbbr
---------------------
3 | abc
5 | def
Table arrc_Voucher
VoucherID | VoucherNbr
-----------------------
2 | 12345
6 | 23456
So ideally, what I'd like to get back would be:
RedeemID | VoucherID | MerchantID | RedeemAmt | SkuAbbr | VoucherNbr
-----------------------------------------------------------------------
1 | 2 | 3 | 25 | abc | 12345
2 | 2 | 5 | 50 | def | 23456
The problem was you had duplicate table references - which would work, except for that this included table aliasing.
If you want to only see rows where there are supporting records in both tables, use:
SELECT r.*,
m.SkuAbbr,
v.VoucherNbr
FROM arrc_RedeemActivity r
JOIN arrc_Merchant m ON m.merchantid = r.merchantid
JOIN arrc_Voucher v ON v.voucherid = r.voucherid
This will show NULL for the m and v references that don't have a match based on the JOIN criteria:
SELECT r.*,
m.SkuAbbr,
v.VoucherNbr
FROM arrc_RedeemActivity r
LEFT JOIN arrc_Merchant m ON m.merchantid = r.merchantid
LEFT JOIN arrc_Voucher v ON v.voucherid = r.voucherid

MS Access Pass Through Query find duplicates using multiple tables

I'm trying to find all coverage_set_id with more than one benefit_id attached summary_attribute (value=2004687).
The query seems to be working fine without the GROUP BY & HAVING parts, but once I add those lines in (for the COUNT) my results are incorrect. Just trying to get duplicate coverage_set_id.
Pass-Through query via OBDC database:
SELECT DISTINCT
b.coverage_set_id,
COUNT (b.coverage_set_id) AS "COUNT"
FROM
coverage_set_detail_view a
JOIN contracts_by_sub_group_view b ON b.coverage_set_id = a.coverage_set_id
JOIN request c ON c.request_id = b.request_id
WHERE
b.valid_from_date BETWEEN to_date('10/01/2010','mm/dd/yyyy')
AND to_date('12/01/2010','mm/dd/yyyy')
AND c.request_status = 1463
AND summary_attribute = 2004687
AND benefit_id <> 1092333
GROUP BY
b.coverage_set_id
HAVING
COUNT (b.coverage_set_id) > 1
My results look like this:
-----------------------
COVERAGE_SET_ID | COUNT
-----------------------
4193706 | 8
4197052 | 8
4193926 | 112
4197078 | 96
4174168 | 8
I'm expecting all the COUNTs to be 2.
::EDIT::
Solution:
SELECT
c.coverage_set_id AS "COVERAGE SET ID",
c1.description AS "Summary Attribute",
count(d.benefit_id) AS "COUNT"
FROM (
SELECT DISTINCT coverage_set_id
FROM contracts_by_sub_group_view
WHERE
valid_from_date BETWEEN '01-OCT-2010' AND '01-DEC-2010'
AND request_id IN (
SELECT request_id
FROM request
WHERE request_status = 1463)
) a
JOIN coverage_set_master e ON e.coverage_set_id = a.coverage_set_id
JOIN coverage_set_detail c ON c.coverage_set_id = a.coverage_set_id
JOIN benefit_summary d ON d.benefit_id = c.benefit_id
AND d.coverage_type = e.coverage_type
JOIN codes c1 ON c1.code_id = d.summary_attribute
WHERE
d.summary_attribute IN (2004687, 2004688)
AND summary_structure = 1000217
GROUP BY c.coverage_set_id, c1.description
HAVING COUNT(d.benefit_id) > 1
ORDER BY c.coverage_set_id, c1.description
And these were the results:
COVERAGE SET ID | SUMMARY ATTRIBUTE | COUNT
-------------------------------------------------
4174168 | INPATIENT | 2
4174172 | INPATIENT | 2
4191828 | INPATIENT | 2
4191832 | INPATIENT | 2
4191833 | INPATIENT | 2
4191834 | INPATIENT | 2
4191838 | INPATIENT | 2
4191842 | INPATIENT | 2
4191843 | INPATIENT | 2
4191843 | OUTPATIENT | 2
4191844 | INPATIENT | 2
4191844 | OUTPATIENT | 2
The coverage_set_id in both the HAVING and count part of the SELECT should be benefit_id.
Since benefit_id is also in table a you can do the following
SELECT
a.coverage_set_id,
COUNT (a.benefit_id) AS "COUNT"
FROM
coverage_set_detail_view a
WHERE
a.coverage_set_id in (
SELECT b.coverage_set_id
FROM contracts_by_sub_group_view b
WHERE b.valid_from_date BETWEEN to_date('10/01/2010','mm/dd/yyyy') AND to_date('12/01/2010','mm/dd/yyyy'))
AND a.coverage_set_id in (
SELECT b2.coverage_set_id
FROM contracts_by_sub_group_view b2
INNER JOIN request c on c.request_id=b2.request_id
WHERE c.request_status = 1463)
AND ?.summary_attribute = 2004687
AND a.benefit_id <> 1092333
GROUP BY
a.coverage_set_id
HAVING
COUNT (a.benefit_id) > 1
This removes the JOIN magnification that was occurring on the FROM since those tables are not needed to pull coverage_set_id and benefit_id. The only remaining need for the other 2 tables is to filter out data based on criteria, which is in the WHERE clause.
I'm not sure what table summary_attribute lives in but it would follow a similar pattern to valid_from_date, request_status, or benefit_id.