SQL union / join / intersect multiple select statements - sql

I have two select statements. One gets a list (if any) of logged voltage data in the past 60 seconds and related chamber names, and one gets a list (if any) of logged arc event data in the past 5 minutes. I am trying to append the arc count data as new columns to the voltage data table. I cannot figure out how to do this.
Note that, there may or may not be arc count rows, for a given chamber name that is in the voltage data table. If there are no rows, I want to set the arc count column value to zero.
Any ideas on how to accomplish this?
Voltage Data:
SELECT DISTINCT dbo.CoatingChambers.Name,
AVG(dbo.CoatingGridVoltage_Data.ChanA_DCVolts) AS ChanADC,
AVG(dbo.CoatingGridVoltage_Data.ChanB_DCVolts) AS ChanBDC,
AVG(dbo.CoatingGridVoltage_Data.ChanA_RFVolts) AS ChanARF,
AVG(dbo.CoatingGridVoltage_Data.ChanB_RFVolts) AS ChanBRF FROM
dbo.CoatingGridVoltage_Data LEFT OUTER JOIN dbo.CoatingChambers ON
dbo.CoatingGridVoltage_Data.CoatingChambersID =
dbo.CoatingChambers.CoatingChambersID WHERE
(dbo.CoatingGridVoltage_Data.DT > DATEADD(second, - 60,
SYSUTCDATETIME())) GROUP BY dbo.CoatingChambers.Name
Returns
Name | ChanADC | ChanBDC | ChanARF | ChanBRF
-----+-------------------+--------------------+---------------------+------------------
OX2 | 2.9099999666214 | -0.485000004371007 | 0.344801843166351 | 0.49748428662618
S2 | 0.100000001490116 | -0.800000016887983 | 0.00690172302226226 | 0.700591623783112
S3 | 4.25666658083598 | 0.5 | 0.96554297208786 | 0.134956782062848
Arc count table:
SELECT CoatingChambers.Name,
SUM(ArcCount) as ArcCount
FROM CoatingChambers
LEFT JOIN CoatingArc_Data
ON dbo.[CoatingArc_Data].CoatingChambersID = dbo.CoatingChambers.CoatingChambersID
where EventDT > DATEADD(mi,-5, GETDATE())
Group by Name
Returns
Name | ArcCount
-----+---------
L1 | 283
L4 | 0
L6 | 1
S2 | 55
To be clear, I want this table (with added arc count column), given the two tables above:
Name | ChanADC | ChanBDC | ChanARF | ChanBRF | ArcCount
-----+-------------------+--------------------+---------------------+-------------------+---------
OX2 | 2.9099999666214 | -0.485000004371007 | 0.344801843166351 | 0.49748428662618 | 0
S2 | 0.100000001490116 | -0.800000016887983 | 0.00690172302226226 | 0.700591623783112 | 55
S3 | 4.25666658083598 | 0.5 | 0.96554297208786 | 0.134956782062848 | 0

You can treat the select statements as virtual tables and just join them together:
select
x.Name,
x.ChanADC,
x.ChanBDC,
x.ChanARF,
x.ChanBRF,
isnull( y.ArcCount, 0 ) ArcCount
from
(
select distinct
cc.Name,
AVG(cgv.ChanA_DCVolts) AS ChanADC,
AVG(cgv.ChanB_DCVolts) AS ChanBDC,
AVG(cgv.ChanA_RFVolts) AS ChanARF,
AVG(cgv.ChanB_RFVolts) AS ChanBRF
from
dbo.CoatingGridVoltage_Data cgv
left outer join
dbo.CoatingChambers cc
on
cgv.CoatingChambersID = cc.CoatingChambersID
where
cgv.DT > dateadd(second, - 60, sysutcdatetime())
group by
cc.Name
) as x
left outer join
(
select
cc.Name,
sum(ac.ArcCount) as ArcCount
from
dbo.CoatingChambers cc
left outer join
dbo.CoatingArc_Data ac
on
ac.CoatingChambersID = cc.CoatingChambersID
where
EventDT > dateadd(mi,-5, getdate())
group by
Name
) as y
on
x.Name = y.Name
Also, it's worthwhile to simplify your names with aliases and format the queries for readability...which I shamelessly took a stab at.

Related

How to integrate over segments using SQL

I have a table with columns t_b; t_e; x were [t_b, t_e) denotes a period during which x resources where used. I want to compute a table were for each hour h I have amount of resources that where used during [h, h+1) period.
So far my only idea was to generate multiple rows from each input row for each hour (I use an extension of SQL with UDFs) and then simply group by by hour, but I'm afraid this may be too slow considering large amount of data at hand.
Say for example I have a table with two rows:
+-----+-----+---+
| t_b | t_e | x |
+-----+-----+---+
| 1 | 3.5 | a |
| 0.5 | 4 | b |
+-----+-----+---+
Then resulting table should be:
+---+-------------+
| h | x |
+---+-------------+
| 0 | 0*a + 0.5*b |
| 1 | 1*a + 1*b |
| 2 | 1*a + 1*b |
| 3 | 0.5*a + 1*b |
+---+-------------+
You can have a trigger on insert into the stats table that also adds to the aggregate table (the per-hour sums).
If you also need to convert the existing data, you need to run over every row of your current table, split it into amounts/hours and add to the aggregate table.
This is an sql-server example for all number columns
with h as (
-- your hours tally here
select top(24) row_number() over(order by (select null)) eoh from sys.all_objects
), myTable as (
select 1 t_b, 3.5 t_e, 20 v union all
select 0.5, 4, 40
)
select eoh-1 h_starth
, sum(v * (case when t_e < eoh then t_e else eoh end - case when t_b > eoh-1 then t_b else eoh-1 end)) usage
from h
left join myTable t on t_e > eoh - 1 and eoh > t_b -- [..) intresection with [..)
group by eoh;
Fiddle

Columns to Rows Two Tables in Cross Apply

Using SQL Server I have two tables, below sample Table #T1 in DB has well over a million rows, Table #T2 has 100 rows. Both tables are in Column format and I need to Pivot to rows and join both.
Can I get it all in one query with Cross Apply and remove the cte?
This is my code, I have correct output but is this the most efficient way to do this considering number of rows?
with cte_sizes
as
(
select SizeRange,Size,ColumnPosition
from #T2
cross apply (
values(Sz1,1),(Sz2,2),(Sz3,3),(Sz4,4)
) X (Size,ColumnPosition)
)
select a.ProductID,a.SizeRange,c.Size,isnull(x.Qty,0) as Qty
from #T1 a
cross apply (
values(a.Sale1,1),(a.Sale2,2),(a.Sale3,3),(a.Sale4,4)
) X (Qty,ColumnPosition)
inner join cte_sizes c
on c.SizeRange = a.SizeRange
and c.ColumnPosition = x.ColumnPosition
I have also code and considered this but is this the CROSS APPLY a better method?
with cte_sizes
as
(
select 1 as SizePos
union all
select SizePos + 1 as SizePos
from cte_sizes
where SizePos < 4
)
select a.ProductID
,a.SizeRange
,(case when b.SizePos = 1 then c.Sz1
when b.SizePos = 2 then c.Sz2
when b.SizePos = 3 then c.Sz3
when b.SizePos = 4 then c.Sz4 end
) as Size
,isnull((case when b.SizePos = 1 then a.Sale1
when b.SizePos = 2 then a.Sale2
when b.SizePos = 3 then a.Sale3
when b.SizePos = 4 then a.Sale4 end
),0) as Qty
from #T1 a
inner join #T2 c on c.SizeRange = a.SizeRange
cross join cte_sizes b
This is wild guessing, but my magic crystall ball told me, that you might be looking for something like this:
For this we do not need your table #TS at all.
WITH Unpivoted2 AS
(
SELECT t2.SizeRange,A.* FROM #t2 t2
CROSS APPLY(VALUES(1,t2.Sz1)
,(2,t2.Sz2)
,(3,t2.Sz3)
,(4,t2.Sz4)) A(SizePos,Size)
)
SELECT t1.ProductID
,Unpivoted2.SizeRange
,Unpivoted2.Size
,Unpivoted1.Qty
FROM #t1 t1
CROSS APPLY(VALUES(1,t1.Sale1)
,(2,t1.Sale2)
,(3,t1.Sale3)
,(4,t1.Sale4)) Unpivoted1(SizePos,Qty)
LEFT JOIN Unpivoted2 ON Unpivoted1.SizePos=Unpivoted2.SizePos AND t1.SizeRange=Unpivoted2.SizeRange
ORDER BY t1.ProductID,Unpivoted2.SizeRange;
The result:
+-----------+-----------+------+------+
| ProductID | SizeRange | Size | Qty |
+-----------+-----------+------+------+
| 123 | S-XL | S | 1 |
+-----------+-----------+------+------+
| 123 | S-XL | M | 12 |
+-----------+-----------+------+------+
| 123 | S-XL | L | 13 |
+-----------+-----------+------+------+
| 123 | S-XL | XL | 14 |
+-----------+-----------+------+------+
| 456 | 8-14 | 8 | 2 |
+-----------+-----------+------+------+
| 456 | 8-14 | 10 | 22 |
+-----------+-----------+------+------+
| 456 | 8-14 | 12 | NULL |
+-----------+-----------+------+------+
| 456 | 8-14 | 14 | 24 |
+-----------+-----------+------+------+
| 789 | S-L | S | 3 |
+-----------+-----------+------+------+
| 789 | S-L | M | NULL |
+-----------+-----------+------+------+
| 789 | S-L | L | 33 |
+-----------+-----------+------+------+
| 789 | S-L | XL | NULL |
+-----------+-----------+------+------+
The idea in short:
The cte will return your #T2 in an unpivoted structure. Each name-numbered column (something you should avoid) is return as a single row with an index indicating the position.
The SELECT will do the same with #T1 and join the cte against this set.
UPDATE: After a lot of comments...
If I get this (and the changes to the initial question) correctly, the approach above works perfectly well, but you want to know, what was best in performance.
The first answer to "What is the fastest approach?" is Race your horses by Eric Lippert.
Good to know 1: A CTE is nothing more then syntactic sugar. It will allow to type a sub-query once and use it like a table, but it has no effect to the way how the engine will work this down.
Good to know 2: It is a huge difference whether you use APPLY or JOIN. The first will call the sub-source once per row, using the current row's values. The second will have to create two sets first and will then join them by some condition. There is no general "what is better"...
For your issue: As there is one very big set and one very small set, all depends on when you reduce the big set usig any kind of filter. The earlier the better.
And most important: It is - in any case - a sign of bad structures - when you find name numbering (something like phone1, phone2, phoneX). The most expensive work will be to transform your 4 name-numbered columns to some dedicated rows. This should be stored in normalized format...
If you still need help, I'd ask you to start a new question.

Calculating Win-rates

I have to calculate win-rates of the players from a table (bots_match_history) which has data in the format:
id | username | sub_level_id | bot_type | match_result | system_win_balance | created_at | analyzed | stakes
------+-----------------+--------------+----------+--------------+--------------------+----------------------------+----------+--------
5487 | ashishish | 5 | hard | l | -831 | 2017-11-29 06:26:13.288267 | f | 18
5486 | dilip.kumar | 3 | hard | l | -821 | 2017-11-29 06:25:09.106075 | f | 50
5485 | abhinav.garg | 5 | hard | w | -791 | 2017-11-29 06:24:07.589281 | f | 18
I need to use only those entries which haven't been analyzed yet (analyzed=false) and which has more than 3 entries for a particular level.
This is the query that I had written, somehow for some entries it is returning a win-rate of > 100%.
WITH total AS (
SELECT COUNT(b.match_result) AS total_matches, b.bot_type, sl.level_id, b.stakes
FROM bots_match_history b
JOIN sub_levels sl ON b.sub_level_id = sl.id
WHERE b.analyzed=FALSE
GROUP BY b.bot_type, sl.level_id, b.stakes
HAVING COUNT(b.match_result) >=3
)
SELECT total.bot_type, total.level_id, total.stakes, round(cast(((
SELECT COUNT(b2.*)
FROM bots_match_history b2
JOIN sub_levels sl2 ON b2.sub_level_id = sl2.id
WHERE b2.match_result='w' AND b2.analyzed=FALSE
AND b2.bot_type = total.bot_type AND sl2.level_id = total.level_id
)::FLOAT * 100.0 / total.total_matches) AS NUMERIC), 2)::FLOAT AS win_percentage
FROM total, bots_match_history b3
JOIN sub_levels sl3 ON sl3.id = b3.sub_level_id
WHERE b3.bot_type = total.bot_type AND sl3.level_id=total.level_id
GROUP BY total.bot_type, total.level_id, total.stakes, total.total_matches;
What is wrong in this query that it is returning a win-rate of more than 100%?
You are grouping your total by stakes, but do not JOIN by it later, so one row from total (limited to certain stakes) can be matched with more rows than calculating total did (depends on how many of those are won).
I have made fiddle, excluding sub_levels table and substituting it with sub_level_id instead for grouping and joins: http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=b3558b9d3cddc63a47d3f29a4a6c08f6
Fixed SQL should look like this:
WITH total AS (
SELECT COUNT(b.match_result) AS total_matches, b.bot_type, sl.level_id, b.stakes
FROM bots_match_history b
JOIN sub_levels sl ON b.sub_level_id = sl.id
WHERE b.analyzed=FALSE
GROUP BY b.bot_type, sl.level_id, b.stakes
HAVING COUNT(b.match_result) >=3
)
SELECT total.bot_type, total.level_id, total.stakes, round(cast(((
SELECT COUNT(b2.*)
FROM bots_match_history b2
JOIN sub_levels sl2 ON b2.sub_level_id = sl2.id
WHERE b2.match_result='w' AND b2.analyzed=FALSE
AND b2.bot_type = total.bot_type AND sl2.level_id = total.level_id
AND b2.stakes = total.stakes
)::FLOAT * 100.0 / total.total_matches) AS NUMERIC), 2)::FLOAT AS win_percentage
FROM total, bots_match_history b3
JOIN sub_levels sl3 ON sl3.id = b3.sub_level_id
WHERE b3.bot_type = total.bot_type AND sl3.level_id=total.level_id
AND b3.stakes=total.stakes
GROUP BY total.bot_type, total.level_id, total.stakes, total.total_matches;
Also, you might get (IMO) incorrect results if you limit yourself to analyzed rows only. I saw 40% winrate for one set of parameters, then checked actual rows and there was actually 3 won and 3 lost games with left me puzzled for a while only to notice that one of those won games was "analyzed" already.

Select multiple (non-aggregate function) columns with GROUP BY

I am trying to select the max value from one column, while grouping by another non-unique id column which has multiple duplicate values. The original database looks something like:
mukey | comppct_r | name | type
65789 | 20 | a | 7n
65789 | 15 | b | 8m
65789 | 1 | c | 1o
65790 | 10 | a | 7n
65790 | 26 | b | 8m
65790 | 5 | c | 1o
...
This works just fine using:
SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.mukey;
Which returns a table like:
mukey | ComponentPercent
65789 | 20
65790 | 26
65791 | 50
65792 | 90
I want to be able to add other columns in without affecting the GROUP BY function, to include columns like name and type into the output table like:
mukey | comppct_r | name | type
65789 | 20 | a | 7n
65790 | 26 | b | 8m
65791 | 50 | c | 7n
65792 | 90 | d | 7n
but it always outputs an error saying I need to use an aggregate function with select statement. How should I go about doing this?
You have yourself a greatest-n-per-group problem. This is one of the possible solutions:
select c.mukey, c.comppct_r, c.name, c.type
from c yt
inner join(
select c.mukey, max(c.comppct_r) comppct_r
from c
group by c.mukey
) ss on c.mukey = ss.mukey and c.comppct_r= ss.comppct_r
Another possible approach, same output:
select c1.*
from c c1
left outer join c c2
on (c1.mukey = c2.mukey and c1.comppct_r < c2.comppct_r)
where c2.mukey is null;
There's a comprehensive and explanatory answer on the topic here: SQL Select only rows with Max Value on a Column
Any non-aggregate column should be there in Group By clause .. why??
t1
x1 y1 z1
1 2 5
2 2 7
Now you are trying to write a query like:
select x1,y1,max(z1) from t1 group by y1;
Now this query will result only one row, but what should be the value of x1?? This is basically an undefined behaviour. To overcome this, SQL will error out this query.
Now, coming to the point, you can either chose aggregate function for x1 or you can add x1 to group by. Note that this all depends on your requirement.
If you want all rows with aggregation on z1 grouping by y1, you may use SubQ approach.
Select x1,y1,(select max(z1) from t1 where tt.y1=y1 group by y1)
from t1 tt;
This will produce a result like:
t1
x1 y1 max(z1)
1 2 7
2 2 7
Try using a virtual table as follows:
SELECT vt.*,c.name FROM(
SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.muke;
) as VT, c
WHERE VT.mukey = c.mukey
You can't just add additional columns without adding them to the GROUP BY or applying an aggregate function. The reason for that is, that the values of a column can be different inside one group. For example, you could have two rows:
mukey | comppct_r | name | type
65789 | 20 | a | 7n
65789 | 20 | b | 9f
How should the aggregated group look like for the columns name and type?
If name and type is always the same inside a group, just add it to the GROUP BY clause:
SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.muke, c.name, c.type;
Use a 'Having' clause
SELECT *
FROM c
GROUP BY c.mukey
HAVING c.comppct_r = Max(c.comppct_r);

MS Access Pass Through Query find duplicates using multiple tables

I'm trying to find all coverage_set_id with more than one benefit_id attached summary_attribute (value=2004687).
The query seems to be working fine without the GROUP BY & HAVING parts, but once I add those lines in (for the COUNT) my results are incorrect. Just trying to get duplicate coverage_set_id.
Pass-Through query via OBDC database:
SELECT DISTINCT
b.coverage_set_id,
COUNT (b.coverage_set_id) AS "COUNT"
FROM
coverage_set_detail_view a
JOIN contracts_by_sub_group_view b ON b.coverage_set_id = a.coverage_set_id
JOIN request c ON c.request_id = b.request_id
WHERE
b.valid_from_date BETWEEN to_date('10/01/2010','mm/dd/yyyy')
AND to_date('12/01/2010','mm/dd/yyyy')
AND c.request_status = 1463
AND summary_attribute = 2004687
AND benefit_id <> 1092333
GROUP BY
b.coverage_set_id
HAVING
COUNT (b.coverage_set_id) > 1
My results look like this:
-----------------------
COVERAGE_SET_ID | COUNT
-----------------------
4193706 | 8
4197052 | 8
4193926 | 112
4197078 | 96
4174168 | 8
I'm expecting all the COUNTs to be 2.
::EDIT::
Solution:
SELECT
c.coverage_set_id AS "COVERAGE SET ID",
c1.description AS "Summary Attribute",
count(d.benefit_id) AS "COUNT"
FROM (
SELECT DISTINCT coverage_set_id
FROM contracts_by_sub_group_view
WHERE
valid_from_date BETWEEN '01-OCT-2010' AND '01-DEC-2010'
AND request_id IN (
SELECT request_id
FROM request
WHERE request_status = 1463)
) a
JOIN coverage_set_master e ON e.coverage_set_id = a.coverage_set_id
JOIN coverage_set_detail c ON c.coverage_set_id = a.coverage_set_id
JOIN benefit_summary d ON d.benefit_id = c.benefit_id
AND d.coverage_type = e.coverage_type
JOIN codes c1 ON c1.code_id = d.summary_attribute
WHERE
d.summary_attribute IN (2004687, 2004688)
AND summary_structure = 1000217
GROUP BY c.coverage_set_id, c1.description
HAVING COUNT(d.benefit_id) > 1
ORDER BY c.coverage_set_id, c1.description
And these were the results:
COVERAGE SET ID | SUMMARY ATTRIBUTE | COUNT
-------------------------------------------------
4174168 | INPATIENT | 2
4174172 | INPATIENT | 2
4191828 | INPATIENT | 2
4191832 | INPATIENT | 2
4191833 | INPATIENT | 2
4191834 | INPATIENT | 2
4191838 | INPATIENT | 2
4191842 | INPATIENT | 2
4191843 | INPATIENT | 2
4191843 | OUTPATIENT | 2
4191844 | INPATIENT | 2
4191844 | OUTPATIENT | 2
The coverage_set_id in both the HAVING and count part of the SELECT should be benefit_id.
Since benefit_id is also in table a you can do the following
SELECT
a.coverage_set_id,
COUNT (a.benefit_id) AS "COUNT"
FROM
coverage_set_detail_view a
WHERE
a.coverage_set_id in (
SELECT b.coverage_set_id
FROM contracts_by_sub_group_view b
WHERE b.valid_from_date BETWEEN to_date('10/01/2010','mm/dd/yyyy') AND to_date('12/01/2010','mm/dd/yyyy'))
AND a.coverage_set_id in (
SELECT b2.coverage_set_id
FROM contracts_by_sub_group_view b2
INNER JOIN request c on c.request_id=b2.request_id
WHERE c.request_status = 1463)
AND ?.summary_attribute = 2004687
AND a.benefit_id <> 1092333
GROUP BY
a.coverage_set_id
HAVING
COUNT (a.benefit_id) > 1
This removes the JOIN magnification that was occurring on the FROM since those tables are not needed to pull coverage_set_id and benefit_id. The only remaining need for the other 2 tables is to filter out data based on criteria, which is in the WHERE clause.
I'm not sure what table summary_attribute lives in but it would follow a similar pattern to valid_from_date, request_status, or benefit_id.