Total Count in Grouped TSQL Query - sql

I have an performance heavy query, that filters out many unwanted records based on data in other tables etc.
I am averaging a column, and also returning the count for each average group. This is all working fine.
However, I would also like to include the percentage of the TOTAL count.
Is there any way of getting this total count without rerunning the whole query, or increasing the performance load significantly?
I would also prefer if I didn't need to completely restructure the sub query (e.g. by getting the total count outside of it), but can do if necessary.
SELECT
data.EquipmentId,
AVG(MeasureValue) AS AverageValue,
COUNT(data.*) AS BinCount
COUNT(data.*)/ ???TotalCount??? AS BinCountPercentage
FROM
(SELECT * FROM MultipleTablesWithJoins) data
GROUP BY data.EquipmentId

See Window functions.
SELECT
data.EquipmentId,
AVG(MeasureValue) AS AverageValue,
COUNT(*) AS BinCount,
COUNT(*)/ cast (cnt as float) AS BinCountPercentage
FROM
(SELECT *,
-- Here is total count of records
count(*) over() cnt
FROM MultipleTablesWithJoins) data
GROUP BY data.EquipmentId, cnt
EDIT: forgot to actually divide the numbers.

Another approach:
with data as
(
SELECT * FROM MultipleTablesWithJoins
)
,grand as
(
select count(*) as cnt from data
)
SELECT
data.EquipmentId,
AVG(MeasureValue) AS AverageValue,
COUNT(data.*) AS BinCount
COUNT(data.*)/ grand.cnt AS BinCountPercentage
FROM data cross join grand
GROUP BY data.EquipmentId

Related

counts' division doesn't work in full code

I do have a problem with a task because my division value is different when I use it alone and when I use it in full code. Let's say I do this code:
SELECT (count(paimta))::numeric / count(distinct paimta) as average
FROM Stud.Egzempliorius;
and finally a number I get is 2.(6)7, but when I use it in full code which is:
SELECT Stud.Egzempliorius.Paimta, COUNT(PAIMTA) as PaimtaKnyga
FROM Stud.Skaitytojas, Stud.Egzempliorius
WHERE Stud.Skaitytojas.Nr=Stud.Egzempliorius.Skaitytojas
GROUP BY Stud.Egzempliorius.Paimta
HAVING count(paimta) > (count(paimta))::numeric / count(distinct paimta);
it's value changes because division is not working anymore and let's say instead of having
HAVING count(paimta) > (count(paimta))::numeric / count(distinct paimta);
my codes turns into
HAVING count(paimta) > (count(paimta))::numeric;
and these values are equal, so I can't get final answer. That's database I use https://klevas.mif.vu.lt/~baronas/dbvs/biblio/show-table.php?table=Stud.Egzempliorius
I was struggling for 10 hours now and finally I've lost my patience... So, my question is what I have to do that this code:
SELECT (count(paimta))::numeric / count(distinct paimta) as average
FROM Stud.Egzempliorius;
value doesn't change in full code?
Picture how it changes Photo
Your solution fails because the two queries operate on a different groups of rows. The first query does a computation over the whole dataset, while the second one groups by paimta.
One option would have been to use window functions, but as far as concerns Postgres does not support count(distinct) as a window function.
I think that the simplest approach is to use a subquery :
select e.paimta, count(paimta) as paimtaknyga
from stud.skaitytojas s
inner join stud.egzempliorius e on s.nr = e.skaitytojas
group by e.paimta
having count(paimta) > (
select (count(paimta))::numeric / count(distinct paimta) from stud.egzempliorius
)

Google Big Query: New Column of Aggregate Based On Condition of Current Row

Using the Google Big Query database bigquery-public-data.crypto_ethereum_classic.transactions as reference.
For each transaction row, I want to calculate the count of all transactions to the same address that occurred before that transaction, and sum of the gas usage of them. I am sure I can do this with a join as I have tried and Google accepts my old query, but since there is so much data as a result of the (inner) join, there is almost always a "quota limit exceeded" error. At the same time, I think a subquery solution is inefficient, as it is querying almost the same thing in both aggregate functions.
In a perfect world the query would use something like a join to create a temporary table with all columns I need (transaction_hash, receipt_gas_used, to_address, block_timestamp), according to the conditions (where to_address = table_1.to_address and block_timestamp < table_1.block_timestamp), where I can then perform the aggregate functions on the columns of that table.
What I have so far and what I'm looking for is something like...:
SELECT
table_1.*,
COUNT(
DISTINCT IF(block_timestamp < table_1.block_timestamp and to_address = table_1.to_address, `hash`, NULL)
) as txn_count,
SUM(
IF(block_timestamp < table_1.block_timestamp and to_address = table_1.to_address, `receipt_gas_used`, NULL)
) as total_gas_used
from
`bigquery-public-data.crypto_ethereum_classic.transactions` as table_1
where block_number >= 3000000 and block number <= 3500000 #just to subset the data a bit
I think you want window functions:
select t.*,
row_number() over (partition by to_address order by block_timestamp) as txn_seqnum,
sum(receipt_gas_used) over (partition by to_address order by block_timestamp) as total_gas_used
from `bigquery-public-data.crypto_ethereum_classic.transactions` as t
where block_number >= 3000000 and block number <= 3500000 #just to subset the
If you really have ties and need the distinct, then use dense_rank() instead of row_number().

SQL: How to use sum in group by

SELECT idteam,
job,
price,
COUNT('X') as INFORMS,
SUM(COUNT('X') * price) as TOTAL
FROM REP
JOIN COSTS ON (job = categ AND to_number(to_char(REP,'YYYY')) = year)
GROUP BY idteam, job, price, TOTAL
ORDER BY IDTEAM;
I don't know why if I write TOTAL in GROUP BY and sql sends me error.. Identifier invalid.
I don't know how can I resolve that.
Thanks.
The column "TOTAL" is an alias for SUM(COUNT('X') * price).
It cannot be used as a column identifier in the GROUP BY clause. You must say GROUP BY SUM(COUNT('X') * price), because "TOTAL" is unknown/not a column, at the time of grouping.
After using GROUPING, you can refer to "TOTAL" in a HAVING clause.
In any case, the version/type of SQL your are using, doesn't allow it.
Additionally, why are you COUNTing 'X'? That X is a fixed value, and does not depend on any of your columns. If you would like to count each row, just use Count(1) or Count(*). Also, you don't need to SUM a COUNT. A COUNT is already summed.
You should post the structure of both REP and COSTS. Your linked image doesn't have enough info to support the query you wrote.
select
idteam,
-- job, /* not selected since it would need to be grouped*/
sum(price) as 'theSUM'
from REP
join COSTS
on REP.categ = COSTS.job
and COSTS.year = 2016
group by idteam
order by idteam

SQL Percentage of Occurrences

I'm working on some SQL code as part of my University work. The data is factitious just to be clear. I'm trying to count the occurances of 1 & 0 in the SQL table Fact_Stream, this is stored in the Free_Stream column/attribute as a Boolean/bit value.
As calculations cant be made on bit values (at least in the way I'm trying) I've converted the value to an integer -- Just to be clear on that. The table contains information on a streaming companies streams, a 1 indicates the stream was free of charge, a 0 indicates the stream was paid for. My code:
SELECT Fact_Stream.Free_Stream, ((CAST(Free_Stream AS INT)) / COUNT(*) * 100) As 'Percentage of Streams'
FROM Fact_Stream
GROUP BY Free_Stream
The result/output is nearly where I want it to be, but it doesn't display the percentage correctly.
Output:
Using MS SQL Management Studio | MS SQL Server 2012 (I believe)
The percentage should be based on all rows, so you need to divide the count per 1/0 by a count of all rows. The easiest way to get this is utilizing a Windowed Aggregate Function:
SELECT Fact_Stream.Free_Stream,
100.0 * COUNT(*) -- count per bit
/ SUM(COUNT(*)) OVER () -- sum of those counts = count of all rows
As "Percentage of Streams"
FROM Fact_Stream
GROUP BY Free_Stream
You have INTs as a devisor and devidened(not sure I am correct with namings). So the result is also INT. Just cast one of those to decimal(notice how did I change to 100.0). Also you should debide count of elements in group to total count of rows in the table:
select Free_Stream,
(count(*) / (select count(*) from Free_Stream)) * 100.0 as 'Percentage of Streams'
from Fact_Stream
group by Free_Stream
Your equation is dividing the identifier (1 or 0) by the number of streams for each one, instead of dividing the count of free or paid by the total count. One way to do this is to get the total count first, then use it in your query:
declare #totalcount real;
select #totalcount = count(*) from Fact_Stream;
SELECT Fact_Stream.Free_Stream,
(Cast(Count(*) as real) / #totalcount)*100 AS 'Percentage of Streams'
FROM Fact_Stream
group by Fact_Stream.Free_Stream

Calculate percents inline in SQL query

SELECT User, COUNT(*) as count FROM Tests WHERE Release = '1.0' GROUP by User;
Above query will return distinct numbers, however, I would like to convert count to percents in relation to total number of records. Total number of records considering WHERE clause.
SELECT R1.user, COUNT(*)/R2.COUNT_ALL AS Expr1
FROM Releases R1,
(SELECT COUNT(*) As COUNT_ALL FROM Releases WHERE Release = '1.0') R2
WHERE R1.Release = '1.0'
GROUP BY R1.user, R2.COUNT_ALL
Here's another approach that uses a single SELECT:
SELECT
Tests.User,
Count(IIf([Tests].[Release]='1.0', 1, Null)) / Count(*) AS Percentage
FROM
Tests
GROUP BY
Tests.User
Unlike the approaches suggested earlier, this one will, for better or worse, return records for users having no records in Tests where Release is "1.0". If you don't want these records, you could add a HAVING clause to eliminate them.
Hope this helps.
Like Brian's answer but in standard SQL instead of MS Access only. (One can also use COUNT instead of SUM with NULL where I have 0.)
SELECT
Tests.User,
Sum(CASE WHEN Release='1.0' THEN 1 ELSE 0 END) / Count(*) AS Percentage
FROM
Tests
GROUP BY
Tests.User