Summing numbers don't match up? - sql

I have a table with data like this (acc_v in query):
value
id
100
1
300
2
200
1
As you can see there are multiple values per id. I want to sum all of the values across the ids and end up with this:
value
id
300
1
300
2
;with accV as (SELECT
d_id
,[period_end_date]
,max(a_value) as value
,id
FROM bh
where period_end_date = '2021-6-30'
group by d_id, period_end_date, id)
SELECT bh.id, sum(value)
FROM bh join accV on accV.id = bh.id
group by bh.id
For some reason the total values are adding up to significantly larger amounts than they should be. I verified this by taking the original values and summing them in excel. If anyone knows what I am doing wrong the help is much appreciated.

You can use window functions:
select id, sum(sum(value)) over () as total_sum
from t
group by id;

Related

SQL: Checking value counts of a column

I'd like to check if a column in a table has values with a small number of value counts.
Consider the following table as an example:
RowID |Product
1 | A
2 | A
3 | B
...
200.000 | C
the following table is aggregated of the table above:
Product |Count
A |204
B |682
C |553
D |1402
E |30855
F |357
G |1
H |542
What I'd like to know of the column Product of my table is, whether or not a Product has a count that is less than 5%. And if so, the SQL statement should return: 'Some values of this field have a small number of value counts'
In other words: IF [MinValueCount]/[Count] <= .05 then 'Some values of this field have a small number of value counts' else 'null'
With the example above, I should get: 'Some values of this field have a small number of value counts'
as product G is less than 5% of the total count of products.
how should the SQL statement look like?
With kind regards,
Lazzanova
Use two levels of aggregation. You can get the total using window functions:
select max( 'Some values of this field have a small number of value counts')
from (select product, count(*) as cnt,
sum(count(*)) over () as total_cnt
from t
) t
where cnt < 0.05 * total_cnt;
The use of max() in the outer query is just to return one row. You could also use fetch or a similar clause (whatever your database supports):
select 'Some values of this field have a small number of value counts'
from (select product, count(*) as cnt,
sum(count(*)) over () as total_cnt
from t
) t
where cnt < 0.05 * total_cnt
fetch first 1 row only;

SQL - Count Results of 2 Columns

I have the following table which contains ID's and UserId's.
ID UserID
1111 11
1111 300
1111 51
1122 11
1122 22
1122 3333
1122 45
I'm trying to count the distinct number of 'IDs' so that I get a total, but I also need to get a total of ID's that have also seen the that particular ID as well... To get the ID's, I've had to perform a subquery within another table to get ID's, I then pass this into the main query... Now I just want the results to be displayed as follows.
So I get a Total No for ID and a Total Number for Users ID - Also would like to add another column to get average as well for each ID
TotalID Total_UserID Average
2 7 3.5
If Possible I would also like to get an average as well, but not sure how to calculate that. So I would need to count all the 'UserID's for an ID add them altogether and then find the AVG. (Any Advice on that caluclation would be appreciated.)
Current Query.
SELECT DISTINCT(a.ID)
,COUNT(b.UserID)
FROM a
INNER JOIN b ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999)
GROUP BY a.ID
Which then Lists all the IDs and COUNT's all the USERID.. I would like a total of both columns. I've tried warpping the query in a
SELECT COUNT(*) FROM (
but this only counts the ID's which is great, but how do I count the USERID column as well
You seem to want this:
SELECT COUNT(DISTINCT a.ID), COUNT(b.UserID),
COUNT(b.UserID) * 1.0 / COUNT(DISTINCT a.ID)
FROM a INNER JOIN
b
ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999);
Note: DISTINCT is not a function. It applies to the whole row, so it is misleading to put an expression in parentheses after it.
Also, the GROUP BY is unnecessary.
The 1.0 is because SQL Server does integer arithmetic and this is a simple way to convert a number to a decimal form.
You can use
SELECT COUNT(DISTINCT a.ID) ...
to count all distinct values
Read details here
I believe you want this:
select TotalID,
Total_UserID,
sum(Total_UserID+TotalID) as Total,
Total_UserID/TotalID as Average
from (
SELECT (DISTINCT a.ID) as TotalID
,COUNT(b.UserID) as Total_UserID
FROM a
INNER JOIN b ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999)
) x

SQL counting query

Sorry if this is a basic question.
Basically, I have a table that is as follows, below is a basic sample
store-ProdCode-result
13p I10x 5
13p I20x 7
13p I30x 8
14a K38z 23
17a K38z 23
my data set has nearly 100,000 records.
What I'm trying to do is, for every store find the top 10 prodCode.
I am unsure of how to do this but what I tried was:
select s_code as store, prod_code,count (prod_code)
from top10_secondary
where prod_code is not null
group by store,prod_code
order by count(prod_code) desc limit 10
this is giving me something completely different and i'm unsure on how I go about achieving my final result.
All help is appreciated.
Thanks
The expected output should be: for every store(s_code) display the top 10 prodcode
so:
store--prodcode--result
1a abc 5
1a abd 4
2a dgf 1
2a ldk 6
.(10 times until next store code)
You can use the table twice in the FROM clause, once for the data, and once to get a count of how many records have fewer results for that store.
SELECT a.s_code, a.prod_code, count(*)
FROM top10_secondary a
LEFT OUTER JOIN top10_secondary b
ON a.s_code = b.s_code
AND b.result < a.result
GROUP BY a.s_code, a.prod_code
HAVING count(*) < 10
With this technique though, you may get more than 10 records per store if the 10th result value exists multiple times. Because the limit rule is simply "include record as long as there are less than 10 records with result values than mine"
It looks like in your case, "result" is a ranking, so they would not be duplicated per store.
This is a good case for Window functions.
SELECT
s_code,
prod_code,
prod_count
FROM
(
SELECT
s_code,
prod_code,
prod_count,
RANK() OVER (PARTITION BY s_code ORDER BY prod_Count DESC) as prod_rank
FROM
(SELECT s_code as store, prod_code, count(prod_Code) prod_count FROM table GROUP BY s_code, prod_code) t1
) t2
WHERE prod_rank <= 10
The inner most query gets the count of each product at the store. The second inner more query determines the rank for those products for each store based on that count. Then the outer most query limits the results based on that rank.
o

SQL query with grouping and MAX

I have a table that looks like the following but also has more columns that are not needed for this instance.
ID DATE Random
-- -------- ---------
1 4/12/2015 2
2 4/15/2015 2
3 3/12/2015 2
4 9/16/2015 3
5 1/12/2015 3
6 2/12/2015 3
ID is the primary key
Random is a foreign key but i am not actually using table it points to.
I am trying to design a query that groups the results by Random and Date and select the MAX Date within the grouping then gives me the associated ID.
IF i do the following query
select top 100 ID, Random, MAX(Date) from DateBase group by Random, Date, ID
I get duplicate Randoms since ID is the primary key and will always be unique.
The results i need would look something like this
ID DATE Random
-- -------- ---------
2 4/15/2015 2
4 9/16/2015 3
Also another question is there could be times where there are many of the same date. What will MAX do in that case?
You can use NOT EXISTS() :
SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
WHERE s.random = t.random
AND s.date > t.date)
This will select only those who doesn't have a bigger date for corresponding random value.
Can also be done using IN() :
SELECT * FROM YourTable t
WHERE (t.random,t.date) in (SELECT s.random,max(s.date)
FROM YourTable s
GROUP BY s.random)
Or with a join:
SELECT t.* FROM YourTable t
INNER JOIN (SELECT s.random,max(s.date) as max_date
FROM YourTable s
GROUP BY s.random) tt
ON(t.date = tt.max_date and s.random = t.random)
In SQL Server you could do something like the following,
select a.* from DateBase a inner join
(select Random,
MAX(dt) as dt from DateBase group by Random) as x
on a.dt =x.dt and a.random = x.random
This method will work in all versions of SQL as there are no vendor specifics (you'll need to format the dates using your vendor specific syntax)
You can do this in two stages:
The first step is to work out the max date for each random:
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
Now you can join back onto your table to get the max ID for each combination:
SELECT MAX(e.ID) AS ID
,e.DateField AS DateField
,e.Random
FROM Example AS e
INNER JOIN (
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
) data
ON data.MaxDateField = e.DateField
AND data.Random = e.Random
GROUP BY DateField, Random
SQL Fiddle example here: SQL Fiddle
To answer your second question:
If there are multiples of the same date, the MAX(e.ID) will simply choose the highest number. If you want the lowest, you can use MIN(e.ID) instead.

MS Access equivalent for using dense_rank in select

In MS Access, I have a table with 2 million account records/rows with various columns of data. I wish to apply a sequence number to every account record. (i.e.- 1 for the first account record ABC111, 2 for the second account record DEF222..., etc.)
Then, I would like to assign a batch number sequence for every 5 distinct account number. (i.e - record 1 with account number ABC111 being associated with batch number 101, record 2 with account number DEF222 being associated with batch number of 101)
This is how I would do it with a sql server query:
select distinct(p.accountnumber),FLOOR(((50 + dense_rank() over(order by
p.accountnumber)) - 1)/5) + 100 As BATCH from
db2inst1.account_table p
Raw Data:
AccountNumber
ABC111
DEF222
GHI333
JKL444
MNO555
PQR666
STU777
Resulting Data:
RecordNumber AccountNumber BatchNumber
1 ABC111 101
2 DEF222 101
3 GHI333 101
4 JKL444 101
5 MNO555 101
6 PQR666 102
7 STU777 102
I tried to make a query that uses SELECT as well as DENSE_RANK but I couldn't figure out how to make it work.
Thanks for reading my question
Something like this would probably work.
I'd first create a temporary table to hold the distinct account numbers, then I'd do an update query to assign the ranking.
CREATE TABLE tmpAccountRank
(AccountNumber TEXT(10)
CONSTRAINT PrimaryKey PRIMARY KEY,
AccountRank INTEGER NULL);
Then I'd use this table to generate the account ranking.
DELETE FROM tmpAccountRank;
INSERT INTO tmpAccountRank(AccountNumber)
SELECT DISTINCT AccountNumber FROM db2inst1.account_table;
UPDATE tmpAccountRank
SET AccountRank =
DCOUNT('AccountNumber', 'tmpAccountRank',
'AccountNumber < ''' + AccountNumber + '''') \ 5 + 101
I use DCOUNT and integer division (\ 5) to generate the ranking. This probably will have terrible performance but I think it's the way you would do it in MS Access.
If you want to skip the temp table, you can do it all in a nested subquery, but I don't think it's a great practice to do too much in a single query, especially in MS Access.
SELECT AccountNumber,
(SELECT COUNT(*) FROM
(SELECT DISTINCT AccountNumber
FROM db2inst1.account_table
WHERE AccountNumber < t.AccountNumber) q)) \ 5 + 101
FROM db2inst1.account_table t
Actually, this won't work in MS Access; apparently you can't reference tables outside of multiple levels of nesting in a subquery.
You can do dense_rank() with a correlated subquery. The logic is:
select a.*,
(select count(distinct a2.accountnumber)
from db2inst1.account_table as a2
where a2.accountnumber <= a.accountnumber
) as dense_rank
from db2inst1.account_table as a;
Then, you can use this for getting the batch number. Unfortunately, I don't follow the logic in your question (dense_rank() produces a number but your batch number is not numeric). However, this should answer your question.
EDIT:
Oh, that's right. In MS Access you need nested subqueries:
select a.*,
(select count(*)
from (select distinct a2.accountnumber
from db2inst1.account_table as a2
) as a2
where a2.accountnumber <= a.accountnumber
) as dense_rank
from db2inst1.account_table as a;