Multi-level aggregation from a table - sql

I have a table metrics which has the following columns :
stage name
---------------------
new member
new member
old member
new visitor
old visitor
Now I can find out how many new or old members are there by running a query like this :
select stage, count(*) from metrics where name = 'member' group by stage;
This will give me the following result:
stage count
-----------
new 2
old 1
But along with this I want output like this :
total stage count
------------------
3 new 2
3 old 1
Total is sum of all rows statisfying where clause above. How do I need to modify my previous query to get the result I need? Thanks.

You can do something like this:
with t as
(select stage from metrics where name = 'member')
select
(select count(*) from t) as total,
stage, count(*)
from t
group by stage
Check it: http://sqlfiddle.com/#!15/b97a4/9
This is compact variant and includes the 'member' constant only once.

The window-function using variant:
with member as (
select stage, count(*)
from metrics where name = 'member'
group by stage
)
select sum(count) over () as total, member.*
from member
http://sqlfiddle.com/#!15/b97a4/18

This can do what you want:
SELECT t2.totalCount,
t1.stage,
t1.stageCount
FROM
(SELECT stage,
COUNT(*) stageCount
FROM metrics
WHERE name = 'member'
GROUP BY stage
) t1,
(SELECT COUNT(*) AS totalCount FROM metrics WHERE name = 'member'
) t2;
See sqlfiddle http://sqlfiddle.com/#!2/0240b/5.
An advantage of this approach in comparison to using the subquery is that the sql finding the total count will not be run for each row of the sql defining the count by stage.

Try this code:
select (select count(*) as total from metrics where name = 'member' group by name),stage, count(*) from metrics where name = 'member' group by stage;

Related

How to include column not included in Group By

I have the table DirectCosts with the following columns:
DetailsID (unique)
InvoiceNumber
ProjectID
PayableID
I need to find the duplicates combinations of payableid and invoicenumber.
How can I adjust the following query so that it accommodates the combination AND displays the list of instead of the count?
SELECT sinvoicenumber, count(*)
FROM exportdirectcostdetails where iprocoreprojectid = 1187294
GROUP BY sinvoicenumber
HAVING COUNT(*) > 2
Is there a way it can display all columns?
Original Question : Why do I get error ed2 should have column name defined
You are having a derived table, so you need to have column names for the derived table.
select ed1.sinvoicenumber,
ed1.ipayableid,
ed2.sinvoicenumber
from ExportDirectCostDetails ed1
inner join
(
SELECT sinvoicenumber, count(sinvoicenumber) AS InvoiceNumberCount
FROM exportdirectcostdetails
where iprocoreprojectid = 1187294
GROUP BY sinvoicenumber
HAVING COUNT(*) > 2
) ed2
on ed1.sinvoicenumber = ed2.sinvoicenumber
Updated Question: How to have all column names
You need to have PARTITION BY clause defined and then apply filter as given below:
SELECT t.* FROM
(SELECT *, count(*) OVER(PARTITION BY payableid,invoiceNumber) AS InvoiceCount
FROM exportdirectcostdetails where iprocoreprojectid = 1187294) as t
WHERE InvoiceCount > 1

Merging two query results in a materialized view

Im trying to merge two SELECT results into one view.
The first query returns the id's of all registered users.
The second query goes through an entire table and counts how many victories a player has and returns the id of the player and number of wins.
What I'm trying to do now is to merge these two results, so that if the user has wins it states how many but if he doesn't then it says 0.
I tried doing it like this:
SELECT profile.user_id
FROM profile
FULL JOIN ( SELECT player_game_data.user_id,
count(player_game_data.user_id) AS wins
FROM player_game_data
WHERE player_game_data.is_winner = 1
GROUP BY player_game_data.user_id) t2 ON profile.user_id::text = t2.user_id::text;
But in the end it only returns id's of the players and there isn't a count column:
What am I doing wrong?
Is this what you want?
select p.*,
(select count(*)
from player_game_data pg
where pg.user_id = p.user_id and pg.is_winner = 1
) as num_wins
from profile p;
Or, if all users have played at least one game, you can use conditional aggregation:
select pg.user_id,
count(*) filter (where pg.is_winner = 1)
from player_game_data pg
group by pg.user_id;
Or, if is_winner only takes on the values of 0 and 1:
select pg.user_id, sum(ps.is_winner)
from player_game_data pg
group by pg.user_id;
Thanks for the help Gordon. I've got it to work now.
The final query looks like this :
SELECT p.user_id,
( SELECT count(*) AS count
FROM player_game_data pg
WHERE pg.user_id::text = p.user_id::text AND pg.is_winner = 1) AS wins,
( SELECT count(*) AS count
FROM player_game_data pg
WHERE pg.user_id::text = p.user_id::text AND pg.is_winner = 0) AS losses,
( SELECT count(*) AS count
FROM player_game_data pg
WHERE pg.user_id::text = p.user_id::text) AS games_played
FROM profile p;
And when I run it I get the result that i wanted:

SQL - When result is duplicated on 2 fields remove all

When i run this query
SELECT
DT.CONTRACT_NUMBER,
DT.ROLE,
DT.TAX_ID,
DT.EFFECTIVE_DATE
FROM DATA_TABLE DT
I get this result.
Id like to remove results where the TAX ID appears more than once for each contract.
i.e This result would be gone. If they had 3 results they would be gone.
I think window functions might be the way to go:
SELECT DT.CONTRACT_NUMBER, DT.ROLE, DT.TAX_ID, DT.EFFECTIVE_DATE
FROM (SELECT DT.CONTRACT_NUMBER, DT.ROLE, DT.TAX_ID, DT.EFFECTIVE_DATE,
COUNT(*) OVER (PARTITION BY TAX_ID) as cnt
FROM DATA_TABLE DT
WHERE DT.CONTRACT_NUMBER = '551000280'
) DT
WHERE CNT = 1;
If you actually want to keep one row per tax id, then use row_number() instead of count(*).

Count query with timestamp value

I would like to create a count query (in Postgres) which counts data.data_name dependent on data.todb_date.
So what I want to is that the query counts all the rows that are higher than the requirement in the WHERE clause. I tried Count(data.data_name) and Count(*) but they didn't work.
My planned result looks like this:
todb_date: 2016-01-01
data.data_name : test1
count: 150
todb_date: 2017-01-01
data.data_name : test1
count: 130
This is the query I have tried:
SELECT data.data_name, parentdata.data_id,
data.data_id, parentdata.todb_date,
COUNT (data.data_name)
FROM parentdata, data
WHERE parentdata.data_id = data.data_id
AND parentdata.todb_date > '2016-01-01'
GROUP BY parentdata.data_id, data.data_id, data.data_name, parentdata.todb_date
As #Usagi Miyamoto suggested, you should use a data_trunc() function to group your results according to certain time increments (here: per year):
SELECT d.data_name nam, date_trunc('year',p.todb_date) yr, COUNT(*) cnt
FROM parentdata p
INNER JOIN data d ON p.data_id = d.data_id AND p.todb_date > '2016-01-01'
GROUP BY d.data_name,date_trunc('year',p.todb_date)
ORDER BY nam, yr
If you replace 'year' by 'date' you will get daily counts, see here.

SQL query count divided by a distinct count of same query

Having some trouble with some SQL.
Take the following result for instance:
LOC_CODE CHANNEL
------------ --------------------
3ATEST-01 CHAN2
3ATEST-01 CHAN3
3ATEST-02 CHAN4
What I need to do is get a count of the above query, grouped by channel, but i want that count to be divided by the count that the "LOC_CODE" appears.
Example of the result I am after is:
CHANNEL COUNT
---------------- ----------
CHAN2 0.5
CHAN3 0.5
CHAN4 1
Above explaination is that the CHAN2 appears next to "3ATEST-01", but that LOC_CODE of "3ATEST-01" appears twice, so the count should be divided by 2.
I know I can do this by basically duplicating the query with a distinct count, but the underlying query is quite complex and don't really want to harm performance.
Please let me know if you would like more information!
Try:
select channel,
count(*) over (partition by channel, loc_code)
/ count(*) over (partition by loc_code) as count_ratio
from my_table
SELECT t.CHANNEL, COUNT(*) / gr.TotalCount
FROM my_table t JOIN (
SELECT LOC_CODE, COUNT(*) TotalCount
FROM my_table
GROUP BY LOC_CODE
) gr USING(LOC_CODE)
GROUP BY t.LOC_CODE, t.CHANNEL
Create a index on (LOC_CODE, CHANNEL)
If are no duplicate channels, replace COUNT(*) / gr.TotalCount with 1 / gr.TotalCount and remove the GROUP BY clause
First, find a query that gets you the correct results. Then, see if it can be optimised. My guess is that it's hard to optimise as you require two different groupings, one per Channel and one pre Loc_Code.
I'm not even sure that this fits your description:
SELECT t.CHANNEL
, COUNT(*) / SUM(grp.TotalCount)
FROM my_table t
JOIN
( SELECT LOC_CODE
, COUNT(*) TotalCount --- or is it perhaps?:
--- COUNT(DISTINCT CHANNEL)
FROM my_table
GROUP BY LOC_CODE
) grp
ON grp.LOC_CODE = t.LOC_CODE
GROUP BY t.CHANNEL
Your requirements are still a bit unclear to me when it comes to duplicate CHANNELs, but this should work if you want grouping on both CHANNEL and LOC_CODE to sum up later;
SELECT L1.CHANNEL, 1/COUNT(L2.LOC_CODE)
FROM Locations L1
LEFT JOIN Locations L2 ON L1.LOC_CODE = L2.LOC_CODE
GROUP BY L1.CHANNEL, L1.LOC_CODE
Demo here.