PostgreSQL Using COUNT to form statistical results - sql

I have a few tables that make up a media catalog of live/studio music, where each media item has zero-many show dates, CDs and Vinyl associated to it. The query I have at the moment pulls out statistics that results in a tabular set of data for the all the media items available. I'm having trouble now extending the query to include finer grained statistics on each associated table.
Schema:
media(id , title)
cd(media_fk, type)
vinyl(media_fk)
gig(id, date)
media_gigs(gig_fk, media_fk)
Query I have thus far:
SELECT m.id, m.title, COUNT(DISTINCT c.id) as cds, COUNT(DISTINCT v.id) as vinyl, gig.id as gid, gig.date as gdate
FROM media m
LEFT JOIN cd c on m.id = c.media
LEFT JOIN vinyl v on m.id = v.media
LEFT JOIN media_gigs g on m.id = g.media
LEFT JOIN gig gig on g.gig = gig.id
GROUP BY m.id, gig.id;
Which produces:
id | title | cds | vinyl | gid | gdate
---+---------+-----+-------+--------------------------+------------
1 | title 1 | 5 | 1 | may-11-1989-kawasaki | 1989-05-11
1 | title 1 | 5 | 1 | may-13-1989-tokyo | 1989-05-13
2 | title 2 | 6 | 0 | apr-29-1998-nagoya | 1998-04-29
2 | title 2 | 6 | 0 | may-6-1998-tokyo | 1998-05-06
2 | title 2 | 6 | 0 | may-7-1998-tokyo | 1998-05-07
3 | title 3 | 6 | 2 | dec-1-1986-new-york-city | 1986-12-01
3 | title 3 | 6 | 2 | dec-5-1986-quebec-city | 1986-12-05
3 | title 3 | 6 | 2 | nov-19-1986-tokyo | 1986-11-19
3 | title 3 | 6 | 2 | nov-20-1986-tokyo | 1986-11-20
cd.type is an enum type of [silver,cdr,pro-cdr] that I'm wanting to add to the results. So, the the end goal is to have 3 additional columns that are a count of the type of cd associated to each media item. I've not found the correct syntax using COUNT or otherwise to aggregate the cd based on its type, so looking for a push in the right direction. I'm fairly new to SQL so what I have so far may be a bit naive.
Using PG 9.3.

You can use the CASE function to determine the cd type and do a SUM based on the result, as below:
SELECT
m.id,
m.title,
COUNT(DISTINCT c.id) as cds,
COUNT(DISTINCT v.id) as vinyl,
gig.id as gid, gig.date as gdate,
SUM(case cd.type
when 'silver' then 1
else 0
end) silver,
SUM(case cd.type
when 'cdr' then 1
else 0
end) cdr,
SUM(case cd.type
when 'pro-cdr' then 1
else 0
end) pro_cdr
FROM media m
LEFT JOIN cd c on m.id = c.media
LEFT JOIN vinyl v on m.id = v.media
LEFT JOIN media_gigs g on m.id = g.media
LEFT JOIN gig gig on g.gig = gig.id
GROUP BY m.id, gig.id;
References:
Conditional Expressions on PostgreSQL 9.3 Manual
Enumerated Types on PostgreSQL 9.3 Manual

As other poster has mentioned, you can do this with a SUM(CASE WHEN <cond1> THEN 1 ELSE 0) construction on the c.type column.
There are some other problems with your SQL I would like to mention:
Incorrect use of LEFT JOIN
You group on a value that might be NULL: gig.id. This is probably because of incorrect use of the LEFT JOIN. Only use left join if you want to keep rows in the result set that have no match in the joining table.
So on the CD table a left join is correct, because you also want to be able to show that there are 0 cd's. On the media_gigs and the gigs table you probably want an INNER JOIN, because there always has to be a match.
Edit: It's possible that I mistakenly thought this was incorrect. I assumed from the sample data that you don't want to display media for which there is no gig.
Non-grouping, non-aggregate columns
In your query you select columns that you don't group on, which are not aggregate functions (like SUM, COUNT). While some Db dialects may accept this, it is bad practice. For instance, take the following query:
SELECT x, y, SUM(z) FROM t
GROUP BY x;
If y is not functionally dependant on x, that is, if there can be different values of y for one value of x, it is not clear which of these values should be displayed. Therefore your should always write it like this:
SELECT x, y, SUM(z) FROM t
GROUP BY x, y;

Related

Oracle SQL: Optimizing LEFT OUTER JOIN of two similar select statements to be smaller and/or more efficient

So I have this Oracle SQL query:
SELECT man.Toilet_Type, NVL(man.manual_PORTA_POTTY, 0) MANUAL, NVL(reg.regular_PORTA_POTTY, 0) REGULAR FROM (
SELECT A.Visitor Toilet_Type, COUNT(A.Toilet_ID) MANUAL_PORTA_POTTY FROM
BORE.EnragedPotty A,
BORE.SemiEnragedPotty B,
BORE.ManualPotty C
WHERE B.SemiEnragedPotty_ID = C.SemiEnragedPotty_ID
AND B.Toilet_ID = A.Toilet_ID
GROUP BY Visitor
ORDER BY Visitor ASC) man
LEFT OUTER JOIN
(SELECT A.Visitor Toilet_Type, COUNT(B.Toilet_ID) REGULAR_PORTA_POTTY FROM
BORE.EnragedPotty A,
BORE.RegularPotty B
WHERE B.Toilet_ID = A.Toilet_ID
GROUP BY Visitor
ORDER BY Visitor ASC) reg ON man.Toilet_Type = reg.Toilet_Type
This gives two table results. The first query, man, gives me the following output:
+===============+========+
| Toilet_Type | Manual |
+===============+========+
| Portable | 234 |
+---------------+--------+
| Home | 10 |
+---------------+--------+
| Assassination | 2 |
+---------------+--------+
The second query, reg, gives me the same output as above, but with REGULAR instead of MANUAL.
What I want to do is query the databases in a more efficient manner. I want the output to be formatted like so:
+===============+========+=========+
| Toilet_Type | Manual | Regular |
+===============+========+=========+
| Portable | 234 | 444 |
+---------------+--------+---------+
| Home | 10 | 222 |
+---------------+--------+---------+
| Assassination | 2 | 111 |
+---------------+--------+---------+
Surely this can be done in a single query without using a LEFT OUTER JOIN?
This is untested, as I didn't have any sample data, but I think something similar to this might get it done in one query:
SELECT
E.Visitor Toilet_Type,
SUM(case when SE.SemiEnragedPotty_ID is not null and
M.Toilet_ID is not null then 1 else 0 end) MANUAL_PORTA_POTTY,
SUM(case when R.Toilet_ID is not null then 1 else 0 end) REGULAR_PORTA_POTTY
FROM
BORE.EnragedPotty E,
BORE.SemiEnragedPotty SE,
BORE.ManualPotty M,
BORE.RegularPotty R
WHERE
E.SemiEnragedPotty_ID = SE.SemiEnragedPotty_ID (+) AND
E.Toilet_ID = M.Toilet_ID (+)
E.Toilet_ID = R.Toilet_ID (+)
GROUP BY Visitor
ORDER BY Visitor ASC
I may have some of the details off -- I had to rename your aliases to follow which table was which, so it wouldn't shock me if I misplaced one of them.
If you need to pull from the same dataset twice, you should consider using subquery factoring.
WITH
some_result_you_dont_want_to_repeat AS (
-- Chunk of SQL goes here
)
SELECT
-- More SQL here
FROM some_result_you_dont_want_to_repeat once
JOIN some_result_you_dont_want_to_repeat twice
ON ...
In your case, it appears that your A-B table join can be factored out.

(SQL / PostgreSQL) In a query, how can I translate a field's value into a more human-readable value using another table as a lookup?

I have postgresql tables with values like:
Table region_data:
region_name | population | region_code
------------+------------+-------------
Region 1 | 120000 | A
Region 2 | 200000 | A
Region 3 | -1 | B
Region 4 | -2 | -1
Where some data may not be available (i.e., the -1 and -2 values)
And tables that contain translations for those values:
Table data_codes:
code | meaning
------+-----------------------
-1 | 'Data not available'
-2 | 'Insufficient data'
...
and
Table region_types:
type | meaning
------+---------------
A | Mountain
B | Grassland
...
I want to make a query (actually a view) that returns the human-readable translations provided by the data_code and region_types tables. For instance, the view would return:
Region Name | Population | Region Type
------------+--------------------+-------------
Region 1 | 120000 | Mountain
Region 2 | 200000 | Mountain
Region 3 | Data Not Available | Grassland
Region 4 | Insufficient Data | Data Not Available
I've tried doing some sub-queries, but they return a lot of duplicate rows where the code doesn't match to anything in the data_code table.
Please help? Thanks!
Assuming there is no conflict between the data codes and the region codes, then I see two challenges. One is the data type problem on the population column (the value is an integer but the data meaning requires a string). The other is combining the region codes with the data codes:
select rd.region_name,
(case when population >= 0 cast(population as varchar(255))
else p.meaning
end) as population,
r.meaning
from region_data rd left outer join
(select type, meaning from region_types
union all
select code, meaning from data_codes
) r
on rd.region_code = r.type left outer join
data_codes p
on rd.population < 0 and rd.population = p.code;
select
r.region_name,
coalesce(d1.meaning, r.population::text) as population,
coalesce(d2.meaning, rt.meaning, r.region_code) as region_code
from region_data as r
left outer join data_codes as d1 on d1.code = r.population
left outer join data_codes as d2 on d2.code::text = r.region_code
left outer join region_types as rt on rt.type = r.region_code
order by r.region_name
=> sql fiddle demo
Maybe you can post the query. duplicated rows when using joins usually means INNER JOIN instead of LEFT JOIN

SQL Order By Within A Count(Distinct)

I have the following tables:
filetype1
F1_ID | F1_ORDR | FILENAME
1 | 1 | file1.txt
2 | 2 | file2.txt
3 | 3 | file3.txt
4 | 2 | file4.txt
5 | 4 | file5.txt
filetype2
F2_ID | F2_ORDR | FILENAME
1 | 1 | file6.txt
2 | 2 | file7.txt
3 | 4 | file8.txt
ordr
OR_ID | OR_VENDOR
1 | 1
2 | 1
3 | 1
4 | 1
vendor
VE_ID | VE_NAME
1 | Company1
My goal is to have a list of vendors and a count of the number of orders where a file is connected for each type. For example, the end result of this data should be:
VENDOR | OR_CT | F1_CT | F2_CT
Company1 | 4 | 4 | 3
Because at least 1 type1 file was attached to 4 distinct orders and at least 1 type2 file was attached to 3 distinct orders. Currently my SQL code looks like this:
SELECT vendor.ve_id, vendor.ve_name,
(SELECT COUNT(or_id)
FROM ordr
WHERE ordr.or_vendor = vendor.ve_id) as OR_COUNT,
(SELECT COUNT(DISTINCT f1_order)
FROM filetype1 INNER JOIN ordr ON filetype1.f1_ordr = ordr.or_id
WHERE ordr.or_vendor = vendor.ve_id) as F1_CT,
(SELECT COUNT(DISTINCT f2_ordr)
FROM filetype2 INNER JOIN ordr ON filetype2.f2_ordr = ordr.or_id
WHERE ordr.or_vendor = vendor.ve_id) as F2_CT
FROM vendor
ORDER BY vendor.ve_name
Unfortunately this yields the following results:
VENDOR | OR_COUNT | F1_COUNT | F2_COUNT
Company1 | 4 | 5 | 3
My only guess is that because I'm using COUNT(DISTINCT) the COUNT is automatically assuming the DISTINCT is ordering by F1_ID instead of by F1_ORDR
If anyone can assist me on how to tell the COUNT(DISTINCT) to order by F1_ORDR that would be most helpful. I have searched the vast internet for a solution but its hard to explain what I want to a search engine, forums, etc. My database uses Microsoft SQL Server. My knowledge of database management is almost completely self taught, so I'm just glad I made it this far on my own. My expertise is in web design. Thank you for your time.
Your SQL yields the result you want for me.
Two pieces of advice
Order is a bad name for a table - it conflicts with a reserved word, and will cause you no end of hassle
You should join your tables like so
FROM filetype1
inner join [order]
on filetype1.f1_order = or_id
rather than using a where clause
Perhaps try this instead
select
vendor.VE_ID, vendor.VE_NAME,
count(distinct or_id),
count(distinct f1_order),
count(distinct f2_order)
from
vendor
left join [order]
on vendor.VE_ID = [order].OR_VENDOR
inner join filetype1
on [order].OR_ID = filetype1.F1_ORDER
left join filetype2
on [order].OR_ID = filetype2.F2_ORDER
group by
vendor.VE_ID, vendor.VE_NAME
Try this:
SELECT
vdr.VE_NAME
,COUNT(DISTINCT OR_ID) AS OR_ID
,COUNT(DISTINCT ft1.F1_ORDER) AS FT1_COUNT
,COUNT(DISTINCT ft2.F2_ORDER) AS FT2_COUNT
FROM
vendor vdr
LEFT OUTER JOIN [order] odr
ON vdr.VE_ID = odr.OR_VENDOR
INNER JOIN filetype1 ft1
ON odr.OR_ID = ft1.F1_ORDER
LEFT OUTER JOIN filetype2 ft2
ON odr.OR_ID = ft2.F2_ORDER
GROUP BY
vdr.VE_ID
,vdr.VE_NAME
I will propose you this:
Merge filetype1 and filetype2 tables in one table(filetype) and add another field named - f_type(for instance) of type INT or TINTYINT to store the filetype (1 or 2). This has the benefits of painlessly adding another filetype later
Now the query will look something like this:
SELECT
vendor.ve_name,
count(DISTINCT filetype.f_order),
filetype.f_type
FROM
filetype
INNER JOIN `order`
ON filetype.f_order = `order`.or_id
INNER JOIN vendor
ON `order`.or_vendor = vendor.ve_id
GROUP BY vendor.ve_id,filetype.f_type
This will give the count of orders for filetype.
For the total orders just add another query:
SELECT count(*) FROM `order`

join on three tables? Error in phpMyAdmin

I'm trying to use a join on three tables query I found in another post (post #5 here). When I try to use this in the SQL tab of one of my tables in phpMyAdmin, it gives me an error:
#1066 - Not unique table/alias: 'm'
The exact query I'm trying to use is:
select r.*,m.SkuAbbr, v.VoucherNbr from arrc_RedeemActivity r, arrc_Merchant m, arrc_Voucher v
LEFT OUTER JOIN arrc_Merchant m ON (r.MerchantID = m.MerchantID)
LEFT OUTER JOIN arrc_Voucher v ON (r.VoucherID = v.VoucherID)
I'm not entirely certain it will do what I need it to do or that I'm using the right kind of join (my grasp of SQL is pretty limited at this point), but I was hoping to at least see what it produced.
(What I'm trying to do, if anyone cares to assist, is get all columns from arrc_RedeemActivity, plus SkuAbbr from arrc_Merchant where the merchant IDs match in those two tables, plus VoucherNbr from arrc_Voucher where VoucherIDs match in those two tables.)
Edited to add table samples
Table arrc_RedeemActivity
RedeemID | VoucherID | MerchantID | RedeemAmt
----------------------------------------------
1 | 2 | 3 | 25
2 | 6 | 5 | 50
Table arrc_Merchant
MerchantID | SkuAbbr
---------------------
3 | abc
5 | def
Table arrc_Voucher
VoucherID | VoucherNbr
-----------------------
2 | 12345
6 | 23456
So ideally, what I'd like to get back would be:
RedeemID | VoucherID | MerchantID | RedeemAmt | SkuAbbr | VoucherNbr
-----------------------------------------------------------------------
1 | 2 | 3 | 25 | abc | 12345
2 | 2 | 5 | 50 | def | 23456
The problem was you had duplicate table references - which would work, except for that this included table aliasing.
If you want to only see rows where there are supporting records in both tables, use:
SELECT r.*,
m.SkuAbbr,
v.VoucherNbr
FROM arrc_RedeemActivity r
JOIN arrc_Merchant m ON m.merchantid = r.merchantid
JOIN arrc_Voucher v ON v.voucherid = r.voucherid
This will show NULL for the m and v references that don't have a match based on the JOIN criteria:
SELECT r.*,
m.SkuAbbr,
v.VoucherNbr
FROM arrc_RedeemActivity r
LEFT JOIN arrc_Merchant m ON m.merchantid = r.merchantid
LEFT JOIN arrc_Voucher v ON v.voucherid = r.voucherid

Deleting similar columns in SQL

In PostgreSQL 8.3, let's say I have a table called widgets with the following:
id | type | count
--------------------
1 | A | 21
2 | A | 29
3 | C | 4
4 | B | 1
5 | C | 4
6 | C | 3
7 | B | 14
I want to remove duplicates based upon the type column, leaving only those with the highest count column value in the table. The final data would look like this:
id | type | count
--------------------
2 | A | 29
3 | C | 4 /* `id` for this record might be '5' depending on your query */
7 | B | 14
I feel like I'm close, but I can't seem to wrap my head around a query that works to get rid of the duplicate columns.
count is a sql reserve word so it'll have to be escaped somehow. I can't remember the syntax for doing that in Postgres off the top of my head so I just surrounded it with square braces (change it if that isn't correct). In any case, the following should theoretically work (but I didn't actually test it):
delete from widgets where id not in (
select max(w2.id) from widgets as w2 inner join
(select max(w1.[count]) as [count], type from widgets as w1 group by w1.type) as sq
on sq.[count]=w2.[count] and sq.type=w2.type group by w2.[count]
);
There is a slightly simpler answer than Asaph's, with EXISTS SQL operator :
DELETE FROM widgets AS a
WHERE EXISTS
(SELECT * FROM widgets AS b
WHERE (a.type = b.type AND b.count > a.count)
OR (b.id > a.id AND a.type = b.type AND b.count = a.count))
EXISTS operator returns TRUE if the following SQL statement returns at least one record.
According to your requirements, seems to me that this should work:
DELETE
FROM widgets
WHERE type NOT IN
(
SELECT type, MAX(count)
FROM widgets
GROUP BY type
)