How to count() and aggregate values without using group by? - sql

I'm having trouble obtaining data and categorizing it within a select without filtering it through the group by clause.
I have a table populated with the results of a customer satisfaction poll, which includes the office at which the client was served, the type of service he was provided, and the score given by the client.
I need to know the NPS scores and group them by business and office.
(ID column not included because I can barely fit it, but it's there)
service
office
score
Rental
office 1
1
New Cars
office 1
6
New Cars
office 2
5
Rental
office 2
10
Rental
office 3
9
New Cars
office 3
8
The thing is, I need to count the amount of scores between 0 and 6 (detractors), between 7 and 8 (passives), and 9 and 10 (promoters). So it becomes something like this:
service
office
qty_detractors
qty_passives
qty_promoters
qty_answers
NPS_score
Rental
office 1
1
0
0
1
Rental
office 2
0
0
1
1
Rental
office 3
0
0
1
1
New Cars
office 1
1
0
0
1
New Cars
office 2
1
0
1
2
New Cars
office 3
0
1
0
2
To calculate the nps_score = (qty_promoters/qty_answers) - (qty_detractors /qty_answers)
I thought of doing a count of scores to get the total amount of answers from every segment, but I don't know how to then spread the data according to my needs.
select
business,
office,
count(score) as total
from dbo.poll_results
group by business,
office
I thought about making a temporary table in which I could store the id's of each answer and classify the results into the 3 categories.
select id,
score,
"type" =
case
when score <= 6 then 'detractor'
when score > 6 and score < 9 then 'passive'
when score >= 9 then 'promoter'
end
into #tmp_NPS_scores
from dbo.poll_results
And then joining the two, but I'm very confused about how this is going to turn out and so far I've been hitting walls trying to structure the query.
I've been investigating about PIVOT(), but I still don't quite get it I believe, tried playing with it but I only got errors in return. Should I pursue this as a solution?
I hope I could make the problem as understandable as possible, I'm working with over ten columns and I believe this is enough for anyone to comprehend.
Thank you in advance.
Oh and just in case anyone wonders, I know this can be done with BI tools, in fact I need to make a dashboard out of this in Tableau, but I was given the task to relieve some of the load off Tableau by making the NPS calculation inside the table.

with breakout as (
select service, office,
count(case when score between 0 and 6 then 1 end) as qty_detractors,
count(case when score between 7 and 8 then 1 end) as qty_passives,
count(case when score between 9 and 10 then 1 end) as qty_promoters,
count(*) as qty_answers
from dbo.poll_results
group by service, office
)
select *, (qty_promoters - qty_detractors) / qty_answers as NPS_score
from breakout
This is going to be a lot more efficient than executing separate queries and combining them all via joins and then having to deal with nulls values where there were zeroes...

You can try and use CTE - common table expressions, temporary results sets to separate the count query into 3 needed groups and then build all the results together.
This is a readable and straightforward solution.
;WITH detractors AS
(
SELECT service, office, COUNT(score) as count
FROM poll
WHERE score <=6
GROUP by service, office
),
passives AS
(
SELECT service, office, COUNT(score) as count
FROM poll
WHERE score > 6 AND score <= 9
GROUP by service, office
),
promoters AS
(
SELECT service, office, COUNT(score) as count
FROM poll
WHERE score > 9
GROUP by service, office
),
totals AS
(
SELECT service, office, COUNT(score) as count
FROM poll
GROUP by service, office
),
SELECT p.service, p.office, pro.count as promoters, det.count as detractors, pas.count as passives, ((pro.count/tot.count) - (det.count /tot.count)) as nps_score
FROM poll p
LEFT JOIN promoters pro ON p.office = pro.office AND p.service = pro.service
LEFT JOIN passives pass ON p.office = pass.office AND p.service = pass.service
LEFT JOIN detractors det ON p.office = det.office AND p.service = det.service
LEFT JOIN totals tot ON p.office = tot.office AND p.service = tot.service
UPD: This answer achieves the same less complicated, using the CTEs as well.

Related

SQL - Finding out how to write a case statement depending on how many times a name shows up in a column

I need some help with an SQL exercise that has me completely stumped. For an assignment, I have a question asking me to print a list of names with different descriptions attached depending on how many times they appear in a pre-existing list. The question verbatim is posted below for reference:
SI schema contains customers that have bought one or more vehicles. They can be classified using the following criteria:
Customers that have bought only one car (one-time buyer)
Customer that have bought two cars (two-time buyer)
Customers that have bought more than two cars (frequent buyer)
Using a SINGLE SELECT statement, display a list of customers with their names and what type of buyer they are for all those customers that have bought Jaguar car makes.
The code I have written is posted here:
use si;
select saleinv.custname,
count(case
when saleinv.custname = 1
then 'One-Time Purchaser'
when saleinv.custname = 2
then 'Two-Time Purchaser'
else
'Frequent Purchaser'
end) as "purchtype"
from si.saleinv
inner join si.car
on car.custname = saleinv.custname
where (car.carmake like 'JAGUAR');
That is just what I have currently -- I am constantly taking things out and adding things and rearranging things -- nothing seems to work. I am being met with error after error. I have been trying to follow along with any CASE statement resources I can find including the ones provided to me by my instructor, but nothing seems to be helping me. There are plenty of resources detailing what to do in regards to working with directly assigned values, but never anything related to I'm supposed to use this to find the amount of items that appear in a list. It doesn't matter how well I follow along with example code, my IDE just isn't liking what I am putting in.
I don't want any outright changes to my code, I just want somebody to point out what I'm doing wrong because at the moment, I have no clue whatsoever.
I am brand new to StackOverflow (in terms of actually posting content on the site), so I may not know how to navigate replies and posts and such, but I'll do my best.
Thank you all.
As I didn't get complete idea of your database structure, I am assuming you have 3 tables Customers, Purchases and Cars.
You can use corelated subquery to use CASE on aggregate functions.
SELECT CustomerID,
CASE WHEN t1.TotalPurchase = 1 THEN 'one-time-buyer'
CASE WHEN t1.TotalPurchase = 2 THEN 'two-times-buyer'
CASE WHEN t1.TotalPurchase > 2 THEN 'frequent-buyer' AS 'PurchaserType'
FROM
(SELECT CustomerID, COUNT(*) as TotalPurchases FROM Purchases
GROUP BY CustomerID ) AS t1
You can add rest of your joins according to get other required columns.
I'll demonstrate with R's mtcars dataset. cyl is a reasonable field to test on, so I'll project that as your "Customer name". A change for this example: I'll use ranges of values instead of simple equality.
with counts as (
select cyl, count(*) as n
from mtcars
group by cyl
)
select c.cyl, c.n,
case when c.n <= 8 then 'small'
when 8 < c.n and c.n <= 12 then 'medium'
when 12 < c.n then 'large'
end as something
from counts c
cyl n something
4 11 medium
6 7 small
8 14 large
I think you intend to join it back on some data, likely from another table. I'll just join it back on itself (odd perhaps), effectively the same method.
with counts as (
select cyl, count(*) as n
from mtcars
group by cyl
)
select mt.cyl, mt.disp, c.n,
case when c.n <= 8 then 'small'
when 8 < c.n and c.n <= 12 then 'medium'
when 12 < c.n then 'large'
end as something
from mtcars mt
inner join counts c on mt.cyl = c.cyl
cyl disp n something
6 160.0 7 small
6 160.0 7 small
4 108.0 11 medium
6 258.0 7 small
8 360.0 14 large
6 225.0 7 small
8 360.0 14 large
...truncated
(This was done in SQLite, though it should perform the same in other DBMSs.)
Data
"car","mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb"
"Mazda RX4",21,6,160,110,3.9,2.62,16.46,0,1,4,4
"Mazda RX4 Wag",21,6,160,110,3.9,2.875,17.02,0,1,4,4
"Datsun 710",22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
"Hornet 4 Drive",21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
"Hornet Sportabout",18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
"Valiant",18.1,6,225,105,2.76,3.46,20.22,1,0,3,1
"Duster 360",14.3,8,360,245,3.21,3.57,15.84,0,0,3,4
"Merc 240D",24.4,4,146.7,62,3.69,3.19,20,1,0,4,2
"Merc 230",22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
"Merc 280",19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4
"Merc 280C",17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4
"Merc 450SE",16.4,8,275.8,180,3.07,4.07,17.4,0,0,3,3
"Merc 450SL",17.3,8,275.8,180,3.07,3.73,17.6,0,0,3,3
"Merc 450SLC",15.2,8,275.8,180,3.07,3.78,18,0,0,3,3
"Cadillac Fleetwood",10.4,8,472,205,2.93,5.25,17.98,0,0,3,4
"Lincoln Continental",10.4,8,460,215,3,5.424,17.82,0,0,3,4
"Chrysler Imperial",14.7,8,440,230,3.23,5.345,17.42,0,0,3,4
"Fiat 128",32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1
"Honda Civic",30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2
"Toyota Corolla",33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1
"Toyota Corona",21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1
"Dodge Challenger",15.5,8,318,150,2.76,3.52,16.87,0,0,3,2
"AMC Javelin",15.2,8,304,150,3.15,3.435,17.3,0,0,3,2
"Camaro Z28",13.3,8,350,245,3.73,3.84,15.41,0,0,3,4
"Pontiac Firebird",19.2,8,400,175,3.08,3.845,17.05,0,0,3,2
"Fiat X1-9",27.3,4,79,66,4.08,1.935,18.9,1,1,4,1
"Porsche 914-2",26,4,120.3,91,4.43,2.14,16.7,0,1,5,2
"Lotus Europa",30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2
"Ford Pantera L",15.8,8,351,264,4.22,3.17,14.5,0,1,5,4
"Ferrari Dino",19.7,6,145,175,3.62,2.77,15.5,0,1,5,6
"Maserati Bora",15,8,301,335,3.54,3.57,14.6,0,1,5,8
"Volvo 142E",21.4,4,121,109,4.11,2.78,18.6,1,1,4,2

Grouping together query counts into as few queries as possible in SQLite?

I'm not a database expert but I've inherited this SQLite database I have to work with. It contains tags, images and events. An event contains a number of images and an image contains a number of tags (the tags describe the image content e.g. coffee, phone, laptop, etc.).
The table structure looks something like this:
row_id tags image_id event_id
1 computer 1 1
2 desk 1 1
3 chair 1 1
4 computer 2 1
5 coffee 2 1
6 desk 2 1
7 dog 3 2
8 phone 3 2
etc. etc. etc. etc. // many 1000's
The users of our system used to search for images by choosing some tags and we had a very simple query which returned a ranked list favoring images containing the most tags. It looked like this:
SELECT image_id
FROM TagsTable
WHERE tags
IN ('computer', 'desk', 'chair') // user variables
GROUP BY image_id
ORDER BY COUNT(image_id) DESC
But now we want to return a list of the events (which I need to rank) instead of individual images. I can achieve this by doing many queries in a loop but it's very slow. Ideally I'm trying to produce the following information in as few queries as possible.
So if the user searched for 'computer', 'desk' and 'chair', you would get...
event_id computer_count desk_count chair_count event_image_count
1 12 15 9 56
2 22 0 13 24
3 14 7 0 32
etc. etc. etc. etc. etc.
// no results if all tag counts are 0
So at a glance we can see event 1 contains a total of 56 images and the tag 'computer' appears 12 times, 'desk' appears 15 times and 'chair' appears 9 times.
Is this possible using just SQL or do I need to perform multiple queries? Please note I am using SQLite.
You can answer this specific question using conditional aggregation:
SELECT event_id,
SUM(CASE WHEN tags = 'computer' THEN 1 ELSE 0 END) as computer_count,
SUM(CASE WHEN tags = 'desk' THEN 1 ELSE 0 END) as desk_count,
SUM(CASE WHEN tags = 'chair' THEN 1 ELSE 0 END) as chair_count,
COUNT(DISTINCT image_id) as image_count
FROM TagsTable
WHERE tags IN ('computer', 'desk', 'chair')
GROUP BY event_id;
EDIT:
To add an "average" column:
SELECT . . .
SUM(CASE WHEN tags IN ('computer', 'desk', 'chair') THEN 1.0 ELSE 0 END) / 3 as tag_average

Access SQL: How to specify which record to return based on the "more important" condition?

I have 2 tables (MS ACCESS):
Table "Orders"
OrderID Product Product_Group Client Client_Group Revenue
1 Cars Vehicles Men People 10 000
2 Houses NC_Assets Women People 15 000
3 Houses NC_Assets Partnersh Companies 12 000
4 Cars Vehicles Corps Companies 3 000
Table "Gouping"
Product Product_Group Client Client_Group Tax rate
Cars Companies Taxable 30%
Vehicles Companies Taxable 15%
Houses People Taxable 13%
Houses Women Taxable 15%
I want to join these tables to see which orders will fall into which taxable group. As you can see some products/clients are mapped differently than their groups -> if that is the case, the query should return only one record for this pair and exclude any pairing containing their groups. In pseudo-code:
If there's product-client grouping, return this record Else
If there's product-client grouping ---//----- else
If there's product group - client ----///-----else
If there's product group-client group ---///----
End if * 4
In that order.
Now my query (pseudo):
SELECT [Orders].*, [Grouping].* FROM [Orders] LEFT JOIN [Grouping] ON
(([Orders].Product = [Grouping].Product OR [Orders].Product_Group = [Grouping].Product_Group) AND
([Orders].Client = [Grouping].Client OR [Orders].Client_Group = [Grouping].Client_Group))
Returns both Cars-Companies and Vehicles-Companies. I'm out of ideas how to set it up to get only the most granular records from each combination. UNION? NOT EXISTS?
Any help appreciated.
I want to join these tables to see how many orders qualify as good,
mediocre etc.
Sounds like you want counts of the particular conditions...Assuming you have a SUM and CASE (I haven't written queries for MS Access in about 10 years...), here's some pseudo-code that should get you started:
SELECT SUM(CASE WHEN {mediocre-conditions} THEN 1 ELSE 0 END) AS MediocreCount,
SUM(CASE WHEN {good-conditions} THEN 1 ELSE 0 END) AS GoodCount,
SUM(CASE WHEN {great-conditions} THEN 1 ELSE 0 END) AS GreatCount
FROM [Orders] LEFT JOIN [Grouping] ON (([Orders].Product = [Grouping].Product OR [Orders].Product_Group = [Grouping].Product_Group) AND ([Orders].Client = [Grouping].Client OR [Orders].Client_Group = [Grouping].Client_Group))
[update] I don't like giving bad answers, so did a quick look...based on this link: Does MS Access support "CASE WHEN" clause if connect with ODBC?, it appears you may be able to do:
SELECT SUM(IIF({mediocre-conditions},1,0)) AS MediocreCount,
SUM(IIF({good-conditions},1,0)) AS GoodCount,
SUM(IIF({great-conditions},1,0)) AS GreatCount

Double "group by" without join?

I have user data:
user store item cost
1 10 100 5
1 10 101 3
1 11 102 7
2 10 101 3
2 12 103 4
2 12 104 5
I want a table which will tell me for each user how much he bought from each store and how much he bought in total:
user store cost_this_store cost_total
1 10 8 15
1 11 7 15
2 10 3 12
2 12 9 12
I can do this with two group by and a join:
select s.user, s.store, s.cost_this_store, u.cost_total
from (select user, store, sum(cost) as cost_this_store
from my_data
group by user, store) s
join (select user, sum(cost) as cost_total
from my_data
group by user) u
on s.user = u.user
However, this is definitely not how I would do this if I were writing this in any other language (join is clearly avoidable, and the two group by are not independent).
Is it possible to avoid the join in sql?
PS. I need the solution to work in hive.
You can do this with a windowing function... which Hive added support for last year:
select distinct
user,
store,
sum(cost) over (partition by user, store) as cost_this_store,
sum(cost) over (partition by user) as cost_total
from my_data
However, I'd argue that there wasn't anything glaringly wrong with your original implementation. You've essentially got two different sets of data, which you're combining through a JOIN.
The duplication might look like a code smell in a different language, but this isn't necessarily the wrong approach in SQL, and often you'll have to take approaches such as this that duplicate a portion of a query between two intermediate result sets for performance reasons.
SQL Fiddle (SQL Server)

Progressive count using a query?

I use this query to
SELECT userId, submDate, COUNT(submId) AS nSubms
FROM submissions
GROUP BY userId, submDate
ORDER BY userId, submDate
obtain the total number of submissions per user per date.
However I need to have the progressive count for every user so I can see how their submissions accumulate over time.
Is this possible to implement in a query ?
EDIT: The obtained table looks like this :
userId submDate nSubms
1 2-Feb 1
1 4-Feb 7
2 1-Jan 4
2 2-Jan 2
2 18-Jan 1
I want to produce this :
userId submDate nSubms progressive
1 2-Feb 1 1
1 4-Feb 7 8
2 1-Jan 4 4
2 2-Jan 2 6
2 18-Jan 1 7
EDIT 2 : Sorry for not mentioning it earlier, I am not allowed to use :
Stored procedure calls
Update/Delete/Insert/Create queries
Unions
DISTINCT keyword
as I am using a tool that doesn't allow those.
You can use a self-join to grab all the rows of the same table with a date before the current row:
SELECT s0.userId, s0.submDate, COUNT(s0.submId) AS nSubms, COUNT (s1.submId) AS progressive
FROM submissions AS s0
JOIN submissions AS s1 ON s1.userId=s0.userId AND s1.submDate<=s0.submDate
GROUP BY s0.userId, s0.submDate
ORDER BY s0.userId, s0.submDate
This is going to force the database to do a load of pointless work counting all the same rows again and again though. It would be better to just add up the nSubms as you go down in whatever script is calling the query, or in an SQL variable, if that's available in your environment.
The Best solution for this is to do it at the client.
It's the right tool for the job. Databases are not suited for this kind of task
Select S.userId, S.submDate, Count(*) As nSubms
, (Select Count(*)
From submissions As S1
Where S1.userid = S.userId
And S1.submDate <= S.submDate) As TotalSubms
From submissions As S
Group By S.userid, S.submDate
Order By S.userid, S.submDate