Grouping together query counts into as few queries as possible in SQLite? - sql

I'm not a database expert but I've inherited this SQLite database I have to work with. It contains tags, images and events. An event contains a number of images and an image contains a number of tags (the tags describe the image content e.g. coffee, phone, laptop, etc.).
The table structure looks something like this:
row_id tags image_id event_id
1 computer 1 1
2 desk 1 1
3 chair 1 1
4 computer 2 1
5 coffee 2 1
6 desk 2 1
7 dog 3 2
8 phone 3 2
etc. etc. etc. etc. // many 1000's
The users of our system used to search for images by choosing some tags and we had a very simple query which returned a ranked list favoring images containing the most tags. It looked like this:
SELECT image_id
FROM TagsTable
WHERE tags
IN ('computer', 'desk', 'chair') // user variables
GROUP BY image_id
ORDER BY COUNT(image_id) DESC
But now we want to return a list of the events (which I need to rank) instead of individual images. I can achieve this by doing many queries in a loop but it's very slow. Ideally I'm trying to produce the following information in as few queries as possible.
So if the user searched for 'computer', 'desk' and 'chair', you would get...
event_id computer_count desk_count chair_count event_image_count
1 12 15 9 56
2 22 0 13 24
3 14 7 0 32
etc. etc. etc. etc. etc.
// no results if all tag counts are 0
So at a glance we can see event 1 contains a total of 56 images and the tag 'computer' appears 12 times, 'desk' appears 15 times and 'chair' appears 9 times.
Is this possible using just SQL or do I need to perform multiple queries? Please note I am using SQLite.

You can answer this specific question using conditional aggregation:
SELECT event_id,
SUM(CASE WHEN tags = 'computer' THEN 1 ELSE 0 END) as computer_count,
SUM(CASE WHEN tags = 'desk' THEN 1 ELSE 0 END) as desk_count,
SUM(CASE WHEN tags = 'chair' THEN 1 ELSE 0 END) as chair_count,
COUNT(DISTINCT image_id) as image_count
FROM TagsTable
WHERE tags IN ('computer', 'desk', 'chair')
GROUP BY event_id;
EDIT:
To add an "average" column:
SELECT . . .
SUM(CASE WHEN tags IN ('computer', 'desk', 'chair') THEN 1.0 ELSE 0 END) / 3 as tag_average

Related

SQL SUM counting every instance and not if data is present

I have a piece of a SQL statement below:
SELECT WOCUSTFIELD.WORKORDERID, WORKORDER.ACTUALFINISHDATE,
CASE WHEN wocustfield.custfieldname LIKE '%bags%'
THEN (REPLACE( wocustfield.custfieldname, 'bags:', ''))
ELSE REPLACE( wocustfield.custfieldname, 'boxes:', '')
END AS Facility,
For each of the custfieldname that has a number > than 0 put in for Bags or Boxes I would like to count it as a Facility (visit to teh site).
Currently, it is counting every custfieldname regardless if it has a number ( 0 or >) in the field as a visit and is totaling all custfieldnames for each day.
for example, if the data look like the following for Jan 1st 2023:
Bags Boxes
Dogstation 1 3 4
Dogstation 2 0 0
Dogstation 3 5 1
Dogstation 4 2 0
Dogstation 5 0 0
I would like to have 3 Facility (visits) stations and not 5. I hope this makes sense. thanks for any help given
The COUNT aggregation counts 1 for every non-null. Zero (0) is not null, so it will count as 1. You want to do something like this:
SUM(CASE WHEN bags >0 then 1 else 0 end) as stores_with_bags
Just add in a WHERE clause. I'm not exactly sure how your table is formatted, but ex:
WHERE Bags>0 OR Boxes>0

How to count() and aggregate values without using group by?

I'm having trouble obtaining data and categorizing it within a select without filtering it through the group by clause.
I have a table populated with the results of a customer satisfaction poll, which includes the office at which the client was served, the type of service he was provided, and the score given by the client.
I need to know the NPS scores and group them by business and office.
(ID column not included because I can barely fit it, but it's there)
service
office
score
Rental
office 1
1
New Cars
office 1
6
New Cars
office 2
5
Rental
office 2
10
Rental
office 3
9
New Cars
office 3
8
The thing is, I need to count the amount of scores between 0 and 6 (detractors), between 7 and 8 (passives), and 9 and 10 (promoters). So it becomes something like this:
service
office
qty_detractors
qty_passives
qty_promoters
qty_answers
NPS_score
Rental
office 1
1
0
0
1
Rental
office 2
0
0
1
1
Rental
office 3
0
0
1
1
New Cars
office 1
1
0
0
1
New Cars
office 2
1
0
1
2
New Cars
office 3
0
1
0
2
To calculate the nps_score = (qty_promoters/qty_answers) - (qty_detractors /qty_answers)
I thought of doing a count of scores to get the total amount of answers from every segment, but I don't know how to then spread the data according to my needs.
select
business,
office,
count(score) as total
from dbo.poll_results
group by business,
office
I thought about making a temporary table in which I could store the id's of each answer and classify the results into the 3 categories.
select id,
score,
"type" =
case
when score <= 6 then 'detractor'
when score > 6 and score < 9 then 'passive'
when score >= 9 then 'promoter'
end
into #tmp_NPS_scores
from dbo.poll_results
And then joining the two, but I'm very confused about how this is going to turn out and so far I've been hitting walls trying to structure the query.
I've been investigating about PIVOT(), but I still don't quite get it I believe, tried playing with it but I only got errors in return. Should I pursue this as a solution?
I hope I could make the problem as understandable as possible, I'm working with over ten columns and I believe this is enough for anyone to comprehend.
Thank you in advance.
Oh and just in case anyone wonders, I know this can be done with BI tools, in fact I need to make a dashboard out of this in Tableau, but I was given the task to relieve some of the load off Tableau by making the NPS calculation inside the table.
with breakout as (
select service, office,
count(case when score between 0 and 6 then 1 end) as qty_detractors,
count(case when score between 7 and 8 then 1 end) as qty_passives,
count(case when score between 9 and 10 then 1 end) as qty_promoters,
count(*) as qty_answers
from dbo.poll_results
group by service, office
)
select *, (qty_promoters - qty_detractors) / qty_answers as NPS_score
from breakout
This is going to be a lot more efficient than executing separate queries and combining them all via joins and then having to deal with nulls values where there were zeroes...
You can try and use CTE - common table expressions, temporary results sets to separate the count query into 3 needed groups and then build all the results together.
This is a readable and straightforward solution.
;WITH detractors AS
(
SELECT service, office, COUNT(score) as count
FROM poll
WHERE score <=6
GROUP by service, office
),
passives AS
(
SELECT service, office, COUNT(score) as count
FROM poll
WHERE score > 6 AND score <= 9
GROUP by service, office
),
promoters AS
(
SELECT service, office, COUNT(score) as count
FROM poll
WHERE score > 9
GROUP by service, office
),
totals AS
(
SELECT service, office, COUNT(score) as count
FROM poll
GROUP by service, office
),
SELECT p.service, p.office, pro.count as promoters, det.count as detractors, pas.count as passives, ((pro.count/tot.count) - (det.count /tot.count)) as nps_score
FROM poll p
LEFT JOIN promoters pro ON p.office = pro.office AND p.service = pro.service
LEFT JOIN passives pass ON p.office = pass.office AND p.service = pass.service
LEFT JOIN detractors det ON p.office = det.office AND p.service = det.service
LEFT JOIN totals tot ON p.office = tot.office AND p.service = tot.service
UPD: This answer achieves the same less complicated, using the CTEs as well.

How do I select this grouped data correctly?

First of all, sorry for the generic title, I don't know how exactly to word the question I have.
I am working with a legacy database and can't make changes to the schema, so I'm forced to work with what I've got.
Table setup (columns):
Id (a normal int id)
UniqueId (a column that holds the uniqueIds for each location)
status (a varchar column that can contain one of three status's, 'completed', 'failed', 'Attention')
Count (an int column that represents how many users fell into each status
Example data :
UniqueId Status Count
679FCE83-B245-E511-A42C-90B11C2CD708 completed 64
679FCE83-B245-E511-A42C-90B11C2CD708 Attention 1
679FCE83-B245-E511-A42C-90B11C2CD708 failed 101
4500990D-F516-E411-BB09-90B11C2CD708 completed 100
4500990D-F516-E411-BB09-90B11C2CD708 Attention 17
4500990D-F516-E411-BB09-90B11C2CD708 failed 516
557857BD-6B46-E511-A42C-90B11C2CD708 completed 67
557857BD-6B46-E511-A42C-90B11C2CD708 Attention 4
557857BD-6B46-E511-A42C-90B11C2CD708 failed 103
What I am trying to do is select all of the records, grouped by uniqueId, with a separate column for each of the status's containing their individual counts. The results would look something like this...
UniqueId, count(completed), count(failed), count(Attention)
679FCE83-B245-E511-A42C-90B11C2CD708 64 101 1
4500990D-F516-E411-BB09-90B11C2CD708 100 516 17
557857BD-6B46-E511-A42C-90B11C2CD708 67 103 4
I'm sure I'm missing something basic with this, but I can't seem to find the words to Google my way out of this one.
Could someone push me in the right direction?
You can use conditional aggregation:
select uniqueid,
sum(case when status = 'Completed' then count else 0 end) as completed,
sum(case when status = 'Failed' then count else 0 end) as failed,
sum(case when status = 'Attention' then count else 0 end) as attention
from t
group by uniqueid;

Converting Column Headers to Row elements

I have 2 tables I am combining and that works but I think I designed the second table wrong as I have a column for each item of what really is a multiple choice question. The query is this:
select Count(n.ID) as MemCount, u.Pay1Click, u.PayMailCC, u.PayMailCheck, u.PayPhoneACH, u.PayPhoneCC, u.PayWuFoo
from name as n inner join
UD_Demo_ORG as u on n.ID = u.ID
where n.MEMBER_TYPE like 'ORG_%' and n.CATEGORY not like '%_2' and
(u.Pay1Click = '1' or u.PayMailCC = '1' or u.PayMailCheck = '1' or u.PayPhoneACH = '1' or u.PayPhoneCC = '1' or u.PayWuFoo = '1')
group by u.Pay1Click, u.PayMailCC, u.PayMailCheck, u.PayPhoneACH, u.PayPhoneCC, u.PayWuFoo
The results come up like this:
Count Pay1Click PayMailCC PayMailCheck PayPhoneACH PayPhoneCC PayWuFoo
8 0 0 0 0 0 1
25 0 0 0 0 1 0
8 0 0 0 1 0 0
99 0 0 1 0 0 0
11 0 1 0 0 0 0
So the question is, how can I get this to 2 columns, Count and then the headers of the next 6 headers so the results look like this:
Count PaymentType
8 PayWuFoo
25 PayPhoneCC
8 PayPhoneACH
99 PayMailCheck
11 PayMailCC
Thanks.
Try this one
Select Count,
CASE WHEN Pay1Click=1 THEN 'Pay1Click'
PayMailCC=1 THEN ' PayMailCC'
PayMailCheck=1 THEN 'PayMailCheck'
PayPhoneACH=1 THEN 'PayPhoneACH'
PayPhoneCC=1 THEN 'PayPhoneCC'
PayWuFoo=1 THEN 'PayWuFoo'
END as PaymentType
FROM ......
I think indeed you made a mistake in the structure of the second table. Instead of creating a row for each multiple choice question, i would suggest transforming all those columns to a 'answer' column, so you would have the actual name of the alternative as the record in that column.
But for this, you have to change the structure of your tables, and change the way the table is populated. you should get the name of the alternative checked and put it into your table.
More on this, you could care for repetitive data in your table, so writing over and over again the same string could make your table grow larger.
if there are other things implied to the answer, other informations in the UD_Demo_ORG table, then you can normalize the table, creating a payment_dimension table or something like this, give your alternatives an ID such as
ID PaymentType OtherInfo(description, etc)...
1 PayWuFoo ...
2 PayPhoneCC ...
3 PayPhoneACH ...
4 PayMailCheck ...
5 PayMailCC ...
This is called a dimension table, and then in your records, you would have the ID of the payment type, and not the information you don't need.
So instead of a big result set, maybe you could simplify by much your query and have just
Count PaymentId
8 1
25 2
8 3
99 4
11 5
as a result set. it would make the query faster too, and if you need other information, you can then join the table and get it.
BUT if the only field you would have is the name, perhaps you could use the paymentType as the "id" in this case... just consider it. It is scalable if you separate to a dimension table.
Some references for further reading:
http://beginnersbook.com/2015/05/normalization-in-dbms/ "Normalization in DBMS"
http://searchdatamanagement.techtarget.com/answer/What-are-the-differences-between-fact-tables-and-dimension-tables-in-star-schemas "Differences between fact tables and dimensions tables"

Conditional SELECT depending on a set of rules

I need to get data from different columns depending on a set of rules and I don't see how to do it. Let me illustrate this with an example. I have a table:
ID ELEM_01 ELEM_02 ELEM_03
---------------------------------
1 0.12 0 100
2 0.14 5 200
3 0.16 10 300
4 0.18 15 400
5 0.20 20 500
And I have a set of rules which look something like this:
P1Z: ID=2 and ELEM_01
P2Z: ID=4 and ELEM_03
P3Z: ID=4 and ELEM_02
P4Z: ID=3 and ELEM_03
I'm trying to output the following:
P1Z P2Z P3Z P4Z
------------------------
0.14 400 15 300
I'm used to much simpler queries and this is a bit above my level. I'm getting mixed up by this problem and I don't see a straightforward solution. Any pointers would be appreciated.
EDIT Logic behind the rules: the table contains data about different aspects of a piece of equipment. Each combination of ID/ELEM_** represents the value of one aspect of the piece of equipment. The table contains all values of all aspects, but we want a row containing data on only a specific subset of aspects, so that we can output in a single table the values of a specific subset of aspects for all pieces of equipment.
Assuming that each column is numeric and ID is unique you could do:
SELECT
SUM(CASE WHEN ID = 2 THEN ELEM_01 END) AS P1Z,
SUM(CASE WHEN ID = 4 THEN ELEM_03 END) AS P2Z,
SUM(CASE WHEN ID = 4 THEN ELEM_02 END) AS P3Z,
SUM(CASE WHEN ID = 3 THEN ELEM_03 END) AS P4Z
...