counting values in two columns at the same time? SQL/SQLITE - sql

The problem I am attemping to do is find which two people go on the most number of trips together. I created a table where you have the name of a person, the name of somebody that went on a hike with them, the name of the peak, and the date. I want to be able to count all of the values of name1 and name2. Ex in my dataset below I want to count how many times 'Mary' and 'Patricia' appear side by side. I tried to use a COUNT(name1,name2) as numPairs and using group by (name1,name2) but SQLLITE says count() can only take in one parameter.
If my query looks off it is mainly due to the fact I am more comfortable using relational algebra selections/projections to get my data. am open to any other solutions that may help me
mary,patricia
brad,steven
cherry,rick
brad,steven
mary,patricia
mary,patricia
| 2
brad,steven | 2
cherry,rick | 1

Do you simply want aggregation?
select name1, name2, count(*)
from t
group by name1, name2;
If you want distinct pairs, so the ordering doesn't matter, you can use least() and greatest():
select least(name1, name2), greatest(name1, name2), count(*)
from t
group by least(name1, name2), greatest(name1, name2);

You need to use nested query as below
select count(*) as Num_of_times,both_names, peak1 from (select concat(name1,",",name2)
as both_names, peak1 from your_table) as tbl group by both_names;

Related

Grouping a percentage calculation in postgres/redshift

I keep running in to the same problem over and over again, hoping someone can help...
I have a large table with a category column that has 28 entries for donkey breed, then I'm counting two specific values grouped by each of those categories in subqueries like this:
WITH totaldonkeys AS (
SELECT donkeybreed,
COUNT(*) AS total
FROM donkeytable1
GROUP BY donkeybreed
)
,
sickdonkeys AS (
SELECT donkeybreed,
COUNT(*) AS totalsick
FROM donkeytable1
JOIN donkeyhealth on donkeytable1.donkeyid = donkeyhealth.donkeyid
WHERE donkeyhealth.sick IS TRUE
GROUP BY donkeybreed
)
,
It's my goal to end up with a table that has primarily the percentage of sick donkeys for each breed but I always end up struggling like hell with the problem of not being able to group by without using an aggregate function which I cannot do here:
SELECT (CAST(sickdonkeys.totalsick AS float) / totaldonkeys.total) * 100 AS percentsick,
totaldonkeys.donkeybreed
FROM totaldonkeys, sickdonkeys
GROUP BY totaldonkeys.donkeybreed
When I run this I end up with 28 results for each breed of donkey, one correct I believe but obviously hundreds of useless datapoints.
I know I'm probably being really dumb here but I keep hitting in to this same problem again and again with new donkeydata, I should obviously be structuring the whole thing a new way because you just can't do this final query without an aggregate function, I think I must be missing something significant.
You can easily count the proportion that are sick in the donkeyhealth table
SELECT d.donkeybreed,
AVG( (dh.sick)::int ) AS proportion_sick
FROM donkeytable1 d JOIN
donkeyhealth dh
ON d.donkeyid = dh.donkeyid
GROUP BY d.donkeybreed

Group By Using Wildcards in Big Query

I have this query:
SELECT SomeTableA.*
FROM SomeTableB
LEFT JOIN SomeTableA USING (XYZ)
GROUP BY SomeTableA.*
I know that I cannot do the GROUP BY part with wildcards. At the same time, I don't really like listing all the columns (can be up to 20) manually.
Could this be added as new feature? Or is there any way how to easily get the list of all 20 columns from SomeTableA for the GROUP BY part?
If you really have the exact query shown in your question - then try below instead - no grouping required
#standardSQL
SELECT DISTINCT *
FROM `project.dataset.tableA`
WHERE xyz IN (SELECT xyz FROM `project.dataset.tableB`)
As of Group By Using Wildcards in Big Query this sounds more like grouping by struct which is not supported so you can submit feature request if you want - https://issuetracker.google.com/issues/new?component=187149&template=0

SQL Nested Query with distinct count

I have a dilemma, and I'm hoping someone will be able to help me out. I am attempting to work on some made up problems from an old text book of mine, this isn't a question from the book, but the data is, I just wanted to see if I could still work in SQL, so here goes. When this code is executed,
SELECT COUNT(code_description) "Number of Different Crimes", last, first,
code_description
FROM
(
SELECT criminal_id, last, first, crime_code, code_description
FROM criminals
JOIN crimes USING (criminal_id)
JOIN crime_charges USING (crime_id)
JOIN crime_codes USING (crime_code)
ORDER BY criminal_id
)
WHERE criminal_id = 1020
GROUP BY last, first, code_description;
I am provided with these results:
Number of Different Crimes LAST FIRST CODE_DESCRIPTION
1 Phelps Sam Agg Assault
1 Phelps Sam Drug Offense
Inevitably, I would like the number of different crimes to be 2 for each line since this criminal has two unique crimes charged to him. I would like it to be displayed something like:
Number of Different Crimes LAST FIRST CODE_DESCRIPTION
2 Phelps Sam Agg Assault
2 Phelps Sam Drug Offense
Not to push my luck but I would also like to get rid of the follow line also:
WHERE criminal_id = 1020
to something a little more elegant to represent any criminal with more than 1 crime type associated with them, for this case, Sam Phelps is the only one in this data set.
As #sgeddes said in a comment, you can use an analytic count, which doesn't need a subquery if you're specifying the criminal ID:
SELECT COUNT(code_description) OVER (PARTITION BY first, last) AS "Number of Different Crimes",
last, first, code_description
FROM criminals
JOIN crimes USING (criminal_id)
JOIN crime_charges USING (crime_id)
JOIN crime_codes USING (crime_code)
WHERE criminal_id = 1020;
If you want to look for anyone with multiple crimes then you do need a subquery so you can filter on the analytic result:
SELECT charge_count AS "Number of Different Crimes",
last, first, code_description
FROM (
SELECT COUNT(DISTINCT code_description) OVER (PARTITION BY first, last) AS charge_count,
criminal_id, last, first, code_description
FROM criminals
JOIN crimes USING (criminal_id)
JOIN crime_charges USING (crime_id)
JOIN crime_codes USING (crime_code)
)
WHERE charge_count > 1
ORDER BY criminal_id, code_description;
SQL Fiddle demo.
If the charges are across multiple crimes, but duplicated, then the distinct count still works, but you might want to make add a distinct to the overall result set - unless you want to show other crime-specific info - otherwise you get something like this.

Count number of occurrences of keyword in comma separated column?

I have a column which stores data like this:
Product:
product1,product2,product5
product5,product7
product1
What I would like to do is count the number of occurrences there are of product1, product2, etc. but where the record contains multiple products I want it to double count them.
So for the above example the totals would be:
product1: 2
product2: 1
product5: 2
product7: 1
How can I achieve this?
I was trying something like this:
select count(case when prodcolumn like '%product1%' then 'product1' end) from myTable
This gets me the count for product1 appears but how do I extend this to go through each product?
I also tried something like this:
select new_productvalue, count(new_productvalue) from OpportunityExtensionBase
group by new_ProductValue
But that lists all different combinations of the products which were found and how many times they were found...
These products don't change so hard coding it is ok...
EDIT: here is what worked for me.
WITH Product_CTE (prod) AS
(SELECT
n.q.value('.', 'varchar(50)')
FROM (SELECT cast('<r>'+replace(new_productvalue, ';', '</r><r>')+'</r>' AS xml) FROM table) AS s(XMLCol)
CROSS APPLY s.XMLCol.nodes('r') AS n(q)
WHERE n.q.value('.', 'varchar(50)') <> '')
SELECT prod, count(*) AS [Num of Opps.] FROM Product_CTE GROUP BY prod
You have a lousy, lousy data structure, but sometimes one must make do with that. You should have a separate table storing each pair product/whatever pair -- that is the relational way.
with prodref as (
select 'product1' as prod union all
select 'product2' as prod union all
select 'product5' as prod union all
select 'product7' as prod
)
select p.prod, count(*)
from prodref pr left outer join
product p
on ','+p.col+',' like '%,'+pr.prod+',%'
group by p.prod;
This will be quite slow on a large table. And, the query cannot make use of standard indexes. But, it should work. If you can restructure the data, then you should.
Nevermind all you need if one split function
SQL query to split column data into rows
hope after this you can manage .

SQL Query (incorrect use of joins?)

I'm trying to write an (Oracle) SQL query that, given an "agent_id", would give me a list of questions that agent has answered during an assessment, as well as an average score over all of the times that agent has answered those questions.
Note: I tried to design the query such that it would support multiple employees (so we can query at the store level), hence the "IN" condition in the where clause.
Here's what I have so far:
select question.question_id as questionId,
((sum(answer.answer) / count(answer.answer)) * 100) as avgScore
from SPMADMIN.SPM_QC_ASSESSMENT_ANSWER answer
join SPMADMIN.SPM_QC_QUESTION question
on answer.question_id = question.question_id
join SPMADMIN.SPM_QC_ASSESSMENT assessment
on answer.assessment_id = assessment.assessment_id
join SPMADMIN.SPM_QC_SUB_GROUP_TYPE sub_group
on question.sub_group_type_id = sub_group.sub_group_id
join SPMADMIN.SPM_QC_GROUP_TYPE theGroup
on sub_group.group_id = theGroup.group_id
where question.question_id in (select distinct question2.question_id
from SPMADMIN.SPM_QC_QUESTION question2
)
and question.bool_yn_active_flag = 'Y'
and assessment.agent_id in (?)
and answer.answer is not null
order by theGroup.page_order asc,
sub_group.group_order asc,
question.sub_group_order asc
Basically I would want to see:
|questionId|avgScore|
| 1 | 100 |
| 2 | 50 |
| 3 | 75 |
Such that every question that employee has ever answered is in the list of question indexes with their average score over all of the times they've answered it.
When I run it as is, I'm given a "ORA-00937: not a single-group group function" error. Any sort of combination of a "group by" clause I've added hasn't helped in the least.
When I run it removing the question.question_id as questionId, part of the select, it runs fine, but it shows their average score over all questions. I need it broken down by question.
Any help or pointers would be greatly appreciated.
When you have an aggregate function in the SELECT list (SUM and COUNT are aggregate functions), then any other columns in the SELECT list need to be in a GROUP BY clause. For example:
SELECT fi, COUNT(fo)
FROM fum
GROUP BY fi
The COUNT(fo) expression is an aggregate, the fi column is a non-aggregate. If you were to add another non-aggregate to the SELECT list, it would also need to be included in the GROUP BY. For example
SELECT TRUNC(fee), fi, COUNT(fo)
FROM fum
GROUP BY TRUNC(fee), fi
To be a little more precise, rather than say "columns in the SELECT list", we should actually say "all non-aggregate expressions in the SELECT list" will need to be included in the GROUP BY clause.
It's not your joins but your use of GROUP BY.
When you use a GROUP BY in SQL, the things you GROUP BY are the things which define the groups. Everything else you have in your SELECT have to be in aggregates which operate over the group.
You can also do aggregates over the entire set without a GROUP BY, but then every column will need to be within an aggregate function.