SQL: Displaying results from conditional loop - sql

I'm not even sure if this is possible using SQL only but here goes...
I have a list of football results in one table, each row is a match and contains all data from that match, I want to cycle through each match, get the home team, check their last 6 matches and display only the matches where the specified team scored 2 goals or more in 50% or more of their last 6 matches.
So far I have this, I just don't know how to stitch it together...
Create list of all games, returning only the home team:
SELECT Date, Home
FROM [FDATA].[dbo].[Goals]
ORDER BY Date
Get last 6 games of that team:
SELECT TOP 6 *
FROM [FDATA].[dbo].[Goals]
WHERE Home = 'home from first query' AND Date <= 'date from first query'
ORDER BY Date DESC
Then check if the team scored 2 or more goals in >= 50% of the 6 games returned and output the row from the first query if true:
SELECT *
FROM last query
WHERE HomeGoals >= 2
ORDER BY Date DESC
Apologies for the crudeness of this question but I'm a bit of a novice.

Use only need two queries:
SELECT home, count(1) cnt
FROM
(
SELECT TOP 6 G1.HomeGoals, G1.Home
FROM [FDATA].[dbo].[Goals] AS G1
LEFT OUTER JOIN
[FDATA].[dbo].[Goals] AS G2 ON
G1.Home = G2.HOME AND G1.Date <= G2.Date
ORDER BY G1.Date DESC
)
WHERE HomeGoals >= 2
GROUP BY home
HAVING count(1) >= 3

Related

how can I count some values for data in a table based on same key in another table in Bigquery?

I have one table like bellow. Each id is unique.
id
times_of_going_out
fef666
2
S335gg
1
9a2c50
1
and another table like this one ↓. In this second table the "id" is not unique, there are different "category_name" for a single id.
id
category_name
city
S335gg
Games & Game Supplies
tk
9a2c50
Telephone Companies
os
9a2c50
Recreation Centers
ky
fef666
Recreation Centers
ky
I want to find the difference between destinations(category_name) of people who go out often(times_of_going_out<5) and people who don't go out often(times_of_going_out<=5).
** Both tables are a small sample of large tables.
 ・ Where do people who go out twice often go?
 ・ Where do people who go out 6times often go?
Thank you
The expected result could be something like
less than 5
more than 5
top ten “category_name” for uid’s with "times_of_going_out" less than 5 times
top ten “category_name” for uid’s with "times_of_going_out" more than 5 times
Steps:
combining data and aggregating total time_going_out
creating the categories that you need : less than equal to 5 and more than 5. if you don't need equal to 5, you can adjust the code
ranking both categories with top 10, using dense_rank(). this will produce the rank from 1 - 10 based on the total time_going out
filtering the cases so it takes top 10 values for both categories
with main as (
select
category_name,
sum(coalesce(times_of_going_out,0)) as total_time_per_category
from table1 as t1
left join table2 as t2
on t1.id = t2.id
group by 1
),
category as (
select
*,
if(total_time_per_category >= 5, 'more than 5', 'less than equal to 5') as is_more_than_5_times
from main
),
ranking_ as (
select *,
case when
is_more_than_5_times = 'more than 5' then
dense_rank() over (partition by is_more_than_5_times order by total_time_per_category desc)
else NULL
end AS rank_more_than_5,
case when
is_more_than_5_times = 'less than equal to 5' then
dense_rank() over (partition by is_more_than_5_times order by total_time_per_category)
else NULL
end AS rank_less_than_equal_5
from category
)
select
is_more_than_5_times,
string_agg(category_name,',') as list
from ranking_
where rank_less_than_equal_5 <=10 or rank_more_than_5 <= 10
group by 1

How to consecutively count everything greater than or equal to itself in SQL?

Let's say if I have a table that contains Equipment IDs of equipments for each Equipment Type and Equipment Age, how can I do a Count Distinct of Equipment IDs that have at least that Equipment Age.
For example, let's say this is all the data we have:
equipment_type
equipment_id
equipment_age
Screwdriver
A123
1
Screwdriver
A234
2
Screwdriver
A345
2
Screwdriver
A456
2
Screwdriver
A567
3
I would like the output to be:
equipment_type
equipment_age
count_of_equipment_at_least_this_age
Screwdriver
1
5
Screwdriver
2
4
Screwdriver
3
1
Reason is there are 5 screwdrivers that are at least 1 day old, 4 screwdrivers at least 2 days old and only 1 screwdriver at least 3 days old.
So far I was only able to do count of equipments that falls within each equipment_age (like this query shown below), but not "at least that equipment_age".
SELECT
equipment_type,
equipment_age,
COUNT(DISTINCT equipment_id) as count_of_equipments
FROM equipment_table
GROUP BY 1, 2
Consider below join-less solution
select distinct
equipment_type,
equipment_age,
count(*) over equipment_at_least_this_age as count_of_equipment_at_least_this_age
from equipment_table
window equipment_at_least_this_age as (
partition by equipment_type
order by equipment_age
range between current row and unbounded following
)
if applied to sample data in your question - output is
Use a self join approach:
SELECT
e1.equipment_type,
e1.equipment_age,
COUNT(*) AS count_of_equipments
FROM equipment_table e1
INNER JOIN equipment_table e2
ON e2.equipment_type = e1.equipment_type AND
e2.equipment_age >= e1.equipment_age
GROUP BY 1, 2
ORDER BY 1, 2;
GROUP BY restricts the scope of COUNT to the rows in the group, i.e. it will not let you reach other rows (rows with equipment_age greater than that of the current group). So you need a subquery or windowing functions to get those. One way:
SELECT
equipment_type,
equipment_age,
(Select COUNT(*)
from equipment_table cnt
where cnt.equipment_type = a.equipment_type
AND cnt.equipment_age >= a.equipment_age
) as count_of_equipments
FROM equipment_table a
GROUP BY 1, 2, 3
I am not sure if your environment supports this syntax, though. If not, let us know we will find another way.

Postgres OFFSET the results of subqueries group by

Im calculating weekly score for game where certain weeks there are bonuses, and when totaling your score the lowest two scores are dropped.
id
name
week
score
1
Player A
1
10
2
Player A
2
20
3
Player A
3
30
4
Player A
4
40
5
Player B
1
5
6
Player B
2
10
7
Player B
3
15
8
Player B
4
20
Let's say in week 2 your score should be doubled,
So A's scores should be [10,40,30,40] and B [5,20,15,20]
With the rules of removing the two lowest scores
A [40,40] total 80
B [20,20] total 40
If I run this this query
select name, sum(special_scores) as total_score
from(
select
name,
case
when week = 2 then score * 2
else score
end special_scores
from public.standings
where name = 'Player A'
order by special_scores
offset 2
) s
group by name
order by total_score desc;
I see the expected result of totaling the score column and omitting the last two results, so I believe my sub query is correct.
However if I remove the where clause from the subquery
select name, sum(special_scores) as total_score from (
select name, case
when week = 2 then score * 2
else score
end special_scores from public.standings
order by special_scores
offset 2
) s
group by name
order by total_score desc
The table will populate but will not omit the two lowest scores
So I'm getting something like
name.
total_score
Player A
120
Player B
60
Could someone help as to why the offset in the second query is not removing the scores before totaling?
Brake the problem into subproblems.
First create a subquery that computes the actual scores (when they have to be doubled). keep id, week and actual score.
THen use a window function (row_number()) to drop the bottom two scores.
Then sum the results by id.
Finally, join the id to this result table to get the player name too.

SQL obtaining items ranked by their count(*)

I have been attempting the following query for a while- not sure how to approach this issue I'm having.
I need to obtain bands that cover the second most styles of music - including all equal bands if there is a tie for second. For example for the table band_style,
Band_id | Style
---------------------
1 Rock
2 Pop
1 Punk
3 Classical
1 Metal
2 Rock
4 Pop
4 Rap
The returned result should be
Band_id | Num_styles
2 2
4 2
My initial attempt at a solution:
SELECT band_id, COUNT(*) AS num_styles FROM band_style
GROUP BY band_id HAVING COUNT(*) <
(SELECT MAX(c) FROM
(SELECT COUNT(band_id) AS c
FROM band_style
GROUP BY band_id));
So this gives me the count of all the bands with less styles than the maximum. Now, I'd like to take ALL rows which have the maximum value of this query. I do not want to use rownum or limit because from what I've experienced this doesn't work too well in the case of ties. I am also wondering if there is a way to wrap this in another MAX function, but I don't really see how.
Any help with this issue would be appreciated- also think this would be useful to know to see if it can be applied to 3rd, 4th highest, etc.
(Using Oracle/SQLPlus)
Assuming this is a large data file and we do not necessarily know what the "second highest count" is.
UPDATE: this almost works- gets all bands with less than max number of styles. But calling MAX doesn't seem to be working, as the table returned still has all values of NUM except the max..
WITH data AS (
SELECT band_id, COUNT(*) AS NUM FROM band_style GROUP BY band_id HAVING COUNT (*) <
(SELECT MAX(c) FROM
(SELECT COUNT(band_id) AS c
FROM band_style
GROUP BY band_id)))
SELECT data.band_id, data.NUM FROM data
INNER JOIN ( SELECT band_id m, MAX(NUM) n
FROM data GROUP BY band_id
) t
ON t.band_id = data.band_id
AND t.NUM = data.NUM;
If you have to stick with mysql, this sql will be much more difficult. But if you could switch to mariadb or oracle this should work.
with data as (
select
band_id, count(*) styles,
dense_rank() over (order by count(*) desc) place
from
table1 group by band_id)
select * from data where place=2
http://sqlfiddle.com/#!4/dc3f6/12
Your friend here is the window function dense_rank.
The output is:
BAND_ID STYLES PLACE
2 2 2
4 2 2
And here to avoid some missunderstandings, due to place 2 is here styles 2.
http://sqlfiddle.com/#!4/2be32/3
Now the styles count is different from the place id.
BAND_ID STYLES PLACE
4 3 2
This illustrates that dense_rank does not know the second highest count value beforehand.

sql - return max value for a time period in two tables

Here is a simplified version of my problem:
I have two tables:
Students:
ST_STUDENT_ID NAME ST_DATE_TAKEN
------------- ----- -------------
1 Jim 2011-01-01
2 Fred 2011-01-02
3 Sarah 2011-01-03
4 Nancy 2001-02-04
SCORES:
SC_STUDENT_ID SC_SCORE
------------- --------
1 97
2 97
3 95
4 97
I need to pull the student with the highest score for a month (say January). However, I only want one student even if multiple students received that score and that score could also exist outside my focus month, so this is complicating my query. The only way I could figure to do it was to redo all my criteria at each nested sub-query. Is there a better way. It wasn't too terrible here, but in my actual problem the where criteria is much more complicated and joined across many tables, duplicating it is a pain, plus the cost gets quite large.
SELECT ST_STUDENT_ID, ST_TIMESTAMP, SC_SCORE
FROM STUDENTS
JOIN SCORES
ON ST_STUDENT_ID = SC_STUDENT_ID
WHERE ST_STUDENT_ID = (
SELECT MAX(ST_STUDENT_ID)
FROM STUDENTS
JOIN SCORES
ON ST_STUDENT_ID = SC_STUDENT_ID
WHERE ST_TIMESTAMP > '2011-01-01'
AND ST_TIMESTAMP < '2011-02-01'
AND SC_SCORE IS NOT NULL
AND SC_SCORE = (
SELECT MAX(SC_SCORE)
FROM STUDENTS
JOIN SCORES
ON ST_STUDENT_ID = SC_STUDENT_ID
WHERE ST_TIMESTAMP > '2011-01-01'
AND ST_TIMESTAMP < '2011-02-01'))
If you only want one score, and your time period will be passed explicitly into the query, what about something like this?
SELECT TOP 1 ST_STUDENT_ID, ST_TIMESTAMP, SC_SCORE
FROM STUDENTS
JOIN SCORES ON ST_STUDENT_ID = SC_STUDENT_ID
WHERE ST_TIMESTAMP >= '2011-01-01'
AND ST_TIMESTAMP <= '2011-02-01'
ORDER BY SC_SCORE DESC, ST_STUDENT_ID DESC
That syntax should work for MS SQL Server - different RDBMSs have slightly different syntaxes for the "TOP 1" concept.
[I see from your later comment that you're using DB2 - in which case the syntax apparently is FETCH FIRST 1 ROWS ONLY.]
Note that I'm following the logic in your example, which implies the student with the highest ID takes precedence. Good incentive to register late for class ;-)
(Assuming SQL Server 2005 or later, or another RDMBS that supports CTEs and window functions)
Something like:
;With OrderedScores as (
SELECT
ST_STUDENT_ID,
ST_TIMESTAMP,
SC_SCORE,
ROW_NUMBER() OVER (ORDER BY SC_SCORE desc,newid()) as rn /* Ordered randomly within same score */
FROM
STUDENTS
join
SCORES
on ST_STUDENT_ID = SC_STUDENT_ID
WHERE
ST_TIMESTAMP >= '20110101' and
ST_TIMESTAMP < '20110201'
)
select * from OrderedScores where rn = 1
Obviously, you can play with the criteria within the ORDER BY of the window function, to determine which student you pick when ties exist (in the above it's random; again assuming SQL Server - if another RDBMS, newid() should be replaced with something else)
In addition, I think I've got your date criteria correct here - in your original query, you have one set of criteria that use > and < (so excluding Jim), and in the other, you use <= and >=, which could include a student who tested on 1st Feb.