SQL AVG value of one column where another column equal specific value - sql

I'm given data of pitchers, the pitch type, and the pitch speed.
|------------------------------------------------|
| day | inning | pitcher| pitch_type| pitch_speed|
| 1 1 AE1 fastball| 97 |
| 1 1 AE1 fastball| 94 |
| 1 1 AE1 slider | 83 |
| 1 2 AE1 fastball| 96 |
| 1 2 AE1 slider | 86 |
| 1 2 AE1 fastball| 97 |
|------------------------------------------------|
Is there a way of querying the data to get the avg value of the pitch speed for a specific pitch type.
I.E. a way to return fastball_speed = 96 and slider_speed = 84.5 (the average)

What about this?
select pitch_type, avg(pitch_speed) from your_table group by pitch_type
BTW please when specifying sample data, use CTE to make work easier for solvers:
#standardSql
with t as (
select 1 as day, 1 as inning, 'AE1' as pitcher, 'fastball' as pitch_type, 97 as pitch_speed union all
select 1 as day, 1 as inning, 'AE1' as pitcher, 'fastball' as pitch_type, 94 as pitch_speed union all
select 1 as day, 1 as inning, 'AE1' as pitcher, 'slider' as pitch_type, 83 as pitch_speed union all
select 1 as day, 2 as inning, 'AE1' as pitcher, 'fastball' as pitch_type, 96 as pitch_speed union all
select 1 as day, 2 as inning, 'AE1' as pitcher, 'slider' as pitch_type, 86 as pitch_speed union all
select 1 as day, 2 as inning, 'AE1' as pitcher, 'fastball' as pitch_type, 97 as pitch_speed
)
select pitch_type, avg(pitch_speed) from t group by pitch_type

Related

Vertica SQL: Disregard rows, based on two conditions

I need a where condition that considers to following for an entire table:
If a 0 exists for an ID (in column d) then exclude everything that is >0, if 0 does not exist, but exists a row where d = a then exclude everything before that..
In Example (Case 1) I want to disregard rows 1 & 2, in Example 2 (Case 2) I want to disregard rows 1,2,& 3.
Currenty I have: where d <= 0 or d = a) but in Case 1 this also returns row nr 2, which I do not want.
row nr
ID
d
a
1
1
180
78
2
1
78
78
3
1
0
78
4
1
-67
78
5
1
-121
78
row nr
ID
d
a
1
2
180
148
2
2
171
148
3
2
170
148
4
2
148
148
5
2
-67
148
6
2
-121
148
This becomes a bit more complex to do than what you expected. You will have to involve a nested query with an OLAP function to detect that each row in a partition ( defined by the value of id) belongs to a partition of which at least one row has a value of 0 for d, and then, outside of that nested query, filter for that fact, and the value of d being 0 or greater. That's case 1.
In the other case, you use the same nested query to ascertain that you use only rows with no row with a value of 0 for d in the partition, and from there, the easiest way is to use Vertica's MATCH() clause to filter out the pattern of rows that consists of : a row with a d equal to a; zero, one or more occurrences of any row following, which I describe, in the query, with the pattern: (d_equal_a anyrow*) .
Here goes:
WITH
-- YOUR INPUT, don't use in query
indata(row_nr,ID,d,a) AS (
SELECT 1,1,180,78
UNION ALL SELECT 2,1,78,78
UNION ALL SELECT 3,1,0,78
UNION ALL SELECT 4,1,-67,78
UNION ALL SELECT 5,1,-121,78
UNION ALL SELECT 1,2,180,148
UNION ALL SELECT 2,2,171,148
UNION ALL SELECT 3,2,170,148
UNION ALL SELECT 4,2,148,148
UNION ALL SELECT 5,2,-67,148
UNION ALL SELECT 6,2,-121,148
)
-- end of your input, real query starts here, replace following comma with "WITH"
,
min_abs_d_eq_0 AS (
-- nested query with OLAP expression returning Boolean
SELECT
*
, (MIN(ABS(d)) OVER (PARTITION BY id) = 0) AS min_abs_d_eq_0
FROM indata
)
,
case1 AS (
SELECT
row_nr
, id
, d
, a
, 'no match clause' AS event_name -- these are based on the
, 0 AS pattern_id -- MATCH clause coming from
, 0 AS match_id -- the next CTE, "case2"
FROM min_abs_d_eq_0
WHERE min_abs_d_eq_0 AND d <= 0
)
,
case2 AS (
SELECT
row_nr
, id
, d
, a
, event_name()
, pattern_id()
, match_id()
FROM min_abs_d_eq_0
WHERE NOT min_abs_d_eq_0
MATCH (
PARTITION BY id ORDER BY row_nr
DEFINE
d_equal_a AS d = a
, anyrow AS true
PATTERN p AS (d_equal_a anyrow*)
)
)
SELECT * FROM case1
UNION ALL
SELECT * FROM case2
ORDER BY id,row_nr;
-- out row_nr | id | d | a | event_name | pattern_id | match_id
-- out --------+----+------+-----+-----------------+------------+----------
-- out 3 | 1 | 0 | 78 | no match clause | 0 | 0
-- out 4 | 1 | -67 | 78 | no match clause | 0 | 0
-- out 5 | 1 | -121 | 78 | no match clause | 0 | 0
-- out 4 | 2 | 148 | 148 | d_equal_a | 1 | 1
-- out 5 | 2 | -67 | 148 | anyrow | 1 | 2
-- out 6 | 2 | -121 | 148 | anyrow | 1 | 3

SQL - For each ID, values in other columns should be repeated

The table I am trying to create should look like this
**ID** **Timeframe** Value
1 60 15
1 60 30
1 90 45
2 60 15
2 60 30
2 90 45
3 60 15
3 60 30
3 90 45
So for each ID the values of 60,60,90 and 15,30,45 should be repeated.
Could anyone help me with a code? :)
You are looking for a cross join. The basic idea is something like this:
select i.id, tv.timeframe, tv.value
from (values (1), (2), (3)) i(id) cross join
(values (60, 15), (60, 30), (90, 45)) tv(timeframe, value)
order by i.id, tv.value;
Not all databases support the values() table constructor. In those databases, you would need to use the appropriate syntax.
So you have this table: ...
id
1
2
3
and you have this table: ...
timeframe value
60 15
60 30
90 45
Then try this:
WITH
-- the ID table...
id(id) AS (
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
)
,
-- the values table:
vals(timeframe,value) AS (
SELECT 60,15
UNION ALL SELECT 60,30
UNION ALL SELECT 90,45
)
SELECT
id
, timeframe
, value
FROM id CROSS JOIN vals
ORDER BY id, timeframe;
-- out id | timeframe | value
-- out ----+-----------+-------
-- out 1 | 60 | 30
-- out 1 | 60 | 15
-- out 1 | 90 | 45
-- out 2 | 60 | 30
-- out 2 | 60 | 15
-- out 2 | 90 | 45
-- out 3 | 60 | 30
-- out 3 | 60 | 15
-- out 3 | 90 | 45
-- out (9 rows)

Select Top 20 Distinct Rows in Each Category

I have a database table in the following format.
Product | Date | Score
A | 01/01/18 | 99
B | 01/01/18 | 98
C | 01/01/18 | 97
--------------------------
A | 02/01/18 | 99
B | 02/01/18 | 98
C | 02/01/18 | 97
--------------------------
D | 03/01/18 | 99
A | 03/01/18 | 98
B | 03/01/18 | 97
C | 03/01/18 | 96
I want to pick the first from every month such that there are no repeat products. For example, the output of the above table should be
Product | Date | Score
A | 01/01/18 | 99
B | 02/01/18 | 98
D | 03/01/18 | 99
How do I get this result with a single sql query? The actual table is much bigger than this and I want top 20 from every month without repetition.
This is a hard problem -- a type of subgraph problem that isn't really suitable to SQL. There is a brute force approach:
with jan as (
select *
from t
where date = '2018-01-01'
limit 1
),
feb as (
select *
from t
where date = '2018-02-01' and
product not in (select product from jan)
),
mar as (
select *
from t
where date = '2018-03-01' and
product not in (select product from jan) and
product not in (select product from feb)
)
select *
from jan
union all
select *
from feb
union all
select *
from mar;
You can generalize this with additional CTEs. But there is no guarantee that a month will have a product -- even when it could have had one.
It is possible by using row_number.
select * from (
select row_Number() over(partition by Product order by Product ) as rno,* from
Products
) as t where t.rno<=20
I think you want top 20 records every month without repeating products than below solution will be work.
select *
into #temp
from
(values
('A','01/01/18','99')
,('B','01/01/18','98')
,('C','01/01/18','97')
,('A','02/01/18','99')
,('B','02/01/18','98')
,('C','02/01/18','97')
,('D','03/01/18','99')
,('A','03/01/18','98')
,('B','03/01/18','97')
,('C','03/01/18','96')
) AS VTE (Product ,Date, Score )
select * from
(
select * , ROW_NUMBER() over (partition by date,product order by score ) as rn
from #TEMP
)
A where rn < 20

Sql Query Compare and Sum

I have these problem I need to match the sum a columns to see if they match with the Final Total of the Invoice by Invoice Number ( I am working in a query to do it)
Example
Invoice No Line _no Total Line Invoice total Field I will create
----------------------------------------------------------------------
45 1 145 300 145
45 2 165 300 300 Match
46 1 200 200 200 Match
47 1 100 300 100
47 2 100 300 200
47 3 100 300 300 Match
You want a cumulative sum. In SQL Server 2012+, just do:
select e.*,
(case when InvoiceTotal = sum(InvoiceTotal) over (partition by invoice_no order by line_no)
then 'Match'
end)
from example e;
In earlier versions of SQL Server, I would be inclined to do it with a correlated subquery:
select e.*
(case when InvoiceTotal = (select sum(InvoiceTotal)
from example e2
where e2.Invoice_no = e.invoice_no and
e2.line_no >= e.line_no
)
then 'Match'
end)
from example e;
You can also do this with a cross apply as M Ali suggests.
EDIT:
Now that I think about the problem, you don't need a cumulative sum. That was just how I originally thought of the problem. So, this will work in SQL Server 2008:
select e.*,
(case when InvoiceTotal = sum(InvoiceTotal) over (partition by invoice_no)
then 'Match'
end)
from example e;
You can't get the cumulative sum out (the second to last column) without more manipulation, but the match column is not hard.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE TEST(InvoiceNo INT, Line_no INT, TotalLine INT, InvoiceTotal INT)
INSERT INTO TEST VALUES
(45 ,1 ,145 ,300),
(45 ,2 ,165 ,300),
(46 ,1 ,200 ,200),
(47 ,1 ,100 ,300),
(47 ,2 ,100 ,300),
(47 ,3 ,100 ,300)
Query 1:
SELECT t.[InvoiceNo]
,t.[Line_no]
,t.[TotalLine]
,t.[InvoiceTotal]
,C.Grand_Total
,CASE WHEN C.Grand_Total = t.[InvoiceTotal]
THEN 'Match' ELSE '' END AS [Matched]
FROM TEST t
CROSS APPLY (SELECT SUM([TotalLine]) AS Grand_Total
FROM TEST
WHERE [InvoiceNo] = t.[InvoiceNo]
AND [Line_no] < = t.[Line_no]) C
Results:
| INVOICENO | LINE_NO | TOTALLINE | INVOICETOTAL | GRAND_TOTAL | MATCHED |
|-----------|---------|-----------|--------------|-------------|---------|
| 45 | 1 | 145 | 300 | 145 | |
| 45 | 2 | 165 | 300 | 310 | |
| 46 | 1 | 200 | 200 | 200 | Match |
| 47 | 1 | 100 | 300 | 100 | |
| 47 | 2 | 100 | 300 | 200 | |
| 47 | 3 | 100 | 300 | 300 | Match |
Is this what you're looking for? I think subquery is what you're asking about, but i'm guessing to get an end result similar to the entire thing.
select t."Invoice No", t."Line no_", t."Invoice total",
calcTotals.lineNum as calcSum, case when t."Invoice total" = calcTotals.lineNum then 'matched' else 'not matched' end
from [table] t
inner join (
select "Invoice No" as invoiceNumber,
sum("Line _no") as lineNum
from [table]
group by "Invoice No"
) calcTotals on t."Invoice No" = calcTotals.invoiceNumber

Group Date column based on hours

I have a table in sqlite database where I store data about call logs. As an example assume that my table looks like this
| Calls_count | Calls_duration | Time_slice | Time_stamp |
| 10 | 500 | 21 | 1399369269 |
| 2 | 88 | 22 | 1399383668 |
Here
Calls_count is calls made since last observations
Calls_duration is the duration of calls in ms since last observations
Time-slice represents a time portion of week. Every day is divided into 4 portions of 6 hours each such that
06:00-11:59 | 12:00-17:59 | 18:00- 23.59 | 24:00-05:59 |
Mon| 11 | 12 | 13 | 14 |
Tue| 21 | 22 | 23 | 24 |
Wed| 31 | 32 | 33 | 34 |
Thu| 41 | 42 | 43 | 44 |
Fri| 51 | 52 | 53 | 54 |
Sat| 61 | 62 | 63 | 64 |
Sun| 71 | 72 | 73 | 74 |
And the time_stamp is unix epoch when the observation was made/ record was inserted in the database
Now I want to create a query so that if I specify time_stamp for a start and the end of week, The result is 168 rows of data, giving me sum of calls grouped by hour such that I get 24 rows for each day of week. This is an hourly break down of calls in a week.
SUM_CALLS | Time_Slice | Hour_of_Week |
10 | 11 | 1 |
0 | 11 | 2 |
....
7 | 74 | 167 |
4 | 74 | 168 |
In the above example of intended result,
1st row is Monday 06:00-06:59
2nd row is Monday 07:00-07:59
Last row is Sunday 04:00-05:59
Since version 3.8.3 SQLite supports common table expressions
and this is a possible solution
WITH RECURSIVE
hours(x,y) AS (SELECT CAST(STRFTIME('%s',STRFTIME('%Y-%m-%d %H:00:00', '2014-05-05 00:00:00')) AS INTEGER),
CAST(STRFTIME('%s',STRFTIME('%Y-%m-%d %H:59:59', '2014-05-05 00:00:00')) AS INTEGER)
UNION ALL
SELECT x+3600,y+3600 FROM hours LIMIT 168)
SELECT
COALESCE(SUM(Calls_count),0) AS SUM_CALLS,
CASE CAST(STRFTIME('%w',x,'unixepoch') AS INTEGER)
WHEN 0 THEN 7 ELSE STRFTIME('%w',x,'unixepoch') END
||
CASE
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '06:00:00' AND '11:59:59' THEN 1
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '12:00:00' AND '17:59:59' THEN 2
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '18:00:00' AND '23:59:59' THEN 3
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '00:00:00' AND '05:59:59' THEN 4
END AS Time_Slice,
((x-(SELECT MIN(x) FROM hours))/3600)+1 AS Hour_of_Week
FROM hours LEFT JOIN call_logs
ON call_logs.time_stamp >= hours.x AND call_logs.time_stamp <= hours.y
GROUP BY Hour_of_Week
ORDER BY Hour_of_Week
;
This is tested with SQLite version 3.7.13 without cte:
DROP VIEW IF EXISTS digit;
CREATE TEMPORARY VIEW digit AS SELECT 0 AS d UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION
SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
;
DROP VIEW IF EXISTS hours;
CREATE TEMPORARY VIEW hours AS SELECT STRFTIME('%s','2014-05-05 00:00:00') + s AS x,
STRFTIME('%s','2014-05-05 00:00:00') + s+3599 AS y
FROM (SELECT (a.d || b.d || c.d) * 3600 AS s FROM digit a, digit b, digit c LIMIT 168)
;
SELECT
COALESCE(SUM(Calls_count),0) AS SUM_CALLS,
CASE CAST(STRFTIME('%w',x,'unixepoch') AS INTEGER)
WHEN 0 THEN 7 ELSE STRFTIME('%w',x,'unixepoch') END
||
CASE
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '06:00:00' AND '11:59:59' THEN 1
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '12:00:00' AND '17:59:59' THEN 2
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '18:00:00' AND '23:59:59' THEN 3
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '00:00:00' AND '05:59:59' THEN 4
END AS Time_Slice,
((x-(SELECT MIN(x) FROM hours))/3600)+1 AS Hour_of_Week
FROM hours LEFT JOIN call_logs
ON call_logs.time_stamp >= hours.x AND call_logs.time_stamp <= hours.y
GROUP BY Hour_of_Week
ORDER BY Hour_of_Week
;