Efficient query for average count in SQL - sql

I have a table:
id age tag
1 26 green
1 26 blue
2 28 yellow
3 45 red
4 40 blue
4 40 green
5 50 red
I need to find both the average and standard deviation of the number of distinct tags per age group (I have coded this with CASE WHEN).
So to get for example:
average_age average_tags sd_tags age_group
26.7 1.5 0.1 25-34
40 2 0 35-44
45 1 0.01 44+
Is there a way to more efficiently calculate the average count of tags and also to avoid repeating the CASE code in the sub-query for efficiency / readability?
SELECT
AVG(age) as average_age,
AVG(countField) as average_spread,
STDDEV(countField) as sd_spread,
CASE
WHEN v.age BETWEEN 25 AND 34 THEN '25-34'
WHEN v.age BETWEEN 35 AND 44 THEN '35-44'
ELSE '44+'
END AS age_group,
FROM vid v
(
SELECT AVG(v.age),
COUNT(DISTINCT(p.tag)) AS countField,
CASE
WHEN v.age BETWEEN 25 AND 34 THEN '25-34'
WHEN v.age BETWEEN 35 AND 44 THEN '35-44'
ELSE '44+'
END AS age_group,
FROM vid v
GROUP BY age_group
) as q WHERE age IS NOT NULL
GROUP BY age_group ORDER BY age_group;

you can use this query
select AVG(T.Age) as average_age,AVG(countField) as average_spread,age_group
from (select v.Age,COUNT(DISTINCT(v.Tag)) as countField,CASE
WHEN v.age BETWEEN 25 AND 34 THEN '25-34'
WHEN v.age BETWEEN 35 AND 44 THEN '35-44'
ELSE '44+' END AS age_group
from VID v
group by v.Age) T
WHERE age IS NOT NULL
GROUP BY age_group ORDER BY age_group;

Related

Hive Summing up data in the table based on the date range

Have a table with the following schema design and the data residing inside it is like:
ID HITS MISS DDATE
1 10 3 20180101
1 33 21 20180122
1 84 11 20180901
1 11 2 20180405
1 54 23 20190203
1 33 43 20190102
4 54 22 20170305
4 56 88 20180115
5 87 22 20180809
5 66 48 20180617
5 91 53 20170606
DataTypes:
ID INT
HITS INT
MISS INT
DDATE STRING
The requirement is to calculate the total of the given (HITS and MISS) on yearly basis i.e 2017,2018,2019...
Written the following query:
SELECT ID,
SUM(HITS) AS HITS,SUM(MISS) AS MISS,
CASE
WHEN DDATE BETWEEN '201701' AND '201712' THEN '2017' ELSE
'NOTHING' END AS TTL_YR17_DATA
CASE
WHEN DDATE BETWEEN '201801' AND '201812' THEN '2018' ELSE
'NOTHING' END AS TTL_YR18_DATA
CASE
WHEN DDATE BETWEEN '201901' AND '201912' THEN '2019' ELSE
'NOTHING' END AS TTL_YR19_DATA
FROM
HST_TABLE
WHERE
DDATE BETWEEN '201801' AND '201812'
GROUP BY
ID,DDATE;
But, the query is not fetching the expected result.
Actual O/P:
1 10 3 2018
1 33 21 2018
1 84 11 2018
1 11 2 2018
1 54 23 2019
1 33 43 2019
4 54 22 2017
4 56 88 2018
5 87 22 2018
5 66 48 2018
5 91 53 2017
Expected O/P:
1 138 37 2018
4 56 88 2018
5 153 70 2018
1 87 66 2019
5 91 53 2017
Another related question:
Is there a way that I can avoid passing the DDATE range in the query? As this should be given by the user and shouldn't be hardcoded.
Any help/advice to achieve the above two requirements will be really helpful.
OK,it's easy to implement this with the substring function in HIVE, as below:
select
substring(dddate,0,4) as the_year,
id,
sum(hits) as hits_num,
sum(miss) as miss_num
from
hst_table
group by
substring(dddate,0,4),
id
order by
the_year,
id
The answer above by #Shawn.X is correct but has a logical flaw. Below is the corrected one:
select
substring(ddate,0,4) as the_year,
id,
sum(hits) as hits_num,
sum(miss) as miss_num
from
hst_table
group by
substring(ddate,0,4),
id
order by
the_year,
id;

Aggregate result from query by quarter SQL

Lets say I have a table which holds all exports for some time back in Microsoft SQL database:
Name:
ExportTable
Columns:
id - numeric(18)
exportdate - datetime
In order to get the number of exports per week I can run the following query:
SELECT DATEPART(ISO_WEEK,[exportdate]) as 'exportdate', count(exportdate) as 'totalExports'
FROM [ExportTable]
Group By DATEPART(ISO_WEEK,[exportdate])
order by exportdate;
Returns:
exportdate totalExports
---------- ------------
27 13
28 12
29 15
30 8
31 17
32 10
33 7
34 15
35 4
36 18
37 10
38 14
39 14
40 21
41 19
Would it be possible to aggregate the week results by quarter so the output becomes something like the bellow?
UPDATE
Sorry for not being crystal clear, I would like the current result to add upp with previous result up to a new quarter.
Note week 41 contains 21+19 = 40
Week 39 contains 157 (13+12+15+8+17+10+7+15+4+18+10+14+14)
exportdate totalExports Quarter
---------- ------------ -------
27 13 3
28 25 3
29 40 3
30 48 3
31 65 3
32 75 3
33 82 3
34 97 3
35 101 3
36 119 3
37 129 3
38 143 3
39 157 3 -- Sum of 3 Quarter values.
40 21 4 -- New Quarter show current week value
41 40 4 -- (21+19)
You can use this.
SELECT
DATEPART(ISO_WEEK,[exportdate]) as 'exportdate'
, SUM( count(exportdate) ) OVER ( PARTITION BY DATEPART(QUARTER,MIN([exportdate])) ORDER BY DATEPART(ISO_WEEK,[exportdate]) ROWS UNBOUNDED PRECEDING ) as 'totalExports'
, DATEPART(QUARTER,MIN([exportdate])) [Quarter]
FROM [ExportTable]
Group By DATEPART(ISO_WEEK,[exportdate])
order by exportdate;
You could use a case statement to separate the dates into quarters.
e.g.
CASE
WHEN EXPORT_DATE BETWEEN '1' AND '4' THEN 1
WHEN Export_Date BETWEEN '5' and '9' THEN 2
ELSE 0 AS [Quarter]
END
Its just an example but you get the idea.
You could then use the alias from the case
SELECT DATEPART(ISO_WEEK,[exportdate]) as 'exportdate', count(exportdate) as 'totalExports', DATEPART(quarter,[exportdate]) as quarter FROM [ExportTable] Group By DATEPART(ISO_WEEK,[exportdate]), DATEPART(quarter,[exportdate]) order by exportdate;

T-SQL Group by day date but i want show query full date

I want to show the date field can not group.
My Query:
SELECT DAY(T1.UI_CreateDate) AS DATEDAY, SUM(1) AS TOTALCOUNT
FROM mydb.dbo.LP_UseImpression T1 WHERE T1.UI_BR_BO_ID = 45
GROUP BY DAY(T1.UI_CreateDate)
Result:
DATEDAY TOTALCOUNT
----------- -----------
15 186
9 1
3 2
26 481
21 297
27 342
18 18
30 14
4 183
25 553
13 8
22 469
16 1
17 28
20 331
28 90
14 33
8 1
But i want to show the full date...
Example result:
DATEDAY TOTALCOUNT
----------- -----------
15/06/2015 186
9/06/2015 1
3/06/2015 2
26/06/2015 481
21/06/2015 297
27/06/2015 342
18/06/2015 18
30/06/2015 14
4/06/2015 183
25/06/2015 553
13/06/2015 8
22/06/2015 469
16/06/2015 1
17/06/2015 28
20/06/2015 331
28/06/2015 90
14/06/2015 33
8/06/2015 1
I want to see the results...
I could not get a kind of results...
How can I do?
Thanx!
How about just casting to date to remove any time component:
SELECT CAST(T1.UI_CreateDate as DATE) AS DATEDAY, COUNT(*) AS TOTALCOUNT
FROM mydb.dbo.LP_UseImpression T1
WHERE T1.UI_BR_BO_ID = 45
GROUP BY CAST(T1.UI_CreateDate as DATE)
ORDER BY DATEDAY;
SUM(1) for calculating the count does work. However, because SQL has the COUNT(*) function, it seems a bit awkward.
So you can group by DAY(T1.UI_CreateDate) or use full date for grouping. But these are different . As both these dates '2015-04-15' and '2015-12-15' result in same DAY value of 15.
Assuming you want to group on DAY rather than date please try the below version of query:
SELECT DISTINCT
T1.UI_CreateDate as DATEDAY,
count(1) over (PARTITION BY DAY(T1.UI_CreateDate) ) AS TOTALCOUNT
FROM mydb.dbo.LP_UseImpression T1 WHERE T1.UI_BR_BO_ID = 45
sql fiddle for demo: http://sqlfiddle.com/#!6/c3337/1

Oracle SQL weeks sales SUM

I have the sales data in terms of week:
ITEM LOC WEEK SALES
111 39 16/05/2015 10
222 39 16/05/2015 23
111 39 09/05/2015 13
222 39 09/05/2015 33
I want the sum of SALES column for the last 4 weeks.
So it comes like:
ITEM LOC 4-WEEKS-SALES
111 39 23
222 39 56
Just filter for last four weeks and agregate:
select ITEM, LOC,sum(SALES)
from theTable
where WEEK > SYSDATE - ( 7 * 4 )
group by ITEM,LOC
Try to this
select ITEM,LOC,sum(SALES) '4-WEEKS-SALES'
from tablename
where Datepart(wk, WEEK)>=(Datepart(wk, Getdate())-4)
Group by ITEM,LOC,Datepart(wk, WEEK)

how to get the unique records with min and max for each user

I have the following table:
id gender age highest weight lowest weight abc
a f 30 90 70 1.3
a f 30 90 65 null
a f 30 null null 1.3
b m 40 100 86 2.5
b m 40 null 80 2.5
c f 50 105 95 6.4
I need this result in sql server. What I need is the minimum of the weight and maximum of the weight and one record per user.
id gender age highest weight lowest weight abc
a f 30 90 65 1.3
b m 40 100 80 2.5
c f 50 105 95 6.4
Just do a grouping:
select id,
max(gender),
max(age),
max([highest weight]),
min([lowest weight]),
max(abc)
from SomeTable
group by id
You can do this using grouping:
select id, gender, max(highest_weight), min(lowwest_weight) from student
group by id, gender
But you need do define the rule for the other fields with variable value, like abc
Can you post more information?