Counts on Changed data - sql

I have a data where i need to give the counts of the rows for which the confidence code changed.
Input data:
ID Date ReMatchConfidence OrgMatch
1 2017 101 45
2 2017 101 88
3 2017 103 35
4 2016 104 66
5 2016 104 66
6 2017 104 66
7 2016 88 14
8 2017 88 25
Output:
Data 2017 2016
Change from 45 to 101 1 0
Change from 88 to 101 1 0
Change from 35 to 103 1 0
Change from 66 to 104 1 2
Change from 14 to 88 0 1
Change from 25 to 88 1 0

Try this:
SELECT CONCAT('Change from ', OrgMatch, ' to ', ReMatchConfidence) AS Data,
Count(IF(Date = '2017', 1, NULL)) as '2017',
Count(IF(Date = '2016', 1, NULL)) as '2016'
FROM tables GROUP BY OrgMatch, ReMatchConfidence;

A little modification to Daria's query. Works in SQL Server
SELECT CONCAT('Change from ', OrgMatch, ' to ', [RefMatch Confidence]) AS Data,
Count(CASE WHEN [Date] = '2017' THEN 1 ELSE NULL END) as '2017',
Count(CASE WHEN [Date] = '2016' THEN 1 ELSE NULL END ) as '2016'
FROM t1 GROUP BY OrgMatch, [RefMatch Confidence];

Related

Hive Summing up data in the table based on the date range

Have a table with the following schema design and the data residing inside it is like:
ID HITS MISS DDATE
1 10 3 20180101
1 33 21 20180122
1 84 11 20180901
1 11 2 20180405
1 54 23 20190203
1 33 43 20190102
4 54 22 20170305
4 56 88 20180115
5 87 22 20180809
5 66 48 20180617
5 91 53 20170606
DataTypes:
ID INT
HITS INT
MISS INT
DDATE STRING
The requirement is to calculate the total of the given (HITS and MISS) on yearly basis i.e 2017,2018,2019...
Written the following query:
SELECT ID,
SUM(HITS) AS HITS,SUM(MISS) AS MISS,
CASE
WHEN DDATE BETWEEN '201701' AND '201712' THEN '2017' ELSE
'NOTHING' END AS TTL_YR17_DATA
CASE
WHEN DDATE BETWEEN '201801' AND '201812' THEN '2018' ELSE
'NOTHING' END AS TTL_YR18_DATA
CASE
WHEN DDATE BETWEEN '201901' AND '201912' THEN '2019' ELSE
'NOTHING' END AS TTL_YR19_DATA
FROM
HST_TABLE
WHERE
DDATE BETWEEN '201801' AND '201812'
GROUP BY
ID,DDATE;
But, the query is not fetching the expected result.
Actual O/P:
1 10 3 2018
1 33 21 2018
1 84 11 2018
1 11 2 2018
1 54 23 2019
1 33 43 2019
4 54 22 2017
4 56 88 2018
5 87 22 2018
5 66 48 2018
5 91 53 2017
Expected O/P:
1 138 37 2018
4 56 88 2018
5 153 70 2018
1 87 66 2019
5 91 53 2017
Another related question:
Is there a way that I can avoid passing the DDATE range in the query? As this should be given by the user and shouldn't be hardcoded.
Any help/advice to achieve the above two requirements will be really helpful.
OK,it's easy to implement this with the substring function in HIVE, as below:
select
substring(dddate,0,4) as the_year,
id,
sum(hits) as hits_num,
sum(miss) as miss_num
from
hst_table
group by
substring(dddate,0,4),
id
order by
the_year,
id
The answer above by #Shawn.X is correct but has a logical flaw. Below is the corrected one:
select
substring(ddate,0,4) as the_year,
id,
sum(hits) as hits_num,
sum(miss) as miss_num
from
hst_table
group by
substring(ddate,0,4),
id
order by
the_year,
id;

Select TOP 1 doesnt return value in case of ex-aequo

this is a link to my data.
I have this query:
SELECT *
FROM tbl c
WHERE C.dep = (select top 1 dep
from tbl cc
where cc.yea = c.yea
and cc.mon = mon
group by mon, yea, dep, n
order by n desc)
OR C.dep =( select top 1 dep
from tbl cc
where cc.yea = c.yea
and cc.mon = mon
group by mon, yea, dep, n
order by n asc
)
ORDER BY yea, mon, n
that sould return for each (month,year) the best (lowest n) and the worst (highest n) dep. This query works for month 1,2,4,5,7 and not for months 3,6. The only difference is that in both 3 and 6 cases I got two del with same score (1). How can I return one of them, instead of not returning anything.
this is my output:
n yea mon dep
----------- ----------- ----------- ----------
1 2017 1 50
48 2017 1 36
58 2017 2 36
85 2017 3 36
1 2017 4 50
39 2017 4 36
1 2017 5 50
39 2017 5 36
19 2017 6 36
3 2017 7 50
17 2017 7 36
And this is how I expected:
n yea. mon dep
----------- ----------- ----------- ----------
1 2017 1 50
48 2017 1 36
58 2017 2 36
85 2017 3 36
1 2017 3 49 (or 67)
1 2017 4 50
39 2017 4 36
1 2017 5 50
39 2017 5 36
1 2017 6 50 (or 13)
19 2017 6 36
3 2017 7 50
17 2017 7 36
Use row_number():
select t.*
from (select t.*,
row_number() over (partition by yea, mon order by n asc) as seqnum_asc,
row_number() over (partition by yea, mon order by n desc) as seqnum_desc
from tbl t
) t
where 1 in (seqnum_asc, seqnum_desc);

proc sql statement to sum on values/rows that match a condition

I have a data table like below:
Table 1:
ROWID PERSONID YEAR pidDifference TIMETOEVENT DAYSBETVISIT
10 111 2009 . 100 .
110 120 2009 9 10 .
231 120 2009 0 20 10
222 120 2010 0 40 20
221 222 2009 102 10 30
321 222 2009 0 30 20
213 222 2009 0 10 20
432 321 2009 99 10 0
211 432 2009 111 20 10
212 432 2009 0 20 0
I want to sum over the DAYSBETVISIT column only when the pidDifference value is 0 for each PERSONID. So I wrote the following proc sql statement.
proc sql;
create table table5 as
(
select rowid, YEAR, PERSONID, pidDifference, TIMETOEVENT, DAYSBETVISIT,
SUM(CASE WHEN PIDDifference = 0 THEN DaysBetVisit ELSE 0 END)
from WORK.Table4_1
group by PERSONID,TIMETOEVENT, YEAR
);
quit;
However, the result I got was not summing the DAYSBETVISIT values in rows where PIDDifference = 0 within the same PERSONID. It just output the same value as was present in DAYSBETVISIT in that specific row.
Column that I NEED (sumdays) but don't get with above statement (showing the resultant column using above statement as OUT:
ROWID PERSONID YEAR pidDifference TIMETOEVENT DAYSBETVISIT sumdays OUT
10 111 2009 . 100 . 0 0
110 120 2009 9 10 . 0 0
231 120 2009 0 20 10 30 10
222 120 2010 0 40 20 30 20
221 222 2009 102 10 30 0 0
321 222 2009 0 30 20 40 20
213 222 2009 0 10 20 40 20
432 321 2009 99 10 0 0 0
211 432 2009 111 20 10 0 0
212 432 2009 0 20 0 0 0
I do not know what I am doing wrong.
I am using SAS EG Version 7.15, Base SAS version 9.4.
For your example data it looks like you just need to use two CASE statements. One to define which values to SUM() and another to define whether to report the SUM or not.
proc sql ;
select personid, piddifference, daysbetvisit, sumdays
, case when piddifference = 0
then sum(case when piddifference=0 then daysbetvisit else 0 end)
else 0 end as WANT
from expect
group by personid
;
quit;
Results
pid
PERSONID Difference DAYSBETVISIT sumdays WANT
--------------------------------------------------------
111 . . 0 0
120 0 10 30 30
120 0 20 30 30
120 9 . 0 0
222 0 20 40 40
222 0 20 40 40
222 102 30 0 0
321 99 0 0 0
432 0 0 0 0
432 111 10 0 0
SAS proc sql doesn't support window functions. I find the re-merging aggregations to be a bit difficult to use, except in the obvious cases. So, use a subquery or join and group by:
proc sql;
create table table5 as
select t.rowid, t.YEAR, t.PERSONID, t.pidDifference, t.TIMETOEVENT, t.DAYSBETVISIT,
tt.sum_DaysBetVisit
from WORK.Table4_1 t left join
(select personid, sum(DaysBetVisit) as sum_DaysBetVisit
from WORK.Table4_1
group by personid
having min(pidDifference) = max(pidDifference) and min(pidDifference) = 0
) tt
on tt.personid = t.personid;
Note: This doesn't handle NULL values for pidDifference. If that is a concern, you can add count(pidDifference) = count(*) to the having clause.

Get SUM for each combination of values from two tables

I have two tables:
1. #Forecast_Premiums
Syndicate_Key Durg_Key Currency_Key Year_Of_Account Forecast_Premium CUML_EPI_Amount
NULL NULL NULL UNKNOWN 0 6
3 54 46 2000 109105 0
3 54 46 2001 128645 128646
5 47 80 2002 117829 6333
6 47 80 2002 125471 NULL
6 60 80 2003 82371 82371
10 98 215 2006 2093825 77888
10 98 215 2007 11111938 4523645
2.#Forecast_Claims
Syndicate_Key Durg_Key Currency_Key Year_Of_Account Contract_Ref Forecast_Claims Ultimate_Profit_Comission
NULL NULL NULL UNKNOWN UNKNOWN 0 -45
5 47 80 2002 AB00ZZ021M12 -9991203 NULL
5 47 80 2002 AB00ZZ021M13 -4522 -74412
9 60 215 2006 AC04ZZ021M13 -2340299 -895562
10 98 46 2007 FAC0ZZ021M55 -2564123 -851298
The task:
Using #Forecast_Premiums and #Forecast_Claims tables write a query to find
total amount of Pure Premium ,Cumulative EPI Amount, Forecast_Claims and Ultimate_Profit_Comissionreceived for each combination of Syndicate_Key, Durg_Key , Currency_key and Year_of_Account.
Note: In case the Key is NULL set it as 'UNKNOWN' , In Case the Amount is NULL set it as 0.
My solution:
SELECT
ISNULL(CAST(FP.Syndicate_key AS VARCHAR(20)), 'UNKNOWN') AS 'Syndicate_key',
ISNULL(CAST(FP.Durg_Key AS VARCHAR(20)), 'UNKNOWN') AS 'Durg_Key',
ISNULL(CAST(FP.Currency_Key AS VARCHAR(20)), 'UNKNOWN') AS 'Currency_Key',
fp.Year_Of_Account,
SUM(ISNULL(FP.Forecast_Premium,0)) AS 'Pure_Premium',
SUM(ISNULL(FP.CUML_EPI_Amount,0)) AS 'Cuml_Amount',
SUM(ISNULL(dc.Forecast_Claims,0)) AS 'Total_Claims',
SUM(ISNULL(dc.Ultimate_Profit_Comission,0)) AS 'Total_Comission'
FROM #FORECAST_PREMIUMS fp
left join #FORECAST_Claims dc
ON
(FP.Year_Of_Account = dc.Year_Of_Account AND
FP.Syndicate_Key = dc.Syndicate_Key AND
FP.Currency_Key = dc.Currency_Key AND
FP.Year_Of_Account = dc.Year_Of_Account)
GROUP BY fp.Syndicate_Key, fp.Durg_Key,fp.Currency_Key,fp.Year_Of_Account
Issue:
It returns the Forecast_Claims SUM and Ultimate_Profit_Comission SUM only for one combination of keys and year: 5 47 80 2002.
Moreover it returns 8 rows when it should had return 10.
Eight result records is correct, for there are eight distinct combinations of Syndicate_Key, Durg_Key , Currency_key and Year_of_Account in FORECAST_PREMIUMS.
As to the Forecast_Claims SUM: This is also correct; 5 47 80 2002 is the only combination that has a match in Forecast_Claims.
Only: Are you supposed to match both NULL records? You don't do this, as NULL = NULL is never true (only NULL is NULL is true). You would have to do something like
(
(FP.Year_Of_Account = dc.Year_Of_Account)
OR
(FP.Year_Of_Account is null AND dc.Year_Of_Account is null
) AND ...
to get these records match. Or:
ISNULL(FP.Year_Of_Account, -1) = ISNULL(dc.Year_Of_Account, -1) AND ...

SQL Pivot Table isn't working

SQL 2005
I have a temp table:
Year PercentMale PercentFemale PercentHmlss PercentEmployed TotalSrvd
2008 100 0 0 100 1
2009 55 40 0 80 20
2010 64 35 0 67 162
2011 69 27 0 34 285
2012 56 43 10 1 58
and I want to create a query to display the data like this:
2008 2009 2010 2011 2012
PercentMale 100 55 64 69 56
PercentFemale - 40 35 27 43
PercentHmlss - - - - 10
PercentEmployed 100 80 67 34 1
TotalSrvd 1 20 162 285 58
Can I use a pivot table to accomplish this? If so, how? I've tried using a pivot but have found no success.
select PercentHmlss,PercentMale,Percentfemale,
PercentEmployed,[2008],[2009],[2010],[2011],[2012] from
(select PercentHmlss,PercentMale, Percentfemale, PercentEmployed,
TotalSrvd,year from #TempTable)as T
pivot (sum (TotalSrvd) for year
in ([2008],[2009],[2010],[2011],[2012])) as pvt
This is the result:
PercentHmlss PercentMale Percentfemale PercentEmployed [2008] [2009] [2010] [2011] [2012]
0 55 40 80 NULL 20 NULL NULL NULL
0 64 35 67 NULL NULL 162 NULL NULL
0 69 27 34 NULL NULL NULL 285 NULL
0 100 0 100 1 NULL NULL NULL NULL
10 56 43 1 NULL NULL NULL NULL 58
Thanks.
For this to work you will want to perform an UNPIVOT and then a PIVOT
SELECT *
from
(
select year, quantity, type
from
(
select year, percentmale, percentfemale, percenthmlss, percentemployed, totalsrvd
from t
) x
UNPIVOT
(
quantity for type
in
([percentmale]
, [percentfemale]
, [percenthmlss]
, [percentemployed]
, [totalsrvd])
) u
) x1
pivot
(
sum(quantity)
for Year in ([2008], [2009], [2010], [2011], [2012])
) p
See a SQL Fiddle with a Demo
Edit Further explanation:
You were close with your PIVOT query that you tried, in that you got the data for the Year in the column format that you wanted. However, since you want the data that was contained in the columns initially percentmale, percentfemale, etc in the row of data - you need to unpivot the data first.
Basically, what you are doing is taking the original data and placing it all in rows based on the year. The UNPIVOT is going to place your data in the format (Demo):
Year Quantity Type
2008 100 percentmale
2008 0 percentfemale
etc
Once you have transformed the data into this format, then you can perform the PIVOT to get the result you want.