Create and classify a row based on column values - sql

I am attempting to assign a classification to a row of data based on whether certain values exist. Utilizing the sample code below I have gotten to a place where I've gotten stuck.
proc sql;
create table test
(id char(4),
task char(4),
id2 char(4),
status char(10),
seconds num);
insert into test
values('1','A','1','COMP',15)
values('1','B','2','WORK',20)
values('1','C','3','COMP',50)
values('1','D','3','COMP',null)
values('2','A','1','COMP',15)
values('2','B','2','COMP',520)
values('2','C','2','COMP',NULL)
values('2','D','3','COMP',221)
values('2','E','3','COMP',null)
values('2','F','3','COMP',null);
proc sql;
create table test2 as
select
ID,
ID2,
STATUS,
SUM(SECONDS) AS SECONDS,
sum(case when task='A' THEN 1 ELSE 0 END) AS A,
sum(case when task='B' THEN 1 ELSE 0 END) AS B,
sum(case when task='C' THEN 1 ELSE 0 END) AS C,
sum(case when task='D' THEN 1 ELSE 0 END) AS D,
sum(case when task='E' THEN 1 ELSE 0 END) AS E,
sum(case when task='F' THEN 1 ELSE 0 END) AS F
from
test
GROUP BY
ID,
ID2,
STATUS
;
quit;
Ultimately I would like to classify each row that gets created in the second step 'test2' to have a column that looks to the values in each lettered column(A-F) and Label them as such. So when the Row has a 1 in Column A only, it would be labeled 'A' but when a row has a 1 in multiple columns like 'D', 'E' and 'F' I would like it to be labeled as D_E_F.

Best way to do this is in a DATA STEP:
data test3;
format classifier $32.;
set test2;
array vars[6] A B C D E F;
classifier = "";
do i=1 to 6;
if vars[i] then
classifier = catx("_",classifier,vname(vars[i]));
end;
drop i;
run;
I create a character variable CLASSIFIER with length 32.
I define an array that groups the columns A through F. This allows me to loop over those columns easily.
Initialize the CLASSIFIER variable.
Loop over the array. If the value =1, then add the name of the variable to the CLASSIFIER string.
CATX(delim,str1,str2) concatenates str1 and str2 with the delim in the middle. It also removes whitespace.
VNAME(array[i]) returns the variable name of the variable pointed to by array[i].
Finally remove the i loop variable, unless you really want it in your output.

I know it is ugly, but you can do it with CASE statements accumulating the wanted result in another field. You have the SQL Fiddle here.
Note that if it is possible that the concatenation is empty you will have to check this condition to avoid performing the substring.
select
ID,
ID2,
STATUS,
SUM(SECONDS) AS SECONDS,
sum(case when task='A' THEN 1 ELSE 0 END) AS A,
sum(case when task='B' THEN 1 ELSE 0 END) AS B,
sum(case when task='C' THEN 1 ELSE 0 END) AS C,
sum(case when task='D' THEN 1 ELSE 0 END) AS D,
sum(case when task='E' THEN 1 ELSE 0 END) AS E,
sum(case when task='F' THEN 1 ELSE 0 END) AS F,
substring(
case when sum(case when task='A' THEN 1 ELSE 0 END) = 1 then '_A' else '' end
+ case when sum(case when task='B' THEN 1 ELSE 0 END) = 1 then '_B' else '' end
+ case when sum(case when task='C' THEN 1 ELSE 0 END) = 1 then '_C' else '' end
+ case when sum(case when task='D' THEN 1 ELSE 0 END) = 1 then '_D' else '' end
+ case when sum(case when task='E' THEN 1 ELSE 0 END) = 1 then '_E' else '' end
+ case when sum(case when task='F' THEN 1 ELSE 0 END) = 1 then '_F' else '' end,
2, len(case when sum(case when task='A' THEN 1 ELSE 0 END) = 1 then '_A' else '' end
+ case when sum(case when task='B' THEN 1 ELSE 0 END) = 1 then '_B' else '' end
+ case when sum(case when task='C' THEN 1 ELSE 0 END) = 1 then '_C' else '' end
+ case when sum(case when task='D' THEN 1 ELSE 0 END) = 1 then '_D' else '' end
+ case when sum(case when task='E' THEN 1 ELSE 0 END) = 1 then '_E' else '' end
+ case when sum(case when task='F' THEN 1 ELSE 0 END) = 1 then '_F' else '' end) - 1) as wantedOutput
from
test
GROUP BY
ID,
ID2,
STATUS

Related

String loop in T-SQL

How can I do the loop in string.If I want to get number of consecutive addition equals to 1
e.g '321' (2-1 counts as 1, 3-2 counts as 1): result 2
e.g '320244434321' (2-1 count as 1, 3-2 count as 1 and 4-3 count as 1) result is 3
e.g '00321881'(2-1 counts as 1, 3-2 counts as 1): result 2
If I understand correctly, you want the number of adjacent digits that decrease by "1". And you are not counting values twice. So, you can use brute force:
select ( (case when str like '%10%' then 1 else 0 end) +
(case when str like '%21%' then 1 else 0 end) +
(case when str like '%32%' then 1 else 0 end) +
(case when str like '%43%' then 1 else 0 end) +
(case when str like '%54%' then 1 else 0 end) +
(case when str like '%65%' then 1 else 0 end) +
(case when str like '%76%' then 1 else 0 end) +
(case when str like '%87%' then 1 else 0 end) +
(case when str like '%98%' then 1 else 0 end)
)
from t;

Counting columns with a where clause

Is there a way to count a number of columns which has a particular value for each rows in Hive.
I have data which looks like in input and I want to count how many columns have value 'a' and how many column have value 'b' and get the output like in 'Output'.
Is there a way to accomplish this with Hive query?
One method in Hive is:
select ( (case when cl_1 = 'a' then 1 else 0 end) +
(case when cl_2 = 'a' then 1 else 0 end) +
(case when cl_3 = 'a' then 1 else 0 end) +
(case when cl_4 = 'a' then 1 else 0 end) +
(case when cl_5 = 'a' then 1 else 0 end)
) as count_a,
( (case when cl_1 = 'b' then 1 else 0 end) +
(case when cl_2 = 'b' then 1 else 0 end) +
(case when cl_3 = 'b' then 1 else 0 end) +
(case when cl_4 = 'b' then 1 else 0 end) +
(case when cl_5 = 'b' then 1 else 0 end)
) as count_b
from t;
To get the total count, I would suggest using a subquery and adding count_a and count_b.
Use lateral view with explode on the data and do the aggregations on it.
select id
,sum(cast(col='a' as int)) as cnt_a
,sum(cast(col='b' as int)) as cnt_b
,sum(cast(col in ('a','b') as int)) as cnt_total
from tbl
lateral view explode(array(ci_1,ci_2,ci_3,ci_4,ci_5)) tbl as col
group by id

Sum data for many different results for same field

I am trying to find a better way to write this sql server code 2008. It works and data is accurate. Reason i ask is that i will be asked to do this for several other reports going forward and want to reduce the amount of code to upkeep going forward.
How can i take a field where i sum for the yes/no/- (dash) in each field without doing an individual sum as i have in code. Each table is a month of detail data which i sum using in a CTE. i changed the table name for each month and Union All to put data together. Is there a better way to do this. This is a small sample of code. Thanks for the help.
WITH H AS (
SELECT 'August' AS Month_Name
, SUM(CASE WHEN G.FFS = '-' THEN 1 ELSE 0 END) AS FFS_Dash
, SUM(CASE WHEN G.FFS = 'Yes' THEN 1 ELSE 0 END) AS FFS_Yes
, SUM(CASE WHEN G.FFS = 'No' THEN 1 ELSE 0 END) AS FFS_No
, SUM(CASE WHEN G.DNA = '-' THEN 1 ELSE 0 END) AS DNA_Dash
, SUM(CASE WHEN G.DNA = 'Yes' THEN 1 ELSE 0 END) AS DNA_Yes
, SUM(CASE WHEN G.DNA = 'No' THEN 1 ELSE 0 END) AS DNA_No
FROM table08 G )
, G AS (
SELECT 'July' AS Month_Name
, SUM(CASE WHEN G.FFS = '-' THEN 1 ELSE 0 END) AS FFS_Dash
, SUM(CASE WHEN G.FFS = 'Yes' THEN 1 ELSE 0 END) AS FFS_Yes
, SUM(CASE WHEN G.FFS = 'No' THEN 1 ELSE 0 END) AS FFS_No
, SUM(CASE WHEN G.DNA = '-' THEN 1 ELSE 0 END) AS DNA_Dash
, SUM(CASE WHEN G.DNA = 'Yes' THEN 1 ELSE 0 END) AS DNA_Yes
, SUM(CASE WHEN G.DNA = 'No' THEN 1 ELSE 0 END) AS DNA_No
FROM table07 G )
select * from H
UNION ALL
select * from G
How about:
SELECT Month_Name,
SUM(CASE WHEN G.FFS = '-' THEN 1 ELSE 0 END) AS FFS_Dash,
SUM(CASE WHEN G.FFS = 'Yes' THEN 1 ELSE 0 END) AS FFS_Yes,
SUM(CASE WHEN G.FFS = 'No' THEN 1 ELSE 0 END) AS FFS_No,
SUM(CASE WHEN G.DNA = '-' THEN 1 ELSE 0 END) AS DNA_Dash,
SUM(CASE WHEN G.DNA = 'Yes' THEN 1 ELSE 0 END) AS DNA_Yes,
SUM(CASE WHEN G.DNA = 'No' THEN 1 ELSE 0 END) AS DNA_No
FROM ((select 'July' as Month_Name, G.*
from table07 G
) union all
(select 'August', H.*
from table08 H
)
) gh
GROUP BY Month_Name;
However, having tables with the same structure is usually a sign of poor database design. You should have a single table with a column representing the month.

how do I correctly use case when statement

Hej,
I needed help with a case when statement in SQL Server.
Basically, I got three products and when the sum is equal to 2, then I want it to it be counted as 1 else 0. I wanted to know if the logic is write with this code or can it be improved?
case when sum(hase=1 OR hasd=1 OR hasf=1)=2 then 1 else 0 end as Xavc
What I was trying with this code is this: The customer might not have all three products however, if he has two products or the three the three , then it is equal to 2 and count is 1.
Something like this?
SELECT
CASE
WHEN hase + hasd + hasf = 2 THEN 1
ELSE 0
END AS Xavc
I think you are trying to do something like this...
CASE WHEN SUM(CASE WHEN hase=1 THEN 1 ELSE 0 END)
+ SUM(CASE WHEN hasd=1 THEN 1 ELSE 0 END)
+ SUM(CASE WHEN hasf=1 THEN 1 ELSE 0 END) = 2
THEN 1 ELSE 0 END AS Xavc
In this case try this ..
CASE WHEN SUM(CASE WHEN hase=1 THEN 1 ELSE 0 END) + SUM(CASE WHEN hasd=1 THEN 1 ELSE 0 END) = 2
OR SUM(CASE WHEN hasd=1 THEN 1 ELSE 0 END) + SUM(CASE WHEN hasf=1 THEN 1 ELSE 0 END) = 2
OR SUM(CASE WHEN hase=1 THEN 1 ELSE 0 END) + SUM(CASE WHEN hasf=1 THEN 1 ELSE 0 END) = 2
THEN 1 ELSE 0 END AS Xavc

How to add cases across columns that meet a condition, within rows?

I have data on household ownership of appliances, with one appliance per column, and data in the format of Y or N. I want to generate a new column with the sum of appliances owned per household. When I run the following script (SQLIte), I get an error message about syntax error near ")". Please help - I've tried all sorts of syntax.
SELECT Household,
SUM((CASE WHEN Stove="Y" THEN 1 ELSE 0) +
(CASE WHEN Fridge="Y" THEN 1 ELSE 0) +
(CASE WHEN TV="Y" THEN 1 ELSE 0) +
(CASE WHEN Video="Y" THEN 1 ELSE 0) +
(CASE WHEN SatDish="Y" THEN 1 ELSE 0) +
(CASE WHEN Radio="Y" THEN 1 ELSE 0) +
(CASE WHEN FixPhone="Y" THEN 1 ELSE 0) END)
AS Appliances
FROM Assets
You need End with every case statement:
Try this:
SELECT Household,
SUM((CASE WHEN Stove="Y" THEN 1 ELSE 0 End) +
(CASE WHEN Fridge="Y" THEN 1 ELSE 0 End) +
(CASE WHEN TV="Y" THEN 1 ELSE 0 End) +
(CASE WHEN Video="Y" THEN 1 ELSE 0 End) +
(CASE WHEN SatDish="Y" THEN 1 ELSE 0 End) +
(CASE WHEN Radio="Y" THEN 1 ELSE 0 End) +
(CASE WHEN FixPhone="Y" THEN 1 ELSE 0 End))
AS Appliances
FROM Assets
Group By Household