count the multiple column values in hive table - hive

below is the data set for a household member ( eight-member) by there type.....
h1 h2 h3 h4 h5 h6 h7 h8
U U P U Y null Y U
U H U U Y Y P P
U U U H U nuLL Y null
null null H H U null null null
P U U U Y null Z P
Y P null H Y P U H
U null U null P U Z Y
null null null null null null null null
in the above data set to count the total number of H= head of household,P= parent of household,U=adult, Y= wife of household, null= No match. i used this code and this code giving me the right count of household member by type but i case of null i am not getting proper count. can any one tell me why that is happening? please resolve it. below i am providing my code
select sum( Head_cnt) as H,
sum( parent_cnt) as P,
sum( adult_cnt) as U,
sum(spouce_cnt) as Y,
sum( nomatch_cnt) as Nomatch
from(
select length(regexp_replace(row_concatenated, '[^U]', '')) as adult_cnt,
length(regexp_replace(row_concatenated, '[^H]', '')) as head_cnt,
length(regexp_replace(row_concatenated, '[^P]', '')) as parent_cnt,
length(regexp_replace(row_concatenated, '[^Y]', '')) as spouce_cnt,
length(regexp_replace(row_concatenated, '[null]', '')) as nomatch_cnt
from(select concat_ws(',',h1,h2,h3,h4,h5,h6,h7,h8) as row_concatenated
from table_name)s
)s;
please give me the solution for it ......for null value in the code. i am getting proper count for all the values except the null value. and just remember this is not an NULL value. here null mean NO MATCH.

If I understand you correctly you need to count the occurrences of each letter in each column ignoring NULL. One way to do it is to union all columns then group by and count:
with union_table as (
select h1 as h
union all
select h2 as h
union all
... (up to h8)
select h8 as h
from your_table
)
select h, count(*) as cnt
from union_table
group by h
The default count function will ignore NULLs.

Convert null values to some character, say 'N' for No match:
select sum( Head_cnt) as H,
sum( parent_cnt) as P,
sum( adult_cnt) as U,
sum(spouce_cnt) as Y,
sum( nomatch_cnt) as Nomatch
from(
select length(regexp_replace(row_concatenated, '[^U]', '')) as adult_cnt,
length(regexp_replace(row_concatenated, '[^H]', '')) as head_cnt,
length(regexp_replace(row_concatenated, '[^P]', '')) as parent_cnt,
length(regexp_replace(row_concatenated, '[^Y]', '')) as spouce_cnt,
length(regexp_replace(row_concatenated, '[^N]', '')) as nomatch_cnt
from
(
select concat_ws(',',nvl(h1,'N'),nvl(h2,'N'),nvl(h3,'N'),nvl(h4,'N'),nvl(h5,'N'),nvl(h6,'N'),nvl(h7,'N'),nvl(h8,'N')) as row_concatenated
from table_name)s
)s;

Related

Case statement with four columns, i.e. attributes

I have a table with values "1", "0" or "". The table has four columns: p, q, r and s.
I need help creating a case statement that returns values when the attribute is equal to 1.
For ID 5 the case statement should return "p s".
For ID 14 the case statement should return "s".
For ID 33 the case statement should return 'p r s". And so on.
Do I need to come with a case statement that has every possible combination? Or is there a simpler way. Below is what I have come up with thus far.
case
when p = 1 and q =1 then "p q"
when p = 1 and r =1 then "p r"
when p = 1 and s =1 then "p s"
when r = 1 then r
when q = 1 then q
when r = 1 then r
when s = 1 then s
else ''
end
One solution could be this which uses a case for each attribute to return the correct value, surrounded by a trim to remove the trailing space.
with tbl(id, p, q, r, s) as (
select 5,1,0,0,1 from dual union all
select 14,0,0,0,1 from dual
)
select id,
trim(regexp_replace(case p when 1 then 'p' end ||
case q when 1 then 'q' end ||
case r when 1 then 'r' end ||
case s when 1 then 's' end, '(.)', '\1 '))
from tbl;
The real solution would be to fix the database design. This design technically violates Boyce-Codd 4th normal form in that it contains more than 1 independent attribute. The fact an ID "has" or "is part of" attribute p or q, etc should be split out. This design should be 3 tables, the main table with the ID, the lookup table containing info about attributes that the main ID could have (p, q, r or s) and the associative table that joins the two where appropriate (assuming an ID row could have more than one attribute and an attribute could belong to more than one ID), which is how to model a many-to-many relationship.
main_tbl main_attr attribute_lookup
ID col1 col2 main_id attr_id attr_id attr_desc
5 5 1 1 p
14 5 4 2 q
14 4 3 r
4 s
Then it would be simple to query this model to build your list, easy to maintain if an attribute description changes (only 1 place to change it), etc.
Select from it like this:
select m.ID, m.col1, listagg(al.attr_desc, ' ') within group (order by al.attr_desc) as attr_desc
from main_tbl m
join main_attr ma
on m.ID = ma.main_id
join attribute_lookup al
on ma.attr_id = al.attr_id
group by m.id, m.col1;
You can use concatenations with decode() functions
select id, decode(p,1,'p','')||decode(q,1,'q','')
||decode(r,1,'r','')||decode(s,1,'s','') as "String"
from t;
Demo
If you need spaces between letters, consider using :
with t(id,p,q,r,s) as
(
select 5,1,0,0,1 from dual union all
select 14,0,0,0,1 from dual union all
select 31,null,0,null,1 from dual union all
select 33,1,0,1,1 from dual
), t2 as
(
select id, decode(p,1,'p','')||decode(q,1,'q','')
||decode(r,1,'r','')||decode(s,1,'s','') as str
from t
), t3 as
(
select id, substr(str,level,1) as str, level as lvl
from t2
connect by level <= length(str)
and prior id = id
and prior sys_guid() is not null
)
select id, listagg(str,' ') within group (order by lvl) as "String"
from t3
group by id;
Demo
in my opinion, its a bad practice to use columns for relationships.
you should have two tables, one that's called arts and another that is called mapping art looks like this:
ID - ART
1 - p
2 - q
3 - r
4 - 2
...
and mapping maps your base-'ID's to your art-ids and looks like this
MYID - ARTID
5 - 1
5 - 4
afterwards, you should make use of oracles pivot operator. its more dynamically

Mapping of Multi-valued column from one table to another table using join in DB2

I am creating a report using SQL in DB2 and I have a field in one of the tables which stores multiple values. Now, I need to reference another table to get the description of these multiple values as shown.
Table A
Item_No| R_code
---------------
X R01,R03,R04
Y R02,R03
Z R04
Table B
R_code| Description
------------------
R01 Missing info
R02 Invalid info
R03 Invalid Account
R04 Missing Address
How do I get the following result if I join Table A and Table B
Final Result
Item_no| R_code | Description
---------------------------
X R01,R03,R04 Missing info,Invalid Account,Missing Address
Y R02,R03 Invalid info,Invalid Account
Z R04 Missing Address
WITH unpivot (lvl, item_no, r_code, tail) AS (
SELECT 1, item_no,
CASE WHEN LOCATE(',',r_code) > 0
THEN TRIM(LEFT(r_code, LOCATE(',',r_code)-1))
ELSE TRIM(r_code)
END,
CASE WHEN LOCATE(',',r_code) > 0
THEN SUBSTR(r_code, LOCATE(',',r_code)+1)
ELSE ''
END
FROM tableA
UNION ALL
SELECT lvl + 1, item_no,
CASE WHEN LOCATE(',', tail) > 0
THEN TRIM(LEFT(tail, LOCATE(',', tail)-1))
ELSE TRIM(tail)
END,
CASE WHEN LOCATE(',', tail) > 0
THEN SUBSTR(tail, LOCATE(',', tail)+1)
ELSE ''
END
FROM unpivot
WHERE lvl < 100 AND tail != ''),
get_desc AS
(
SELECT item_no, up.r_code, tb.description
FROM unpivot up
INNER JOIN tableB tb
ON up.item_no = tb.item_no
)
SELECT item_no,
LISTAGG(r_code, ', ' ) WITHIN GROUP(ORDER BY r_code ASC) AS r_code,
LISTAGG(description, ', ') WITHIN GROUP(ORDER BY description ASC) AS description
FROM get_desc

SQL query to get the count by applying group by

I want to get the below result:
source table :
Cnt A B
4 ABC YU/FGH
5 ABC YU/DFE
5 ABC KL
2 LKP BN/ER
4 JK RE
Result:
Cnt A B
9 ABC YU
5 ABC KL
2 LKP BN
4 JK RE
Here I want the count by grouping 'B' and want to display the 'B' record only till the special character (/)
Basically, you will have to filter out the all the characters after the "/" symbol and then apply a SUM and a GROUP BY. You can see this below. The inner query filters out the unwanted string and the outer query does the SUM and the GROUP BY :
SELECT SUM(t.Cnt), t.A, t.B
FROM (
SELECT Cnt,
A,
CASE
WHEN CHARINDEX('/', B) > 0 THEN SUBSTRING(B, 0, CHARINDEX('/', B))
ELSE B
END AS B
FROM #Tab
) t
GROUP BY t.A, t.B
ORDER BY t.A
You can see this working here -> http://rextester.com/IQJ79191
Hope this helps!!!
You can get your string till '/' by using SUBSTRING.
select
count(SUBSTRING(reverse(B),0,charindex('/',reverse(B)))),
A,
SUBSTRING(reverse(B),0,charindex('/',reverse(B)))
from source_table group by B;
Solution for Oracle - substr(B,0,instr(B,'/',1)-1) B
Put this both in select and groupby
I can suggest you to use a query like this:
select
sum(Cnt) Cnt,
A,
left(B, charindex('/',B+'/',0)-1) B -- Using `+'\'` will do the trick
from
t
group by
A,
left(B, charindex('/',B+'/',0)-1);
By using String and CharIndex Functions.
;WITH SourceTable(Cnt,A,B) AS
(
SELECT 4,'ABC','YU/FGH'UNION ALL
SELECT 5,'ABC','YU/DFE'UNION ALL
SELECT 5,'ABC','KL' UNION ALL
SELECT 2,'LKP','BN/ER' UNION ALL
SELECT 4,'JK','RE'
)
SELECT SUM(Cnt) AS Cnt,A,CASE WHEN CHARINDEX('/',B) = 0 THEN B
ELSE SUBSTRING(B,0,CHARINDEX('/',B)) END AS [B] FROM SourceTable
GROUP BY A,CASE WHEN CHARINDEX('/',B) = 0 THEN B
ELSE SUBSTRING(B,0,CHARINDEX('/',B)) END
ORDER BY Cnt DESC
Try this query --
SELECT SUM(Cnt) AS [COUNT]
,A
,CASE
WHEN CHARINDEX('/', B) > 0
THEN SUBSTRING(B, 1, (CHARINDEX('/', B) - 1))
ELSE B
END
FROM tblSample
GROUP BY A, B
ORDER BY A, B

MYSQL enumeration: #rownum, odd and even records

I asked a question about creating temporary/ virtual ids for query results,
mysql & php: temporary/ virtual ids for query results?
I nearly got I wanted with this link,
http://craftycodeblog.com/2010/09/13/rownum-simulation-with-mysql/
I have managed to enumerate each row,
SELECT
u.pg_id AS ID,
u.pg_url AS URL,
u.pg_title AS Title,
u.pg_content_1 AS Content,
#rownum:=#rownum+1 AS rownum
FROM (
SELECT pg_id, pg_url,pg_title,pg_content_1
FROM root_pages
WHERE root_pages.parent_id = '7'
AND root_pages.pg_id != '7'
AND root_pages.pg_cat_id = '2'
AND root_pages.pg_hide != '1'
ORDER BY pg_created DESC
) u,
(SELECT #rownum:=0) r
result,
ID URL Title Content rownum
53 a x x 1
52 b x x 2
43 c x x 3
41 d x x 4
but how can I work on it a bit further - I want to display the odd or even records only like the ones below - is it possible?
odd records,
ID URL Title Content rownum
53 a x x 1
43 c x x 3
even records,
ID URL Title Content rownum
52 b x x 2
41 d x x 4
thank you.
p.s. I don't quite understand the sql query actually even though I almost got the answer, for instance, what do the 'u' and 't' mean?
what do the 'u' and 't' mean?
They are table aliases, so you don't have to specify the entire name of the table when you need to make reference.
To get only the odd numbered records, use:
SELECT x.*
FROM (SELECT u.pg_id AS ID,
u.pg_url AS URL,
u.pg_title AS Title,
u.pg_content_1 AS Content,
#rownum := #rownum + 1 AS rownum
FROM root_pages u
JOIN (SELECT #rownum := 0) r
WHERE u.parent_id = '7'
AND u.pg_id != '7'
AND u.pg_cat_id = '2'
AND u.pg_hide != '1'
ORDER BY u.pg_created DESC) x
WHERE x.rownum % 2 != 0
To get the even numbered records, use:
SELECT x.*
FROM (SELECT u.pg_id AS ID,
u.pg_url AS URL,
u.pg_title AS Title,
u.pg_content_1 AS Content,
#rownum := #rownum + 1 AS rownum
FROM root_pages u
JOIN (SELECT #rownum := 0) r
WHERE u.parent_id = '7'
AND u.pg_id != '7'
AND u.pg_cat_id = '2'
AND u.pg_hide != '1'
ORDER BY u.pg_created DESC) x
WHERE x.rownum % 2 = 0
Explanation
The % is the modulus operator in MySQL syntax -- it returns the remainder of the division. For example 1 % 2 is 0.5, while 2 % 2 is zero. This is then used in the WHERE clause to filter the rows displayed.

SQL: how to get all the distinct characters in a column, across all rows

Is there an elegant way in SQL Server to find all the distinct characters in a single varchar(50) column, across all rows?
Bonus points if it can be done without cursors :)
For example, say my data contains 3 rows:
productname
-----------
product1
widget2
nicknack3
The distinct inventory of characters would be "productwigenka123"
Here's a query that returns each character as a separate row, along with the number of occurrences. Assuming your table is called 'Products'
WITH ProductChars(aChar, remain) AS (
SELECT LEFT(productName,1), RIGHT(productName, LEN(productName)-1)
FROM Products WHERE LEN(productName)>0
UNION ALL
SELECT LEFT(remain,1), RIGHT(remain, LEN(remain)-1) FROM ProductChars
WHERE LEN(remain)>0
)
SELECT aChar, COUNT(*) FROM ProductChars
GROUP BY aChar
To combine them all to a single row, (as stated in the question), change the final SELECT to
SELECT aChar AS [text()] FROM
(SELECT DISTINCT aChar FROM ProductChars) base
FOR XML PATH('')
The above uses a nice hack I found here, which emulates the GROUP_CONCAT from MySQL.
The first level of recursion is unrolled so that the query doesn't return empty strings in the output.
Use this (shall work on any CTE-capable RDBMS):
select x.v into prod from (values('product1'),('widget2'),('nicknack3')) as x(v);
Test Query:
with a as
(
select v, '' as x, 0 as n from prod
union all
select v, substring(v,n+1,1) as x, n+1 as n from a where n < len(v)
)
select v, x, n from a -- where n > 0
order by v, n
option (maxrecursion 0)
Final Query:
with a as
(
select v, '' as x, 0 as n from prod
union all
select v, substring(v,n+1,1) as x, n+1 as n from a where n < len(v)
)
select distinct x from a where n > 0
order by x
option (maxrecursion 0)
Oracle version:
with a(v,x,n) as
(
select v, '' as x, 0 as n from prod
union all
select v, substr(v,n+1,1) as x, n+1 as n from a where n < length(v)
)
select distinct x from a where n > 0
Given that your column is varchar, it means it can only store characters from codes 0 to 255, on whatever code page you have. If you only use the 32-128 ASCII code range, then you can simply see if you have any of the characters 32-128, one by one. The following query does that, looking in sys.objects.name:
with cteDigits as (
select 0 as Number
union all select 1 as Number
union all select 2 as Number
union all select 3 as Number
union all select 4 as Number
union all select 5 as Number
union all select 6 as Number
union all select 7 as Number
union all select 8 as Number
union all select 9 as Number)
, cteNumbers as (
select U.Number + T.Number*10 + H.Number*100 as Number
from cteDigits U
cross join cteDigits T
cross join cteDigits H)
, cteChars as (
select CHAR(Number) as Char
from cteNumbers
where Number between 32 and 128)
select cteChars.Char as [*]
from cteChars
cross apply (
select top(1) *
from sys.objects
where CHARINDEX(cteChars.Char, name, 0) > 0) as o
for xml path('');
If you have a Numbers or Tally table which contains a sequential list of integers you can do something like:
Select Distinct '' + Substring(Products.ProductName, N.Value, 1)
From dbo.Numbers As N
Cross Join dbo.Products
Where N.Value <= Len(Products.ProductName)
For Xml Path('')
If you are using SQL Server 2005 and beyond, you can generate your Numbers table on the fly using a CTE:
With Numbers As
(
Select Row_Number() Over ( Order By c1.object_id ) As Value
From sys.columns As c1
Cross Join sys.columns As c2
)
Select Distinct '' + Substring(Products.ProductName, N.Value, 1)
From Numbers As N
Cross Join dbo.Products
Where N.Value <= Len(Products.ProductName)
For Xml Path('')
Building on mdma's answer, this version gives you a single string, but decodes some of the changes that FOR XML will make, like & -> &.
WITH ProductChars(aChar, remain) AS (
SELECT LEFT(productName,1), RIGHT(productName, LEN(productName)-1)
FROM Products WHERE LEN(productName)>0
UNION ALL
SELECT LEFT(remain,1), RIGHT(remain, LEN(remain)-1) FROM ProductChars
WHERE LEN(remain)>0
)
SELECT STUFF((
SELECT N'' + aChar AS [text()]
FROM (SELECT DISTINCT aChar FROM Chars) base
ORDER BY aChar
FOR XML PATH, TYPE).value(N'.[1]', N'nvarchar(max)'),1, 1, N'')
-- Allow for a lot of recursion. Set to 0 for infinite recursion
OPTION (MAXRECURSION 365)