hive tranpose rows to columns - hive

Need to transpose columns to rows.
Input Data
i have pre-defined columns to be expected. if that records present ,column value to be populated as yes in the corresponding column else no by default.
set of column to be expected as follows : Col_A,Col_D,Col_X,Col_T,Col_M,Col_E
Output Data
Let me know for any questions

Table transpose (Input data)
+--------+---------+
| col_id | col_val |
+--------+---------+
| axc | col_x |
| bdf | col_f |
| cde | col_x |
| yhc | col_b |
| idx | col_a |
| dft | col_y |
+--------+---------+
Hive Query to transpose col_val :
SELECT a.col_id,IF(array_contains(collect_list(a.map_values['col_x']),'1'),'Y','N') AS col_x,
IF(array_contains(collect_list(a.map_values['col_y']),'1'),'Y','N') AS col_y,
IF(array_contains(collect_list(a.map_values['col_a']),'1'),'Y','N') AS col_a,
IF(array_contains(collect_list(a.map_values['col_b']),'1'),'Y','N') AS col_b,
IF(array_contains(collect_list(a.map_values['col_f']),'1'),'Y','N') AS col_f FROM (
SELECT col_id,
col_val,
map(col_val, '1') map_values
FROM transpose) a GROUP BY a.col_id;
Result
+--------+-------+-------+-------+-------+-------+
| col_id | col_x | col_y | col_a | col_b | col_f |
+--------+-------+-------+-------+-------+-------+
| axc | Y | N | N | N | N |
| bdf | N | N | N | N | Y |
| cde | Y | N | N | N | N |
| dft | N | Y | N | N | N |
| idx | N | N | Y | N | N |
| yhc | N | N | N | Y | N |
+--------+-------+-------+-------+-------+-------+

Related

MS Access SQL Group By values of one column, select maximum value in the second column and value in front of it in the third column

I have a table with 3 columns:
| A | B | C |
--------------
| 1 | C1 | N |
| 1 | C1 | N |
| 1 | C1 | N |
| 2 | C1 | N |
| 2 | C1 | Y |
| 2 | C1 | N |
| 3 | C1 | N |
| 3 | C2 | N |
| 3 | C3 | N |
| 4 | C1 | Y |
| 4 | C2 | N |
| 4 | C2 | N |
| 4 | C3 | N |
I want to group by A as below:
| A | B | C |
--------------
| 1 | C1 | N |
| 2 | C1 | Y |
| 3 | C3 | N |
| 4 | C3 | N |
For each unique A, keep row with maximum B and C in front of that B. And if all B for a unique A are same, but C has "Y" then then keep this row.
The code is in MS Access and I need the very SQL query.

Hive LIMIT changes GROUP BY result

SELECT `col1`
, `col2`
, count(*)
FROM `tab1`
GROUP BY `col1`
, `col2`
limit 10;
+-------+-------+--------+
| col1 | col2 | _c2 |
+-------+-------+--------+
| A | A | 1 |
| A | B | 34241 |
| A | C | 12345 |
| A | D | 145 |
| A | E | 26 |
| A | F | 224547 |
| B | A | 1429 |
| B | B | 25 |
| B | C | 94 |
| B | D | 1 |
+-------+-------+--------+
If I take one of the results from that, and do a specific query for that combination, the result changes.
SELECT `col1`
, `col2`
, count(*)
FROM `tab1`
WHERE `col1`='A'
AND `col2`='B'
GROUP BY `col1`
, `col2`;
+-------+-------+--------+
| col1 | col2 | _c2 |
+-------+-------+--------+
| A | B | 38944 |
+-------+-------+--------+
If I run set hive.map.aggr=true; then I get a different count, somewhere in between the two.
Any ideas why or how to fix?
If I run the same query with LIMIT 20 then it gives the right count. Or, I should say, the same count as the WHERE query, I haven't counted them myself to check that it is correct!

SQL insert all data to bridge table, (many to many)

I Have two table like below ;
X table ;
+---+----------+
| id| value |
+---+----------+
| 1 | x value1 |
+---+----------+
| 2 | x value2 |
+---+----------+
| 3 | x value3 |
+---+----------+
Y table ;
+---+----------+
| id| value |
+---+----------+
| 1 | y value1 |
+---+----------+
| 2 | y value2 |
+---+----------+
| 3 | y value3 |
+---+----------+
And I have created new table(x_y table)which has foreign keys for x and y tables ;
And I want to add all data related to each other to new table like below;
x_y table
+----+------+------+
| id | x_id | y_id |
+----+------+------+
| 1 | 1 | 1 |
+----+------+------+
| 2 | 1 | 2 |
+----+------+------+
| 3 | 1 | 3 |
+----+------+------+
| 4 | 2 | 1 |
+----+------+------+
| 5 | 2 | 2 |
+----+------+------+
| 6 | 2 | 3 |
+----+------+------+
| 7 | 3 | 1 |
+----+------+------+
| 8 | 3 | 2 |
+----+------+------+
| 9 | 3 | 3 |
+----+------+------+
how can I add value like this to third table on postgresql script.
This can be done with a cross join and a row_number that generates id's.
select row_number() over(order by x.id,y.id) as id,x.id as x_id,y.id as y_id
from x
cross join y
Presumably, the new table is defined with id as a serial column. If so, you would insert the data by doing:
insert into x_y (x_id, y_id)
select x.id, y.id
from x cross join
y
order by x.id, y.id;

Create Sequence only for flag N '

I want to create a row number when flag='n' only Or I am okay to sort the data by flags and rank them.
For example:
+-----+-------+-------+
| ID | Flag1 | Flag2 |
+-----+-------+-------+
| 100 | N | N |
| 100 | N | N |
| 100 | Y | N |
| 100 | N | Y |
| 101 | N | N |
| 101 | N | Y |
+-----+-------+-------+
Output:
+---------+-----+-------+-------+
| Seq_num | ID | flag1 | flag2 |
+---------+-----+-------+-------+
| 1 | 100 | N | N |
| 2 | 100 | N | N |
| 3 | 100 | Y | N |
| 4 | 100 | N | Y |
| 1 | 101 | N | N |
| 2 | 101 | N | Y |
+---------+-----+-------+-------+
I have written a query using row_number() and partition by , but this does not check for flags.
Basically, I need to first sort the data by flags. and if either of the flags or both are Y then sort them last.
how can I do this ?
You are on the right track with row_number() and partition by; the following query should work:
declare #tmp table(ID int, Flag1 char(1), Flag2 char(1))
insert into #tmp values
(100, 'N','N')
,(100, 'N','N')
,(100, 'Y','N')
,(100, 'N','Y')
,(101, 'N','N')
,(101, 'N','Y')
select row_number() over(partition by ID order by id, flag2, flag1) as Seq_num,
ID,
flag1,
flag2
from #tmp
Results:

Access Query for Ranking/Assigning Priority Values

I am doing data conversion from a previous system that was keyed in without validation rules. I am working with a table of Emergency Contacts, and trying to assign a primary contact with (Y/N) when the field is blank or duplicated (i.e. someone puts Y or N for multiple contacts I want to arbitrarily assign primary). I will also assign a new column with an alphabetic sequence (a, b, c, etc.) based on the priority which was designated in the other column.
Every ID must only have 1 Priority 'Y'.
Current Table:
+--------+---------+----------+
| id | fname | pri_cont |
+--------+---------+----------+
| 001000 | Rox | Y |
| 001000 | Dan | N |
| 001002 | May | Y |
| 001007 | Lee | Y |
| 001007 | Clive | Y |
| 001008 | Max | Y |
| 001008 | Kim | N |
| 001013 | Sam | Y |
| 001013 | Ann | |
| 001014 | Nat | Y |
| 001018 | Bruce | Y |
| 001018 | Mel | |
| 001020 | Wilson | Y |
| 001022 | Goi | Y |
| 001022 | Adele | N |
| 001022 | Gary | N |
+--------+---------+----------+
What I want:
+--------+---------+----------+----------+
| id | fname | pri_cont | priority |
+--------+---------+----------+----------+
| 001000 | Rox | Y | a |
| 001000 | Dan | N | b |
| 001002 | May | Y | a |
| 001007 | Lee | Y | a |
| 001007 | Clive | N | b |
| 001008 | Max | Y | a |
| 001008 | Kim | N | b |
| 001013 | Sam | Y | a |
| 001013 | Ann | N | b |
| 001014 | Nat | Y | a |
| 001018 | Bruce | Y | a |
| 001018 | Mel | N | b |
| 001020 | Wilson | Y | a |
| 001022 | Goi | Y | a |
| 001022 | Adele | N | b |
| 001022 | Gary | N | c |
+--------+---------+----------+----------+
How can I do that?
Well, as I see it your cleanup requires several queries (please note queries assume Emergency Contacts table has a unique autonumber, dbID):
One Select Query to count Y and N instances. Also, query can calculate Priority column using the Chr ASCII conversion of numbers to letters.:
SELECT t1.ID, t1.fname, t1.pri_cont,
(SELECT Count(*)
FROM EmergContacts t2
WHERE t1.dbID >= t2.dbID AND t1.ID = t2.ID
AND t1.pri_cont = t2.pri_cont AND t1.pri_cont = 'Y') AS YCount,
(SELECT Count(*)
FROM EmergContacts t3
WHERE t1.dbID >= t3.dbID AND t1.ID = t3.ID
AND t1.pri_cont = t3.pri_cont AND t1.pri_cont = 'N') AS NCount,
(SELECT Chr(Count(t2.ID) + 96)
FROM EmergContacts t2
WHERE t1.dbID >= t2.dbID AND t1.ID = t2.ID) AS Priority
FROM EmergContacts AS t1;
With output such as below:
ID | fname | pri_cont | YCount | NCount | Priority
1000 | Rox | Y | 1 | 0 | a
1000 | Dan | N | 0 | 1 | b
1002 | May | Y | 1 | 0 | a
1007 | Lee | Y | 1 | 0 | a
1007 | Clive | Y | 2 | 1 | b
1008 | Max | Y | 1 | 0 | a
1008 | Kim | N | 0 | 1 | b
1013 | Sam | Y | 1 | 0 | a
1013 | Ann | | 0 | 1 | b
1014 | Nat | Y | 1 | 0 | a
1018 | Bruce | Y | 1 | 0 | a
1018 | Mel | | 0 | 1 | b
1020 | Wilson | Y | 1 | 0 | a
1022 | Goi | Y | 1 | 0 | a
1022 | Adele | N | 0 | 1 | b
1022 | Gary | N | 0 | 2 | c
From there you run three update queries:
To clean up Nulls:
UPDATE EmergContacts
SET pri_cont = 'N'
WHERE pri_cont Is Null;
To clean up IDs with more than 1 Ys:
UPDATE EmergContacts
SET pri_cont = 'N'
WHERE ID IN (SELECT ID FROM EmergContPrCount WHERE YCount > 1)
AND fName IN (SELECT fName FROM EmergContPrCount WHERE YCount > 1);
And to clean up IDs with no Ys:
UPDATE EmergContacts
SET pri_cont = 'Y'
WHERE (ID IN (SELECT ID FROM EmergContPrCount WHERE YCount = 0)
AND fName IN (SELECT Max(fName) FROM EmergContPrCount WHERE YCount = 0));