Hive - how to perform operation on state change - hive

I have two columns in my hive table. col2 contains values 'Y' and 'N'. In hive I would like to loop through the table and if every time there is a state change(from N to Y or from Y to N), I would like to insert a new row in another table. How do I do that using hiveql? Thanks guys!

insert into my_target
select ...
from (select col2
,lag (col2) over (order by ...) as prev_col2
from my_source
) t
where t.col2 <> prev_col2
;
Demo
with my_source as (select posexplode(split('YYNYNNNYYY','\\B')) as (col1,col2))
select * from my_source;
+----------------+----------------+
| my_source.col1 | my_source.col2 |
+----------------+----------------+
| 0 | Y |
+----------------+----------------+
| 1 | Y |
+----------------+----------------+
| 2 | N |
+----------------+----------------+
| 3 | Y |
+----------------+----------------+
| 4 | N |
+----------------+----------------+
| 5 | N |
+----------------+----------------+
| 6 | N |
+----------------+----------------+
| 7 | Y |
+----------------+----------------+
| 8 | Y |
+----------------+----------------+
| 9 | Y |
+----------------+----------------+
with my_source as (select posexplode(split('YYNYNNNYYY','\\B')) as (col1,col2))
select *
from (select col1
,col2
,lag (col2) over (order by col1) as prev_col2
from my_source
) t
where t.col2 <> prev_col2
;
+--------+--------+-------------+
| t.col1 | t.col2 | t.prev_col2 |
+--------+--------+-------------+
| 2 | N | Y |
+--------+--------+-------------+
| 3 | Y | N |
+--------+--------+-------------+
| 4 | N | Y |
+--------+--------+-------------+
| 7 | Y | N |
+--------+--------+-------------+

Related

Autonumber rows in select SQL based on column changes

I use a select all statement to retrieve all values from table A. Table A sample is the following:
+---+----+---+
| a | 23 | X |
+---+----+---+
| a | 23 | Y |
+---+----+---+
| a | 24 | X |
+---+----+---+
| a | 24 | Y |
+---+----+---+
| b | 24 | X |
+---+----+---+
| b | 24 | Y |
+---+----+---+
| b | 25 | X |
+---+----+---+
| b | 25 | Y |
+---+----+---+
| b | 25 | Z |
+---+----+---+
For purposes in later stadium of this query, I would like to have a record number for each unique combination of column 1 and 2. For example:
+---+----+---+---+
| a | 23 | X | 1 |
+---+----+---+---+
| a | 23 | Y | 2 |
+---+----+---+---+
| a | 24 | X | 1 |
+---+----+---+---+
| a | 24 | Y | 2 |
+---+----+---+---+
| b | 24 | X | 1 |
+---+----+---+---+
| b | 24 | Y | 2 |
+---+----+---+---+
| b | 25 | X | 1 |
+---+----+---+---+
| b | 25 | Y | 2 |
+---+----+---+---+
| b | 25 | Z | 3 |
+---+----+---+---+
Is this possible to do with SQL and how?
The description of your problem would use dense_rank():
select t.*, dense_rank() over (order by col1, col2)
from t;
Your sample data suggests dense_rank() with partitition by:
select t.*,
dense_rank() over (partition by col1, col2 order by col3) as seqnum
from t;
I believe all you need is
SELECT Distinct
ROW_NUMBER() OVER(ORDER BY Col1 ASC,Col2 ASC) AS Row_num,
Col1, Col2
FROM Table1

hive tranpose rows to columns

Need to transpose columns to rows.
Input Data
i have pre-defined columns to be expected. if that records present ,column value to be populated as yes in the corresponding column else no by default.
set of column to be expected as follows : Col_A,Col_D,Col_X,Col_T,Col_M,Col_E
Output Data
Let me know for any questions
Table transpose (Input data)
+--------+---------+
| col_id | col_val |
+--------+---------+
| axc | col_x |
| bdf | col_f |
| cde | col_x |
| yhc | col_b |
| idx | col_a |
| dft | col_y |
+--------+---------+
Hive Query to transpose col_val :
SELECT a.col_id,IF(array_contains(collect_list(a.map_values['col_x']),'1'),'Y','N') AS col_x,
IF(array_contains(collect_list(a.map_values['col_y']),'1'),'Y','N') AS col_y,
IF(array_contains(collect_list(a.map_values['col_a']),'1'),'Y','N') AS col_a,
IF(array_contains(collect_list(a.map_values['col_b']),'1'),'Y','N') AS col_b,
IF(array_contains(collect_list(a.map_values['col_f']),'1'),'Y','N') AS col_f FROM (
SELECT col_id,
col_val,
map(col_val, '1') map_values
FROM transpose) a GROUP BY a.col_id;
Result
+--------+-------+-------+-------+-------+-------+
| col_id | col_x | col_y | col_a | col_b | col_f |
+--------+-------+-------+-------+-------+-------+
| axc | Y | N | N | N | N |
| bdf | N | N | N | N | Y |
| cde | Y | N | N | N | N |
| dft | N | Y | N | N | N |
| idx | N | N | Y | N | N |
| yhc | N | N | N | Y | N |
+--------+-------+-------+-------+-------+-------+

Round down to nearest of Multiple of N

I have sql table as follows
+-----------------------------+
| |col1 | col2 | col3| col4| |
+-----------------------------+
| _______________________ |
| | a | 3 | d1 | 10 | |
| | a | 6 | d2 | 15 | |
| | b | 2 | d2 | 8 | |
| | b | 30 | d1 | 50 | |
+-----------------------------+
I would like transform the above table into below, where the transformation is
col4 = col4 - (col4 % min(col2) group by col1)
+------------------------------+
| |col1 | col2 | col3| col4| |
+------------------------------+
| ____________________________ |
| |a | 3 | d1 | 9 | |
| |a | 6 | d2 | 15 | |
| |b | 2 | d2 | 8 | |
| |b | 30 | d1 | 50 | |
| |
+------------------------------+
I could read the above table in application code to do transformation manually, was wondering if it was possible to offload the transformation to sql
Just run a simple select query for this:
select col1, col2, col3,
col4 - (col4 % min(col2) over (partition by col1))
from t;
There is no need to actually modify the table.
You can use a multi-table UPDATE to achieve your desired result, joining your table to a table of MIN(col2) values:
UPDATE table1
SET col4 = col4 - (col4 % t2.col2min)
FROM (SELECT col1, MIN(col2) AS col2min
FROM table1
GROUP BY col1) t2
WHERE table1.col1 = t2.col1
Output:
col1 col2 col3 col4
a 3 d1 9
a 6 d2 15
b 2 d2 8
b 30 d1 50
Demo on dbfiddle

SQL insert all data to bridge table, (many to many)

I Have two table like below ;
X table ;
+---+----------+
| id| value |
+---+----------+
| 1 | x value1 |
+---+----------+
| 2 | x value2 |
+---+----------+
| 3 | x value3 |
+---+----------+
Y table ;
+---+----------+
| id| value |
+---+----------+
| 1 | y value1 |
+---+----------+
| 2 | y value2 |
+---+----------+
| 3 | y value3 |
+---+----------+
And I have created new table(x_y table)which has foreign keys for x and y tables ;
And I want to add all data related to each other to new table like below;
x_y table
+----+------+------+
| id | x_id | y_id |
+----+------+------+
| 1 | 1 | 1 |
+----+------+------+
| 2 | 1 | 2 |
+----+------+------+
| 3 | 1 | 3 |
+----+------+------+
| 4 | 2 | 1 |
+----+------+------+
| 5 | 2 | 2 |
+----+------+------+
| 6 | 2 | 3 |
+----+------+------+
| 7 | 3 | 1 |
+----+------+------+
| 8 | 3 | 2 |
+----+------+------+
| 9 | 3 | 3 |
+----+------+------+
how can I add value like this to third table on postgresql script.
This can be done with a cross join and a row_number that generates id's.
select row_number() over(order by x.id,y.id) as id,x.id as x_id,y.id as y_id
from x
cross join y
Presumably, the new table is defined with id as a serial column. If so, you would insert the data by doing:
insert into x_y (x_id, y_id)
select x.id, y.id
from x cross join
y
order by x.id, y.id;

MS-Access: Merge two tables "below" each other

I have two tables in my Access-database. They look something like this:
Table1
+--------------+----------+----------+----------+
| Kabelnummer | Column1 | Column2 | Column3 |
+--------------+----------+----------+----------+
| 1 | x | x | x |
+--------------+----------+----------+----------+
| 2 | x | x | x |
+--------------+----------+----------+----------+
| 3 | x | x | x |
+--------------+----------+----------+----------+
| 4 | x | x | x |
+--------------+----------+----------+----------+
table2
+--------------+----------+----------+----------+
| Kabelnummer | Column1 | Column2 | Column3 |
+--------------+----------+----------+----------+
| 1 | x | x | x |
+--------------+----------+----------+----------+
| 2 | x | x | x |
+--------------+----------+----------+----------+
| 3 | x | x | x |
+--------------+----------+----------+----------+
| 4 | x | x | x |
+--------------+----------+----------+----------+
I need a query that gives me 1 table with the data from table1 added to the data from table2:
TableTotal
+--------------+----------+----------+----------+
| Kabelnummer | Column1 | Column2 | Column3 |
+--------------+----------+----------+----------+
| 1 | x | x | x |
+--------------+----------+----------+----------+
| 2 | x | x | x |
+--------------+----------+----------+----------+
| 3 | x | x | x |
+--------------+----------+----------+----------+
| 4 | x | x | x |
+--------------+----------+----------+----------+
| 1 | x | x | x |
+--------------+----------+----------+----------+
| 2 | x | x | x |
+--------------+----------+----------+----------+
| 3 | x | x | x |
+--------------+----------+----------+----------+
| 4 | x | x | x |
+--------------+----------+----------+----------+
The names "Column1", "Column2" and "Column3" are the same in both tables
SELECT *
FROM Table1
UNION
SELECT *
FROM table2;
The question asks for non-distinct values while the current answers provide distinct values. The method below provides non-distinct values such that
SELECT *
FROM Table1
UNION ALL
SELECT *
FROM table2;
which is often more efficient than the union method, particularly with large data sets (not having to compute the distinct).
If your goal is to append the second table to the first one, it can be achieved this way
INSERT INTO TABLE1 SELECT * FROM TABLE2;
The caveat with these other queries is that yes, they do the job, but create a third table with the joined data.