How to transpose data to hive

How to transpose data to hive - sql

This Is My hive table:
dept_id emp_cnt emp_cnt_prev_yr sales_cnt sales_cnt_prev_yr
1 10 8 10000 5000
2 15 9 20000 12000
3 12 10 30000 15000
4 6 12 40000 20000
I want to store data into another hive table like below:
dept_id metric_nm metric_val metric_val_prev-yr
1 emp_cnt 10 8
2 emp_cnt 15 9
3 emp_cnt 12 10
4 emp_cnt 6 12
1 sales_cnt 10000 5000
2 sales_cnt 20000 12000
3 sales_cnt 30000 15000
4 sales_cnt 40000 20000
What I tried so far:
SELECT dept_id,
metric_nm,
Substr(metrics,1,Locate('#',metrics,1)-1) AS metric_val,
Substr(metrics,Locate('#',metrics,1)+1) AS metric_val_prev_yr
FROM (
SELECT dept_id,
Map('emp_cnt', Concat(emp_cnt,'#',emp_cnt_prev_yr),'sales_cnt', Concat(sales_cnt,'#',sales_cnt_prev_yr)) AS metrc
FROM <TABLE>) a lateral VIEW explode(metric) ext AS metric_nm,
metrics;

Use UNION ALL to combine two metric_nm datasets into single one:
insert overwrite table table_name
select dept_id, 'emp_cnt' as metric_nm,
emp_cnt as metric_val, emp_cnt_prev_yr as metric_val_prev_yr
from your_table
UNION ALL
select dept_id, 'sales_cnt' as metric_nm,
sales_cnt as metric_val, sales_cnt_prev_yr as metric_val_prev_yr
from your_table;
One more approach (cross join with stack metric_nm to multiply rows x number of metric_nm values ), this CROSS JOIN will be executed as map-join:
--configuration
set hive.cli.print.header=true;
set hive.execution.engine=tez;
set hive.mapred.reduce.tasks.speculative.execution=false;
set mapred.reduce.tasks.speculative.execution=false;
set hive.exec.parallel=true;
set hive.exec.parallel.thread.number=36;
set hive.vectorized.execution.enabled=true;
set hive.vectorized.execution.reduce.enabled=true;
set hive.auto.convert.join=true; --this enables map-join
select dept_id, s.metric_nm,
case s.metric_nm when 'emp_cnt' then emp_cnt
when 'sales_cnt' then sales_cnt
--add more cases
end as metric_val,
case s.metric_nm when 'emp_cnt' then emp_cnt_prev_yr
when 'sales_cnt' then sales_cnt_prev_yr
--add more cases
end as metric_val_prev_yr
from your_table
cross join
(select stack (2, --number of values, add more
'sales_cnt',
'emp_cnt'
--add more values
) as metric_nm
)s

Related

Value Calculation on oracle sql

I have this table:
CREATE TABLE TEST
(
TITLE VARCHAR2(199 BYTE),
AMOUNT NUMBER,
VALUE NUMBER
)
and this INSERT statement:
INSERT INTO TEST (TITLE, AMOUNT, VAL)
VALUES (Switch, 3000, 12);
COMMIT;
We have an amount = 3000 up to 12, now we need to calculate.
So
3000 multiplied by 1 = 3000
3000 multiplied by 2 = 6000
3000 multiplied by 3 = 9000
3000 multiplied by 4 = 12000
3000 multiplied by 5 = 15000
3000 multiplied by 6 = 18000
3000 multiplied by 7 = 21000
3000 multiplied by 8 = 24000
3000 multiplied by 9 = 27000
3000 multiplied by 10 = 30000
3000 multiplied by 11 = 33000
3000 multiplied by 12 = 36000
Regards
Output is needed in the following format.
Title Amount Total
Switch 30000 3000 6000 9000 12000 15000 18000 21000 24000 27000 30000 33000 36000 231000
plug
board
Can somebody help me how to get this output in SQL?

You can use a recursive query:
WITH data (title, amount, value, idx) AS (
SELECT title, amount, value, 1
FROM test
UNION ALL
SELECT title, amount, value, idx + 1
FROM data
WHERE idx < value
) SEARCH DEPTH FIRST BY title SET order_num
SELECT title, amount * idx AS value
FROM data;
Or a correlated hierarchical query:
SELECT t.title, t.amount * l.idx AS value
FROM test t
CROSS JOIN LATERAL (
SELECT LEVEL AS idx FROM DUAL CONNECT BY LEVEL <= t.value
) l;
Which, for the sample data:
CREATE TABLE TEST ( TITLE VARCHAR2(199 BYTE), AMOUNT NUMBER, VALUE NUMBER )
INSERT INTO TEST ( TITLE, AMOUNT, VALUE ) VALUES ( 'Switch', 3000, 12);
Both output:
TITLE
VALUE
Switch
3000
Switch
6000
Switch
9000
Switch
12000
Switch
15000
Switch
18000
Switch
21000
Switch
24000
Switch
27000
Switch
30000
Switch
33000
Switch
36000
fiddle
Or for your output format:
WITH data (title, amount, value, idx) AS (
SELECT title, amount, value, 1
FROM test
UNION ALL
SELECT title, amount, value, idx + 1
FROM data
WHERE idx < value
) SEARCH DEPTH FIRST BY title SET order_num
SELECT title,
LISTAGG(amount * idx, ' ') WITHIN GROUP (ORDER BY idx) AS amounts,
SUM(amount*idx) AS total
FROM data
GROUP BY title;
or
SELECT t.title,
l.amounts,
t.amount * t.value * (t.value + 1) / 2 AS total
FROM test t
CROSS JOIN LATERAL (
SELECT LISTAGG(LEVEL * t.amount, ' ') WITHIN GROUP (ORDER BY LEVEL) AS amounts
FROM DUAL CONNECT BY LEVEL <= t.value
) l;
Which both output:
TITLE
AMOUNTS
TOTAL
Switch
3000 6000 9000 12000 15000 18000 21000 24000 27000 30000 33000 36000
234000
fiddle

Try it like here:
WITH
tbl AS
(
Select 'Switch' "TITLE", 3000 "AMOUNT", 12 "VAL" From Dual
)
--
Select TITLE, AMOUNT, TOTAL
From (Select LEVEL "ID", TITLE "TITLE", Sum(AMOUNT * LEVEL) OVER() "TOTAL",
LISTAGG(AMOUNT * LEVEL, ' ') WITHIN GROUP (ORDER BY LEVEL) OVER() "AMOUNT"
From tbl
Connect By LEVEL <= VAL )
Where ID = 1
--
-- R e s u l t :
-- TITLE AMOUNT TOTAL
-- ------- --------------------------------------------------------------------- ------
-- Switch 3000 6000 9000 12000 15000 18000 21000 24000 27000 30000 33000 36000 234000

Dear Respectful Experts
Thanks for your Cooperation,
I have oracle 10 G which do not support LISTAGG, what i need to do in 10G,
Thanks
Regards

Oralce sql：I want to select the TOP 3 Records [duplicate]

This question already has answers here:
How do I limit the number of rows returned by an Oracle query after ordering?
(14 answers)
Closed 8 months ago.
I want to select the TOP 3 Records ordered desc by 'cnt'
this is top 4
a b c cnt
99 YC 市購件異常 3
99 LY 漏油 2
99 QT16 其他異常 2
99 JGSH 機構損壞 1
then
select * from （）where rownum<= 3 order by cnt desc
get data
99 YC 市購件異常 3
99 LY 漏油 2
99 JGSH 機構損壞 1
i want to get
99 YC 市購件異常 3
99 LY 漏油 2
99 QT16 其他異常 2

Try this:
SELECT T.a, T.b, T.c, T.cnt
FROM
(
SELECT *, RANK() OVER(PARTITION BY a ORDER BY cnt DESC) RNK
FROM TEST_TBL
) T
WHERE T.RNK <= 3

It looks like you want to keep "duplicates" (in the cnt column) in the result.
In that case, I'd say that it is row_number analytic function that helps:
Sample data:
SQL> with test (a, b, cnt) as
2 (select 99, 'yc' , 3 from dual union all
3 select 99, 'ly' , 2 from dual union all
4 select 99, 'qt16', 2 from dual union all
5 select 99, 'jgsh', 1 from dual union all
6 --
7 select 99, 'abc' , 2 from dual --> yet another row with CNT = 2
8 ),
Query begins here: first rank rows (line #11), and then return the top 3 (line #15):
9 temp as
10 (select a, b, cnt,
11 row_number() over (partition by a order by cnt desc) rnk
12 from test
13 )
14 select * from temp
15 where rnk <= 3;
A B CNT RNK
---------- ---- ---------- ----------
99 yc 3 1
99 ly 2 2
99 abc 2 3
SQL>
Because, if you use rank analytic function (as Hana suggested), you might get more than desired 3 rows (see the rnk column's values) (depending on data you work with, of course; rank works with data you posted, but - if there are more rows that share the same cnt value, it won't work any more):
<snip>
9 temp as
10 (select a, b, cnt,
11 rank() over (partition by a order by cnt desc) rnk
12 from test
13 )
14 select * from temp
15 where rnk <= 3;
A B CNT RNK
---------- ---- ---------- ----------
99 yc 3 1
99 ly 2 2
99 abc 2 2
99 qt16 2 2
SQL>

Generating distrubted amount of records

I have some code that is generating parent and a random number of child records for each parent. I want there to 5 or more child records for each parent and less than 20.
I ran this several times and I seem to be getting none or very few child records in the range of 5-13.
Can someone please explain how I can get a more distributed value of child records.
If you run the last query below you will see there are no count(*) or very few for from the values 6-15.
No doubt I have a problem with my logic but I can't seem to find it. I'm also open to any new code that accomplishes the same task and produces a distributed amount of child records with an INSERT all statement.
My goal is to generate a huge amount of testing data to examine the application queries. For now I'm only generating 30 days worth.
CREATE TABLE emp_info
(
empid INTEGER,
empname VARCHAR2(50)
);
CREATE TABLE emp_attendance
(
empid INTEGER,
start_date DATE,
end_date DATE
);
INSERT ALL
-- WHEN rn=1 insert the parent record.
-- 1 will always =1 always insert a
-- child record.
WHEN rn = 1 then into emp_info (empid, empname) values (id, name)
WHEN 1 = 1 then into emp_attendance (empid, start_date, end_date)
VALUES(id, d1, d1 + DBMS_RANDOM.value (0, .75))
SELECT *
FROM
(
-- get the highest empid as start
-- so this can be run more than once.
-- if never run before start WITH 0.
WITH t AS ( SELECT nvl(max(empid), 0) maxid FROM emp_info )
SELECT CEIL(maxid + level/20) id,
CASE MOD(maxid + level, 20) WHEN 1 THEN 1 END rn,
-- create an alpha name from 3-15
-- characters in length.
DBMS_RANDOM.string('U', DBMS_RANDOM.value(3, 15)) name,
-- set the start date any where from
-- today + 30 days
TRUNC(sysdate) + DBMS_RANDOM.value (1, 30) d1,
CASE WHEN ROW_NUMBER() OVER (PARTITION BY CEIL(maxid + level/20) ORDER BY level) > 5 THEN
-- Ensure there is a minimum of
-- 5 child and a max of 20 records
-- for each parent.
--
-- Exclude first 5 records and then
-- for 6-20 records, generating
-- random number between 5-20.
-- We can then compare with any
-- number between 5-20 so that it
-- can give us any number of
-- records.
DBMS_RANDOM.value(5, 20) ELSE 5 END AS random_val
FROM t
CONNECT BY level <= 20 * 1000
)
WHERE random_val <= 19;
-- why is this where clause neeed?
SELECT empid, COUNT(*)
FROM emp_attendance
GROUP BY empid
ORDER BY empid;
EMPID COUNT(*)
1 20
2 20
3 20
4 18
5 19
6 20
7 20
8 19
9 20
10 20
11 19
……
50 20

Something like this should get you going.
SQL> with
2 emps as
3 ( select level empid, dbms_random.value(5,20) children from dual connect by level <= 20 ),
4 empatt as
5 ( select e.empid , x.start_date, x.start_date+dbms_random.value(0,0.75) end_date
6 from emps e,
7 lateral(
8 select
9 trunc(sysdate)+dbms_random.value(1,30) start_date
10 from dual
11 connect by level <= e.children
12 ) x
13 )
14 select empid, count(*)
15 from empatt
16 group by empid
17 order by 1;
EMPID COUNT(*)
---------- ----------
1 5
2 14
3 17
4 6
5 10
6 18
7 12
8 13
9 16
10 11
11 7
12 14
13 7
14 7
15 7
16 13
17 18
18 9
19 9
20 12

INSERT ALL
WHEN attendid = 1
THEN INTO emp_info (empid, empname) VALUES
(empid, dbms_random.string ( 'U', dbms_random.value (3, 15))
)
WHEN attendid <= attend_cnt
THEN INTO emp_attendance (empid, start_date, end_date) VALUES
(empid, start_date, start_date + dbms_random.value (0, .75))
WITH got_maxid AS
(
SELECT NVL (MAX (empid), 0) AS maxid
FROM emp_info
)
, new_emps AS
(
SELECT maxid + CEIL (LEVEL / 50) AS empid
, MOD (LEVEL, 50) + 1 AS attendid,
CASE
WHEN MOD (LEVEL, 50) = 0
THEN dbms_random.value (5, 50+ 1)
END AS attend_cnt0
FROM got_maxid
CONNECT BY LEVEL <= 2000
)
SELECT n.*,
MIN (attend_cnt0) OVER (PARTITION BY empid) AS attend_cnt,
TRUNC (SYSDATE) + dbms_random.value (5, 30) AS start_date
FROM new_emps n;

Incorrect first value when using dense_rank with union all

I want to 'batch' my result (coming from a union of several queries) with a predefined 'batch size' but I can't figure out why the first batch is always incorrect?
For instance with the following code:
DECLARE #BATCHSIZE AS INT = 2;
DECLARE #TEMPTABLE TABLE(ITEMID VARCHAR (10))
INSERT INTO #TEMPTABLE VALUES ('100'),('200'),('300'),('400'),('500'),('600'),('700'),('800'),('900'),('1000'),('1100'),('1200'),('1300'),('1400'),('1500')
;
WITH TEMP AS
(
SELECT * FROM #TEMPTABLE
)
SELECT *, BatchId = (dense_rank() over (order by ITEMID) / #BatchSize + 1)
FROM (
SELECT * From TEMP
UNION ALL
SELECT * From TEMP
) AS temptable
I get a result:
100 1
100 1
1000 2
1000 2
1100 2
1100 2
1200 3
1200 3
1300 3
1300 3
1400 4
1400 4
1500 4
1500 4
200 5
200 5
300 5
300 5
400 6
400 6
500 6
500 6
600 7
600 7
700 7
700 7
800 8
800 8
900 8
900 8
It seems like they're all ok except for batch 1 which only consist of itemid 100?
Must be doing something wrong here..

dense_rank() starts with 1. Shift it to start with 0 :
...
SELECT *, BatchId = (dense_rank() over (order by ITEMID) - 1 )/ #BatchSize + 1
...

select first 10 rows, then decide which rows to include afterwards

I have 2 columns in table. The 1st column represents the name of different companies and the 2nd column displays the count of products.
COL 1 COL 2
CompA 2323
CompB 2320
CompC 1999
CompD 1598
CompE 1400...so on
What i want to do is to display first 10 rows showing first 10 companies name and its count in desc order.
Then I want to compare 10th company count with 11th company count. If they match, then display 11th company name and its count. If 10th company count doesn't match 11th company count, then display first 10 records only.
I have read only access so I can't update or insert new records in table.
How can this be done?

If what you want to do is display the top ten counts, including ties. This is simple to solve with an ANALYTIC function such as RANK() or DENSE_RANK() ...
SQL> select * from
2 ( select
3 ename
4 , sal
5 , rank() over (order by sal desc) sal_rank
6 from emp
7 )
8 where sal_rank <= 10
9 /
ENAME SAL SAL_RANK
---------- ---------- ----------
QUASSNOI 6500 1
SCHNEIDER 5000 2
FEUERSTEIN 4500 3
VERREYNNE 4000 4
LIRA 3750 5
PODER 3750 5
KESTELYN 3500 7
TRICHLER 3500 7
GASPAROTTO 3000 9
ROBERTSON 2990 10
RIGBY 2990 10
11 rows selected.
SQL>
Note that if RIGBY had had the same salary as GASPAROTTO, their SAL_RANK would have been 9, ROBERTSON's would have been 11 and the result set would have comprised ten rows.
DENSE_RANK() differs from RANK() in that it always returns the top ten whatevers, instead of skipping ties...
SQL> select * from
2 ( select
3 ename
4 , sal
5 , dense_rank() over (order by sal desc) sal_rank
6 from emp
7 )
8 where sal_rank <= 10
9 /
ENAME SAL SAL_RANK
---------- ---------- ----------
1
SCHNEIDER 5000 2
FEUERSTEIN 4500 3
VERREYNNE 4000 4
LIRA 3750 5
PODER 3750 5
KESTELYN 3500 6
TRICHLER 3500 6
GASPAROTTO 3000 7
ROBERTSON 2990 8
RIGBY 2990 8
SPENCER 2850 9
BOEHMER 2450 10
13 rows selected.
SQL>

Try this:
SELECT col1, col2
FROM (
SELECT col1, col2
FROM Table
WHERE col2 >= (SELECT col2
FROM (SELECT col2 FROM Table ORDER BY col2 DESC) t1
WHERE t1.ROWNUM = 10)
ORDER BY col2 DESC) t2
WHERE t2.ROWNUM <= 11

With Top10Co As
(
Select Col1 As CompanyName, Col2 As Cnt
, Row_Number() Over ( Order By Col2 Desc ) As Num
From MyTable
)
Select CompanyName, Cnt
From Top10Co
Where Num <= 10
Union All
Select Col1, Col2
From MyTable
Where Exists (
Select 1
From Top10Co As T2
Where T2.Num = 10
And T2.CompanyName <> MyTable.Col1
And T2.Cnt = MyTable.Col2
)

I think that what you are saying can be done by writing a pl/sql block of code of top-n query.
some thing like this would help
decalre
v_col_1 companies.col_1%TYPE;
v_col_2 companies.col_2%TYPE;
count number;
col_2_all number;
CURSOR companies is SELECT *
FROM (select * from companies ORDER BY col_2)
WHERE rownum <= 10
ORDER BY rownum;
begin
loop
fetch companies into v_cal_1,v_col_2;
count++;
if count =10 then
col_2_all=v_col_2
dbms_output.put_line('company name'||v_cal_1||'company count'||v_cal_2);
elsif count =11 then
if col_2_all=v_col_2 then
dbms_output.put_line('company name'||v_cal_1||'company count'||v_cal_2);
end if;
end if;
EXIT WHEN count>11;
end;
i am not sure about the syntax but it has to be something like this :)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to transpose data to hive - sql

Related

Value Calculation on oracle sql

Oralce sql：I want to select the TOP 3 Records [duplicate]

Generating distrubted amount of records

Incorrect first value when using dense_rank with union all

select first 10 rows, then decide which rows to include afterwards

Categories

Resources