Hive: Populate other columns based on unique value in a particular column - sql

I have a two tables in hive as mentioned below in Hive
Table 1:
id name value
1 abc stack
3 abc overflow
4 abc foo
6 abc bar
Table 2:
id name value
5 xyz overflow
9 xyz stackoverflow
3 xyz foo
23 xyz bar
I need to take the count of value column without considering the id and name column.
Expected output is
id name value
1 abc stack
9 xyz stackoverflow
I tried this and works in other databases but not in hive
select id,name,value from
(SELECT id,name,value FROM table1
UNION ALL
SELECT id,name,value FROM table2) t
group by value having count(value) = 1;
Hive expects group by clause like mentioned below.
select id,name,value from
(SELECT id,name,value FROM table1
UNION ALL
SELECT id,name,value FROM table2) t
group by id,name,value having count(value) = 1;
and gives the output
id name value
1 abc stack
3 abc overflow
4 abc foo
6 abc bar
5 xyz overflow
9 xyz stackoverflow
3 xyz foo
23 xyz bar
We will have to give all the columns in group by which we are using in select clause. but when i give it considers all the columns and the result is different than expected.

Calculate analytic count(*) over(partition by value).
Testing with your data example:
with
table1 as (
select stack (4,
1,'abc','stack',
3,'abc','overflow',
4,'abc','foo',
6,'abc','bar'
) as (id, name, value)
),
table2 as (
select stack (4,
5, 'xyz','overflow',
9, 'xyz','stackoverflow',
3, 'xyz','foo',
23, 'xyz','bar'
) as (id, name, value)
)
select id, name, value
from(
select id, name, value, count(*) over(partition by value) value_cnt
from
(SELECT id,name,value FROM table1
UNION ALL
SELECT id,name,value FROM table2) s
)s where value_cnt=1;
Result:
OK
id name value
1 abc stack
9 xyz stackoverflow
Time taken: 55.423 seconds, Fetched: 2 row(s)

You can try below -
seELECT id,name,value FROM table1 a left join table2 b on a.value=b.value
where b.value is null
UNION ALL SELECT
seELECT id,name,value FROM table2 a left join table1 b on a.value=b.value
where b.value is null

Related

Create a duplicate row on top of Select statement

table TEST
id
Name
1
abc
2
xyz
In general i used to get records from below query
Select id,name from TEST.
id
Name
1
abc
2
xyz
but now i want to create a duplicate for each row on top my select query
expected output: please suggest how can i achieve result like below
id
Name
1
abc
1
abc
2
xyz
2
xyz
You may cross join your table with a sequence table containing how ever many copies you want. Here is an example using an inline sequence table:
SELECT t1.id, t1.Name
FROM yourTable t1
CROSS JOIN (
SELECT 1 AS seq FROM dual UNION ALL
SELECT 2 FROM dual UNION ALL
SELECT 3 FROM dual
) t2
WHERE t2.seq <= 2
ORDER BY t1.id;
To me, UNION (ALL) set operator seems to be quite simple.
Sample data:
SQL> select * from test;
ID NAME
---------- ----
1 abc
2 xyz
UNION ALL:
SQL> select * from test
2 union all
3 select * from test;
ID NAME
---------- ----
1 abc
2 xyz
1 abc
2 xyz
SQL>
CREATE table test(
id integer,
name VARCHAR2(4)
);
INSERT into test (id, name) VALUES (1,'ABC');
INSERT into test (id, name) VALUES (2,'XYZ');
with data as (select level l from dual connect by level <= 2)
select *
from test, data
order by id, l
/
One more option is LATERAL
SELECT t.*
FROM test
, LATERAL (
SELECT id, name FROM DUAL
union all
SELECT id, name FROM DUAL
) t
One option is using a self-join along with ROW_NUMBER analytic function such as
WITH t AS
(
SELECT t1.id, t1.name, ROW_NUMBER() OVER (PARTITION BY t1.id ORDER BY 0) AS rn
FROM test t1,
test t2
)
SELECT id, name
FROM t
WHERE rn <= 2
Demo

return 0 in select count when no record found

i'm trying to get an id , and a count in same query result.
the problem is when the record doesn't exist , the count return null instead of 0
this is the query
SELECT DISTINCT Id
,(
SELECT count(*)
FROM table1
WHERE reference_id = 300000009798620
)
FROM table1
WHERE reference_id = 300000009798620;
Just use:
SELECT max(id) as id, count(*)
FROM table1
WHERE reference_id = 300000009798620
Please try the below modified query
SELECT 300000009798620 as reference_id, count(*)
FROM table1 WHERE reference_id = 300000009798620
Would this do?
table1 CTE represents your table; I simplified REFERENCE_ID. Both rows share the same reference_id value (1), with two different ID column values (1 and 2)
temp1 selects IDs for par_reference_id parameter value
temp2 counts rows for par_reference_id
First execution returns something as par_reference_id = 1 (count is 2):
SQL> with
2 table1 (id, reference_id) as
3 -- this represents your TABLE1 (but reference_id is way simpler)
4 (select 1, 1 from dual union all
5 select 2, 1 from dual),
6 temp_1 as
7 -- distinct IDs per desired reference_id
8 (select distinct id
9 from table1
10 where reference_id = &&par_reference_id
11 ),
12 temp_2 as
13 -- number of rows for that reference_id
14 (select count(*) cnt
15 from table1
16 where reference_id = &&par_reference_id
17 )
18 -- and finally:
19 select b.id, a.cnt
20 from temp_2 a left join temp_1 b on 1 = 1;
Enter value for par_reference_id: 1
ID CNT
---------- ----------
1 2
2 2
Let's try some other reference_id value (which doesn't exist in table1), e.g. 100: query doesn't return any ID, but count = 0 (as you wanted):
SQL> undefine par_reference_id
SQL> /
Enter value for par_reference_id: 100
ID CNT
---------- ----------
0
SQL>
You can use DUAL to always get a row back with your ID, then a subquery to get the count.
SELECT 300000009798620 AS id,
(SELECT COUNT (*)
FROM table1
WHERE reference_id = 300000009798620) AS amt
FROM DUAL;

How to count distinct rows and get data of the row and count of it as a second column

Let's say I have a data
ID
AAA
ABB
ABC
BDS
BRD
CXD
DCU
ETS
I would like to count distinct to a first letter rows and get the number of their appearance to the right. Sorry I know I am not a very good user of a technical language, but I am new to SQL and English is not my first language.
So by script I would like to return
ID Total
A 3
B 2
C 1
D 1
E 1
I have tried
select left(id,1), count(left(id,1) as Total
from Places
group by Id
order by Total desc;
, but it didn't work. Your help will be greatly appreciated.
select left(id,1), count(*) as Total
from Places
group by left(id,1)
order by Total desc;
Is this you need?
declare #t table(val varchar(10))
insert into #t
select 'AAA' union all
select 'ABB' union all
select 'ABC' union all
select 'BDS' union all
select 'BRD' union all
select 'CXD' union all
select 'DCU' union all
select 'ETS'
select left(t1.val,1) as id ,count(t1.val) as total from #t as t1 left join
(
select distinct right(val,1) as val from #t
) as t2 on t1.val =t2.val
group by left(t1.val,1)
Result is
id total
---- -----------
A 3
B 2
C 1
D 1
E 1

a sql query from oracle

I am using oracle 10g EE database.I have one table mytable and has two columns and data is as follows:
Note: I want to find out data based on same value in 2nd column only, it does not matter whether there exists same or different value in first column.
10 is repeated 3 times for A, B and C and these 3 are required output
similarly 20 is repeated 2 times for C and D and these are also required output
column1 column2
-------------- ---------------
A 10 //required
A 10 //required
B 10 //required
C 20//required
D 20//required
E 30--------not required as 30 is only here and not duplicated
F 40--------not required as 40 is only here and not duplicated
following output is required i.e. same value in 2nd column having same or different values in 1st column
column1 column2
-------------- ---------------
A 10
A 10
B 10
C 20
D 20
SELECT column1,
column2
FROM <table> t1
WHERE column2 IN (SELECT column2
FROM <table> t2
GROUP BY column2
HAVING count(*) > 1);
It sounds like you want
SELECT *
FROM table_name t1
WHERE column2 IN( SELECT column2
FROM table_name t2
GROUP BY column2
HAVING COUNT(*) > 1 )
This appears to work with your sample data
SQL> with table_name as (
2 select 'A' column1, 10 column2 from dual union all
3 select 'A', 10 from dual union all
4 select 'B', 10 from dual union all
5 select 'C', 20 from dual union all
6 select 'D', 30 from dual)
7 SELECT *
8 FROM table_name t1
9 WHERE column2 IN( SELECT column2
10 FROM table_name t2
11 GROUP BY column2
12 HAVING COUNT(*) > 1 );
C COLUMN2
- ----------
B 10
A 10
A 10
select * from table where column2 in ( select column2 from table group by coulmn2 having count(*)>1);
should work for you.
Thanks
Abhi

In Oracle, how do I get a page of distinct values from sorted results?

I have 2 columns in a one-to-many relationship. I want to sort on the "many" and return the first occurrence of the "one". I need to page through the data so, for example, I need to be able to get the 3rd group of 10 unique "one" values.
I have a query like this:
SELECT id, name
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
ORDER BY name, id;
There can be multiple rows in table2 for each row in table1.
The results of my query look like this:
id | name
----------------
2 | apple
23 | banana
77 | cranberry
23 | dark chocolate
8 | egg
2 | yak
19 | zebra
I need to page through the result set with each page containing n unique ids. For example, if start=1 and n=4 I want to get back
2
23
77
8
in the order they were sorted on (i.e., name), where id is returned in the position of its first occurrence. Likewise if start=3 and n=4 and order = desc I want
8
23
77
2
I tried this:
SELECT * FROM (
SELECT id, ROWNUM rnum FROM (
SELECT DISTINCT id FROM (
SELECT id, name
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
ORDER BY name, id)
WHERE ROWNUM <= 4)
WHERE rnum >=1)
which gave me the ids in numerical order, instead of being ordered as the names would be.
I also tried:
SELECT * FROM (
SELECT DISTINCT id, ROWNUM rnum FROM (
SELECT id FROM (
SELECT id, name
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
ORDER BY name, id)
WHERE ROWNUM <= 4)
WHERE rnum >=1)
but that gave me duplicate values.
How can I page through the results of this data? I just need the ids, nothing from the "many" table.
update
I suppose I'm getting closer with changing my inner query to
SELECT id, name, rank() over (order by name, id)
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
...but I'm still getting duplicate ids.
You may need to debug it a little, but but it will be something like this:
SELECT * FROM (
SELECT * FROM (
SELECT id FROM (
SELECT id, name, row_number() over (partition by id order by name) rn
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
)
) WHERE rn=1 ORDER BY name, id
) WHERE rownum>=1 and rownum<=4;
It's a bit convoluted (and I would tend to suspect that it could be simplified) but it should work. You'd can put whatever start and end position you want in the WHERE clause-- I'm showing here with start=2 and n=4 are pulled from a separate table but you could simplify things by using a couple of parameters instead.
SQL> ed
Wrote file afiedt.buf
1 with t as (
2 select 2 id, 'apple' name from dual union all
3 select 23, 'banana' from dual union all
4 select 77, 'cranberry' from dual union all
5 select 23, 'dark chocolate' from dual union all
6 select 8, 'egg' from dual union all
7 select 2, 'yak' from dual union all
8 select 19, 'zebra' from dual
9 ),
10 x as (
11 select 2 start_pos, 4 n from dual
12 )
13 select *
14 from (
15 select distinct
16 id,
17 dense_rank() over (order by min_id_rnk) outer_rnk
18 from (
19 select id,
20 min(rnk) over (partition by id) min_id_rnk
21 from (
22 select id,
23 name,
24 rank() over (order by name) rnk
25 from t
26 )
27 )
28 )
29 where outer_rnk between (select start_pos from x) and (select start_pos+n-1 from x)
30* order by outer_rnk
SQL> /
ID OUTER_RNK
---------- ----------
23 2
77 3
8 4
19 5