HiveQL equivalent of !> in SQL

HiveQL equivalent of !> in SQL - sql

I have currently been trying to extract those values from a table that do not exist in another table. However, as the joining value contains null values - the not in, not exists and left join option do not seem to be working.
Therefore, is there a way to apply the 'not greater than' condition in the HiveQL?
For reference, this is the query that I ran, and similarly with not exists and left join ..
with date_prob as
(
select distinct visit
from t1
where dt=20161124
and dt1!=orig_ts
),
ev_data as
(
select distinct visit
from t1
where dt=20161124
and visit is not null
and origts is not null
and uid is not null
),
fin_data as
(
select x.visit
from ev_data x
where x.visit not in
(
select distinct visit
from date_prob
and visit is not null
)
)
The query that I ran for a left join -
with date_prob as
(
select distinct id
from t1
where dt1='2016-11-24'
and dt1!=orig_ts
and (datediff(dt1,orig_ts) not in ('1','-1'))
),
ev_data as
(
select distinct id
from t1
where dt1='2016-11-24'
and id is not null
)
select x.id
from ev_data x
left join date_prob y
where y.id is null
;
The Data Example -
id dt1 orig_ts
1 2016-11-24 2016-11-10
2 2016-11-24 2016-11-24
3 2016-11-24 2010-01-01
4 2016-11-24 2017-01-01
5 2016-11-24 2016-11-24
6 2016-11-24 2016-11-25
7 2016-11-23 2016-11-23
Therefore, from this table I want to remove those Id's where there is greater than a difference of a day. Thus, the query should return values only where the ID is equal to 2,5 and 6.

If you want to extract those values from a table that do not exist in another table than you can use left join and filter where second_table_key is null.
This will work even there are NULLs in keys:
--this query will return records from table a that do not exist in b
select a.id
from a left join b on a.id=b.id
where b.id is null; --only not joined
Have fixed your example. it works:
drop table if exists t1;
create table t1 (id int,dt1 string, orig_ts string );
insert overwrite table t1
select 1 id, '2016-11-24' dt1, '2016-11-10' orig_ts union all
select 2 id, '2016-11-24' dt1, '2016-11-24' orig_ts union all
select 3 id, '2016-11-24' dt1, '2010-01-01' orig_ts union all
select 4 id, '2016-11-24' dt1, '2017-01-01' orig_ts union all
select 5 id, '2016-11-24' dt1, '2016-11-24' orig_ts union all
select 6 id, '2016-11-24' dt1, '2016-11-25' orig_ts union all
select 7 id, '2016-11-23' dt1, '2016-11-23' orig_ts;
with date_prob as
(
select distinct id
from t1
where dt1='2016-11-24'
and dt1!=orig_ts
and (datediff(dt1,orig_ts) not in ('1','-1'))
),
ev_data as
(
select distinct id
from t1
where dt1='2016-11-24'
and id is not null
)
select x.id
from ev_data x
left join date_prob y on x.id=y.id
where y.id is null
;
OK
2
5
6
Time taken: 14.166 seconds, Fetched: 3 row(s)
hive>
Works as expected

Related

Oracle SQL - left join record with closest datetime

I have 2 tables.
table1:
item
end time
1
2022-11-23 08:12:00
1
2022-11-23 09:12:00
2
2022-11-22 13:12:00
3
2022-11-22 14:12:00
table2:
item
value
last_dt
1
11
2022-11-23 09:12:00
1
12
2022-11-23 08:30:00
1
13
2022-11-24 08:30:00
2
21
2022-11-22 13:12:00
3
31
2022-11-22 14:12:00
3
32
2022-11-22 14:30:00
i would like to left join table1 to table2 by comparing the table1's end_time with table2's last_dt.
below is the expected result.
item
end time
value
1
2022-11-23 08:12:00
12
1
2022-11-23 09:12:00
11
2
2022-11-22 13:12:00
21
3
2022-11-22 14:12:00
31

You may use lateral join with fetch first row only to select the closest value per row. But it effectively will perform a nested loop (which would be fast in case of index on item, last_dt and small table1).
select *
from table1
left join lateral (
select value
from table2
where table2.item = table1.item
order by abs(table2.last_dt - table1.end_time) asc
fetch first row only
) val
on 1 = 1
Alternatively you may use first aggregate function and order by time difference. It would work in old Oracle versions (at least from 10g) also.
select
table1.item,
table1.end_time,
max(table2.value) keep(dense_rank first order by abs(table2.last_dt - table1.end_time)) as value
from table1
left join table2
on table2.item = table1.item
group by
table1.item,
table1.end_time
For your sample data both will return this result:
ITEM
END_TIME
VALUE
1
2022-11-23 08:12:00
12
2
2022-11-22 13:12:00
21
3
2022-11-22 14:12:00
31
db<>fiddle

Preparing
-- ms-sql-syntax
create table table1(item# int, end_time datetime);
insert table1 select 1, '2022-11-23T08:12:00';
insert table1 select 2, '2022-11-22T13:12:00';
insert table1 select 3, '2022-11-22T14:12:00';
create table table2 (item# int, value int, end_time datetime);
insert table2 select 1, 11, '2022-11-23T09:12:00';
insert table2 select 1, 12, '2022-11-23T08:30:00';
insert table2 select 1, 13, '2022-11-24T08:30:00';
insert table2 select 2, 21, '2022-11-22T13:12:00';
insert table2 select 3, 31, '2022-11-22T14:12:00';
insert table2 select 3, 32, '2022-11-22T14:30:00';
expected result
item# end_time value
1 2022-11-23 08:12:00 12
2 2022-11-22 13:12:00 21
3 2022-11-22 14:12:00 31
Second
You do not need "left join". You need "outer apply" or "cross apply"
https://oracle-base.com/articles/12c/lateral-inline-views-cross-apply-and-outer-apply-joins-12cr1#cross-apply-join
It should be something like this:
-- it should be oracle syntax. not sure
SELECT
t1.item#, t1.end_time, t2.value
FROM table1 AS t1
CROSS APPLY (
SELECT value
FROM table2 AS t2ca
WHERE rownum = 1
ORDER BY ABS(#DATEDIFF('SS', t2ca.end_time, t1.end_time))
) AS t2
-- ms-sql-syntax. exactly
SELECT
t1.item#, t1.end_time, t2.value
FROM table1 AS t1
CROSS APPLY (
SELECT top 1 value
FROM table2 AS t2ca
ORDER BY ABS(DATEDIFF(second, t2ca.end_time, t1.end_time))
) AS t2
TOP 1 WITH TIES -- ms-sql-syntax
-- TOP 1 WITH TIES -- ms-sql-syntax
SELECT TOP 1 WITH TIES
t1.item#
, t1.end_time
, t2.value
FROM table1 AS t1
CROSS JOIN table2 AS t2
ORDER BY
ROW_NUMBER() OVER (
PARTITION BY t1.item#, t1.end_time
ORDER BY ABS(DATEDIFF(second, t2.end_time, t1.end_time))
);
SUBQUERY and window-ROW_NUMBER() -- ms-sql-syntax
-- SUBQUERY and window-ROW_NUMBER() -- ms-sql-syntax
SELECT
item#
, end_time
, value
FROM (
SELECT
t1.item#
, t1.end_time
, t2.value
, ROW_NUMBER() OVER (
PARTITION BY t1.item#, t1.end_time
ORDER BY ABS(DATEDIFF (second, t2.end_time, t1.end_time))
) AS __rn__
FROM table1 AS t1
CROSS JOIN table2 AS t2
) AS ordering
WHERE __rn__ = 1

Find exactly equal rows in 2 tables, both in terms of value and number

I have two Table, that both of them have 2 field (provinceid,cityid)
i want to find provinceid that have exactly the same cityid in this two table.
for example i have this tables:
table1:
provinceid
cityid
1
1
1
2
2
3
2
4
3
6
table2:
provinceid
cityid
1
1
1
5
2
3
2
4
3
6
3
7
i want a query that just return provinceid =2 and city id =3 and 4.
i try this query and it is right. but i want a better query:
select provinceid ,t1.cityid
from t1
left join t2 on t1=provinceid=t2.provinceid and t1.cityid=t2.cityid
where t2.provinceid is not null and t2.cityid is not null
and t1.provinceid not in (select provinceid
from t2
left join t1 on t1=provinceid=t2.provinceid and t1.cityid=t2.cityid
where t1.provinceid is not null and t1.cityid is not null)
thank you

Try this :
select t1.provinceid ,t1.cityid
from table1 t1 join table2 t2
on t1.provinceid=t2.provinceid
and t1.cityid=t2.cityid
and t1.provinceid in (
select distinct(t1.provinceid)
from
(select provinceid, count(provinceid) as cnt from table1 group by provinceid) as t1
cross join
(select provinceid ,count(provinceid) as cnt from table2 group by provinceid) as t2
where t1.cnt = t2.cnt);
Output:
provinceid
cityid
1
1
2
3
2
4

The simplest method for an exact match is to use string aggregation. The exact syntax varies by database, but in Standard SQL this looks like:
select t1.provinceid, t2.provinceid
from (select provinceid,
listagg(cityid, ',') within group (order by cityid) as cities
from t1
group by provinceid
) t1 join
(select provinceid,
listagg(cityid, ',') within group (order by cityid) as cities
from t2
group by provinceid
) t2
on t1.cities = t2.cities;
If you want the provinceids to be the same as well, just add t1.provinceid = t2.provinceid to the on clause.
Or, if you want the provinceids to be the same, you can use full join instead:
select provinceid
from t1 full join
t2
using (provinceid, cityid)
group by provinceid
having count(*) = count(t1.cityid) and count(*) = count(t2.cityid);

Besides match in provid and cityid, we are looking for exactly matching sets of records as well. There might be many different methods to this. I prefer to have string comparison for list of cities for each provide with addition to provide and cityid match clause to remove other sets of provide and cityid which are available in tables but not the exact row match.
WITH table1 AS(
SELECT 1 AS PROVID, 1 AS CITYID FROM DUAL UNION ALL
SELECT 1 AS PROVID, 2 AS CITYID FROM DUAL UNION ALL
SELECT 2 AS PROVID, 3 AS CITYID FROM DUAL UNION ALL
SELECT 2 AS PROVID, 4 AS CITYID FROM DUAL UNION ALL
SELECT 3 AS PROVID, 6 AS CITYID FROM DUAL
),
table2 AS (
SELECT 1 AS PROVID, 1 AS CITYID FROM DUAL UNION ALL
SELECT 1 AS PROVID, 5 AS CITYID FROM DUAL UNION ALL
SELECT 2 AS PROVID, 3 AS CITYID FROM DUAL UNION ALL
SELECT 2 AS PROVID, 4 AS CITYID FROM DUAL UNION ALL
SELECT 3 AS PROVID, 6 AS CITYID FROM DUAL UNION ALL
SELECT 3 AS PROVID, 7 AS CITYID FROM DUAL
),
listed_table1 AS (
SELECT
a.provid,
listagg(cityid,',') within GROUP (ORDER BY cityid) list_city
FROM table1 a
GROUP BY a.provid
),
listed_table2 AS (
SELECT
a.provid,
listagg(cityid,',') within GROUP (ORDER BY cityid) list_city
FROM table2 a
GROUP BY a.provid
)
SELECT
t1.provid, t1.cityid
FROM
(SELECT x.*, x1.list_city FROM table1 x, listed_table1 x1 WHERE x.provid = x1.provid) t1,
(SELECT y.*, y1.list_city FROM table2 y, listed_table2 y1 WHERE y.provid = y1.provid) t2
WHERE t1.provid = t2.provid AND t1.cityid = t2.cityid AND t1.list_city = t2.list_city
;

You can use (union ..)except (inner join..) to detect non-matches. Step by step
with u12 as (
select PROVID, CITYID from table1
union
select PROVID, CITYID from table2
),
c12 as (
select t1.PROVID, t2.CITYID
from table1 t1
join table2 t2 on t1.PROVID=t2.PROVID and t1.CITYID=t2.CITYID
),
nonMatch as (
select distinct PROVID
from (
select PROVID, CITYID from u12
except
select PROVID, CITYID from c12
) t
)
select *
from table1 t
where not exists (
select 1
from nonMatch n
where n.PROVID = t.PROVID);
If a number of doubles counts then count them first
with t1 as (
select PROVID, CITYID, count(*) n
from table1
group by PROVID, CITYID
),
t2 as (
select PROVID, CITYID, count(*) n
from table2
group by PROVID, CITYID
),
u12 as (
select PROVID, CITYID, n from t1
union
select PROVID, CITYID, n from t2
),
c12 as (
select t1.PROVID, t1.CITYID, t1.n
from t1
join t2 on t1.PROVID = t2.PROVID and t1.CITYID = t2.CITYID and t1.n = t2.n
),
nonMatch as (
select distinct PROVID
from (
select PROVID, CITYID, n from u12
except
select PROVID, CITYID, n from c12
) t
)
select *
from table1 t
where not exists (
select 1
from nonMatch n
where n.PROVID = t.PROVID)
db<>fiddle

ParentChild to get correct ID in T-SQL

I have two tables toplevel and parentchild. parentchild table is a tree which is related to each other. Tree looks like this.
TREE
1
-11
2
-12
--13
3
-14
--15
---16
drop table #TopLevel
create table #TopLevel
(
TopLevelID INT,
createdate DateTime
)
insert into #TopLevel
(TopLevelID,createdate
)
select 1,'2013-03-01 00:00:00' union all
select 2,'2013-03-07 00:00:00' union all
select 3,'2013-03-06 00:00:00' union all
select 4,'2013-03-03 00:00:00' union all
select 5,'2013-03-08 00:00:00' union all
select 6,'2013-03-09 00:00:00' union all
select 7,'2013-03-10 00:00:00'
drop table #parentchild
create table #parentchild
(
parentchildID INT,Parent INT,Child INT
)
insert into #parentchild
(
parentchildID,Parent, Child
)
select 1,1,11 union all
select 2,12,13 union all
select 4,15,16 union all
select 5,14,15 union all
select 3,2,12 union all
select 6,3,14
;with abc as
(
select * From #parentchild
left outer join #TopLevel on #parentchild.Parent=#TopLevel.TopLevelID
)
select * from abc
I need to find toplevelid for each row in #parentchild table. For examble in #parentchild table parent=12 is not in #toplevel table because its child too. then if we see child =12 and parent is 2, that is in #toplevel table.
please help. Thanks.
Data should be look like this in #parentchild table. * one i added manually.
parentchildID Parent Child TopLevelID createdate
1 1 11 1 2013-03-01 00:00:00.000
2 12 13 *2 *2013-03-07 00:00:00.000
4 15 16 *3 *2013-03-06 00:00:00.000
5 14 15 *3 *2013-03-06 00:00:00.000
3 2 12 2 2013-03-07 00:00:00.000
6 3 14 3 2013-03-06 00:00:00.000
Where I am doing wrong?
;with abc as
(
select ParentChildID,Parent,Child,TopLevelID,CreateDate From #parentchild
left outer join #TopLevel on #parentchild.Parent=#TopLevel.TopLevelID
)
,xyz as
(
select ParentChildID,Parent,Child,TopLevelID,CreateDate from abc where TopLevelID IS NULL
union all
select a.ParentChildID,a.Parent,a.Child,a.TopLevelID,a.CreateDate from abc a
inner join abc e on e.TopLevelID=a.Parent
)
select * from xyz
This CTE is for first #parentchild table but giving nothing.
;with abc as
(
select ParentChildID,Parent,Child from #parentchild
where parent is null
union all
select a.ParentChildID,a.Parent,a.Child from #parentchild a
inner join abc e on e.child=a.parent
)
select * from abc

I think the recursive common table expression will be your friend here. http://www.codeproject.com/Articles/683011/How-to-use-recursive-CTE-calls-in-T-SQL
Try this. It was a bit of a rush but feel free to clean it up and play around with it. There's probably a more elegant way of doing this than using the left function.
With CTE as (
select parentchildID,Parent,Child,1 as lev,cast(parent as varchar(20)) as heirarchy--,parent as TopParent
from #parentchild pc
join #TopLevel tl on tl.TopLevelID = pc.Parent
union all
select pc2.parentchildID,pc2.Parent,pc2.Child,cte.lev + 1,cast(cte.heirarchy + '\'+ cast(pc2.Child as varchar(3)) as varchar(20))
from #parentchild pc2
join CTE on cte.child = pc2.Parent
where pc2.parent not in (select TopLevelID from #TopLevel)
),
NoDate as
(
select parentchildID,parent,Child,heirarchy,
case left(heirarchy,charindex('\',heirarchy)) when '' then heirarchy else left(heirarchy,charindex('\',heirarchy)-1) end as TopLevelParent
from CTE
)
select nd.parentchildID,nd.Parent,nd.Child,nd.TopLevelParent,tl.createdate,nd.heirarchy
from NoDate nd
join #TopLevel tl on tl.TopLevelID = nd.TopLevelParent
order by parentchildID;

SQL Monthly Summary

I have a table that contains a startdate for each item
for example:
ID - Startdate
1 - 2011-01-01
2 - 2011-02-01
3 - 2011-04-01
...
I need a query that will give me the count of each item within each month, i need a full 12 month report. I tried simply grouping by the Month(StartDate) but this doesnt give me a zero for the months with no values, in the case above, for march.
so i would like the output to be along the lines of..
Month - Count
1 20
2 14
3 0
...
Any ideas?
Thanks.

SELECT A.Month, ISNULL(B.countvalue,0) Count
FROM (SELECT 1 AS MONTH
UNION
SELECT 2
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5
UNION
SELECT 6
UNION
SELECT 7
UNION
SELECT 8
UNION
SELECT 9
UNION
SELECT 10
UNION
SELECT 11
UNION
SELECT 12 ) A LEFT JOIN (SELECT datepart(month,Startdate) AS Month, Count(ID) as countvalue FROM yourTable GROUP BY datepart(month,Startdate))B
ON A.month = B.month
Hope this helps

Another way to do this using SQL Server 2005+ or Oracle.
SQL Statement
;WITH q (Month) AS (
SELECT 1
UNION ALL
SELECT Month + 1
FROM q
WHERE q.Month < 12
)
SELECT q.Month
, COUNT(i.ID)
FROM q
LEFT OUTER JOIN Input i ON MONTH(i.StartDate) = q.Month
GROUP BY
q.Month
Test script
;WITH Input (ID, StartDate) AS (
SELECT 1, '2011-01-01'
UNION ALL SELECT 2, '2011-02-01'
UNION ALL SELECT 3, '2011-04-01'
)
, q (Month) AS (
SELECT 1
UNION ALL
SELECT Month + 1
FROM q
WHERE q.Month < 12
)
SELECT q.Month
, COUNT(i.ID)
FROM q
LEFT OUTER JOIN Input i ON MONTH(i.StartDate) = q.Month
GROUP BY
q.Month

SQL - Finding differences in row order of two tables

I have two tables of ID's and dates and I want to order both tables by date and see those ids that are not in the same order
e.g.
table_1
id | date
------------
A 01/01/09
B 02/01/09
C 03/01/09
table_2
id | date
------------
A 01/01/09
B 03/01/09
C 02/01/09
and get the results
B
C
Now admittedly I could just dump the results of an order by query and diff them, but I was wondering if there is an SQL-y way of getting the same results.
Edit to clarify, the dates are not necessarily the same between tables, it's just there to determine an order
Thanks

if the dates are different in TABLE_1 and TABLE_2, you will have to join both tables on their rank. For exemple:
SQL> WITH table_1 AS (
2 SELECT 'A' ID, DATE '2009-01-01' dt FROM dual UNION ALL
3 SELECT 'B', DATE '2009-01-02' FROM dual UNION ALL
4 SELECT 'C', DATE '2009-01-03' FROM dual
5 ), table_2 AS (
6 SELECT 'A' ID, DATE '2009-01-01' dt FROM dual UNION ALL
7 SELECT 'C', DATE '2009-01-02' FROM dual UNION ALL
8 SELECT 'B', DATE '2009-01-03' FROM dual
9 )
10 SELECT t1.ID
11 FROM (SELECT ID, row_number() over(ORDER BY dt) rn FROM table_1) t1
12 WHERE (ID, rn) NOT IN (SELECT ID,
13 row_number() over(ORDER BY dt) rn
14 FROM table_2);
ID
--
B
C

Is it not just the case of joining on the date and comparing the IDs are the same. This assumes that table_1 is the master sequence.
SELECT table_1.id
FROM
table_1
INNER JOIN table_2
on table_1.[date] = table_2.[date]
WHERE table_1.id <> table_2.id
ORDER BY table_1.id

ehm select id from table_1, table_2 where table_1.id = table_2.id and table_1.date <> table_2.date ?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

HiveQL equivalent of !> in SQL - sql

Related

Oracle SQL - left join record with closest datetime

Find exactly equal rows in 2 tables, both in terms of value and number

ParentChild to get correct ID in T-SQL

SQL Monthly Summary

SQL - Finding differences in row order of two tables

Categories

Resources