Can the result of this HQL be reused? - hive

As known, HQL is not SQL and will be executed using JAVA, so I have 2 view regarding the following HQL and could anyone tell me which one is correct?
The content inside the c will be executed initially and then its result will be stored somewhere and consequently reused when c is called;
c is just the short name of the HQL inside from and will be executed each time when c is called.
HQL:
from(
select
b.un_connect_id,
b.imp_list_no
from
(select a.*,
row_number() over(partition by list_no order by op_day desc, imp_list_no desc, un_connect_id desc) rno
from sssss a
) b
where b.rno = 1
) c
insert overwrite table yyyyyyyyy partition(tmp = 'TMP',channel,business)
select c.un_connect_id,
c.business
insert overwrite table xxxxxx
select c.list_no,
c.customer_no,
c.party_no,
'${nominal_format_date}' as op_day

Related

Printing the highest value in a table using group by

First of all, sorry for the confusing title, I don't know how to describe it any better, it's complicated.
I have a table that looks like this:
send_org
rec_org
partecipants
a
b
1
a
c
2
b
d
2
b
c
3
b
f
3
and so on.
What I'm trying to print, for each send, is the row with the highest partecipants number (I don't care about duplicates, I need just one row with the highest number); so, in this case, I'm expecting something like
a c 2
b c 3
With MySQL, my query would be
SELECT send_org, receive_org, partecipants
FROM (
SELECT *
FROM tab
ORDER BY partecipants DESC) p
GROUP BY send_org;
and it works.
Hive gives me errors about the keys not in the GROUP BY statement, so I tried to switch to collection_set(), with something like this
SELECT send_org, collect_set(receive_org)[0], max(partecipants) partecipants
FROM tab
GROUP BY send_org
ORDER BY partecipants;
But the collection_set()[0] returns the first value in the column rec (correctly grouped), not the value related to the partecipants number.
Do you have any suggestion?
If you need a better view of the SQL version, it is here.
You may use row_number to determine the "row with the highest partecipants number" eg.
SELECT send_org, receive_org, partecipants
FROM (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY send_org
ORDER BY partecipants DESC
) rn
FROM tab
) p
where rn=1

SQL Updating with random subselect everytime, giving me same value for all rows

I have a column with 90% null values, I am trying to use the remaining 10% to pick a value randomly everytime and use it to fill the missing vlues in the exact same table.
this is my table
I am using this query to do it on PostgreSQL :
UPDATE public."Assure"
SET "codePostal" = (
SELECT "codePostal"
FROM public."Assure"
WHERE "codePostal" is not NULL
ORDER BY random() * public."Assure"."CodeAssure_id"
LIMIT 1 )
WHERE "codePostal" is NULL
result :
You can use window functions:
update assure a
set codePostal = a2.codePostal
from (select a2.*,
row_number() over (order by random()) as seqnum,
count(*) over () as cnt
from assure a2
where codePostal is not null
) a2 join
(select a3.*,
row_number() over (order by random()) as seqnum
from assure a3
where codePostal is null
) a3
on (a3.seqnum % a2.cnt) = a2.seqnum - 1
where a3.codeassure_id = a.codeassure_id;
In addition to doing what you really want, another advantage is that this does not require sorting all the non-NULL values for every row that gets assigned a value.
Here is a db<>fiddle illustrating working code.
Your code will be fine if you just set aliases for the table that is updated and the table in the subquery:
UPDATE Assure a1
SET codePostal = (
SELECT a2.codePostal
FROM Assure a2
WHERE a2.codePostal IS NOT NULL
ORDER BY RANDOM() * a1.CodeAssure_id
LIMIT 1)
WHERE a1.codePostal IS NULL
See the demo.

Oracle LIMIT and 1000 column restriction

I have a SQL query that looks like this:
SELECT foo "c0",
bar "c1",
baz "c2",
...
FROM some_table
WHERE ...
In order to apply a limit, and only return a subset of records from this query, I use the following wrapper SQL:
SELECT *
FROM (
SELECT t.*,
ROW_NUMBER() OVER (ORDER BY ...) rnum
FROM (
... original SQL goes here ...
) t
)
WHERE rnum BETWEEN 1 AND 10
My problem is that the original query is selecting over 1000 columns across a large number of joins to other tables. Oracle has an internal limit of 1000 columns per table or view, and apparently the wrapper SQL I'm using to limit the result set is creating a temporary view to which this limit is applied, causing the whole thing to fail.
Is there another method of pagination that doesn't create such a view, or wouldn't otherwise be affected by the 1000 column limit?
I'm not interested in suggestions to break the work up into chunks, not select > 1000 columns, etc., as I'm already fully aware of all of these methods.
Okay, this will perform worse than what you were planning, but my point is that you could try pagination this way:
WITH CTE AS
(
... original SQL goes here ...
)
SELECT A.*
FROM CTE A
INNER JOIN (SELECT YourKey,
ROW_NUMBER() OVER (ORDER BY ...) rnum
FROM CTE) B
ON A.YourKey = B.YourKey
WHERE rnum BETWEEN 1 AND 10;
you cant have a view with 1000+ columns, so cheat a little.
select *
from foo f, foo2 f2
where (f.rowid, f2.rowid) in (select r, r2
from (select r, r2, rownum rn
from (select /*+ first_rows */ f.rowid r, f2.rowid r2
from foo f, foo2 f2
where f.c1 = f2.a1
and f.c2 = '1'
order by f.c1))
where rn >= AAA
and rownum <= BBB)
order by whatever;
now put any where clauses in the innermost bit (eg i put f.c1 = '1').
BBB = pagesize.
AAA = start point
Is the problem pagination or just returning the first 10 rows? If it is the latter, you can do:
SELECT foo "c0",
bar "c1",
baz "c2",
...
FROM some_table
WHERE ... and
rownum <= 10

Update based on subquery fails

I am trying to do the following update in Oracle 10gR2:
update
(select voyage_port_id, voyage_id, arrival_date, port_seq,
row_number() over (partition by voyage_id order by arrival_date) as new_seq
from voyage_port) t
set t.port_seq = t.new_seq
Voyage_port_id is the primary key, voyage_id is a foreign key. I'm trying to assign a sequence number based on the dates within each voyage.
However, the above fails with ORA-01732: data manipulation operation not legal on this view
What is the problem and how can I avoid it ?
Since you can't update subqueries with row_number, you'll have to calculate the row number in the set part of the update. At first I tried this:
update voyage_port a
set a.port_seq = (
select
row_number() over (partition by voyage_id order by arrival_date)
from voyage_port b
where b.voyage_port_id = a.voyage_port_id
)
But that doesn't work, because the subquery only selects one row, and then the row_number() is always 1. Using another subquery allows a meaningful result:
update voyage_port a
set a.port_seq = (
select c.rn
from (
select
voyage_port_id
, row_number() over (partition by voyage_id
order by arrival_date) as rn
from voyage_port b
) c
where c.voyage_port_id = a.voyage_port_id
)
It works, but more complex than I'd expect for this task.
You can update some views, but there are restrictions and one is that the view must not contain analytic functions. See SQL Language Reference on UPDATE and search for first occurence of "analytic".
This will work, provided no voyage visits more than one port on the same day (or the dates include a time component that makes them unique):
update voyage_port vp
set vp.port_seq =
( select count(*)
from voyage_port vp2
where vp2.voyage_id = vp.voyage_id
and vp2.arrival_date <= vp.arrival_date
)
I think this handles the case where a voyage visits more than 1 port per day and there is no time component (though the sequence of ports visited on the same day is then arbitrary):
update voyage_port vp
set vp.port_seq =
( select count(*)
from voyage_port vp2
where vp2.voyage_id = vp.voyage_id
and (vp2.arrival_date <= vp.arrival_date)
or ( vp2.arrival_date = vp.arrival_date
and vp2.voyage_port_id <= vp.voyage_port_id
)
)
Don't think you can update a derived table, I'd rewrite as:
update voyage_port
set port_seq = t.new_seq
from
voyage_port p
inner join
(select voyage_port_id, voyage_id, arrival_date, port_seq,
row_number() over (partition by voyage_id order by arrival_date) as new_seq
from voyage_port) t
on p.voyage_port_id = t.voyage_port_id
The first token after the UPDATE should be the name of the table to update, then your columns-to-update. I'm not sure what you are trying to achieve with the select statement where it is, but you can' update the result set from the select legally.
A version of the sql, guessing what you have in mind, might look like...
update voyage_port t
set t.port_seq = (<select statement that generates new value of port_seq>)
NOTE: to use a select statement to set a value like this you must make sure only 1 row will be returned from the select !
EDIT : modified statement above to reflect what I was trying to explain. The question has been answered very nicely by Andomar above

ORDER BY the IN value list

I have a simple SQL query in PostgreSQL 8.3 that grabs a bunch of comments. I provide a sorted list of values to the IN construct in the WHERE clause:
SELECT * FROM comments WHERE (comments.id IN (1,3,2,4));
This returns comments in an arbitrary order which in my happens to be ids like 1,2,3,4.
I want the resulting rows sorted like the list in the IN construct: (1,3,2,4).
How to achieve that?
You can do it quite easily with (introduced in PostgreSQL 8.2) VALUES (), ().
Syntax will be like this:
select c.*
from comments c
join (
values
(1,1),
(3,2),
(2,3),
(4,4)
) as x (id, ordering) on c.id = x.id
order by x.ordering
In Postgres 9.4 or later, this is simplest and fastest:
SELECT c.*
FROM comments c
JOIN unnest('{1,3,2,4}'::int[]) WITH ORDINALITY t(id, ord) USING (id)
ORDER BY t.ord;
WITH ORDINALITY was introduced with in Postgres 9.4.
No need for a subquery, we can use the set-returning function like a table directly. (A.k.a. "table-function".)
A string literal to hand in the array instead of an ARRAY constructor may be easier to implement with some clients.
For convenience (optionally), copy the column name we are joining to ("id" in the example), so we can join with a short USING clause to only get a single instance of the join column in the result.
Works with any input type. If your key column is of type text, provide something like '{foo,bar,baz}'::text[].
Detailed explanation:
PostgreSQL unnest() with element number
Just because it is so difficult to find and it has to be spread: in mySQL this can be done much simpler, but I don't know if it works in other SQL.
SELECT * FROM `comments`
WHERE `comments`.`id` IN ('12','5','3','17')
ORDER BY FIELD(`comments`.`id`,'12','5','3','17')
With Postgres 9.4 this can be done a bit shorter:
select c.*
from comments c
join (
select *
from unnest(array[43,47,42]) with ordinality
) as x (id, ordering) on c.id = x.id
order by x.ordering;
Or a bit more compact without a derived table:
select c.*
from comments c
join unnest(array[43,47,42]) with ordinality as x (id, ordering)
on c.id = x.id
order by x.ordering
Removing the need to manually assign/maintain a position to each value.
With Postgres 9.6 this can be done using array_position():
with x (id_list) as (
values (array[42,48,43])
)
select c.*
from comments c, x
where id = any (x.id_list)
order by array_position(x.id_list, c.id);
The CTE is used so that the list of values only needs to be specified once. If that is not important this can also be written as:
select c.*
from comments c
where id in (42,48,43)
order by array_position(array[42,48,43], c.id);
I think this way is better :
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY id=1 DESC, id=3 DESC, id=2 DESC, id=4 DESC
Another way to do it in Postgres would be to use the idx function.
SELECT *
FROM comments
ORDER BY idx(array[1,3,2,4], comments.id)
Don't forget to create the idx function first, as described here: http://wiki.postgresql.org/wiki/Array_Index
In Postgresql:
select *
from comments
where id in (1,3,2,4)
order by position(id::text in '1,3,2,4')
On researching this some more I found this solution:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY CASE "comments"."id"
WHEN 1 THEN 1
WHEN 3 THEN 2
WHEN 2 THEN 3
WHEN 4 THEN 4
END
However this seems rather verbose and might have performance issues with large datasets.
Can anyone comment on these issues?
To do this, I think you should probably have an additional "ORDER" table which defines the mapping of IDs to order (effectively doing what your response to your own question said), which you can then use as an additional column on your select which you can then sort on.
In that way, you explicitly describe the ordering you desire in the database, where it should be.
sans SEQUENCE, works only on 8.4:
select * from comments c
join
(
select id, row_number() over() as id_sorter
from (select unnest(ARRAY[1,3,2,4]) as id) as y
) x on x.id = c.id
order by x.id_sorter
SELECT * FROM "comments" JOIN (
SELECT 1 as "id",1 as "order" UNION ALL
SELECT 3,2 UNION ALL SELECT 2,3 UNION ALL SELECT 4,4
) j ON "comments"."id" = j."id" ORDER BY j.ORDER
or if you prefer evil over good:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY POSITION(','+"comments"."id"+',' IN ',1,3,2,4,')
And here's another solution that works and uses a constant table (http://www.postgresql.org/docs/8.3/interactive/sql-values.html):
SELECT * FROM comments AS c,
(VALUES (1,1),(3,2),(2,3),(4,4) ) AS t (ord_id,ord)
WHERE (c.id IN (1,3,2,4)) AND (c.id = t.ord_id)
ORDER BY ord
But again I'm not sure that this is performant.
I've got a bunch of answers now. Can I get some voting and comments so I know which is the winner!
Thanks All :-)
create sequence serial start 1;
select * from comments c
join (select unnest(ARRAY[1,3,2,4]) as id, nextval('serial') as id_sorter) x
on x.id = c.id
order by x.id_sorter;
drop sequence serial;
[EDIT]
unnest is not yet built-in in 8.3, but you can create one yourself(the beauty of any*):
create function unnest(anyarray) returns setof anyelement
language sql as
$$
select $1[i] from generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
that function can work in any type:
select unnest(array['John','Paul','George','Ringo']) as beatle
select unnest(array[1,3,2,4]) as id
Slight improvement over the version that uses a sequence I think:
CREATE OR REPLACE FUNCTION in_sort(anyarray, out id anyelement, out ordinal int)
LANGUAGE SQL AS
$$
SELECT $1[i], i FROM generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
SELECT
*
FROM
comments c
INNER JOIN (SELECT * FROM in_sort(ARRAY[1,3,2,4])) AS in_sort
USING (id)
ORDER BY in_sort.ordinal;
select * from comments where comments.id in
(select unnest(ids) from bbs where id=19795)
order by array_position((select ids from bbs where id=19795),comments.id)
here, [bbs] is the main table that has a field called ids,
and, ids is the array that store the comments.id .
passed in postgresql 9.6
Lets get a visual impression about what was already said. For example you have a table with some tasks:
SELECT a.id,a.status,a.description FROM minicloud_tasks as a ORDER BY random();
id | status | description
----+------------+------------------
4 | processing | work on postgres
6 | deleted | need some rest
3 | pending | garden party
5 | completed | work on html
And you want to order the list of tasks by its status.
The status is a list of string values:
(processing, pending, completed, deleted)
The trick is to give each status value an interger and order the list numerical:
SELECT a.id,a.status,a.description FROM minicloud_tasks AS a
JOIN (
VALUES ('processing', 1), ('pending', 2), ('completed', 3), ('deleted', 4)
) AS b (status, id) ON (a.status = b.status)
ORDER BY b.id ASC;
Which leads to:
id | status | description
----+------------+------------------
4 | processing | work on postgres
3 | pending | garden party
5 | completed | work on html
6 | deleted | need some rest
Credit #user80168
I agree with all other posters that say "don't do that" or "SQL isn't good at that". If you want to sort by some facet of comments then add another integer column to one of your tables to hold your sort criteria and sort by that value. eg "ORDER BY comments.sort DESC " If you want to sort these in a different order every time then... SQL won't be for you in this case.