SQL querying the same table twice with criteria - sql

I have 1 table
table contains something like:
ID, parent_item, Comp_item
1, 123, a
2, 123, b
3, 123, c
4, 456, a
5, 456, b
6, 456, d
7, 789, b
8, 789, c
9, 789, d
10, a, a
11, b, b
12, c, c
13, d, d
I need to return only the parent_items that have a Comp_item of a and b
so I should only get:
123
456

Here is a canonical way to do this:
SELECT parent_item
FROM yourTable
WHERE Comp_item IN ('a', 'b')
GROUP BY parent_item
HAVING COUNT(DISTINCT Comp_item) = 2
The idea here to aggregate by parent_item, restricting to only records having a Comp_item of a or b, then asserting that the distinct number of Comp_item values is 2.

Alternatively you could use INTERSECT:
select parent_item from my_table where comp_item = 'a'
intersect
select parent_item from my_table where comp_item = 'b';

If you have a parent item table, the most efficient method is possibly:
select p.*
from parent_items p
where exists (select 1 from t1 where t1.parent_id = p.parent_id and t1.comp_item = 'a') and
exists (select 1 from t1 where t1.parent_id = p.parent_id and t1.comp_item = 'b');
For optimal performance, you want an index on t1(parent_id, comp_item).
I should emphasize that I very much like the aggregation solution by Tim. I bring this up because performance was brought up in a comment. Both intersect and group by expend effort aggregating (in the first case to remove duplicates, in the second explicitly). An approach like this does not incur that cost -- assuming that a table with unique parent ids is available.

Related

BigQuery recursively join based on links between 2 ID columns

Given a table representing a many-many join between IDs like the following:
WITH t AS (
SELECT 1 AS id_1, 'a' AS id_2,
UNION ALL SELECT 2, 'a'
UNION ALL SELECT 2, 'b'
UNION ALL SELECT 3, 'b'
UNION ALL SELECT 4, 'c'
UNION ALL SELECT 5, 'c'
UNION ALL SELECT 6, 'd'
UNION ALL SELECT 6, 'e'
UNION ALL SELECT 7, 'f'
)
SELECT * FROM t
id_1
id_2
1
a
2
a
2
b
3
b
4
c
5
c
6
d
6
e
7
f
I would like to be able recursively join then aggregate rows in order to find each disconnected sub-graph represented by these links - that is each collection of IDs that are linked together:
The desired output for the example above would look something like this:
id_1_coll
id_2_coll
1, 2, 3
a, b
4, 5
c
6
d, e
7
f
where each row contains all the other IDs one could reach following the links in the table.
Note that 1 links to b even although there is no explicit link row because we can follow the path 1 --> a --> 2 --> b using the links in the first 3 rows.
One potential approach is to remodel the relationships between id_1 and id_2 such that we get all the links from id_1 to itself then use a recursive common table expression to traverse all the possible paths between id_1 values then aggregate (somewhat arbitrarily) to the lowest such value that can be reached from each id_1.
Explanation
Our steps are
Remodel the relationship into a series of self-joins for id_1
Map each id_1 to the lowest id_1 that it is linked to via a recursive CTE
Aggregate the recursive CTE using the lowest id_1s as the GROUP BY column and grabbing all the linked id_1 and id_2 values via the ARRAY_AGG() function
We can use something like this to remodel the relationships into a self join (1.):
SELECT
a.id_1, a.id_2, b.id_1 AS linked_id
FROM t as a
INNER JOIN t as b
ON a.id_2 = b.id_2
WHERE a.id_1 != b.id_1
Next - to set up the recursive table expression (2.) we can tweak the query above to also give us the lowest (LEAST) of the values for id_1 at each link then use this as the base iteration:
WITH RECURSIVE base_iter AS (
SELECT
a.id_1, b.id_1 AS linked_id, LEAST(a.id_1, b.id_1) AS lowest_linked_id
FROM t as a
INNER JOIN t as b
ON a.id_2 = b.id_2
WHERE a.id_1 != b.id_1
)
We can also grab the lowest id_1 value at this time:
id_1
linked_id
lowest_linked_id
1
2
1
2
1
1
2
3
2
3
2
2
4
5
4
5
4
4
For our recursive loop, we want to maintain an ARRAY of linked ids and join each new iteration such that the id_1 value of the n+1th iteration is equal to the linked_id value of the nth iteration AND the nth linked_id value is not in the array of previously linked ids.
We can code this as follows:
recursive_loop AS (
SELECT id_1, linked_id, lowest_linked_id, [linked_id ] AS linked_ids
FROM base_iter
UNION ALL
SELECT
prev_iter.id_1, prev_iter.linked_id,
iter.lowest_linked_id,
ARRAY_CONCAT(iter.linked_ids, [prev_iter.linked_id])
FROM base_iter AS prev_iter
JOIN recursive_loop AS iter
ON iter.id_1 = prev_iter.linked_id
AND iter.lowest_linked_id < prev_iter.lowest_linked_id
AND prev_iter.linked_id NOT IN UNNEST(iter.linked_ids )
)
Giving us the following results:
|id_1|linked_id|lowest_linked_id|linked_ids|
|----|---------|------------|---|
|3|2|1|[1,2]|
|2|3|1|[1,2,3]|
|4|5|4|[5]|
|1|2|1|[2]|
|5|4|4|[4]|
|2|3|2|[3]|
|2|1|1|[1]|
|3|2|2|[2]|
which we can now link back to the original table for the id_2 values then aggregate (3.) as shown in the complete query below
Solution
WITH RECURSIVE t AS (
SELECT 1 AS id_1, 'a' AS id_2,
UNION ALL SELECT 2, 'a'
UNION ALL SELECT 2, 'b'
UNION ALL SELECT 3, 'b'
UNION ALL SELECT 4, 'c'
UNION ALL SELECT 5, 'c'
UNION ALL SELECT 6, 'd'
UNION ALL SELECT 6, 'e'
UNION ALL SELECT 7, 'f'
),
base_iter AS (
SELECT
a.id_1, b.id_1 AS linked_id, LEAST(a.id_1, b.id_1) AS lowest_linked_id
FROM t as a
INNER JOIN t as b
ON a.id_2 = b.id_2
WHERE a.id_1 != b.id_1
),
recursive_loop AS (
SELECT id_1, linked_id, lowest_linked_id, [linked_id ] AS linked_ids
FROM base_iter
UNION ALL
SELECT
prev_iter.id_1, prev_iter.linked_id,
iter.lowest_linked_id,
ARRAY_CONCAT(iter.linked_ids, [prev_iter.linked_id])
FROM base_iter AS prev_iter
JOIN recursive_loop AS iter
ON iter.id_1 = prev_iter.linked_id
AND iter.lowest_linked_id < prev_iter.lowest_linked_id
AND prev_iter.linked_id NOT IN UNNEST(iter.linked_ids )
),
link_back AS (
SELECT
t.id_1, IFNULL(lowest_linked_id, t.id_1) AS lowest_linked_id, t.id_2
FROM t
LEFT JOIN recursive_loop
ON t.id_1 = recursive_loop.id_1
),
by_id_1 AS (
SELECT
id_1,
MIN(lowest_linked_id) AS grp
FROM link_back
GROUP BY 1
),
by_id_2 AS (
SELECT
id_2,
MIN(lowest_linked_id) AS grp
FROM link_back
GROUP BY 1
),
result AS (
SELECT
by_id_1.grp,
ARRAY_AGG(DISTINCT id_1 ORDER BY id_1) AS id1_coll,
ARRAY_AGG(DISTINCT id_2 ORDER BY id_2) AS id2_coll,
FROM
by_id_1
INNER JOIN by_id_2
ON by_id_1.grp = by_id_2.grp
GROUP BY grp
)
SELECT grp, TO_JSON(id1_coll) AS id1_coll, TO_JSON(id2_coll) AS id2_coll
FROM result ORDER BY grp
Giving us the required output:
grp
id1_coll
id2_coll
1
[1,2,3]
[a,b]
4
[4,5]
[c]
6
[6]
[d,e]
7
[7]
[f]
Limitations/Issues
Unfortunately this approach is inneficient (we have to traverse every single pathway before aggregating it back together) and fails with the real-world case where we have several million join rows. When trying to execute on this data BigQuery runs up a huge "Slot time consumed" then eventually errors out with:
Resources exceeded during query execution: Your project or organization exceeded the maximum disk and memory limit available for shuffle operations. Consider provisioning more slots, reducing query concurrency, or using more efficient logic in this job.
I hope there might be a better way of doing the recursive join such that pathways can be merged/aggregated as we go (if we have an id_1 value AND a linked_id in already in the list of linked_ids we dont need to check it further).
Using ROW_NUMBER() the query is as the follow:
WITH RECURSIVE
t AS (
SELECT 1 AS id_1, 'a' AS id_2,
UNION ALL SELECT 2, 'a'
UNION ALL SELECT 2, 'b'
UNION ALL SELECT 3, 'b'
UNION ALL SELECT 4, 'c'
UNION ALL SELECT 5, 'c'
UNION ALL SELECT 6, 'd'
UNION ALL SELECT 6, 'e'
UNION ALL SELECT 7, 'f'
),
t1 AS (
SELECT ROW_NUMBER() OVER(ORDER BY t.id_1) n, t.id_1, t.id_2 FROM t
),
t2 AS (
SELECT n, [n] n_arr, [id_1] arr_1, [id_2] arr_2, id_1, id_2 FROM t1
WHERE n IN (SELECT MIN(n) FROM t1 GROUP BY id_1)
UNION ALL
SELECT t2.n, ARRAY_CONCAT(t2.n_arr, [t1.n]),
CASE WHEN t1.id_1 NOT IN UNNEST(t2.arr_1)
THEN ARRAY_CONCAT(t2.arr_1, [t1.id_1])
ELSE t2.arr_1 END,
CASE WHEN t1.id_2 NOT IN UNNEST(t2.arr_2)
THEN ARRAY_CONCAT(t2.arr_2, [t1.id_2])
ELSE t2.arr_2 END,
t1.id_1, t1.id_2
FROM t2 JOIN t1 ON
t2.n < t1.n AND
t1.n NOT IN UNNEST(t2.n_arr) AND
(t2.id_1 = t1.id_1 OR t2.id_2 = t1.id_2) AND
(t1.id_1 NOT IN UNNEST(t2.arr_1) OR t1.id_2 NOT IN UNNEST(t2.arr_2))
),
t3 AS (
SELECT
n,
ARRAY_AGG(DISTINCT id_1 ORDER BY id_1) arr_1,
ARRAY_AGG(DISTINCT id_2 ORDER BY id_2) arr_2
FROM t2
WHERE n IN (SELECT MIN(n) FROM t2 GROUP BY id_1)
GROUP BY n
)
SELECT n, TO_JSON(arr_1), TO_JSON(arr_2) FROM t3 ORDER BY n
t1 : Append with row numbers.
t2 : Extract rows matching either id_1 or id_2 by recursive query.
t3 : Make arrays from id_1 and id_2 with ARRAY_AGG().
However, it may not help your Limitations/Issues.
The way this question is phrased makes it appear you want "show me distinct groups from a presorted list, unchained to a previous group". For that, something like this should suffice (assuming auto-incrementing order/one or both id's move to the next value):
SELECT GrpNr,
STRING_AGG(DISTINCT CAST(id_1 as STRING), ',') as id_1_coll,
STRING_AGG(DISTINCT CAST(id_2 as STRING), ',') as id_2_coll
FROM
(
SELECT id_1, id_2,
SUM(CASE WHEN a.id_1 <> a.previous_id_1 and a.id_2 <> a.previous_id_2 THEN 1 ELSE 0 END)
OVER (ORDER BY RowNr) as GrpNr
FROM
(
SELECT *,
ROW_NUMBER() OVER () as RowNr,
LAG(t.id_1, 1) OVER (ORDER BY 1) AS previous_id_1,
LAG(t.id_2, 1) OVER (ORDER BY 1) AS previous_id_2
FROM t
) a
ORDER BY RowNr
) a
GROUP BY GrpNr
ORDER BY GrpNr
I don't think this is the question you mean to ask. This seems to be a graph-walking problem as referenced in the other answers, and in the response from #GordonLinoff to the question here, which I tested (and presume works for BigQuery).
This can also be done using sequential updates as done by #RomanPekar
here (which I also tested). The main consideration seems to be performance. I'd assume dbms have gotten better at recursion since this was posted.
Rolling it up in either case should be fairly easy using String_Agg() as given above or as you have.
I'd be curious to see a more accurate representation of the data. If there is some consistency to how the data is stored/limitations to levels of nesting/other group structures there may be a shortcut approach other than recursion or iterative updates.

How to get a recursive tree for a single table element

I have a table of this type
| id | parent_id | | title |
parent_id refers to the id of the same table
I need to get a recursive tree for an element knowing only its parent.
it will be clearer what I mean in the picture
On the picture i need to get recursive parent tree for element E, (ะก id is known) i need get A - C - E tree without B and D and other elements, only for my element E
The nesting can be even greater, I just need to get all the parents in order without unnecessary elements.
This is needed for bread crumbs on my website
How i can do this in PostgreSQL?
Use RECURSIVE query
with recursive rec(id,parent_id, title) as (
select id,parent_id, title from t
where title = 'E'
union all
select t.*
from rec
join t on t.id = rec.parent_id
)
select * from rec
id|parent_id|title|
--+---------+-----+
5| 3|E |
3| 1|C |
1| |A |
Join your table on herself
SELECT t1.title, t2.title as parent, t3.title as great_parent, ...
FROM my_table t1
JOIN my_table t2 on t1.parent_id = t2.id
JOIN my_table t3 on t2.parent_id = t3.id
...
WHERE t1.title = 'curent'
if you don't know how many parent you have, use LEFT JOIN and do as mutch column as needed
thanks to Marmite Bomber
and with a small improvement to know the kinship level :
--drop table if exists recusive_test ;
create table recusive_test (id_parent integer, id integer, title varchar);
insert into recusive_test (id_parent , id , title) values
(1, 2, 'A')
,(2, 3, 'B')
,( 2, 4, 'C')
,( 4, 5, 'D')
,( 3, 6, 'E')
,( 3, 7, 'F')
,( 6, 8, 'G')
,( 6, 9, 'H')
,( 4, 10, 'I')
,( 4, 11, 'J');
WITH RECURSIVE search_tree(id, id_parent, title, step) AS (
SELECT t.id, t.id_parent, t.title ,1
FROM recusive_test t
where title = 'I'
UNION ALL
SELECT t.id, t.id_parent, t.title, st.step+1
FROM recusive_test t, search_tree st
WHERE t.id = st.id_parent
)
SELECT * FROM search_tree ORDER BY step DESC;

Oracle SQL Query - Element containing every element in subquery

I have 3 tables like so :
Document(ID:integer, Title:string)
Keywords(ID:integer, Name:string)
Document_Keywords(DocumentID:integer, KeywordID:integer)
Document_Keywords.DocumentID referencing Document.ID
Document_Keywords.KeywordID referencing Keywords.ID
A document contains [0, n] keywords.
I want to get every Document which Keywords contains at least a set of another Document's Keywords. As so:
Foo, Bar and Fred-> Documents
Foo's keywords: {1, 2, 3}
Bar's keywords: {1, 2, 3, 4}
Fred's keywords: {1, 3, 5}
If we search for all the documents keywords containing Foo's keywords, we get Bar but not Fred.
Here is the query I have so far:
SELECT KeywordID
FROM Document_Keywords DK
JOIN Document D ON D.ID = DK.DocumentID
WHERE D.title = 'Foo'
MINUS
SELECT KeywordID
FROM Document_Keywords
WHERE DocumentID = 1;
It returns an empty table if the Document with ID = 1 keywords contains at least every keywords of Foo's.
I can't find any other ways to solve this probleme as I can only use Oracle SQL to answer it.
If you want to get keywords with documents:
SELECT KeywordID, D1.ID DOC_ID, D1.Title
FROM Document_Keywords DK1
JOIN Document D1
on DK1.DocumentID = D1.ID
WHERE exists
(select 1
from Document D2
join Document_Keywords DK2
on D2.ID = DK2.DocumentID
where D2.title = 'Foo'
and DK1.KeywordID=DK2.KeywordID
and D1.ID!= D2.ID
);
Full test case with test data and results:
with
Document(ID, Title) as (
select 1, 'Foo' from dual union all
select 2, 'Bar' from dual union all
select 3, 'Fred' from dual
)
,Keywords(ID, Name) as (
select level, 'Key'||level from dual connect by level<=5
)
,Document_Keywords(DocumentID, KeywordID) as (
select 1, column_value from table(sys.odcinumberlist(1,2,3)) union all -- Foo's keywords: {1, 2, 3}
select 2, column_value from table(sys.odcinumberlist(1,2,3,4)) union all -- Bar's keywords: {1, 2, 3, 4}
select 3, column_value from table(sys.odcinumberlist(1,3,5)) -- Fred's keywords: {1, 3, 5}
)
SELECT KeywordID, D1.ID DOC_ID, D1.Title
FROM Document_Keywords DK1
JOIN Document D1
on DK1.DocumentID = D1.ID
WHERE exists
(select 1
from Document D2
join Document_Keywords DK2
on D2.ID = DK2.DocumentID
where D2.title = 'Foo'
and DK1.KeywordID=DK2.KeywordID
and D1.ID!= D2.ID
);
KEYWORDID DOC_ID TITLE
---------- ---------- -----
1 2 Bar
1 3 Fred
2 2 Bar
3 2 Bar
3 3 Fred
If you want without documents, just list of keywords:
SELECT distinct KeywordID
FROM Document_Keywords DK1
WHERE exists
(select 1
from Document D2
join Document_Keywords DK2
on D2.ID = DK2.DocumentID
where D2.title = 'Foo'
and DK1.KeywordID=DK2.KeywordID
and DK1.DocumentID!= D2.ID
);
Full tests case with the results:
with
Document(ID, Title) as (
select 1, 'Foo' from dual union all
select 2, 'Bar' from dual union all
select 3, 'Fred' from dual
)
,Keywords(ID, Name) as (
select level, 'Key'||level from dual connect by level<=5
)
,Document_Keywords(DocumentID, KeywordID) as (
select 1, column_value from table(sys.odcinumberlist(1,2,3)) union all -- Foo's keywords: {1, 2, 3}
select 2, column_value from table(sys.odcinumberlist(1,2,3,4)) union all -- Bar's keywords: {1, 2, 3, 4}
select 3, column_value from table(sys.odcinumberlist(1,3,5)) -- Fred's keywords: {1, 3, 5}
)
SELECT distinct KeywordID
FROM Document_Keywords DK1
WHERE exists
(select 1
from Document D2
join Document_Keywords DK2
on D2.ID = DK2.DocumentID
where D2.title = 'Foo'
and DK1.KeywordID=DK2.KeywordID
and DK1.DocumentID!= D2.ID
);
KEYWORDID
----------
1
2
3
If I have this right, you want documents whose keywords contain all of Fred's keywords as a submultiset.
Setup (building on Sayan's example):
create or replace type number_tt as table of number;
create table documents(id, title) as
select 1, 'Foo' from dual union all
select 2, 'Bar' from dual union all
select 3, 'Fred' from dual;
create table document_keywords(documentid, keywordid) as
select 1, column_value from table(number_tt(1,2,3)) union all
select 2, column_value from table(number_tt(1,2,3,4)) union all
select 3, column_value from table(number_tt(1,3,5))
Query:
with document_keywords_agg(documentid, title, keywordlist, keywordids) as (
select d.id, d.title
, listagg(dk.keywordid, ', ') within group (order by dk.keywordid)
, cast(collect(dk.keywordid) as number_tt)
from documents d
join document_keywords dk on dk.documentid = d.id
group by d.id, d.title
)
select dk1.documentid, dk1.title, dk1.keywordlist
, dk2.title as subset_title
, dk2.keywordlist as subset_keywords
from document_keywords_agg dk1
join document_keywords_agg dk2
on dk2.keywordids submultiset of dk1.keywordids
where dk2.documentid <> dk1.documentid;
Results:
DOCUMENTID
TITLE
KEYWORDLIST
SUBSET_TITLE
SUBSET_KEYWORDS
2
Bar
1, 2, 3, 4
Foo
1, 2, 3
To extend the example a little, let's add another document 'Dino' containing keywords {1,3,5,9}:
insert all
when rownum = 1 then into documents values (docid, 'Dino')
when 1=1 then into document_keywords values (docid, kw)
select 4 as docid, column_value as kw from table(number_tt(1,3,5,9));
Now the results are:
DOCUMENTID
TITLE
KEYWORDLIST
SUBSET_TITLE
SUBSET_KEYWORDS
2
Bar
1, 2, 3, 4
Foo
1, 2, 3
4
Dino
1, 3, 5, 9
Fred
1, 3, 5
(Add a filter to the where clause if you just want to check one document.)
SQL Fiddle
So, inner joining Document_Keyword to itself on KeywordID gives you the raw materials for what you are looking for, no?
. . .
From Document_Keywords A Inner Join Document_Keywords B On A.KeywordID=B.KeywordID
And A.DocumentID<>B.DocumentID
. . .
Granted, if the same Keyword is in multiple other documents you will get multiple occurrences of A.*, but you can summarize those out with a Group By, or possibly a Distinct clause.
If you need text-y results, you can add Document and Keywords table joins to this on the table A keys.
A query that delivers results in the format you specified above would be:
Select Title, ListAgg(KeywordID,',') Within Group (Order By KeywordID) as KeyWord_IDs
From (
Select D.Title,D.ID,A.KeywordID
From Document_Keywords A Inner Join Document_Keywords B On A.KeywordID=B.KeywordID
And A.DocumentID<>B.DocumentID
Inner Join Document D on D.ID=A.DocumentID
Group By A.DocumentID,A.KeyWordID
)
Group By Title,ID

how to use Pivot SQL

For Example,
select A,B,C,D,E,YEAR
FROM t1
where t1.year = 2018
UNION ALL
select A,B,C,D,E,YEAR
FROM t2
where t2.year = 2017
execute like this
A --- B----C----D----E----YEAR
2 --- 4----6----8----10---2018
1 --- 3----5----7----9----2017
I would like to have a result like this
2018 2017
A 2 1
B 4 3
C 6 5
D 8 7
E 10 9
I know I should use pivot, and googled around, but I can not figure out how to write a code to have a result like above.
Thanks
Assuming you are using Oracle 11.1 or above, you can use the pivot and unpivot operators. In your problem, the data is already "pivoted" one way, but you want it pivoted the other way; so you must un-pivot first, and then re-pivot the way you want it. In the solution below, the data is read from the table (I use a WITH clause to generate the test data, but you don't need the WITH clause, you can start at SELECT and use your actual table and column names). The data is fed through unpivot and then immediately to pivot - you don't need subqueries or anything like that.
Note about column names: don't use year, it is an Oracle keyword and you will cause confusion if not (much) worse. And in the output, you can't have 2018 and such as column names - identifiers must begin with a letter. You can go around these limitations using names in double quotes; that is a very poor practice though, best left just to the Oracle parser and not used by us humans. You will see I called the input column yr and the output columns y2018 and such.
with
inputs ( a, b, c, d, e, yr ) as (
select 2, 4, 6, 8, 10, 2018 from dual union all
select 1, 3, 5, 7, 9, 2017 from dual
)
select col, y2018, y2017
from inputs
unpivot ( val for col in (a as 'A', b as 'B', c as 'C', d as 'D', e as 'E') )
pivot ( min(val) for yr in (2018 as y2018, 2017 as y2017) )
order by col -- if needed
;
COL Y2018 Y2017
--- ---------- ----------
A 2 1
B 4 3
C 6 5
D 8 7
E 10 9
ADDED:
Here is how this used to be done (before the pivot and unpivot were introduced in Oracle 11.1). Unpivoting was done with a cross join to a small helper table, with a single column and as many rows as there were columns to unpivot in the base table - in this case, five columns, a, b, c, d, e need to be unpivoted, so the helper table has five rows. And pivoting was done with conditional aggregation. Both can be combined into a single query - there is no need for subqueries (other than to create the helper "table" or inline view).
Note, importantly, that the base table is read just once. Other methods of unpivoting are much more inefficient, because they require reading the base table multiple times.
select decode(lvl, 1, 'A', 2, 'B', 3, 'C', 4, 'D', 5, 'E') as col,
max(case yr when 2018 then decode(lvl, 1, a, 2, b, 3, c, 4, d, 5, e) end) as y2018,
max(case yr when 2017 then decode(lvl, 1, a, 2, b, 3, c, 4, d, 5, e) end) as y2017
from inputs cross join ( select level as lvl from dual connect by level <= 5 )
group by decode(lvl, 1, 'A', 2, 'B', 3, 'C', 4, 'D', 5, 'E')
order by decode(lvl, 1, 'A', 2, 'B', 3, 'C', 4, 'D', 5, 'E')
;
This looks worse than it is; the same decode() function is called three times, but with exactly the same arguments, so it is calculated only once, the value is cached and it is reused in the other places. (It is calculated for group by and then reused for select and for order by.)
To test, you can use the same WITH clause as above - or your actual data.
decode() is proprietary to Oracle, but the same can be written with case expressions (essentially identical to the decode() approach, just different syntax) and it will work in most other database products.
This is a bit tricky -- unpivotting and repivotting. Here is one way:
select col,
max(case when year = 2018 then val end) as val_2018,
max(case when year = 2017 then val end) as val_2017
from ((select 'A' as col, A as val, YEAR from t1 where year = 2018) union all
(select 'B' as col, B as val, YEAR from t1 where year = 2018) union all
(select 'C' as col, C as val, YEAR from t1 where year = 2018) union all
(select 'D' as col, D as val, YEAR from t1 where year = 2018) union all
(select 'E' as col, D as val, YEAR from t1 where year = 2018) union all
(select 'A' as col, A as val, YEAR from t2 where year = 2017) union all
(select 'B' as col, B as val, YEAR from t2 where year = 2017) union all
(select 'C' as col, C as val, YEAR from t2 where year = 2017) union all
(select 'D' as col, D as val, YEAR from t2 where year = 2017) union all
(select 'E' as col, D as val, YEAR from t2 where year = 2017)
) tt
group by col;
You don't specify the database, but this is pretty database independent.

Idiomatic equivalent to map structure

My analytics involves the need to aggregate rows and to store the number of different values occurrences of a field someField in all the rows.
Sample data structure
[someField, someKey]
I'm trying to GROUP BY someKey and then be able to know for each of the results how many time there was each someField values
Example:
[someField: a, someKey: 1],
[someField: a, someKey: 1],
[someField: b, someKey: 1],
[someField: c, someKey: 2],
[someField: d, someKey: 2]
What I would like to achieve:
[someKey: 1, fields: {a: 2, b: 1}],
[someKey: 2, fields: {c: 1, d: 1}],
Does it work for you?
WITH data AS (
select 'a' someField, 1 someKey UNION all
select 'a', 1 UNION ALL
select 'b', 1 UNION ALL
select 'c', 2 UNION ALL
select 'd', 2)
SELECT
someKey,
ARRAY_AGG(STRUCT(someField, freq)) fields
FROM(
SELECT
someField,
someKey,
COUNT(someField) freq
FROM data
GROUP BY 1, 2
)
GROUP BY 1
Results:
It won't give exactly the results you are looking for, but it might work to receive the same queries your previous result would. As you said, for each key you can retrieve how many times (column freq) someField happened.
I've been looking for a way on how to aggregate structs and couldn't find one. But retrieving the results as an ARRAY of STRUCTS turned out to be quite straightforward.
There's probably a smarter way to do this (and get it in the format you want e.g. using an Array for the 2nd column), but this might be enough for you:
with sample as (
select 'a' as someField, 1 as someKey UNION all
select 'a' as someField, 1 as someKey UNION ALL
select 'b' as someField, 1 as someKey UNION ALL
select 'c' as someField, 2 as someKey UNION ALL
select 'd' as someField, 2 as someKey)
SELECT
someKey,
SUM(IF(someField = 'a', 1, 0)) AS a,
SUM(IF(someField = 'b', 1, 0)) AS b,
SUM(IF(someField = 'c', 1, 0)) AS c,
SUM(IF(someField = 'd', 1, 0)) AS d
FROM
sample
GROUP BY
someKey order by somekey asc
Results:
someKey a b c d
---------------------
1 2 1 0 0
2 0 0 1 1
This is well used technique in BigQuery (see here).
I'm trying to GROUP BY someKey and then be able to know for each of the results how many time there was each someField values
#standardSQL
SELECT
someKey,
someField,
COUNT(someField) freq
FROM yourTable
GROUP BY 1, 2
-- ORDER BY someKey, someField
What I would like to achieve:
[someKey: 1, fields: {a: 2, b: 1}],
[someKey: 2, fields: {c: 1, d: 1}],
This is different from what you expressed in words - it is called pivoting and based on your comment - The a, b, c, and d keys are potentially infinite - most likely is not what you need. At the same time - pivoting is easily doable too (if you have some finite number of field values) and you can find plenty of related posts