What should be pig equivalent script of the below SQL query:
SELECT fld1, fld2, fld3, SUM(fld4)
FROM Table1
GROUP BY fld1, fld2, fld3;
For Table1:
A B C 2 X Y Z
A B C 3 X Y Z
A B D 2 X Y Z
A C D 2 X Y Z
A C D 2 X Y Z
A C D 2 X Y Z
OUTPUT:
A B C 5
A B D 2
A C D 6
Ref : https://pig.apache.org/docs/r0.11.1/basic.html#GROUP, you can
find a multi-group example
For your usecase below code should be suffice
A = load 'input.csv' using PigStorage(',') AS (fld1:chararray,fld2:chararray,fld3:chararray,fld4:long,fld5:chararray,fld6:chararray,fld7:chararray);
B = FOREACH(GROUP A BY (fld1,fld2,fld3)) GENERATE FLATTEN(group) AS (fld1,fld2,fld3), SUM(A.fld4) AS fld4_aggr;
DUMP B;
Related
HOW TO PRINT A TO Z ALPHABETS IN QUERY WITHOUT USING TABLE
For Oracle, here's one option:
SQL> select chr(level + 64) letter
2 from dual
3 connect by level <= ascii('Z') - ascii('A') + 1;
LETTER
----------
A
B
C
D
E
F
<snip>
X
Y
Z
26 rows selected.
SQL>
For Postgres:
select chr(code)
from generate_series(ascii('A'), ascii('Z')) as t(code)
order by code
select chr(generate_series(65,97));
output
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
Count duplicate records by using linq
.................................................................................................................................................
Col1 col2
x a
x a
x b
x b
y c
y c
y d
y d
z e
z e
z f
now i want count like follows
x a 2
x b 2
y c 2
y d 2
in linq plese any one assist me
table
.GroupBy(x=>new{x.col1,x.clo2})
.Select(x=>new{ x.key.col1,x.key.col2,x.Count(z=>z.col1)
var Result =
from t in table
group t by new
{
t.col1,
t.col2,
} into gt
select new
{
col1 = gt.Key.col1,
col2 = gt.Key.col2,
count = gt.Count(),
};
I have a problem which I can't describe without explaining this on this example:
So there are 2 columns like:
X Y
A 2
A 1
A 3
B 3
C 2
A 1
D 2
B 1
B 3
C 1
A 1
D 3
D 1
and now I would like to select only that data from X, where one of the values from Y is 2.
So my output should look like:
X Y
A 2
A 1
A 3
C 2
A 1
D 2
C 1
A 1
D 3
D 1
because Y=2 for X=B doesn't exist in the main table.
My question is what is the query for this operation? I tried something with CASE WHEN but something didn't fix for me.
Try
SELECT X FROM Table WHERE X IN (SELECT X FROM Table WHERE Y=2)
OR Try
SELECT t1.X FROM Table t1
INNER JOIN Table t2 ON t1.X = t2.X
WHERE t2.Y = 2
Try a subquery:
SELECT X FROM table WHERE X IN (SELECT X FROM table WHERE Y = 2);
I'm building matching rules for data reconciliation systems and need your advise on adjusting my sql for it as it currently doesn't return what I need.
There are 2 source tables:
Table X Table Y
--------------------- ----------------------
Exec_ID From To Exec_ID From To
1 A B 1 B C
2 A B 2 B C
3 A B 3 B C
4 A B
5 B C
Matching conditions are:
X.To = Y.From
X.Exec_ID = Y.Exec_ID
if there is A -> B and then B -> C, it should return A -> C in the end.
if there is only A -> B and no further B -> C, it should return A -> B.
So the output should be the following.
From To
---------
A C
A C
A C
A B
SQL I'm using is:
select X.From, Y.To
from x
left outer join y on
x.To = Y.From
and x.Exec_ID = y.Exec_ID
It returns the values like
A C
A C
A C
A Null
So the last record is incorrect as it should be A B. Please help to adjust.
Check for null?
select X.From, [To] = COALESCE(Y.To, X.To)
from x
left outer join y on
x.To = Y.From
and x.Exec_ID = y.Exec_ID
Data : I have written two queries one is WITH and Other is SELECT and then self joining the table below, but both queries return different results, why it happens ?
table name is test_cur
ID_SOURCE_CUR ID_TARGET_CUR
------------- --------------
A B
B C
C D
D E
A Z
G A
K A
Q A
J J
K K
K L
L K
B A
Z A
So why the two queries below return different results ?
SELECT *
FROM test_cur tu, test_cur fu
WHERE tu.id_target_cur = 'A'
AND fu.id_source_cur = 'A'
AND tu.id_source_cur <> fu.id_target_cur;
returns 8 rows.
ID_SOURCE_CUR ID_TARGET_CUR ID_SOURCE_CUR_1 ID_TARGET_CUR_1
-------------- -------------- -------------- --------------
G A A B
K A A B
Q A A B
Z A A B
G A A Z
K A A Z
Q A A Z
B A A Z
And -
WITH qry1 AS
(SELECT *
FROM test_cur)
SELECT *
FROM qry1 tu, qry1 fu
WHERE tu.id_target_cur = 'A'
AND fu.id_target_cur = 'A'
AND tu.id_source_cur <> fu.id_target_cur;
returns 25 rows.
ID_SOURCE_CUR ID_TARGET_CUR ID_SOURCE_CUR_1 ID_TARGET_CUR_1
-------------- -------------- -------------- --------------
G A G A
G A K A
G A Q A
G A B A
G A Z A
K A G A
K A K A
K A Q A
K A B A
K A Z A
Q A G A
Q A K A
Q A Q A
Q A B A
Q A Z A
B A G A
B A K A
B A Q A
B A B A
B A Z A
Z A G A
Z A K A
Z A Q A
Z A B A
Z A Z A
Why ?
Your second query is different, you have a different WHERE clause. The first WHERE is :
WHERE tu.id_target_cur = 'A'
AND fu.id_source_cur = 'A'
AND tu.id_source_cur <> fu.id_target_cur;
The second is:
WHERE tu.id_target_cur = 'A'
AND fu.id_target_cur = 'A' -- this line is different, it should be fu.id_source_cur = 'A'
AND tu.id_source_cur <> fu.id_target_cur;
Change those and the results are the exact same on both queries.
The where clauses are different fu.id_source_cur = 'A' vs. fu.id_target_cur = 'A'