Count aggregation not working in pig - apache-pig

I am new to apache pig and I'm facing a problem in pig programming. I want to count the number of employees reporting under every manager. But I don't think I am getting the right output from this code. Need your help.
here's the source data file:
7369,SMITH,CLERK,7902,1980-12-17,800.00,NULL,20
7499,ALLEN,SALESMAN,7698,1981-02-20,1600.00,300.00,30
7521,WARD,SALESMAN,7698,1981-02-22,1250.00,500.00,30
7566,JONES,MANAGER,7839,1981-04-02,2975.00,NULL,20
7654,MARTIN,SALESMAN,7698,1981-09-28,1250.00,1400.00,30
7698,BLAKE,MANAGER,7839,1981-05-01,2850.00,NULL,30
7782,CLARK,MANAGER,7839,1981-06-09,2450.00,NULL,10
7788,SCOTT,ANALYST,7566,1982-12-09,3000.00,NULL,20
7839,KING,PRESIDENT,NULL,1981-11-17,5000.00,NULL,10
7844,TURNER,SALESMAN,7698,1981-09-08,1500.00,0.00,30
7876,ADAMS,CLERK,7788,1983-01-12,1100.00,NULL,20
7900,JAMES,CLERK,7698,1981-12-03,950.00,NULL,30
7902,FORD,ANALYST,7566,1981-12-03,3000.00,NULL,20
7934,MILLER,CLERK,7782,1982-01-23,1300.00,NULL,10
here's the code:
data_mgr = load '/users/Desktop/Employees.rtf' using
PigStorage(',') as (empno:int, empname:chararray, job:chararray,
mgr:int, hiredate:chararray, sal:float, comm:float, dept:int);
data_emp = load '/users/Desktop/Employees.rtf' using
PigStorage(',') as
(empno:int, empname:chararray, job:chararray, mgr:int,
hiredate:chararray, sal:float, comm:float, dept:int);
joined = join data_mgr by mgr, data_emp by empno;
select1 = foreach joined generate data_mgr::empno as mgrid,
data_mgr::empname as mgrname, data_emp::empno as empno;
grouped = group select1 by ($0, $1);
select2 = foreach grouped generate group, COUNT(select1) as
no_of_reportees;
ordered = order select2 by no_of_reportees desc;
dump ordered;

try this,
emp_data = LOAD '/users/Desktop/Employees.rtf' USING PigStorage(',') AS (empno:int, empname:chararray, job:chararray, mgrid:int, hiredate:chararray, sal:float, comm:float, dept:int);
mgr_group = GROUP emp_data BY mgrid;
emo_count = FOREACH mgr_grp GENERATE group AS mgr_id, COUNT(emp_data) AS Count;
emp_count_ordered = ORDER emp_count BY Count DESC;
DUMP emp_count_ordered;
Note : You can further use JOIN operation with initial dataset to fetch mgr name.
Do you mean something like this ? (though I have not tested)
data_emp = load '/users/Desktop/Employees.rtf' using PigStorage(',') as (empno:int, empname:chararray, job:chararray, mgrid:int, hiredate:chararray, sal:float, comm:float, dept:int);
data_mgr = load '/users/Desktop/Employees.rtf' using PigStorage(',') as (empno:int, empname:chararray, job:chararray, mgrid:int, hiredate:chararray, sal:float, comm:float, dept:int);
emp_mgr_join = join data_emp by empno, data_mgr by mgrid;
emp_mgr_join_sub = foreach emp_mgr_join generate data_mgr::mgrid as mgrid, data_mgr::empname as mgrname, data_emp::empno as empno;
emp_mgr_grouped = group emp_mgr_join_sub by mgrid;
emp_mgr_count = foreach emp_mgr_grouped generate group AS mgr_id, emp_mgr_join_sub.mgrname as mgr_name, COUNT(emp_mgr_join_sub) as no_of_reportees;
ordered = order emp_mgr_count by no_of_reportees desc;
dump ordered;

Related

How i turned postgres query to query builder format codeigniter?

i want to select datas from many tables here is the code
how to turned this
SELECT a.*, b.penyusun, c.keywords
FROM cb_monograf a
INNER JOIN (
select row_number() over (order by id_monograf) nomer, id_monograf, string_agg(id_penyusun::varchar,'-') penyusun
from cb_penyusun_monograf
group by id_monograf
) b ON a.id_monograf = b.id_monograf
INNER JOIN (
select row_number() over (order by id_monograf) nomer, id_monograf, string_agg(id_keywords::varchar,'-') keywords
from cb_keywords_monograf
group by id_monograf
) c ON a.id_monograf = c.id_monograf
WHERE a.jenis = 'buku'
into somekind of this format
$this->db->select('a.*,b.penyusun')
->from('cb_monograf a')
->join("($subquery1) b","a.id_monograf = b.id_monograf","inner")
->where('jenis', $param_type);
You're not far off - try this:
// Just to keep code a bit clearer
$db = $this->db;
// Firstly build the selects
$db->select('row_number() over (order by id_monograf) nomer, id_monograf, string_agg(id_penyusun::varchar,'-') penyusun')
$db->group_by('id_monograf');
$q1 = $db->get_compiled_select('cb_penyusun_monograf');
$db->select('row_number() over (order by id_monograf) nomer, id_monograf, string_agg(id_keywords::varchar,'-') keywords');
$db->group_by('id_monograf');
$q2 = $db->get_compiled_select('cb_keywords_monograf');
// Final query
$db->select('a.*, b.penyusun, c.keywords');
$db->join("($q1) b",'a.id_monograf = b.id_monograf','inner');
$db->join("($q2) c",'a.id_monograf = c.id_monograf','inner');
$db->where('a.jenis','buku');
$data = $db->get('cb_monograf a')->result_array(); // or row_array()
Note this query can be improved if your postgresl supports USING() (and inner can probably be dropped as well) to this:
$db->select('a.*, b.penyusun, c.keywords');
$db->join("($q1) b",'id_monograf'); // inner may also not be required
$db->join("($q2) c",'id_monograf'); // inner may also not be required
$db->where('a.jenis','buku');
$data = $db->get('cb_monograf a')->result_array(); // or row_array()

How to fetch all other column values with group by columns

How to fetch other column values which are there apart from group by as below,
SELECT DIS.iDataItemID,
DIS.iDataItemCurrentStatusID,
COUNT(*) ,
DI.iDataTypeID,
DI.iEmployeeID,
DI.iEmployerID
FROM DataItem DI INNER JOIN DataItemStatus DIS
ON DI.iDataItemID = DIS.iDataItemID Where iDataItemCurrentStatusID = 1
GROUP BY DIS.iDataItemID, DIS.iDataItemCurrentStatusID
Query works fine, if i remove the three columns from select (i.e. iDataTypeID, iEmployeeID, iEmployerID)
Please let me know, how to fetch the result with these values
When using a group by, for your SQL to be valid you need to either use an aggregate (count, max) either have the column in the group by list (MySQL is an exception in this regard if strict mode is not enforced).
I short, just add the 3 extra columns you want to see at the end of your current group by line:
SELECT DIS.iDataItemID,
DIS.iDataItemCurrentStatusID,
COUNT(*) ,
DI.iDataTypeID,
DI.iEmployeeID,
DI.iEmployerID
FROM DataItem DI INNER JOIN DataItemStatus DIS
ON DI.iDataItemID = DIS.iDataItemID Where iDataItemCurrentStatusID = 1
GROUP BY DIS.iDataItemID, DIS.iDataItemCurrentStatusID, DI.iDataTypeID,
DI.iEmployeeID, DI.iEmployerID
Most of DBMS has window function support :
SELECT DIS.iDataItemID, DIS.iDataItemCurrentStatusID,
COUNT(*) OVER (PARTITION BY DIS.iDataItemID, DIS.iDataItemCurrentStatusID),
DI.iDataTypeID,
DI.iEmployeeID,
DI.iEmployerID
FROM DataItem DI INNER JOIN
DataItemStatus DIS
ON DI.iDataItemID = DIS.iDataItemID
WHERE iDataItemCurrentStatusID = 1;
I assume your DB is also one of them which has window function support.

how to convert SQL select statement to LINQ

Hi i'm new to SQL and LINQ and i need to convert this sql statement to LINQ
select r.*,
(Select name From org_table where org_ID= r.org_ID )org
(Select name From Depart_table where Depart_ID= r.Depart_ID)depart
from reserve_table r
where name like '
try it this way:
var result = from r in reserve_table
where r.name.Contains("something")
select new {
org = (from o in org_table where o.Org_ID = r.Org_ID select o.name)
depart = (from d in Depart_table where d.Depart_ID = r.Depart_ID select d.name)
// add the rest
}

How can you write the following Rails Query in SQL?

I have the following Rails Query:
Org.find(:all).each do |org|
Org.update_counters org.id, :users_count => org.users.length
end
For various reasons, like performance, I need to write this in SQL, so I can execute SQL, and not user Rails to make the update. Any ideas how combing the rails loger into sql is possible?
Thanks
This:
Org.update_counters org.id, :users_count => org.users.length
Basically does this:
update orgs
set users_count = coalesce(users_count, 0) + #{org.users.length}
where id = #{org.id}
Unrolling one step:
update orgs
set users_count = coalesce(users_count, 0)
+ (select count(*) from org_users where org_id = #{org.id})
where id = #{org.id}
Now you've wrapped that in an Org.find(:all).each so we just have to push the iteration into the SQL and deal with #{org.id}:
update orgs o
set users_count = coalesce(users_count, 0)
+ (select count(*) from org_users ou where ou.org_id = o.id)
And if you really mean to set the users_count values rather than increment them:
update orgs o
set users_count = (select count(*) from org_users ou where ou.org_id = o.id)

Translating from SQL to Linq when having group by and order by

is it possible to transform this sql code into linq?
SELECT top 3 t.tProductIDF, Count(t.tQty)as QtyBought, p.pName, v.vName
From Transactions t, Product p, Vendor v
where t.tProductIDF = p.pID and p.pdeleted = 0 and p.pVendorIDF = v.vID and v.vActive = 1
Group By t.tProductIDF, p.pName, v.vName
Order By QtyBought desc;
i am currently here:
var topProds = (from t in this._entities.Transactions
group t by t.tProductIDF into g
orderby g.Count() descending
select new {g.Key }).Take(3);
but since i cannot access t from the select part, i do not know how i can get pName and vName
var topProds = (from t in this._entities.Transactions
group t by t.tProductIDF into g
orderby g.Count() descending
select new { ProductIDF = g.Key , Qty= g.Sum(cc=>cc.tQty) , Vname = g.Select(cc=>cc.vName).FirstOrDefault() }).Take(3);
Your g hold each row in group.