APACHE PIG - error Projected field [Units_Sold] does not exist in schema: group:chararray,D2:bag{:tuple(Item_Type:chararray,Units_Sold:int)} - apache-pig

Good afternoon - I have a Sales Dataset and am trying to see which Item has the most units sold.
Here is my code:
Country:chararray,
Item_Type:chararray,
Sales_Channel:chararray,
Order_Priority_site:chararray,
Order_Date:chararray,
Order_ID:chararray,
Ship_Date:chararray,
Units_Sold:int,
Unit_Price: int,
Unit_Cost: int,
Total_Revenue: int,
Total_Cost: int,
Total_Profit:int);
D2 = FOREACH data GENERATE Item_Type, Units_Sold;
D3 = GROUP D2 BY Item_Type;
D4 = FOREACH D3 GENERATE group, SUM(Units_Sold);
DUMP D4;```
However, I get the error:
```<file D, line 20, column 36> Invalid field projection. Projected field [Units_Sold] does not exist in schema: group:chararray,D2:bag{:tuple(Item_Type:chararray,Units_Sold:int)}.```
Does anybody know how to fix this? Let me know if you need more info, this is the first qurstion I have posted on here

SUM is expecting a bag. The error shows you the schema:
D2:bag{:tuple(Item_Type:chararray,Units_Sold:int)}
Therefore change SUM to:
SUM(D2.Units_Sold)

Related

How to get month number or name in cds view

I am creating a CDS view in Hana studio where i want to get month number or name from date (YYYYMMDD) in report, but I am unable to find any function like month or anything else,
Please help.
You can join table t247 that has the required information:
#AbapCatalog.sqlViewName: 'ZDD_DATE_T'
#AccessControl.authorizationCheck: #NOT_REQUIRED
define view zdd_date_test
with parameters p_date:abap.dats(8)
as select from demo_expressions left outer join t247 as date_information on date_information.spras = $session.system_language {
key mandt,
key id,
num1,
num2,
date_information.ltx as long_text
} where date_information.mnr = substring(:p_date, 5, 2);
This will return the following data from table demo_expressions:
id,num1,num2,long_text
0,90,18,November
1,19,99,November
2,83,82,November
3,87,92,November
4,15,56,November
5,29,4,November
6,38,87,November
7,74,13,November
8,26,99,November
9,35,50,November
The use of substring(:p_date, 5, 2) is what you use to extract the month number and then join table t247.

Pig: Summing Fields

I have some census data in which each line has a number denoting the county and fields for the number of people in a certain age range (eg, 5 and under, 5 to 17, etc.). After some initial processing in which I removed the unneeded columns, I grouped the labeled data as follows (labeled_data is of the schema {county: chararray,pop1: int,pop2: int,pop3: int,pop4: int,pop5: int,pop6: int,pop7: int,pop8: int}):
grouped_data = GROUP filtered_data BY county;
So grouped_data is of the schema
{group: chararray,filtered_data: {(county: chararray,pop1: int,pop2: int,pop3: int,pop4: int,pop5: int,pop6: int,pop7: int,pop8: int)}}
Now I would like to to sum up all of the pop fields for each county, yielding the total population of each county. I'm pretty sure the command to do this will be of the form
pop_sums = FOREACH grouped_data GENERATE group, SUM(something about the pop fields);
but I've been unable to get this to work. Thanks in advance!
I don't know if this is helpful, but the following is a representative entry of grouped_data:
(147,{(147,385,1005,283,468,649,738,933,977),(147,229,655,178,288,394,499,579,481)})
Note that the 147 entries are actually county codes, not populations. They are therefore of type chararray.
Can you try the below approach?
Sample input:
147,1,1,1,1,1,1,1,1
147,2,2,2,2,2,2,2,2
145,5,5,5,5,5,5,5,5
PigScript:
A = LOAD 'input' USING PigStorage(',') AS(country:chararray,pop1:int,pop2:int,pop3:int,pop4:int,pop5:int,pop6:int,pop7:int,pop8:int);
B = GROUP A BY country;
C = FOREACH B GENERATE group,(SUM(A.pop1)+SUM(A.pop2)+SUM(A.pop3)+SUM(A.pop4)+SUM(A.pop5)+SUM(A.pop6)+SUM(A.pop7)+SUM(A.pop8)) AS totalPopulation;
DUMP C;
Output:
(145,40)
(147,24)

PIG: sum and division, creating an object

I am writing a pig program that loads a file that separates its entires with tabs
ex: name TAB year TAB count TAB...
file = LOAD 'file.csv' USING PigStorage('\t') as (type: chararray, year: chararray,
match_count: float, volume_count: float);
-- Group by type
grouped = GROUP file BY type;
-- Flatten
by_type = FOREACH grouped GENERATE FLATTEN(group) AS (type, year, match_count, volume_count);
group_operat = FOREACH by_type GENERATE
SUM(match_count) AS sum_m,
SUM(volume_count) AS sum_v,
(float)sum_m/sm_v;
DUMP group_operat;
The issue lies in the group operations object I am trying to create. I'm wanting to sum all the match counts, sum all the volume counts and divide the match counts by volume counts.
What am I doing wrong in my arithmetic operations/object creation?
An error I receive is line 7, column 11> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable schema: left is "type:NULL,year:NULL,match_count:NULL,volume_count:NULL", right is "group:chararray"
Thank you.
Try like this, this will return type and sum.
UPDATED the working code
input.txt
A 2001 10 2
A 2002 20 3
B 2003 30 4
B 2004 40 1
PigScript:
file = LOAD 'input.txt' USING PigStorage() AS (type: chararray, year: chararray,
match_count: float, volume_count: float);
grouped = GROUP file BY type;
group_operat = FOREACH grouped {
sum_m = SUM(file.match_count);
sum_v = SUM(file.volume_count);
GENERATE group,(float)(sum_m/sum_v) as sum_mv;
}
DUMP group_operat;
Output:
(A,6.0)
(B,14.0)
try this,
file = LOAD 'file.csv' USING PigStorage('\t') as (type: chararray, year: chararray,
match_count: float, volume_count: float);
grouped = GROUP file BY (type,year);
group_operat = FOREACH grouped GENERATE group,
SUM(file.match_count) AS sum_m,
SUM(file.volume_count) AS sum_v,
(float)(SUM(file.match_count)/SUM(file.volume_count)) as sum_mv;
Above script give result group by type and year, if you want only group by type then remove from grouped
grouped = GROUP file BY type;
group_operat = FOREACH grouped GENERATE group,file.year,
SUM(file.match_count) AS sum_m,
SUM(file.volume_count) AS sum_v,
(float)(SUM(file.match_count)/SUM(file.volume_count)) as sum_mv;

Pig - Calculating percentage of total for a field

I am trying to calculate % of total for a value for in a field.
For example, for data (name, ct)
(john, 1000)
(Dan, 2000)
(liz, 2000)
I want the output to be (name, % of ct to the total)
(john, .2)
(Dan, .4)
(liz, .4)
data = load 'fakedata.txt' as (name:chararray,sqr:chararray,ct:int);
A = foreach data generate name, ct;
A = FILTER A by ct is not null;
B = group A all;
C = foreach B generate SUM(A.ct) as tot;
D = foreach A generate name, ct/(double)C.tot;
dump D;
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: C in {name: bytearray,ct: int}
I am following exactly how it is given in the http://pig.apache.org/docs/r0.10.0/basic.html
an example code in section - "Casting Relations to Scalars"
If I say Dump C, then the output is correctly generated as 5000. So there is a problem in the D. Any help is greatly appreciated.
The below works for me without any error. This is basically same as what you have. Not sure why you are getting this error. Which version of pig are you using?
data = load 'StackData' as (name:chararray, marks:int);
grp = GROUP data all;
allcount = foreach grp generate SUM(data.marks) as total;
perc = foreach data generate name, marks/(double)allcount.total;
dump perc
In Relation D, you are looping over Relation A again - it knows knowing about C.
I'd suggest calculating the SUM, then doing JOIN so each entry contains the sum. That way you'll be able to calculate the % total for each entry.

Create view query not working

This the problem in my book that I am trying to solve..I need to create this report..
A list of the programs on all channels for a specific day showing the channel number, supplier, package, program name, rating code, and show time. This will be similar to a program guide, only not package specific. This is a date-driven report, therefore it should only display programs for a single date specified.
I tried this so far..
CREATE VIEW PROG_LINEUP AS
SELECT DISTINCT
PC.PROGTIME AS `SHOWTIME`,
P.PROGNAME AS `PROGRAM TITLE`,
C.CHID AS `CHANNEL #`,
SU.SUPNAME AS `SUPPLIER`,
R.RATING AS `RATING`
FROM
PROG_CHAN PC,
CHANNELS C,
SUPPLIERS SU,
PROGRAM P,
CHANNEL_PACKAGE CP,
RATING R
WHERE
PC.SHOWDATE = '18-DEC-10'
AND P.PROGID = PC.PROGID
AND CP.CHID = PC.CHID
AND R.RATINGID = P.RATINGID
AND C.CHID = PC.CHID
AND SU.SUPID = P.SUPID
ORDER BY PC.CHID;
But it's giving this error when the table Prog_chan exists! I checked.. What is wrong?
Please tell me if any table script is required. Please help...
WHERE PC.SHOWDATE = '18-DEC-10' AND
*
ERROR at line 13:
ORA-00903: invalid table name
I cant figure out what is wrong since Prog_chan table exists and has values too in it..
QL> desc prog_chan;
Name Null? Type
----------------------------------------- -------- ----------------------------
CHANID NOT NULL NUMBER(5)
PROGID NOT NULL NUMBER(5)
SHOWDATE NOT NULL DATE
STARTTIME NOT NULL DATE
#Jeff -
I removed that comma but error is this now...
CHANNEL_PACKAGE CP, * ERROR at line 11: ORA-00942: table or view does not exist
You have an erroneous extra comma before the WHERE clause.
RATING R,
WHERE PC.SHOWDATE = '18-DEC-10' AND