Conditional statements in PIG - apache-pig

I have below input in a text file and need to generate output in another file based on the logic.
Here is my input file:
customerid|Dateofsubscription|Customercode|CustomerType|CustomerText
1001|2017-05-23|455|CODE|SPRINT56
1001|2017-05-23|455|DESC|Unlimited Plan
1001|2017-05-23|455|DATE|2017-05-05
1002|2017-05-24|455|CODE|SPRINT56
1002|2017-05-24|455|DESC|Unlimited Plan
1002|2017-05-24|455|DATE|2017-05-06
Logic:
If Customercode = 455
if( CustomerType = "CODE" )
Val= CustomerText
if( CustomerType = "DESC" )
Description = CustomerText
if( CustomerType = "DATE" )
Date = CustomerText
Output:
customerid|Val|Description|Date
1001|SPRINT56|Unlimited Plan|2017-05-05
1002|SPRINT56|Unlimited Plan|2017-05-06
Could you please help me with this.

rawData = LOAD data;
filteredData = FILTER rawData BY (Customercode == 455);
--Extract and set Val/Description/Date based on CustomerText and 'null' otherwise
ExtractedData = FOREACH filteredData GENERATE
customerId,
(CustomerType == "CODE" ? CustomerText : null) AS Val,
(CustomerType == "DESC" ? CustomerText : null) AS Description,
(CustomerType == "DATE" ? CustomerText : null) AS Date;
groupedData = GROUP ExtractedData BY customerId;
--While taking MAX, all 'nulls' will be ignored
finalData = FOREACH groupedData GENERATE
group as CustomerId,
MAX($1.Val) AS Val,
MAX($1.Description) AS Description,
MAX($1.Date) AS Date;
DUMP finalData;
I have specified the core logic. Loading, formatting and storage should be straight-forward.

Filter the input where customercode=455,generate the required 2 columns,then group by customerid and then use BagToString
.
B = FILTER A BY Customercode == 455;
C = FOREACH B GENERATE $0 as CustomerId,$4 as CustomerText;
D = GROUP C BY CustomerId;
E = FOREACH D GENERATE group AS CustomerId, BagToString(C.CustomerText, '|'); -- Note:This will generate 1001,SPRINT56|Unlimited Plan|2017-05-05 so,you will have to concat the first field with '|' and then concat the resulting field with the second field which is already delimited by '|'.
F = FOREACH E GENERATE CONCAT(CONCAT($0,'|'),$1);
DUMP F;

Related

DAX expression for ROW_NUMBER() PARTITION BY ORDER BY equivalent

I have a SQL statement like this:
(ROW_NUMBER() OVER (PARTITION BY a.[market], [MEASURE_TYPE]
ORDER BY AM, REP, ORDER_KEY)) AS ORDER_KEY
I want to write a DAX to implement the above SQL statement.
This is not as simple in DAX as in SQL. Here is an example:
Order Key Within Partition =
VAR CurrentMarket = [Market]
VAR CurrentMeasureType = [MeasureType]
VAR CurrentAM = [AM]
VAR CurrentREP = [REP]
VAR CurrentOrderKey = [OrderKey]
VAR CurrentPartition = FILTER (
a, -- the table name
[Market] = CurrentMarket
&& [MeasureType] = CurrentMeasureType
)
RETURN SUMX (
CurrentPartition,
IF (
ISONORAFTER (
CurrentAM, [AM], ASC,
CurrentREP, [REP], ASC,
CurrentOrderKey, [OrderKey], ASC
),
1
)
)
EDIT: Power Query would be better to achieve this.
let
/* Steps so far */
Source = ...,
...
a = ...,
/* End of steps so far */
/* Add steps below to add Order Key Within Partition column */
Partitions = Table.Group(
a,
{"Market", "MeasureType"}, {"Partition", each _}
)[Partition],
AddedOrderKeys = List.Transform(
Partitions,
each Table.AddIndexColumn(
Table.Sort(_, {"AM", "REP", "OrderKey"}),
"Order Key Within Partition",
1
)
),
Result = Table.Combine(AddedOrderKeys)
in
Result
I contribute with an alternative solution to the RANKX. The answer containing the Power Query is the correct one because avoid using calculated columns.
Sales[Sequence by Customer] =
VAR CurrentDate = Sales[Date]
VAR CurrentTime = Sales[Time]
RETURN COUNTROWS (
    FILTER (
        CALCULATETABLE (
            Sales,
            ALLEXCEPT ( Sales, Sales[Customer] )
        ),
        Sales[Date] < CurrentDate
          || ( Sales[Date] = CurrentDate
               && Sales[Time] <= CurrentTime )
    )
)
Source

Filter Data based on priority in LINQ

The fields in my Rate Table are Route, VehicleMasterId, VehicleType, Material, Client, UnitRate etc.
The priority order on which I have to fetch a single row is : VehicleNo > Route > Client > VehicleType, Material
Suppose I have 2 rows with same data except 1 has Client and Vehicle Type and the other one has VehicleNo.Then based on my priority, I should pick the rate of the row with VehicleNo.
To excute this In linq I have first picked all the rows with matching data. Here is my code.
public RateMasterDataModel GetRateMasterforCn(Consignment cn){
// I will always pass all the above fields in cn
var rateMaster = (from rate in Context.RateMaster
where rate.FromDate <= cn.Cndate
&& rate.ToDate >= cn.Cndate
&& (rate.VehicleTypeId != null ? rate.VehicleTypeId == cn.VehicleTypeId : true)
&& (rate.VehicleMasterId != null ? rate.VehicleMasterId == cn.VehicleMasterId : true)
&& (rate.ClientId != null ? rate.ClientId == cn.ClientId : true)
&& (rate.RouteId != null ? rate.RouteId == cn.RouteId : true)
&& (rate.MaterialMasterId != null ? rate.MaterialMasterId == cn.MaterialMasterId : true)
select new RateMasterDataModel
{
RateMasterId = rate.RateMasterId,
FromDate = rate.FromDate,
ToDate = rate.ToDate,
ClientId = rate.ClientId ,
VehicleMasterId = rate.VehicleMasterId,
VehicleTypeId = rate.VehicleTypeId,
MaterialMasterId = rate.MaterialMasterId,
UnitRate = rate.UnitRate,
LoadTypeId = rate.LoadTypeId,
LoadingPointId = rate.RouteId,
CalculationMasterId = rate.CalculationMasterId
}).ToList();
}
Please suggest how to filter after this.
You can use below code to get records ordered by VehicleNo > Route
.OrderBy(v=>v.VehicleNo).ThenBy(r=>r.RouteId)
Add multiple .ThenBy() clause as per your column requirement for sorting the data.
You mean to say if the row which doesn't have the vehicalno. filld-up then the row having Route must be selected.is it correct?

join, group by and count sql to linq conversion

Query:
SELECT COUNT(*) as Avalied
FROM LeaveMaster as lm
JOIN LeaveSubsidary as ls
ON lm.TransMasterId = ls.TransMasterId
WHERE lm.SystemCode = 'abc123' AND lm.EmployeeCode = '0014' AND lm.Status ='Approved'
Group by ls.Leave_Type
How would you convert the above sql query to linq? Thanks :)
Edit :
from p in db.PY_LeaveTransactionMasterTAB.AsEnumerable()
join e in db.PY_LeaveTransactionSubsidaryTAB.AsEnumerable() on p.LV_TransMasterId equals e.LV_TransMasterId into grp
where p.SystemCode == SysCode && p.EmployeeCode == EmpCode && p.Status == "Approved"
group grp by e.Leave_Type // This line is invalid Why?
The above is what I have tried so far, why can't I do e.Leave_Type or what is the right way to do group grp by e.Leave_Type?
You could try this:
(from p in db.PY_LeaveTransactionMasterTAB
join e in db.PY_LeaveTransactionSubsidaryTAB on p.LV_TransMasterId equals e.LV_TransMasterId
where p.SystemCode == SysCode && p.EmployeeCode == EmpCode && p.Status == "Approved"
group new {p,e} by new {e.Leave_Type } into gr
select new
{
Leave_Type = gr.Key.Leave_Type ,
Count = gr.Count()
}).ToList();

Linq union all equivalent of sql code

I have the following queries:
var majorClients = maj in dbContext.MajorClients
where (maj.startdate > startDate)
where (maj.status == "Active")
Select new Client{EntityPK = maj.mjPrimaryKey,Name = maj.name, Type = "Maj"};
var minorClients = min in dbContext.MinorClients
where (min.startdate > startDate)
where (min.status == "Active" || min.status== "Inactive")
Select new Client{EntityPK = min.mnPrimaryKey,Name = min.name, Type = "Min"};
There are clients that could appear in both major and minor tables. I would like to return a list of all occurrences of clients in both tables, however if there are matching clients by name, then I would only want to return the matching record from the majorClients table.
I have written a sql query to return the results:
SELECT mjPrimaryKey AS EntityPK,name,'Maj' AS TYPE
FROM majorClients
WHERE status = 'Active' AND startDate > #startDate
UNION ALL
SELECT mnPrimaryKey,name,'Min' FROM minorClients
WHERE status IN ('Active','Inactive') AND startDate > #startDate
WHERE name NOT IN (SELECT name FROM majorClients WHERE status = 'Active' AND startDate > #startDate)
How would I represent this query in linq?
Try this linq. To exclude duplicates from minorClients, I've used Contains method. To union all objects - Union method:
var majorClients = from maj in dbContext.MajorClients
where maj.startdate > startDate
&& maj.status == "Active"
select new Client
{
EntityPK = maj.mjPrimaryKey,
Name = maj.name,
Type = "Maj"
};
var minorClients = from min in dbContext.MinorClients
where min.startdate > startDate
&& min.status == "Active" || min.status== "Inactive"
&& !(from maj in dbContext.MajorClients
where maj.startdate > startDate
&& maj.status == "Active"
select maj.name).Contains(min.Name)
select new Client
{
EntityPK = min.mnPrimaryKey,
Name = min.name,
Type = "Min"
};
var allClients = majorClients.Union(minorClients);

Convert difficult ordered SQL statement to LINQ (edmx)

Need help with converting next statement to LINQ:
SELECT * FROM comments
WHERE good_id = '19' AND allow = '1'
ORDER BY IF(parent_id = 0, id, parent_id) DESC, id ASC
It's statement show the comment in next order list:
--Comment
--Subcomment (if parent_id != 0)
--Subcomment (if parent_id != 0)
--Comment
--Subcomment (if parent_id != 0)
etc.
But I don't know how to implement it on LINQ. Any ideas?
have not compiled this but I think I am on the right path.
var comments = from c in db.Comments
where c.good_id == 19 && c.allow = "1"
orderby myFunction(c.parent_id, c.id) descending, id ascending
select comments;
public int myFunction(int parentID, in ID)
{
return parent_id == 0 ? id : parent_id;
}
If models have relationships set up correctly then you don't need to do anything fancy
var comments = dataCoontext.Comments
.Where(c => c.GoodId = 19 &&
c.Allow = 1 &&
c.ParentId = 0) //Get top level comments
.OrderBy(c => c.Id);
foreach(var comment in comments){
Console.WriteLine("Comment:" + comment.Text);
GetSubComments(comment);
}
Then use the following
public void GetSubComments(Comment comment){
foreach(var subComment in comment.Children){ //You can apply any sub ordering to Children
Console.WriteLine("Sub comment:" + subComment.Text);
GetSubComments(subComment);
}
}