Conditional statements in PIG

Conditional statements in PIG - apache-pig

I have below input in a text file and need to generate output in another file based on the logic.
Here is my input file:
customerid|Dateofsubscription|Customercode|CustomerType|CustomerText
1001|2017-05-23|455|CODE|SPRINT56
1001|2017-05-23|455|DESC|Unlimited Plan
1001|2017-05-23|455|DATE|2017-05-05
1002|2017-05-24|455|CODE|SPRINT56
1002|2017-05-24|455|DESC|Unlimited Plan
1002|2017-05-24|455|DATE|2017-05-06
Logic:
If Customercode = 455
if( CustomerType = "CODE" )
Val= CustomerText
if( CustomerType = "DESC" )
Description = CustomerText
if( CustomerType = "DATE" )
Date = CustomerText
Output:
customerid|Val|Description|Date
1001|SPRINT56|Unlimited Plan|2017-05-05
1002|SPRINT56|Unlimited Plan|2017-05-06
Could you please help me with this.

rawData = LOAD data;
filteredData = FILTER rawData BY (Customercode == 455);
--Extract and set Val/Description/Date based on CustomerText and 'null' otherwise
ExtractedData = FOREACH filteredData GENERATE
customerId,
(CustomerType == "CODE" ? CustomerText : null) AS Val,
(CustomerType == "DESC" ? CustomerText : null) AS Description,
(CustomerType == "DATE" ? CustomerText : null) AS Date;
groupedData = GROUP ExtractedData BY customerId;
--While taking MAX, all 'nulls' will be ignored
finalData = FOREACH groupedData GENERATE
group as CustomerId,
MAX($1.Val) AS Val,
MAX($1.Description) AS Description,
MAX($1.Date) AS Date;
DUMP finalData;
I have specified the core logic. Loading, formatting and storage should be straight-forward.

Filter the input where customercode=455,generate the required 2 columns,then group by customerid and then use BagToString
.
B = FILTER A BY Customercode == 455;
C = FOREACH B GENERATE $0 as CustomerId,$4 as CustomerText;
D = GROUP C BY CustomerId;
E = FOREACH D GENERATE group AS CustomerId, BagToString(C.CustomerText, '|'); -- Note:This will generate 1001,SPRINT56|Unlimited Plan|2017-05-05 so,you will have to concat the first field with '|' and then concat the resulting field with the second field which is already delimited by '|'.
F = FOREACH E GENERATE CONCAT(CONCAT($0,'|'),$1);
DUMP F;

Related

DAX expression for ROW_NUMBER() PARTITION BY ORDER BY equivalent

I have a SQL statement like this:
(ROW_NUMBER() OVER (PARTITION BY a.[market], [MEASURE_TYPE]
ORDER BY AM, REP, ORDER_KEY)) AS ORDER_KEY
I want to write a DAX to implement the above SQL statement.

This is not as simple in DAX as in SQL. Here is an example:
Order Key Within Partition =
VAR CurrentMarket = [Market]
VAR CurrentMeasureType = [MeasureType]
VAR CurrentAM = [AM]
VAR CurrentREP = [REP]
VAR CurrentOrderKey = [OrderKey]
VAR CurrentPartition = FILTER (
a, -- the table name
[Market] = CurrentMarket
&& [MeasureType] = CurrentMeasureType
)
RETURN SUMX (
CurrentPartition,
IF (
ISONORAFTER (
CurrentAM, [AM], ASC,
CurrentREP, [REP], ASC,
CurrentOrderKey, [OrderKey], ASC
),
1
)
)
EDIT: Power Query would be better to achieve this.
let
/* Steps so far */
Source = ...,
...
a = ...,
/* End of steps so far */
/* Add steps below to add Order Key Within Partition column */
Partitions = Table.Group(
a,
{"Market", "MeasureType"}, {"Partition", each _}
)[Partition],
AddedOrderKeys = List.Transform(
Partitions,
each Table.AddIndexColumn(
Table.Sort(_, {"AM", "REP", "OrderKey"}),
"Order Key Within Partition",
1
)
),
Result = Table.Combine(AddedOrderKeys)
in
Result

I contribute with an alternative solution to the RANKX. The answer containing the Power Query is the correct one because avoid using calculated columns.
Sales[Sequence by Customer] =
VAR CurrentDate = Sales[Date]
VAR CurrentTime = Sales[Time]
RETURN COUNTROWS (
    FILTER (
        CALCULATETABLE (
            Sales,
            ALLEXCEPT ( Sales, Sales[Customer] )
        ),
        Sales[Date] < CurrentDate
          || ( Sales[Date] = CurrentDate
               && Sales[Time] <= CurrentTime )
    )
)
Source

Filter Data based on priority in LINQ

The fields in my Rate Table are Route, VehicleMasterId, VehicleType, Material, Client, UnitRate etc.
The priority order on which I have to fetch a single row is : VehicleNo > Route > Client > VehicleType, Material
Suppose I have 2 rows with same data except 1 has Client and Vehicle Type and the other one has VehicleNo.Then based on my priority, I should pick the rate of the row with VehicleNo.
To excute this In linq I have first picked all the rows with matching data. Here is my code.
public RateMasterDataModel GetRateMasterforCn(Consignment cn){
// I will always pass all the above fields in cn
var rateMaster = (from rate in Context.RateMaster
where rate.FromDate <= cn.Cndate
&& rate.ToDate >= cn.Cndate
&& (rate.VehicleTypeId != null ? rate.VehicleTypeId == cn.VehicleTypeId : true)
&& (rate.VehicleMasterId != null ? rate.VehicleMasterId == cn.VehicleMasterId : true)
&& (rate.ClientId != null ? rate.ClientId == cn.ClientId : true)
&& (rate.RouteId != null ? rate.RouteId == cn.RouteId : true)
&& (rate.MaterialMasterId != null ? rate.MaterialMasterId == cn.MaterialMasterId : true)
select new RateMasterDataModel
{
RateMasterId = rate.RateMasterId,
FromDate = rate.FromDate,
ToDate = rate.ToDate,
ClientId = rate.ClientId ,
VehicleMasterId = rate.VehicleMasterId,
VehicleTypeId = rate.VehicleTypeId,
MaterialMasterId = rate.MaterialMasterId,
UnitRate = rate.UnitRate,
LoadTypeId = rate.LoadTypeId,
LoadingPointId = rate.RouteId,
CalculationMasterId = rate.CalculationMasterId
}).ToList();
}
Please suggest how to filter after this.

You can use below code to get records ordered by VehicleNo > Route
.OrderBy(v=>v.VehicleNo).ThenBy(r=>r.RouteId)
Add multiple .ThenBy() clause as per your column requirement for sorting the data.

You mean to say if the row which doesn't have the vehicalno. filld-up then the row having Route must be selected.is it correct?

join, group by and count sql to linq conversion

Query:
SELECT COUNT(*) as Avalied
FROM LeaveMaster as lm
JOIN LeaveSubsidary as ls
ON lm.TransMasterId = ls.TransMasterId
WHERE lm.SystemCode = 'abc123' AND lm.EmployeeCode = '0014' AND lm.Status ='Approved'
Group by ls.Leave_Type
How would you convert the above sql query to linq? Thanks :)
Edit :
from p in db.PY_LeaveTransactionMasterTAB.AsEnumerable()
join e in db.PY_LeaveTransactionSubsidaryTAB.AsEnumerable() on p.LV_TransMasterId equals e.LV_TransMasterId into grp
where p.SystemCode == SysCode && p.EmployeeCode == EmpCode && p.Status == "Approved"
group grp by e.Leave_Type // This line is invalid Why?
The above is what I have tried so far, why can't I do e.Leave_Type or what is the right way to do group grp by e.Leave_Type?

You could try this:
(from p in db.PY_LeaveTransactionMasterTAB
join e in db.PY_LeaveTransactionSubsidaryTAB on p.LV_TransMasterId equals e.LV_TransMasterId
where p.SystemCode == SysCode && p.EmployeeCode == EmpCode && p.Status == "Approved"
group new {p,e} by new {e.Leave_Type } into gr
select new
{
Leave_Type = gr.Key.Leave_Type ,
Count = gr.Count()
}).ToList();

Linq union all equivalent of sql code

I have the following queries:
var majorClients = maj in dbContext.MajorClients
where (maj.startdate > startDate)
where (maj.status == "Active")
Select new Client{EntityPK = maj.mjPrimaryKey,Name = maj.name, Type = "Maj"};
var minorClients = min in dbContext.MinorClients
where (min.startdate > startDate)
where (min.status == "Active" || min.status== "Inactive")
Select new Client{EntityPK = min.mnPrimaryKey,Name = min.name, Type = "Min"};
There are clients that could appear in both major and minor tables. I would like to return a list of all occurrences of clients in both tables, however if there are matching clients by name, then I would only want to return the matching record from the majorClients table.
I have written a sql query to return the results:
SELECT mjPrimaryKey AS EntityPK,name,'Maj' AS TYPE
FROM majorClients
WHERE status = 'Active' AND startDate > #startDate
UNION ALL
SELECT mnPrimaryKey,name,'Min' FROM minorClients
WHERE status IN ('Active','Inactive') AND startDate > #startDate
WHERE name NOT IN (SELECT name FROM majorClients WHERE status = 'Active' AND startDate > #startDate)
How would I represent this query in linq?

Try this linq. To exclude duplicates from minorClients, I've used Contains method. To union all objects - Union method:
var majorClients = from maj in dbContext.MajorClients
where maj.startdate > startDate
&& maj.status == "Active"
select new Client
{
EntityPK = maj.mjPrimaryKey,
Name = maj.name,
Type = "Maj"
};
var minorClients = from min in dbContext.MinorClients
where min.startdate > startDate
&& min.status == "Active" || min.status== "Inactive"
&& !(from maj in dbContext.MajorClients
where maj.startdate > startDate
&& maj.status == "Active"
select maj.name).Contains(min.Name)
select new Client
{
EntityPK = min.mnPrimaryKey,
Name = min.name,
Type = "Min"
};
var allClients = majorClients.Union(minorClients);

Convert difficult ordered SQL statement to LINQ (edmx)

Need help with converting next statement to LINQ:
SELECT * FROM comments
WHERE good_id = '19' AND allow = '1'
ORDER BY IF(parent_id = 0, id, parent_id) DESC, id ASC
It's statement show the comment in next order list:
--Comment
--Subcomment (if parent_id != 0)
--Subcomment (if parent_id != 0)
--Comment
--Subcomment (if parent_id != 0)
etc.
But I don't know how to implement it on LINQ. Any ideas?

have not compiled this but I think I am on the right path.
var comments = from c in db.Comments
where c.good_id == 19 && c.allow = "1"
orderby myFunction(c.parent_id, c.id) descending, id ascending
select comments;
public int myFunction(int parentID, in ID)
{
return parent_id == 0 ? id : parent_id;
}

If models have relationships set up correctly then you don't need to do anything fancy
var comments = dataCoontext.Comments
.Where(c => c.GoodId = 19 &&
c.Allow = 1 &&
c.ParentId = 0) //Get top level comments
.OrderBy(c => c.Id);
foreach(var comment in comments){
Console.WriteLine("Comment:" + comment.Text);
GetSubComments(comment);
}
Then use the following
public void GetSubComments(Comment comment){
foreach(var subComment in comment.Children){ //You can apply any sub ordering to Children
Console.WriteLine("Sub comment:" + subComment.Text);
GetSubComments(subComment);
}
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Conditional statements in PIG - apache-pig

Related

DAX expression for ROW_NUMBER() PARTITION BY ORDER BY equivalent

Filter Data based on priority in LINQ

join, group by and count sql to linq conversion

Linq union all equivalent of sql code

Convert difficult ordered SQL statement to LINQ (edmx)

Categories

Resources