I have this table as per below:
[Table A][1]
B_NUMBER_COUNTRY OUTGOING_CARRIER MINUTES
CAN A 1,045.71
CAN B 7.98
CAN C 973.52
FRA A 566.19
FRA B 521.52
FRA C 27.03
FRA D 549.14
FRA E 0.21
USA A 32.57
USA B 303.17
USA C 9,837.53
USA D 3.91
USA E 0.07
USA F 2,469.00
USA G 67.68
USA H 0.37
USA I 933.72
I need to rank b_number_country based on the sum of minutes.
In the above case, the total minutes for USA is 13K, CAN is 2K and FRA is 1.6K. So the ranking should be USA - 1, CAN - 2 and FRA - 3. By adding the rank column, it should be showing as per below:
[Table A (rank)][2]
B_NUMBER_COUNTRY OUTGOING_CARRIER MINUTES RANK
CAN A 1,045.71 2
CAN B 7.98 2
CAN C 973.52 2
FRA A 566.19 3
FRA B 521.52 3
FRA C 27.03 3
FRA D 549.14 3
FRA E 0.21 3
USA A 32.57 1
USA B 303.17 1
USA C 9,837.53 1
USA D 3.91 1
USA E 0.07 1
USA F 2,469.00 1
USA G 67.68 1
USA H 0.37 1
USA I 933.72 1
I am unable to get the right query to do this. Every attempt, it includes both b_number_country and outgoing_carrier as part of the ranking.
Edited based on comment:
You need two steps, calculate the sum of the minutes first and then rank them:
SELECT ...,
DENSE_RANK()
OVER (ORDER BY sumMinutes DESC) -- must be DENSE_RANK
FROM
(
SELECT b_number_country, interval_of_day, outgoing_carrier,
SUM (call_duration)/60 AS Minutes,
SUM (call_count) AS attempt,
SUM (answered_count) AS answered,
SUM (seizure_count) AS seizure,
SUM(start_call_count) AS Count_X,
SUM(ner_count) AS NER_COUNT,
SUM(SUM (call_duration)/60)
OVER (PARTITION BY B_NUMBER_COUNTRY) AS sumMinutes
FROM bm_archived_cdr
WHERE call_direction = 'O'
AND call_date = DATE '2016-04-21'
GROUP BY b_number_country, interval_of_day, outgoing_carrier
) dt;
Related
Trying to learn pandas using English football scores.
Here is part of a list of football matches in date order.
"FTR" is the Full Time Result: "A" - win for the away team, "H" - win for the home team, "D"- a draw.
I created columns "HTWTD" - home team wins to date, and "ATWTD" - away team wins to date, to hold the number of wins the home and away teams have had up until that point. I populated the columns with 0s then put a 1 in the HTWTD when the FTR was H, and a 1 in the ATWTD where the FTR was A. This obviously only produces correct data for the first time each team plays.
When we get to row 9, Leeds wins a match having already won one in row 2. The HTWTD in row 9 should read 2 i.e at this point Leeds has won 2 games.
To my untrained mind the process should be...
Look at the row above, if Leeds features, get the corresponding HTWTD or ATWTD score, add 1 to it and put it in the current row HTWTD or ATWTD column. If Leeds doesn't feature (and you are not at the first row) go up one row.
Having googled around I haven't found anything about how to select only rows above current row, then alter entry in current row depending on test on selected rows.
I could probably write a little python function to do this, but is there a pandas way to go about it?
Row
Date
HomeTeam
AwayTeam
FTR
HTWTD
ATWTD
0
12/09/2020
Fulham
Arsenal
A
0
1
1
12/09/2020
Crystal Palace
Southampton
H
1
0
2
12/09/2020
Liverpool
Leeds
H
0
1
3
12/09/2020
West Ham
Newcastle
A
0
1
4
13/09/2020
West Brom
Leicester
A
0
1
5
13/09/2020
Tottenham
Everton
A
0
1
6
14/09/2020
Brighton
Chelsea
A
0
1
7
14/09/2020
Sheffield United
Wolves
A
0
1
8
19/09/2020
Everton
West Brom
H
1
0
9
19/09/2020
Leeds
Fulham
H
1
0
IIUC, you can use .eq() to return a boolean series of True or False for the condition and then use .cumsum() to cumulatively get the sum of the True values per HomeTeam and AwayTeam group result with a .groupby:
df['home_wins'] = df['FTR'].eq('H')
df['away_wins'] = df['FTR'].eq('A')
df['HTWTD'] = df.groupby('HomeTeam')['home_wins'].cumsum()
df['ATWTD'] = df.groupby('AwayTeam')['away_wins'].cumsum()
df.drop(['home_wins', 'away_wins'], axis=1)
Out[1]:
Row Date HomeTeam AwayTeam FTR HTWTD ATWTD
0 0 12/09/2020 Fulham Arsenal A 0 1
1 1 12/09/2020 Crystal Palace Southampton H 1 0
2 2 12/09/2020 Liverpool Leeds H 1 0
3 3 12/09/2020 West Ham Newcastle A 0 1
4 4 13/09/2020 West Brom Leicester A 0 1
5 5 13/09/2020 Tottenham Everton A 0 1
6 6 14/09/2020 Brighton Chelsea A 0 1
7 7 14/09/2020 Sheffield United Wolves A 0 1
8 8 19/09/2020 Everton West Brom H 1 0
9 9 19/09/2020 Leeds Fulham H 1 0
I’m trying to combine table1 (with columns “id”, “state”, “cost”) and table2 (with same columns “id”, “state”, “cost”) such that all rows are combined into a single table3. There may be duplicates for “id” and/or “state” but all duplicates should be retained.
Table1:
Id
State
Cost
1
IL
50
3
CA
10
2
WY
70
Table2:
Id
State
Cost
4
NY
100
3
PA
15
6
FL
5
Goal table3:
Id
State
Cost
1
IL
50
3
CA
10
2
WY
70
4
NY
100
3
PA
15
6
FL
5
This would be a simple rbind using R but I’m probably overthinking it in Teradata.
Thanks!
I am trying to combine two tables but my results are showing incorrect data from one table.
One example is my first table has the following:
PositionCode AcctUnit BudgetFTE
JDFF HRT 2
my second table does not contain any record for PositionCode=JDFF and AcctUnit=HRT however I get the following results with running the query below:
PositionCode AcctUnit BudgetFTE ActualFTE VarianceFTE
JDFF HRT 2 1 -1
sql code:
SELECT
b.PositionCode,
b.AcctUnit,
e.POSITION,
e.HM_ACCT_UNIT,
b.FTE AS BudgetFTE,
sum(e.FTE) AS ActualFTE,
sum(e.FTE) - b.FTE AS VarianceFTE
FROM
QryBudgetRollUp AS b
LEFT JOIN ActiveEmployees AS e ON
( b.PositionCode = e.POSITION ) AND
( e.HM_ACCT_UNIT = b.AcctUnit )
GROUP BY
b.PositionCode,
b.AcctUnit,
e.POSITION,
e.HM_ACCT_UNIT,
b.FTE,
e.FTE - b.FTE
sample data:
QryBudgetRollup
PositionCode AcctUnit FTE
JDFF HRT 2
VIPP HRT 1
HROPSA CMP 1
ActiveEmployees
C E CE LAST_NAME FIRST_NAME EMP PROCESS HM_ACCT_UNIT D POSITION D;P' Date SC Expr1013 AP PG HC FTE PE
2 2343 22343 Doe John FT CHRE CMP CMP HROPSA CMP;HROPSA 2/15/1999 H $20.00 $40,000.00 4 1 1 $4,000.00
2 2515 22515 Jetson George PT CHRE CMP CMP HROPSA CMP;HROPSA 4/22/2014 H $10.00 $20,000.00 2 1 0.5 $2,000.00
4 18 418 Doe Jane FT CSIS HRT HRT VIPP HRT;VIPP 11/1/2002 S $40.00 $80,000.00 7 1 1 $8,000.00
Desired Query Results
PositionCode AcctUnit BudgetFTE ActualFTE VarianceFTE
JDFF HRT 2 0 2
VIPP HRT 1 1 0
HROPSA CMP 1.5 2 -0.5
I have the following dataset:
Name Address Bank_Account Ph_NO IP_Address Chargeoff
AJ 12 ABC Street 1234 369 12.12.34 0
CK 12 ABC Street 1234 450 12.12.34 1
DN 15 JMP Street 3431 569 13.8.09 1
MO 39 link street 8421 450 05.67.89 1
LN 12 ABC Street 1234 340 14.75.06 1
ST 15 JMP Street 8421 569 13.8.09 0`
Using this dataset I want to create the below view in SAS:
Name CountOFAddr CountBankacct CountofPhone CountOfIP CountCharegeoff
AJ 3 3 1 2 2
CK 3 3 2 2 3
DN 2 1 2 2 1
MO 1 2 2 1 2
LN 3 3 1 1 2
ST 2 2 2 2 2
The output variables indicates as follows :
-CountOfAddr : For AJ countOFAddr is 3 which means that AJ Shares its address with itself, CK and LN
-CountBankAcct : For MO count of BankAcct is 2 which means that MO Shares its bank account number with itself and ST.Similarly for variables CountofPhone and CountOfIP.
-CountChargeoff: This one is a little tricky it basically implies that AJ is Linked to CK And LN through address...and both CK and LN have been charged off so the countChargeoff for AJ is 2.
For CK the countChargeOff is 3 because it is linked with itself, MO through Bank Account, and LN/AJ through street address...so total chargeoff in CK's Network is 3(CO count of AJ+CO count of CK+CO Count of MO+CO count of LN)
I currently work as a Risk Analyst in a Financial Service Firm and the code for this problem may help us to significantly reduce funding of fraudulent accounts.
Thanks.
SQL Fiddle Demo
SELECT
Name,
(SELECT Count(Address)
FROM dataset d2
WHERE d1.Address = d2.Address
) CountOFAddr,
(SELECT Count(Bank_Account)
FROM dataset d2
WHERE d1.Bank_Account = d2.Bank_Account
) CountBankacct,
(SELECT Count(Ph_NO)
FROM dataset d2
WHERE d1.Ph_NO = d2.Ph_NO
) CountofPhone,
(SELECT Count(IP_Address)
FROM dataset d2
WHERE d1.IP_Address = d2.IP_Address
) CountOfIP,
(SELECT count(d2.Chargeoff)
FROM dataset d2
WHERE d1.name <> d2.name
and ( d1.Address = d2.Address
or d1.Bank_Account = d2.Bank_Account
or d1.Ph_NO = d2.Ph_NO
or d1.IP_Address = d2.IP_Address
)
) CountCharegeoff
FROM dataset d1
I Include the charge off calculation.
Bring all d2 <> d1.name where have any field in common. Then count that.
I have 4 tables:
Teams
codTeam: 1
year: 1995
codYears: 1
codType: 1
name: FCP
points: 3
codTeam: 2
year: 1990
codYears: 1
codType: 1
name: SLB
points: 3
codTeam: 3
year: 1995
codYears: 3
codType: 2
name: BCP
points: 0
Trainers (People who train a team)
codTrainer: 1
name: Peter
street: Ghost street
cellphone: 252666337
birthdayDate: 1995-02-01
BI: 11111111
number: 121212121
codTrainer: 1
name: Pan
street: Ghost street Remade
cellphone: 253999666
birthdayDate: 1995-01-01
BI: 22222222
number: 212121212
TeamsTrainers (In which team is someone training)
codTeamTrainer: 1
codTeam: 1
codTrainer: 2
dataInicio: 1998-05-05
codTeamTrainer: 2
codTeam: 2
codTrainer: 2
dataInicio: 1998-06-07
codTeamTrainer: 3
codTeam: 2
codTrainer: 1
dataInicio: 1999-09-09
Games
codGame: 1
date: 2015-02-12 13:00:00
codTeamHome: 1
codTeamAgainst: 2
goalsHome: 3
goalsAgainst: 2
codTypeGame: 1
codGame: 2
date: 2015-02-12 15:00:00
codTeamHome: 2
codTeamAgainst: 1
goalsHome: 1
goalsAgainst: 2
codTypeGame: 3
So basically I want to:
Get the table Games and show:
Team Name | Trainer Name | Goals Home | Goals Against | Points | Ammout of Games from the Home Team
I have the following code for that in SQLQuery:
SELECT Teams.name, Trainers.name, Games.goalsHome,
Games.goalsAgainst, Teams.points, COUNT(*)
FROM Teams, Trainers, Games, TeamsTrainers
WHERE Games.codTeamHome = Teams.codTeam AND
TeamsTrainers.codTeam = Teams.codTeam AND
TeamsTrainers.codTrainer = Trainers.codTrainer
GROUP BY Teams.name, Trainers.name, Games.goalsHome,
Games.goalsAgainst, Teams.points
(May have some errors as I translated)
Yet, the COUNT only shows 1 (Probably because on the WHERE it has "teamHome" so it only counts 1), yet, if it's because of that, how do I fix it?
Result:
FCP | Pan | 3 | 2 | 3 | 1 (Count)
SLB | Peter | 1 | 2 | 3 | 1 (Count)
SLB | Pan | 1 | 2 | 3 | 1 (Count)
It should be 2 for each one on the Count
Any idea?
The reason you get wrong result is of wrong joing data type. You should use repsectivelly: left, right or inner join instead of joing data via using where clause. Your data model provides 1 to N relationship, so you should use specific type of join.
See: http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
EDIT
SELECT Te.name, Tr.name, Ga.goalsHome, Ga.goalsAgainst, Te.points,
(SELECT COUNT(*)
FROM Games
WHERE codTeamHome = Te.codTeam OR codTeamAgainst = Te.codTeam)
AS CountOfGames
FROM TeamsTrainers AS Tt
LEFT JOIN Teams AS Te ON Tt.codTeam = Te.codTeam
LEFT JOIN Trainers AS Tr ON Tt.codTrainer = Tr.codTrainer
LEFT JOIN Games AS Ga ON Ga.codTeamHome = Te.codTeam
SQL Fiddle
You can change your WHERE clause by saying
[what you have] OR (Games.codTeamAgainst = Teams.codTeam AND ...)
However, this probably causes other problems because you probably care about whether a particular team scores the goals, not whether the home team scores the goals in games that team plays on either side.
You might not notice the other problems for a while because your GROUP BY clause is probably pretty far from what you want, and you might want to be selecting aggregate functions for a much simpler grouping.