SQL Query using Pivot - sql

I have a table as follows:
PriorityText Priority LoRes Partial Unknown N_A HiRes
------------------------------------------------------------------
Very High 5 0.0612 0.0000 0.0612 0.0612 0.2041
High 4 0.1429 0.0000 0.1633 0.0000 0.1633
Medium 3 0.0000 0.0000 0.1020 0.0000 0.0408
Low-Medium 2 0.0000 0.0000 0.0000 0.0000 0.0000
Low 1 0.0000 0.0000 0.0000 0.0000 0.0000
I am tying to transpose the table into this:
PriorityText Low Low-Medium Medium High Very High
--------------------------------------------------------
Priority 1 2 3 4 5
LoRes 0 0 0 0.1429 0.0612
Partial 0 0 0 0 0
Unknown 0 0 0.102 0.1633 0.0612
N_A 0 0 0 0 0.0612
HiRes 0 0 0.0408 0.1633 0.2041
I am using SQL 2008. I am having trouble coming up with the SQL syntax to perform a pivot on the data.
Can someone please share a SQL snippet that will solve this for me?
I have used the following to successfully pivot one row, but I do not know how to make it do all my rows.
SELECT VeryHigh AS VeryHigh,
High AS High,
Medium AS Medium,
[Low-Medium] AS [Low-Medium],
Low AS Low
FROM (SELECT [PriorityText], [LoRes] FROM #tbTemp) p
PIVOT (SUM(LoRes) FOR [PriorityText] in ([VeryHigh], [High], [Medium], [Low-Medium], [Low])) pvt
My test data in my table is as follows:
Priority PriorityText LoRes Partial Unknown N_A HiRes
1 VeryHigh 0.05 11 54 0 9
2 High 0.14 22 54 0 3
3 Medium 0.07 33 65 0 7
4 Low-Medium 0.01 44 87 0 4
5 Low 0 55 9 0 0
NULL NULL NULL NULL NULL NULL NULL
Thank for any help!!

Not sure how hardcoded you can make this, but what your wanting to do is Transpose the Rows with the Columns.
SELECT p.[Type] as 'PriorityText', Low, [Low-Medium], Medium,High,[VeryHigh]
FROM (SELECT PriorityText, [Holder],[Type]
FROM (SELECT PriorityText,Priority,LoRes,[Partial],Unknown,N_A,HiRes
FROM Test) as sq_source
UNPIVOT ([Holder] FOR [Type] IN
(Priority,LoRes,[Partial],Unknown,N_A,HiRes)) as sq_up
) as sq
PIVOT (
MIN([Holder])
FOR PriorityText IN
(VeryHigh,High,Medium,[Low-Medium],Low)
) as p
order by CASE p.[Type] WHEN 'Priority' THEN 1
WHEN 'LoRes' THEN 2
WHEN 'Partial' THEN 3
WHEN 'Unknown' THEN 4
WHEN 'N_A' THEN 5
ELSE 6 END ASC;
This should get you what you need. One thing to note is this only works if the columns:
Priority
LoRes
Partial
Unknown
N_A
HiRes
are of the same data type (in my test it was decimal(5,4)). If yours are deferent then you will need to do an initial select and convert them to a common data type and use that as the sq_source.

Related

Reindex kmeans clustered dataframe in an ascending order values

I have created a set of 4 clusters using kmeans, but I'd like to reorder the clusters in an ascending manner to have a predictable way of outputting an analysis every time the script is executed.
The resulting df with the clusters is something like:
customer_id recency frequency monetary_value recency_cluster \
0 44792907512250289 21 1 43.76 0
1 4277896431638207047 443 1 73.13 1
2 1509512561185834874 559 1 37.50 1
3 -8259919882769629944 437 1 34.38 1
4 8269311313560571571 133 2 324.78 0
5 6521698907264712834 311 1 6.32 3
6 9102795320443090762 340 1 174.99 3
7 6203217338400763719 39 1 77.50 0
8 7633758030510673403 625 1 95.26 2
9 -2417721548925747504 644 1 76.84 2
frequency_cluster monetary_value_cluster
0 1 0
1 1 0
2 1 0
3 1 0
4 0 1
5 1 0
6 1 1
7 1 0
8 1 0
9 1 0
The recency clusters are not sorted by the data, I'd like for example that the recency cluster 0 to be the one with the min value = 1.0 (recency cluster 1).
recency_cluster count mean std min 25% 50% 75% max
0 17609.0 700.900960 56.895995 609.0 651.0 697.0 749.0 807.0
1 16458.0 102.692672 62.952229 1.0 47.0 101.0 159.0 210.0
2 17166.0 515.971746 56.592490 418.0 466.0 517.0 567.0 608.0
3 18634.0 317.599227 58.852980 211.0 269.0 319.0 367.0 416.0
Using something like:
rfm_df.groupby('recency_cluster')['recency'].transform('min')
Will return a colum with the min value of each clusters
0 1
1 418
2 418
3 418
4 1
...
69862 609
69863 1
69864 211
69865 609
69866 211
I guess there's got to be a way to convert this categories [1,211,418,609] into [0, 1, 2, 3] in order to get the desired result but I can't come up with a solution.
Or maybe there's a better approach to the problem.
Edit: I did this and I think it's working:
rfm_df['recency_normalized_cluster'] = rfm_df.groupby('recency_cluster')['recency'].transform('min').astype('category').cat.codes
rfm_df['recency_normalized_cluster'] = rfm_df.groupby('recency_cluster')['recency'].transform('min').astype('category').cat.codes

Sklearn only predicts one class while dataset is fairly balanced (±80/20 split)

I am trying to come up with a way to check what are the most influential factors of a person not paying back a loan (defaulting). I have worked with the sklearn library quite intensively, but I feel like I am missing something quite trivial...
The dataframe looks like this:
0 7590-VHVEG Female Widowed Electronic check Outstanding loan 52000 20550 108 0.099 288.205374 31126.180361 0 No Employed No Dutch No 0
1 5575-GNVDE Male Married Bank transfer Other 42000 22370 48 0.083 549.272708 26365.089987 0 Yes Employed No Dutch No 0
2 3668-QPYBK Male Registered partnership Bank transfer Study 44000 24320 25 0.087 1067.134272 26678.356802 0 No Self-Employed No Dutch No 0
The distribution of the "DefaultInd" column (target variable) is this:
0 0.835408
1 0.164592
Name: DefaultInd, dtype: float64
I have label encoded the data to make it look like this, :
CustomerID Gender MaritalStatus PaymentMethod SpendingTarget EstimatedIncome CreditAmount TermLoanMonths YearlyInterestRate MonthlyCharges TotalAmountPayments CurrentLoans SustainabilityIndicator EmploymentStatus ExistingCustomer Nationality BKR_Registration DefaultInd
0 7590-VHVEG 0 4 2 2 52000 20550 108 0.099 288.205374 31126.180361 0 0 0 0 5 0 0
1 5575-GNVDE 1 1 0 1 42000 22370 48 0.083 549.272708 26365.089987 0 1 0 0 5 0 0
2 3668-QPYBK 1 2 0 4 44000 24320 25 0.087 1067.134272 26678.356802 0 0 2 0 5 0
After that I have removed NaNs and cleaned it up some more (removing capitalizion, punctuation etc)
After that, I try to run this cell:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
y = df['DefaultInd']
X = df.drop(['CustomerID','DefaultInd'],axis=1)
X = X.astype(float)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.20,random_state=42)
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)
print(classification_report(y_test, y_pred))
Which results in this:
precision recall f1-score support
0 0.83 1.00 0.91 1073
1 0.00 0.00 0.00 213
accuracy 0.83 1286
macro avg 0.42 0.50 0.45 1286
weighted avg 0.70 0.83 0.76 1286
As you can see, the "1" class does not get predicted 1 time, I am wondering whether or not this behaviour is to be expected (I think it is not). I tried to use class_weightd = ‘balanced’, but that resulted in an average f1 score of 0.59 (instead of 0.76)
I feel like I am missing something, or is this kind of behaviour expected and should I rebalance the dataset before fitting? I feel like the division is not that skewed (±80/20), there should not be this big of a problem.
Any help would be more than appreciated :)

Select query when one column value is equal with column name

So I have this query where I need to select subtraction value of parent with busines of the codes.
For example let's say in the container table we have Parent value 0.39 and the child value 0.7 than the value fromthe selection would be -0.31‬.
Now this value (-0.31‬) I need to multiply it with the value of Quality column which is found in another table. Than I need the top 3 values. That means ordering by desc ofcourse.
But ofcourse it should be multiplied when NetNames is equal with BetNames and Codes column value is equal with one the columns in table container. (Q_1, Q_2, Q_3).
I'm lost here guys.
Below is info about my tables.
/*Table Container*/
BetNamesID | Parent_B_Id | Year | Month | Q_1 | Q_2 | Q_3
1 null 2020 5 0.36 0.3 0.21
6 2 2020 8 0.39 0.64 1.0
7 1 2020 9 0.76 0.65 0.29
8 3 2020 13 0.62 0.34 0.81
9 2 2020 2 0.28 0.8 1.0
/*Table Configuration*/
NetNames | Codes | Quality
Test 1 Q_1 5
Test 2 Q_5 7
Test 3 Q_2 24
Test 4 Q_3 98
Test 5 Q_4 22
/*Table BetNames Info*/
ID | Parent_B_Id | Name
1 null Test 1
6 2 Test 2
7 1 Test 3
8 3 Test 4
9 2 Test 5
What I have done until now is this query :
SELECT
child.[BetNamesID],
child.[Parent_B_Id],
child.[Q_1] - parent.[Q_1] AS Q_1,
child.[Q_2] - parent.[Q_2] AS Q_2,
child.[Q_3] - parent.[Q_3] AS Q_3,
// this is just a test case.. this is how it is supposed in my mind(child.[Q_3] - parent.[Q_3]) * qualityvalue AS Q_3, //this is how it is supposed to be
, n.name
FROM [dbo].[Container] child
join [dbo].[Container] parent on child.Parent_B_Id = parent.BetNamesID
join dbo.NetNames n on n.id = parent.Parent_B_Id //with this I get the names for BetNamesID
And this is the result of my query until now:
BetNamesID | Parent_B_Id | Q_1 | Q_2 | Q_3
3 2 0.21 -0.3 -0.1
5 4 -0.39 0.64 -0.9
8 5 0.99 0.65 0.59
What I need now is to multiply the values of Q_1, Q_2, Q_3 columns, with the values found in Config table (Quality column), only when BetNames is equal with NetNames and Codes row value is equal with Q_1 or Q_2 or Q_3 column.
These are the expected values.
BetNamesID | Parent_B_Id | Q_1 | Q_2 | Q_3
3 2 1.05‬(0.21 * 5) -7.2(-0.3* 24) -9.8 (-0.1* 98)
5 4 1.95(0.39*5) 15.36(0.64*24) -88.2 (-0.9*98)
How does the new Table come in play? How can I join? How does the where condition work in this case?

SUM in SQL Server with PARTITION BY clause

I have the following table
QuotationId QuotationDetailId DriverId RangeFrom RangeTo FixedAmount UnitAmount
-------------------------------------------------------------------------------------------
10579 7 1 1 1 1154.00 0.00
10579 7 2 2 2 1731.00 0.00
10579 11 1 0 10 0.00 88.53
10579 11 2 11 24 885.30 100.50
10579 11 3 25 34 2292.30 88.53
I need to write a query in SQL Server with the following logic,
The grouping is QuotationId + QuotationDetailId.
For each of this block I need to sum from the second line on the value of the previous line for fixed
Amount + UnitAmount * RangeFrom + FixedAmount of the current row
So in this case the resulting output should be
QuotationId QuotationDetailId DriverId RangeFrom RangeTo FixedAmount UnitAmount
10579 7 1 1 1 1154.00 0.00
10579 7 2 2 2 2885.00 0.00
10579 11 1 0 10 0.00 88.53
10579 11 2 11 24 1770.60 100.50
10579 11 3 25 34 7174.90 88.53
I've tried several queries but without success, can someone suggest me a way to do that ?
Best regards
Fabrizio
In SQL Server 2012+, you can do a cumulative sum. I'm not sure exactly what the logic is you want, but this seems reasonable given the data set:
select t.*,
sum(FixedAmount*UnitAmount) over (partition by QuotationId, QuotationDetailId
order by DriverId
) as running_sum
from t;
you can use a subquery, your 'amount' column would appear on the list of columns as a query in brackets,
SELECT ...fields...,
(SELECT SUM(A.unitAmount * A.RangeFrom + A.fixedAmount)
From YourTable A
WHERE A.QuotationId = B.QuotationId
AND A.QuotationDetailId = B.QuotationDetailId
AND A.DriverId <= B.DriverId) AS Amount
From YourTable B

Convert rows in columns with informix query

I want to convert
inpvacart inpvapvta inpvapvt1 inpvapvt2 inpvapvt3 inpvapvt4
CS-279 270.4149 0.0000 0.0000 0.0000 0.0000
AAA5030 1.9300 1.9300 1.6212 0.0000 0.0000
Query
select
inpvacart,
inpvapvta,
inpvapvt1,
inpvapvt2,
inpvapvt3,
inpvapvt4
from inpva;
into this
inpvacart line value
CS-279 1 270.4149
CS-279 2 0.00000
CS-279 3 0.00000
CS-279 4 0.00000
CS-279 5 0.00000
AAA5030 1 1.9300
AAA5030 2 1.9300
AAA5030 3 1.6212
AAA5030 4 0.0000
AAA5030 5 0.0000
I have tried this
select s.inpvacart,l.lista,l.resultados
from inpva as s,
table(values(1,s.inpvapvta),
(2,s.inpvapvt1),
(3,s.inpvapvt2),
(4,s.inpvapvt3),
(5,s.inpvapvt4))
)as l(lista,resultados);
But it does not work in informix 9
Is there a way to transpose rows to columns?
Thank You
I don't think Informix has any unpivot operator to transpose columns to rows like for instance MSSQL does, but one way to do this is to transpose the columns manually and then use union to create a single set like this:
select inpvacart, 1 as line, inpvapvta as value from inpva
union all
select inpvacart, 2 as line, inpvapvt1 as value from inpva
union all
select inpvacart, 3 as line, inpvapvt2 as value from inpva
union all
select inpvacart, 4 as line, inpvapvt3 as value from inpva
union all
select inpvacart, 5 as line, inpvapvt4 as value from inpva
order by inpvacart, line;
It's not very pretty but it should work.