SQL Server 2008 pivot without aggregate - sql

I have table to test score data that I need to pivot and I am stuck on how to do it.
I have the data as this:
gradelistening speaking reading writing
0 0.0 0.0 0.0 0.0
1 399.4 423.8 0.0 0.0
2 461.6 508.4 424.2 431.5
3 501.0 525.9 492.8 491.3
4 521.9 517.4 488.7 486.7
5 555.1 581.1 547.2 538.2
6 562.7 545.5 498.2 530.2
7 560.5 525.8 545.3 562.0
8 580.9 548.7 551.4 560.3
9 602.4 550.2 586.8 564.1
10 623.4 581.1 589.9 568.5
11 633.3 578.3 598.1 568.2
12 626.0 588.8 600.5 564.8
But I need it like this:
gr0 gr1 gr2 gr3 gr4 gr5 gr6 gr7 ...
listening 0.0 399.4 461.6 501.0 521.9 555.1 562.7 560.5 580.9...
speaking 0.0 423.8...
reading 0.0 0.0 424.2...
writing 0.0 0.0 431.5...
I don't need to aggregate anything, just pivot the data.

The following is one way to solve the problem, but I am not sure if it is the most efficient.
DECLARE #PivotData table(grade int, listening float, speaking float, reading float, writing float)
INSERT into #PivotData
SELECT 0, 0.0, 0.0, 0.0, 0.0 UNION ALL
SELECT 1, 399.4, 423.8, 0.0, 0.0 UNION ALL
SELECT 2, 461.6, 508.4, 424.4, 431.5 UNION ALL
SELECT 3, 501.0, 525.9, 492.8, 491.3
SELECT TestType, [0] As gr0, [1] as gr1, [2] as gr2, [3] as gr3
FROM
(
SELECT grade, TestType, score
FROM
(
SELECT grade, listening, speaking, reading, writing from #PivotData
) PivotData
UNPIVOT
(
score for TestType IN (listening, speaking, reading, writing)
) as initialUnPivot
) as PivotSource
PIVOT
(
max(score) FOR grade IN ([0], [1], [2], [3])
) as PivotedData
Basically what I did was to initially unpivot the data to get a table that contains the grade, testtype, and score each in its own column, then I pivoted the data to get the answer you want. The fact that my UnPivoted source data contains the TestType column makes it so that each combination of grade and testype returns a single score, so all aggregations will just return that particular score for the combination and will not perform anything on it.
I have only done it for the first 4 grades, but I am pretty sure you can tell what you need to add to have it work for all 13 grades.

Here is a solution. The code below uses Oracle's dual table to create a dummy table for the areas (e.g., listening, speaking, etc.); however, for SQLServer, I believe you can just truncate the 'from dual' clause within each union. The query performs a cartesian product in order to pull down the column-oriented grades into a normalized structure (columns skill, grade, and score). This is then used in the normal manner to pivot the data. I also added a "rank" column so the data could be sorted as per the results you specified.
select skill, rank
, max(case grade when 0 then score else null end) gr0
, max(case grade when 1 then score else null end) gr1
, max(case grade when 2 then score else null end) gr2
from (
select skill, rank, grade
, case skill when 'listening' then listening
when 'speaking' then speaking
when 'reading' then reading
when 'writing' then writing end score
from tmp_grade t, (
select 'listening' skill, 1 rank from dual
union (select 'speaking', 2 from dual)
union (select 'reading', 3 from dual)
union (select 'writing', 4 from dual)
) area1
)
group by skill, rank
order by rank;

Related

How to perform aggregation without considering certain value from a column and calculating mathematical metrics under certain condition on that

Call_ID
UUID
Intent_Product
A
123
Loan_BankAccount
A
234
StopCheque
A
789
Request_Agent_phone_number
B
900
Loan_BankAccount
B
787
Request_Agent_BankAcc
I have the above table where "Call_ID" means a call that has been made, "UUID" is a unique key for a turn in the same call (Suppose Call A can have multiple turns such as 123, 234, 789(here)) and "Intent_Product" refers to the description of the query.
The expected output is :
Intent_Product
Resolved_Count
Contained_Turns
Contained_Calls
Loan_BankAcc
2
1
0.5
Stop_Cheque
1
0
0
Conditions :
Resolution_Count :- Count of the total number of queries that has been resolved ( Here, for example "Loan_BankAccount" =2 , "StopCheque" = 1) (where "Intent_Product" like "Request_Agent" , have to ignored as those are not resolved)
Contained_Turns :- Count the total number of queries that has been contained, but ignore those queries which has "Intent_Product" like "Request_Agent" as the successor. ( example :- here Containment count for "Loan_BankAccount" = 1 and Stop_Cheque" = 0 )
Contained_Calls :- This would be equal to (Contained_Turns)/(Resolution_Count)
WITH
successor AS
(
SELECT
your_data.*,
LEAD(intent_product)
OVER (
PARTITION BY call_id
ORDER BY uuid
)
AS successor_intent_product
FROM
your_data
),
aggregate AS
(
SELECT
intent_product,
COUNT(*) AS turns,
COUNT(CASE WHEN successor_intent_product LIKE 'Request_Agent_%' THEN NULL ELSE 1 END) AS no_request
FROM
successor
WHERE
intent_product NOT LIKE 'Request_Agent_%'
GROUP BY
intent_product
)
SELECT
*,
no_request * 1.0 / turns AS ratio
FROM
aggregate
https://dbfiddle.uk/TY7LhjfF

How can I find the variation in strings in a single column using Snowflake SQL?

Say I have a table like this:
Person1
Person2
Dave
Fred
Dave
Dave
Dave
Mike
Fred
Dave
Dave
Mike
Dave
Jeff
In column 'Person1' clearly Dave is the most popular input, so I'd like to produce a 'similarity score' or 'variation within column' score that would reflect that in SQL (Snowflake).
In contrast, for the column 'Person2' there is more variation between the strings and so the similarity score would be lower, or variation within column higher. So you might end up with a similarity score output as something like: 'Person1': 0.9, 'Person2': 0.4.
If this is just row-wise Levenshtein Distance (LD), how can I push EDITDISTANCE across these to get a score for each column please? At the moment I can only see how to get the LD between 'Person1' and 'Person2', rather than within 'Person1' and 'Person2'.
Many thanks
You proposed values of 0.9 and 0.4 seem like ratio's of sameness, so that can be calculated with a count and ratio_of_report like so:
with data(person1, person2) as (
select * from values
('Dave','Fred'),
('Dave','Dave'),
('Dave','Mike'),
('Fred','Dave'),
('Dave','Mike'),
('Dave','Jeff')
), p1 as (
select
person1
,count(*) as c_p1
,ratio_to_report(c_p1) over () as q
from data
group by 1
qualify row_number() over(order by c_p1 desc) = 1
), p2 as (
select
person2
,count(*) as c_p2
,ratio_to_report(c_p2) over () as q
from data
group by 1
qualify row_number() over(order by c_p2 desc) = 1
)
select
p1.q as p1_same,
p2.q as p2_same
from p1
cross join p2
;
giving:
P1_SAME
P2_SAME
0.833333
0.333333
Editdistance:
So using a full cross join, we can calculate the editdistance of all values, and find the ratio of this to the total count:
with data(person1, person2) as (
select * from values
('Dave','Fred'),
('Dave','Dave'),
('Dave','Mike'),
('Fred','Dave'),
('Dave','Mike'),
('Dave','Jeff')
), combo as (
select
editdistance(da.person1, db.person1) as p1_dist
,editdistance(da.person2, db.person2) as p2_dist
from data as da
cross join data as db
)
select count(*) as c
,sum(p1_dist) as s_p1_dist
,sum(p2_dist) as s_p2_dist
,c / s_p1_dist as p1_same
,c / s_p2_dist as p2_same
from combo
;
But given editdistance gives a result of zero for same and positive value for difference, the scaling of these does not align with the desired result...
JAROWINKLER_SIMILARITY:
Given the Jarowinklet similarity result is already scaled between 0 - 100, it makes more sense to be able to average this..
select
avg(JAROWINKLER_SIMILARITY(da.person1, db.person1)/100) as p1_dist
,avg(JAROWINKLER_SIMILARITY(da.person2, db.person2)/100) as p2_dist
from data as da
cross join data as db;
P1_DIST
P2_DIST
0.861111111111
0.527777777778

SQL Server : percentage calculation with new records as output

I have the below table as an output of a SQL query
ID Car Type Units Sold
---------------------------
1 Sedan 250
2 SUV 125
3 Total 375
I want a SQL query / procedure to produce below output
ID Car Type Units Sold
--------------------------
1 Sedan 250
2 SUV 125
3 Total 375
4 Sedan_Pct 66.67 (250/375)
5 SUV_Pct 33.33 (125/375)
Please note that Car Type will be increased in future and I want the percentage of each car type which should be appended to current table as '_Pct'.
Typically we might expect to see the percentages as a separate column, not as separate rows. That being said, we can generate the output you want using grouping sets in SQL Server:
WITH cte AS (
SELECT ID, CarType, SUM (UnitsSold) AS UnitsSold
FROM yourTable
GROUP BY
GROUPING SETS((ID, CarType), (CarType), ())
)
SELECT
ID,
COALECSE(CarType, 'Total') AS CarType,
CASE WHEN ID IS NOT NULL OR CarType IS NULL
THEN UnitsSold
ELSE 100.0 * UnitsSold /
SUM(CASE WHEN ID IS NOT NULL THEN UnitsSold END) OVER () END AS PctUnitsSold
FROM cte
ORDER BY
ID DESC,
CASE WHEN CarType IS NULL THEN 0 ELSE 0 END,
CarType;
Demo
A simpler solution will be using Union
SELECT CarType, UnitSold FROM Car
UNION
SELECT 'Total' CarType, SUM(UnitSold) UnitSold FROM Car
UNION
SELECT CarType + '_Pct' AS CarType, UnitSold / (SELECT SUM(UnitSold) FROM Car) * 100 AS UnitSold FROM Car
Might not be ideal in the long run
query for mysql server
use union all in view or stored procedure -
declare #totalunitsold numeric(15,0);
set #totalunitsold = (select unitsold from car where cartype='total')
select cartype,unitsold from car
union all
select cartype + '_pct', (unitsold/#totalunitsold) as pct from car
this may help you
SELECT CarType+'_Pct', UnitSold/TotalSale*100
FROM Car cross join (select sum(UnitSold) TotalSale from Car) X
you can union that with your Table
Don't do it! Just add an additional column, not new rows:
select t.*,
t.units_sold * 100.0 / sum(case when t.car_type = 'Total' then units_sold end) over () as ratio
from (<your query here>) t;
One fundamental reason why you want a different column is because ratio has a different type from units_sold. Everything in a column (even in a result set) should be a similar attribute.

How do I aggregate numbers from a string column in SQL

I am dealing with a poorly designed database column which has values like this
ID cid Score
1 1 3 out of 3
2 1 1 out of 5
3 2 3 out of 6
4 3 7 out of 10
I want the aggregate sum and percentage of Score column grouped on cid like this
cid sum percentage
1 4 out of 8 50
2 3 out of 6 50
3 7 out of 10 70
How do I do this?
You can try this way :
select
t.cid
, cast(sum(s.a) as varchar(5)) +
' out of ' +
cast(sum(s.b) as varchar(5)) as sum
, ((cast(sum(s.a) as decimal))/sum(s.b))*100 as percentage
from MyTable t
inner join
(select
id
, cast(substring(score,0,2) as Int) a
, cast(substring(score,charindex('out of', score)+7,len(score)) as int) b
from MyTable
) s on s.id = t.id
group by t.cid
[SQLFiddle Demo]
Redesign the table, but on-the-fly as a CTE. Here's a solution that's not as short as you could make it, but that takes advantage of the handy SQL Server function PARSENAME. You may need to tweak the percentage calculation if you want to truncate rather than round, or if you want it to be a decimal value, not an int.
In this or most any solution, you have to count on the column values for Score to be in the very specific format you show. If you have the slightest doubt, you should run some other checks so you don't miss or misinterpret anything.
with
P(ID, cid, Score2Parse) as (
select
ID,
cid,
replace(Score,space(1),'.')
from scores
),
S(ID,cid,pts,tot) as (
select
ID,
cid,
cast(parsename(Score2Parse,4) as int),
cast(parsename(Score2Parse,1) as int)
from P
)
select
cid, cast(round(100e0*sum(pts)/sum(tot),0) as int) as percentage
from S
group by cid;

SQL Percentage of True columns

I have a table where each row has a description field as well as a boolean value. I'm trying to write a query where I can group by each respective description, and see the percentage of times that the boolean was true.
Example table:
PID Gender SeniorCitizen
1 M 1
2 M 1
3 F 0
4 F 1
5 M 0
And I want a query that will return this:
Gender SeniorPct
M .66
F .50
I've got to the point where I have a query that will calculate the individual percentages for a male or female - but I want to see both results at once
SELECT Gender, COUNT(*) * 1.0 /
(SELECT COUNT(*) FROM MyTable WHERE Gender='M')
FROM MyTable WHERE Gender='M' and SeniorCitizen=1;
I've been trying to include a "GROUP BY Gender" statement in my outer SELECT above, but I can't seem to figure out how to tweak the inner SELECT to get the correct results after tweaking the outer SELECT as such.
(I tested this under MySQL, please check if the same idea can be applied to the SQLite.)
To find the number of seniors (per gender), we can treat the bits as numbers and simply sum them up:
SELECT
Gender,
SUM(SeniorCitizen) Seniors
FROM MyTable
GROUP BY Gender
GENDER SENIORS
M 2
F 1
Based on that, we can easily calculate percentages:
SELECT
Gender,
SUM(SeniorCitizen) / COUNT(*) * 100 SeniorsPct
FROM MyTable
GROUP BY Gender
GENDER SENIORSPCT
M 66.6667
F 50
You can play with it in this SQL Fiddle.
UPDATE: Very similar idea works under SQLite as well. Please take a look at another SQL Fiddle.
Try the following:
CREATE TABLE #MyTable
(
PID INT,
Gender VARCHAR(1),
SeniorCitizen BIT
)
INSERT INTO #MyTable
(
PID,
Gender,
SeniorCitizen
)
SELECT 1, 'M', 1 UNION
SELECT 2, 'M', 1 UNION
SELECT 3, 'F', 0 UNION
SELECT 4, 'F', 1 UNION
SELECT 5, 'M', 0
SELECT
Gender,
COUNT(CASE WHEN SeniorCitizen = 1 THEN 1 END), -- Count of SeniorCitizens grouped by Gender
COUNT(1), -- Count of all users grouped by Gender
CONVERT(DECIMAL(2, 2), -- You can ignore this if you want
COUNT(CASE WHEN SeniorCitizen = 1 THEN 1 END) * 1.0 / COUNT(1) -- Your ratio
)
FROM
#MyTable
GROUP BY
Gender