Calculate Count as Percentage - sql

I have looked around but I just can't seem to understand the logic. I think a good response is here, but like I said, it doesn't make sense, so a more specific explanation would be greatly appreciated.
So I want to show how often customers of each ethnicity are using an credit card. There are different types of credit cards, but if the CardID = 1, they used cash (hence the not equal to 1 statement).
I want to Group By ethnicity and show the count of transactions, but as a percentage.
SELECT Ethnicity, COUNT(distinctCard.TransactionID) AS CardUseCount
FROM (SELECT DISTINCT TransactionID, CustomerID FROM TransactionT WHERE CardID <> 1)
AS distinctCard INNER JOIN CustomerT ON distinctCard.CustomerID = CustomerT.CustomerID
GROUP BY Ethnicity
ORDER BY COUNT(distinctCard.TransactionID) ASC
So for example, this is what it comes up with:
Ethnicity | CardUseCount
0 | 100
1 | 200
2 | 300
3 | 400
But I would like this:
Ethnicity | CardUsePer
0 | 0.1
1 | 0.2
2 | 0.3
3 | 0.4

If you need the percentage of card-transaction per ethnicity, you have to divide the cardtransactions per ethnicity by the total transactions of the same ethnicity. You don't need a sub query for that:
SELECT Ethnicity, sum(IIF(CardID=1,0,1))/count(1) AS CardUsePercentage
FROM TransactionT
INNER JOIN CustomerT
ON TransactionT.CustomerID = CustomerT.CustomerID
GROUP BY Ethnicity

From your posted sample result to me it looks like you just wanted to divide the count by 1000 like
SELECT Ethnicity,
COUNT(distinctCard.TransactionID) / 1000 AS CardUseCount
FROM <rest part of query>

SELECT Ethnicity, COUNT(distinctCard.TransactionID) / (SELECT COUNT(1) FROM TransactionT WHERE CardID <> 1) AS CardUsePer
FROM (SELECT DISTINCT TransactionID, CustomerID FROM TransactionT WHERE CardID <> 1)
AS distinctCard INNER JOIN CustomerT ON distinctCard.CustomerID = CustomerT.CustomerID
GROUP BY Ethnicity
ORDER BY COUNT(distinctCard.TransactionID) ASC

I think the answer you posted is your answer. As they said in your comments , you just count the transactions, you need to divide it by the number of total transactions. As stated in the answer, you need to divide the count(...) by the total number. This would be done as follows:
SELECT Ethnicity, COUNT(distinctCard.TransactionID)/(SELECT COUNT(TransactionT.TransactionID)
FROM TransactionT WHERE CardID <> 1)
AS CardUsePercent
FROM (SELECT DISTINCT TransactionID, CustomerID FROM TransactionT WHERE CardID <> 1)
AS distinctCard INNER JOIN CustomerT ON distinctCard.CustomerID = CustomerT.CustomerID
GROUP BY Ethnicity
ORDER BY COUNT(distinctCard.TransactionID) ASC
This will give the result you want.
EDIT: This may be wrong, as i dont know the exact format of your tables, but i was assuming that the TransactionID field is Unique in the table. Else use the DISTINCT keyword, or the PK of your table , depending on your actual implemetation

Related

How to query data from two tables and group by two columns

I am using SQL Server and I have two huge tables containing all sorts of data. I would like to retrieve data on how many Latvian or Russian people live in each city.
The language column contains more than two languages but I would like to query only "Latvian" and "Russian"
Table1 (columns worth mentioning):
ID
ProjectID
Phone_nr
City
Table2 (Columns worth mentioning):
ID
ProjectID
Phone_nr
Language
I want the query to retrieve information something like this:
City1(RU) | Amount of Russians
City1(LT) | Amount of Latvians
City2(RU) | Amount of Russians
City2(LT) | Amount of Latvians
.. etc
Or something like this:
City1 | Amount of Russians | Amount of Latvians | Total amount of people
City2 | Amount of Russians | Amount of Latvians | Total amount of people
City3 | Amount of Russians | Amount of Latvians | Total amount of people
.. etc
I am wondering what would be the best solution to this? Should I use join or union or a simple select?
I came up with a query like this:
SELECT DISTINCT top 100 t.city, count(t.city) as 'Total amount of nr in city', count(*), l.language
FROM table1 l, table2 t
WHERE l.phone = t.phone and l.projectID = t.projektID
group by t.city, l.language
I believe the where clause is correct because both tables have phone numbers and Project IDs, it is important that the query selects with this where clause.
Unfortunately this doesn't quite work. It returns rows in this format:
City1 | Amount of y | total amount of numbers in this language
City1 | Amount of x | total amount of numbers in that language
It's close but it's not good enough. Note: I am using select top 100 just for testing, I will select everything once I have the query done right.
Can anyone help or point me in the right direction? Thank you
You can try using conditional aggregation -
SELECT t.city,
count(case when l.language='Russians' then 1 end) as 'Amount of Russians',
count(case when l.language='Latvians' then 1 end) as 'Amount of Latvians',
count(*) as 'Total amount of nr in city'
FROM table1 l inner join table2 t
on l.phone = t.phone and l.projectID = t.projektID
group by t.city
Note: It's always best to use join explicitly.
The logic of #Fahmi is correct. There is one more approach of using SUM instead of COUNT. I am adding the same for additional option to consider.
SELECT t.city,
SUM(case when l.language='Russians' then 1 else 0 end) as 'Amount of Russians',
SUM(case when l.language='Latvians' then 1 else 0 end) as 'Amount of Latvians',
count(*) as 'Total amount of nr in city'
FROM table1 l inner join table2 t
on l.phone = t.phone and l.projectID = t.projektID
group by t.city

Case Statement for multiple criteria

I would like to ignore some of the results of my query as for all intents and purposes, some of the results are a duplicate, but based on the way the request was made, we need to use this hierarchy and although we are seeing different 'Company_Name' 's, we need to ignore one of the results.
Query:
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
2
ORDER BY
3 ASC, 2 ASC
This code omits half a doze joins and where statements that are not germane to this question.
Results:
Customer_Name_Count Company_Name Total_Sales
-------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 6 Jimmy's Restaurant 1,500
4 9 Impala Hotel 2,000
5 12 Sports Drink 2,500
In the above set, we can see that numbers 2 & 3 have the same count and the same total_sales number and similar company names. Is there a way to create a case statement that takes these 3 factors into consideration and then drops one or the other for Jimmy's enterprises? The other issue is that this has to be variable as there are other instances where this happens. And I would only want this to happen if the count and sales number match each other with a similar name in the company name.
Desired result:
Customer_Name_Count Company_Name Total_Sales
--------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 9 Impala Hotel 2,000
4 12 Sports Drink 2,500
Looks like other answers are accurate based on assumption that Company_IDs are the same for both.
If Company_IDs are different for both Jimmy's Bar and Jimmy's Restaurant then you can use something like this. I suggest you get functional users involved and do some data clean-up else you'll be maintaining this every time this issue arise:
SELECT
COUNT(DISTINCT CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END) AS Customer_Name_Count
,CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END AS Company_Name
,SUM(A12.Total_Sales) AS Total_Sales
FROM some_table er
GROUP BY CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END
Your problem is that the joins you are using are multiplying the number of rows. Somewhere along the way, multiple names are associated with exactly the same entity (which is why the numbers are the same). You can fix this by aggregating by the right id:
SELECT COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
MAX(Company_Name) as Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM some_table AS A12
GROUP BY Company_id -- I'm guessing the column is something like this
ORDER BY 3 ASC, 2 ASC;
This might actually overstate the sales (I don't know). Better would be fixing the join so it only returned one name. One possibility is that it is a type-2 dimension, meaning that there is a time component for values that change over time. You may need to restrict the join to a single time period.
You need to have function to return a common name for the companies and then use DISTINCT:
SELECT DISTINCT
Customer_Name_Count,
dbo.GetCommonName(Company_Name) as Company_Name,
Total_Sales
FROM dbo.theTable
You can try to use ROW_NUMBER with window function to make row number by Customer_Name_Count and Total_Sales then get rn = 1
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY Customer_Name_Count,Total_Sales ORDER BY Company_Name) rn
FROM (
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
Company_Name
)t1
)t1
WHERE rn = 1

take Duplicated ID's out and Identify a new columns

I Joined 6 table together to gather all information that I need.
I want all Id's, Names, Birthdays, and Ethnicity.
Some Ids have 2 or more Ethnicity and that will cause a id be duplicated.
I am thinking of writing a sub query or can I just use a case statement since I have tried case statement before and works for another case but I can not apply it in this case.
what I have is:
ID NAME Birthdays Ethnicity
4000 Pedram 11/11/1999 Middle East
4001 Carlos 11/11/1920 Spanish
4001 Carlos 11/11/1920 Native American
4002 Asia 11/22/1986 Polish
4002 Asia 11/22/1986 Native American
4002 Asia 11/22/1986 White/caucassian
I want to say if any Id duplicated and ethnicity is different <> just give me this:
ID NAME Birthdays Ethnicity
4000 Pedram 11/11/1999 Middle East
4001 Carlos 11/11/1920 Multiracial
4002 Asia 11/22/1986 multiracial
PS : ethnicity is in a different table and I joined it to Person_table
PS : to be able to join ethnicity table to Person_table I needed to join 3 more tables that have pr keys that can related to each other.
PS : I tried CASE WHEN Count (Id) > 1 THEN 'Multiracial' ELSE Ethnicity END AS Ethnicity_2
and it Identify all ethnicity as Multiracial.
Any help Or thought will be appreciate.
You can use this:
WITH CTE AS
(
SELECT *,
N = COUNT(*) OVER(PARTITION BY ID),
RN = ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Ethnicity)
FROM dbo.YourTable
)
SELECT ID,
NAME,
Birthdays,
CASE WHEN N > 1 THEN 'Multiracial' ELSE Ethnicity END Ethnicity
FROM CTE
WHERE RN = 1;
This one might not be the most efficient but it works. Just substitute your derived table for t below:
SELECT DISTINCT t.id, t.name,
CASE WHEN cnt = 1 THEN ethnicity
ELSE 'Multiracial' END AS ethnicity
FROM t
INNER JOIN
(SELECT id, COUNT(DISTINCT ethnicity) AS cnt
FROM t
GROUP BY id) sub
ON t.id = sub.id
Tested here: http://sqlfiddle.com/#!9/7473f/6
SELECT
id, name, Birthdays,
IIF(COUNT(DISTINCT Ethnicity) > 1, 'Multiracial', MIN(Ethnicity)) as Ethnicity
FROM
Table
GROUP BY
id, name, Birthdays
SELECT
id, name, Birthdays,
CASE WHEN COUNT(DISTINCT Ethnicity) > 1 THEN 'Multiracial' ELSE MIN(Ethnicity) END as Ethnicity
FROM
Table
GROUP BY
id, name, Birthdays

SQL Server Amount Split

I have below 2 tables in SQL Server database.
Customer Main Expense Table
ReportID CustomerID TotalExpenseAmount
1000 1 200
1001 2 600
Attendee Table
ReportID AttendeeName
1000 Mark
1000 Sam
1000 Joe
There is no amount at attendee level. I have need to manually calculate individual attendee amount as mentioned below. (i.e split TotalExpenseAmount based on number of attendees and ensure individual split figures round to 2 decimals and sums up to the TotalExpenseAmount exactly)
The final report should look like:
ReportID CustID AttendeeName TotalAmount AttendeeAmount
1000 1 Mark 200 66.66
1000 1 Sam 200 66.66
1000 1 Joe 200 66.68
The final report will have about 1,50,000 records. If you notice the attendee amount I have rounded the last one in such a way that the totals match to 200. What is the best way to write an efficient SQL query in this scenario?
You can do this using window functions:
select ReportID, CustID, AttendeeName, TotalAmount,
(case when seqnum = 1
then TotalAmount - perAttendee * (cnt - 1)
else perAttendee
end) as AttendeeAmount
from (select a.ReportID, a.CustID, a.AttendeeName, e.TotalAmount,
row_number() over (partition by reportId order by AttendeeName) as seqnum,
count(*) over (partition by reportId) as cnt,
cast(TotalAmount * 1.0 / count(*) over (partition by reportId) as decimal(10, 2)) as perAttendee
from attendee a join
expense e
on a.ReportID = e.ReportID
) ae;
The perAttendee amount is calculated in the subquery. This is rounded down by using cast() (only because floor() doesn't accept a decimal places argument). For one of the rows, the amount is the total minus the sum of all the other attendees.
Doing something similar to #Gordon's answer but using a CTE instead.
with CTECount AS (
select a.ReportId, a.AttendeeName,
ROW_NUMBER() OVER (PARTITION BY A.ReportId ORDER BY A.AttendeeName) [RowNum],
COUNT(A.AttendeeName) OVER (PARTITION BY A.ReportId) [AttendeeCount],
CAST(c.TotalExpenseAmount / (COUNT(A.AttendeeName) OVER (PARTITION BY A.ReportId)) AS DECIMAL(10,2)) [PerAmount]
FROM #Customer C INNER JOIN #Attendee A ON A.ReportId = C.ReportID
)
SELECT CT.ReportID, CT.CustomerId, AT.AttendeeName,
CASE WHEN CC.RowNum = 1 THEN CT.TotalExpenseAmount - CC.PerAmount * (CC.AttendeeCount - 1)
ELSE CC.PerAmount END [AttendeeAmount]
FROM #Customer CT INNER JOIN #Attendee AT
ON CT.ReportID = AT.ReportId
INNER JOIN CTECount CC
ON CC.ReportId = CT.ReportID AND CC.AttendeeName = AT.AttendeeName
I like the CTE because it allows me to separate the different aspects of the query. The cool thing that #Gordon used was the Case statement and the inner calculation to have the lines total correctly.

return count 0 with mysql group by

database table like this
============================
= suburb_id | value
= 1 | 2
= 1 | 3
= 2 | 4
= 3 | 5
query is
SELECT COUNT(suburb_id) AS total, suburb_id
FROM suburbs
where suburb_id IN (1,2,3,4)
GROUP BY suburb_id
however, while I run this query, it doesn't give COUNT(suburb_id) = 0 when suburb_id = 0
because in suburbs table, there is no suburb_id 4, I want this query to return 0 for suburb_id = 4, like
============================
= total | suburb_id
= 2 | 1
= 1 | 2
= 1 | 3
= 0 | 4
A GROUP BY needs rows to work with, so if you have no rows for a certain category, you are not going to get the count. Think of the where clause as limiting down the source rows before they are grouped together. The where clause is not providing a list of categories to group by.
What you could do is write a query to select the categories (suburbs) then do the count in a subquery. (I'm not sure what MySQL's support for this is like)
Something like:
SELECT
s.suburb_id,
(select count(*) from suburb_data d where d.suburb_id = s.suburb_id) as total
FROM
suburb_table s
WHERE
s.suburb_id in (1,2,3,4)
(MSSQL, apologies)
This:
SELECT id, COUNT(suburb_id)
FROM (
SELECT 1 AS id
UNION ALL
SELECT 2 AS id
UNION ALL
SELECT 3 AS id
UNION ALL
SELECT 4 AS id
) ids
LEFT JOIN
suburbs s
ON s.suburb_id = ids.id
GROUP BY
id
or this:
SELECT id,
(
SELECT COUNT(*)
FROM suburb
WHERE suburb_id = id
)
FROM (
SELECT 1 AS id
UNION ALL
SELECT 2 AS id
UNION ALL
SELECT 3 AS id
UNION ALL
SELECT 4 AS id
) ids
This article compares performance of the two approaches:
Aggregates: subqueries vs. GROUP BY
, though it does not matter much in your case, as you are querying only 4 records.
Query:
select case
when total is null then 0
else total
end as total_with_zeroes,
suburb_id
from (SELECT COUNT(suburb_id) AS total, suburb_id
FROM suburbs
where suburb_id IN (1,2,3,4)
GROUP BY suburb_id) as dt
#geofftnz's solution works great if all conditions are simple like in this case. But I just had to solve a similar problem to generate a report where each column in the report is a different query. When you need to combine results from several select statements, then something like this might work.
You may have to programmatically create this query. Using left joins allows the query to return rows even if there are no matches to suburb_id with a given id. If your db supports it (which most do), you can use IFNULL to replace null with 0:
select IFNULL(a.count,0), IFNULL(b.count,0), IFNULL(c.count,0), IFNULL(d.count,0)
from (select count(suburb_id) as count from suburbs where id=1 group by suburb_id) a,
left join (select count(suburb_id) as count from suburbs where id=2 group by suburb_id) b on a.suburb_id=b.suburb_id
left join (select count(suburb_id) as count from suburbs where id=3 group by suburb_id) c on a.suburb_id=c.suburb_id
left join (select count(suburb_id) as count from suburbs where id=4 group by suburb_id) d on a.suburb_id=d.suburb_id;
The nice thing about this is that (if needed) each "left join" can use slightly different (possibly fairly complex) query.
Disclaimer: for large data sets, this type of query might have not perform very well (I don't write enough sql to know without investigating further), but at least it should give useful results ;-)