How to query data from two tables and group by two columns - sql

I am using SQL Server and I have two huge tables containing all sorts of data. I would like to retrieve data on how many Latvian or Russian people live in each city.
The language column contains more than two languages but I would like to query only "Latvian" and "Russian"
Table1 (columns worth mentioning):
ID
ProjectID
Phone_nr
City
Table2 (Columns worth mentioning):
ID
ProjectID
Phone_nr
Language
I want the query to retrieve information something like this:
City1(RU) | Amount of Russians
City1(LT) | Amount of Latvians
City2(RU) | Amount of Russians
City2(LT) | Amount of Latvians
.. etc
Or something like this:
City1 | Amount of Russians | Amount of Latvians | Total amount of people
City2 | Amount of Russians | Amount of Latvians | Total amount of people
City3 | Amount of Russians | Amount of Latvians | Total amount of people
.. etc
I am wondering what would be the best solution to this? Should I use join or union or a simple select?
I came up with a query like this:
SELECT DISTINCT top 100 t.city, count(t.city) as 'Total amount of nr in city', count(*), l.language
FROM table1 l, table2 t
WHERE l.phone = t.phone and l.projectID = t.projektID
group by t.city, l.language
I believe the where clause is correct because both tables have phone numbers and Project IDs, it is important that the query selects with this where clause.
Unfortunately this doesn't quite work. It returns rows in this format:
City1 | Amount of y | total amount of numbers in this language
City1 | Amount of x | total amount of numbers in that language
It's close but it's not good enough. Note: I am using select top 100 just for testing, I will select everything once I have the query done right.
Can anyone help or point me in the right direction? Thank you

You can try using conditional aggregation -
SELECT t.city,
count(case when l.language='Russians' then 1 end) as 'Amount of Russians',
count(case when l.language='Latvians' then 1 end) as 'Amount of Latvians',
count(*) as 'Total amount of nr in city'
FROM table1 l inner join table2 t
on l.phone = t.phone and l.projectID = t.projektID
group by t.city
Note: It's always best to use join explicitly.

The logic of #Fahmi is correct. There is one more approach of using SUM instead of COUNT. I am adding the same for additional option to consider.
SELECT t.city,
SUM(case when l.language='Russians' then 1 else 0 end) as 'Amount of Russians',
SUM(case when l.language='Latvians' then 1 else 0 end) as 'Amount of Latvians',
count(*) as 'Total amount of nr in city'
FROM table1 l inner join table2 t
on l.phone = t.phone and l.projectID = t.projektID
group by t.city

Related

postgresql total column sum

SELECT
SELECT pp.id, TO_CHAR(pp.created_dt::date, 'dd.mm.yyyy') AS "Date", CAST(pp.created_dt AS time(0)) AS "Time",
au.username AS "User", ss.name AS "Service", pp.amount, REPLACE(pp.status, 'SUCCESS', ' ') AS "Status",
pp.account AS "Props", pp.external_id AS "External", COALESCE(pp.external_status, null, 'indefined') AS "External status"
FROM payment AS pp
INNER JOIN auth_user AS au ON au.id = pp.creator_id
INNER JOIN services_service AS ss ON ss.id = pp.service_id
WHERE pp.created_dt::date = (CURRENT_DATE - INTERVAL '1' day)::date
AND ss.name = 'Some Name' AND pp.status = 'SUCCESS'
id | Date | Time | Service |amount | Status |
------+-----------+-----------+------------+-------+--------+---
9 | 2021.11.1 | 12:20:01 | some serv | 100 | stat |
10 | 2021.12.1 | 12:20:01 | some serv | 89 | stat |
------+-----------+-----------+------------+-------+--------+-----
Total | | | | 189 | |
I have a SELECT like this. I need to get something like the one shown above. That is, I need to get the total of one column. I've tried a lot of things already, but nothing works out for me.
If I understand correctly you want a result where extra row with aggregated value is appended after result of original query. You can achieve it multiple ways:
1. (recommended) the simplest way is probably to union your original query with helper query:
with t(id,other_column1,other_column2,amount) as (values
(9,'some serv','stat',100),
(10,'some serv','stat',89)
)
select t.id::text, t.other_column1, t.other_column2, t.amount from t
union all
select 'Total', null, null, sum(amount) from t
2. you can also use group by rollup clause whose purpose is exactly this. Your case makes it harder since your query contains many columns uninvolved in aggregation. Hence it is better to compute aggregation aside and join unimportant data later:
with t(id,other_column1,other_column2,amount) as (values
(9,'some serv','stat',100),
(10,'some serv','stat',89)
)
select case when t.id is null then 'Total' else t.id::text end as id
, t.other_column1
, t.other_column2
, case when t.id is null then ext.sum else t.amount end as amount
from (
select t.id, sum(amount) as sum
from t
group by rollup(t.id)
) ext
left join t on ext.id = t.id
order by ext.id
3. For completeness I just show you what should be done to avoid join. In that case group by clause would have to use all columns except amount (to preserve original rows) plus the aggregation (to get the sum row) hence the grouping sets clause with 2 sets is handy. (The rollup clause is special case of grouping sets after all.) The obvious drawback is repeating case grouping... expression for each column uninvolved in aggregation.
with t(id,other_column1,other_column2,amount) as (values
(9,'some serv','stat',100),
(10,'some serv2','stat',89)
)
select case grouping(t.id) when 0 then t.id::text else 'Total' end as id
, case grouping(t.id) when 0 then t.other_column1 end as other_column1
, case grouping(t.id) when 0 then t.other_column2 end as other_column2
, sum(t.amount) as amount
from t
group by grouping sets((t.id, t.other_column1, t.other_column2), ())
order by t.id
See example (db fiddle):
(To be frank, I can hardly imagine any purpose other than plain reporting where a column mixes id of number type with label Total of text type.)

Sum multiple rows in a joined query

I'm not even sure how to ask this. So here goes. I have two tables I am joining, and am needing to sum one column of data, easy enough, but the data that needs to be summed is dependent on a certain character associated with the same job number from a different table.
Table1
JobNumber CostType Amount
1 A 10
1 B 20
1 C 50
1 C 50
3 C 75
Table 2
JobNumber Status Value
1 A 100
2 I 50
3 A 75
Okay, so some of the jobs will have multiple lines for CostType 'C'. I'm trying to display all JobNumbers with the total of any amounts for CostType C, BUT only for jobs that have the Status of 'A'. Here's my query so far:
SELECT Table1.JobNumber
,Table1.Amount
,Table2.Value
FROM DB.Table1, DB.Table2
WHERE Table1.JobNumber = Table2.JobNumber and Table1.CostType = 'C' and Table2.Status = 'A'
GROUP BY Table1.JobNumber, Table1.Amount, Table2.Value
ORDER BY Table1.JobNumber ASC
It's giving me the list of job numbers, their amounts, and the contract value, and only for CostType 'C' and with the Status of 'A'. But each separate CostType 'C' amount has its own row. Is there a way to combine them and display the total Amount along with the Value for each JobNumber, like this?
JobNumber CostTypeCTotal Value
1 100 100
3 75 75
Hmmm . . . Try aggregating table1 before joining the tables:
SELECT t2.JobNumber, t1.c_Amount, t2.Value
FROM DB.Table2 t2
(SELECT JobNumber,
SUM(CASE WHEN CostType = 'C' THEN amount END) as c_amount
FROM DB.Table1 t1
GROUP BY JobNumber
HAVING SUM(CASE WHEN CostType = 'A' THEN 1 ELSE 0 END) > 0
) t1 JOIN
ON t1.JobNumber = t2.JobNumber;
Note: Learn to use proper, explicit, standard, readable JOIN syntax. Never use commas in the FROM clause. Such use of , is archaic syntax that has been out of date since the 1990s.

Using Count distinct case in sql and group by multiple columns

I have a query that works great (listed below). The issue I am having is we have run into a patient that has had event on two different days and because I am grouping by the PATNUM, it is only showing it as one.
How can I get it to count 1 for each time if the PATNUM and SCHDT are different
Example:
PATNUM SCHDT
12345 30817
12345 30817
54321 30817
54321 30717
PATNUM 12345 should only count once while PATNUM 54321 should count twice.
My count statement is this:
SELECT ph.*, pi.*,
COUNT(DISTINCT CASE WHEN `SERVTYPE` IN ('INPT','INPFOP','INFOBS','IP') AND Complete ='7' THEN pi.PATNUM ELSE NULL END) AS count1,
COUNT(DISTINCT CASE WHEN `SERVTYPE` IN ('INPT','INPFOP','INFOBS','IP') AND Complete ='8' THEN pi.PATNUM ELSE NULL END) AS count2
FROM patientinfo as pi
INNER JOIN physicians as ph ON pi.SURGEON=ph.PName
WHERE PID NOT IN ('1355','988','767','1289','484','2784')
GROUP BY SURGEON
ORDER BY Dept,SURGEON ASC
Which columns do you want to see?
You can adjust your GROUP BY:
SELECT
ph.pname,
ph.specialty,
SUM(CASE WHEN complete = 7 THEN 1 ELSE 0 END) count1,
SUM(CASE WHEN complete = 8 THEN 1 ELSE 0 END) count2
FROM
(
SELECT
DISTINCT
surgeon,
patnum,
schdt,
complete,
servtype
FROM patientinfo
WHERE complete IN (7,8)
AND servtype IN ('INPT','INPFOP','INFOBS','IP')
AND pid NOT IN ('1355','988','767','1289','484','2784')
) pisub
INNER JOIN physicians ph ON pisub.surgeon = ph.pname
GROUP BY ph.pname, ph.specialty
ORDER BY ph.pname, ph.specialty;
Also, I would make a few suggestions:
If you're going to give your tables an alias, then use the alias when referring to any column in your query. I've made a guess here about some of your columns as to which table they come from (e.g. dept), so feel free to change it if it is not correct
You don't need to select all records from both tables if you don't need them
The query won't run if you don't GROUP BY all columns you're selecting. I've written about this for Oracle and SQL in general, but actually in MySQL I think it does run but show incorrect results.

Calculate Count as Percentage

I have looked around but I just can't seem to understand the logic. I think a good response is here, but like I said, it doesn't make sense, so a more specific explanation would be greatly appreciated.
So I want to show how often customers of each ethnicity are using an credit card. There are different types of credit cards, but if the CardID = 1, they used cash (hence the not equal to 1 statement).
I want to Group By ethnicity and show the count of transactions, but as a percentage.
SELECT Ethnicity, COUNT(distinctCard.TransactionID) AS CardUseCount
FROM (SELECT DISTINCT TransactionID, CustomerID FROM TransactionT WHERE CardID <> 1)
AS distinctCard INNER JOIN CustomerT ON distinctCard.CustomerID = CustomerT.CustomerID
GROUP BY Ethnicity
ORDER BY COUNT(distinctCard.TransactionID) ASC
So for example, this is what it comes up with:
Ethnicity | CardUseCount
0 | 100
1 | 200
2 | 300
3 | 400
But I would like this:
Ethnicity | CardUsePer
0 | 0.1
1 | 0.2
2 | 0.3
3 | 0.4
If you need the percentage of card-transaction per ethnicity, you have to divide the cardtransactions per ethnicity by the total transactions of the same ethnicity. You don't need a sub query for that:
SELECT Ethnicity, sum(IIF(CardID=1,0,1))/count(1) AS CardUsePercentage
FROM TransactionT
INNER JOIN CustomerT
ON TransactionT.CustomerID = CustomerT.CustomerID
GROUP BY Ethnicity
From your posted sample result to me it looks like you just wanted to divide the count by 1000 like
SELECT Ethnicity,
COUNT(distinctCard.TransactionID) / 1000 AS CardUseCount
FROM <rest part of query>
SELECT Ethnicity, COUNT(distinctCard.TransactionID) / (SELECT COUNT(1) FROM TransactionT WHERE CardID <> 1) AS CardUsePer
FROM (SELECT DISTINCT TransactionID, CustomerID FROM TransactionT WHERE CardID <> 1)
AS distinctCard INNER JOIN CustomerT ON distinctCard.CustomerID = CustomerT.CustomerID
GROUP BY Ethnicity
ORDER BY COUNT(distinctCard.TransactionID) ASC
I think the answer you posted is your answer. As they said in your comments , you just count the transactions, you need to divide it by the number of total transactions. As stated in the answer, you need to divide the count(...) by the total number. This would be done as follows:
SELECT Ethnicity, COUNT(distinctCard.TransactionID)/(SELECT COUNT(TransactionT.TransactionID)
FROM TransactionT WHERE CardID <> 1)
AS CardUsePercent
FROM (SELECT DISTINCT TransactionID, CustomerID FROM TransactionT WHERE CardID <> 1)
AS distinctCard INNER JOIN CustomerT ON distinctCard.CustomerID = CustomerT.CustomerID
GROUP BY Ethnicity
ORDER BY COUNT(distinctCard.TransactionID) ASC
This will give the result you want.
EDIT: This may be wrong, as i dont know the exact format of your tables, but i was assuming that the TransactionID field is Unique in the table. Else use the DISTINCT keyword, or the PK of your table , depending on your actual implemetation

SQL Inner Join query

I have following table structures,
cust_info
cust_id
cust_name
bill_info
bill_id
cust_id
bill_amount
bill_date
paid_info
paid_id
bill_id
paid_amount
paid_date
Now my output should display records (1 jan 2013 to 1 feb 2013) between two bill_dates dates as single row as follows,
cust_name | bill_id | bill_amount | tpaid_amount | bill_date | balance
where tpaid_amount is total paid for particular bill_id
For example,
for bill id abcd, bill_amount is 10000 and user pays 2000 one time and 3000 second time
means, paid_info table contains two entries for same bill_id
bill_id | paid_amount
abcd 2000
abcd 3000
so, tpaid_amount = 2000 + 3000 = 5000 and balance = 10000 - tpaid_amount = 10000 - 5000 = 5000
Is there any way to do this with single query (inner joins)?
You'd want to join the 3 tables, then group them by bill ids and other relevant data, like so.
-- the select line, as well as getting your columns to display, is where you'll work
-- out your computed columns, or what are called aggregate functions, such as tpaid and balance
SELECT c.cust_name, p.bill_id, b.bill_amount, SUM(p.paid_amount) AS tpaid, b.bill_date, b.bill_amount - SUM(p.paid_amount) AS balance
-- joining up the 3 tables here on the id columns that point to the other tables
FROM cust_info c INNER JOIN bill_info b ON c.cust_id = b.cust_id
INNER JOIN paid_info p ON p.bill_id = b.bill_id
-- between pretty much does what it says
WHERE b.bill_date BETWEEN '2013-01-01' AND '2013-02-01'
-- in group by, we not only need to join rows together based on which bill they're for
-- (bill_id), but also any column we want to select in SELECT.
GROUP BY c.cust_name, p.bill_id, b.bill_amount, b.bill_date
A quick overview of group by: It will take your result set and smoosh rows together, based on where they have the same data in the columns you give it. Since each bill will have the same customer name, amount, date, etc, we are fine to group by those as well as the bill id, and we'll get a record for each bill. If we wanted to group it by p.paid_amount, though, since each payment would have a different one of those (possibly), you'd get a record for each payment as opposed to for each bill, which isn't what you'd want. Once group by has smooshed these rows together, you can run aggregate functions such as SUM(column). In this example, SUM(p.paid_amount) totals up all the payments that have that bill_id to work out how much has been paid. For more information, please look at W3Schools chapter on group by in their SQL tutorials.
Hope I've understood this correctly and that this helps you.
This will do the trick;
select
cust_name,
bill_id,
bill_amount,
sum(paid_amount),
bill_date,
bill_amount - sum(paid_amount)
from
cust_info
left outer join bill_info
left outer join paid_info
on bill_info.bill_id=paid_info.bill_id
on cust_info.cust_id=bill_info.cust_id
where
bill_info.bill_date between X and Y
group by
cust_name,
bill_id,
bill_amount,
bill_date