SQL Query to count the records - sql

I am making up a SQL query which will get all the transaction types from one table, and from the other table it will count the frequency of that transaction type.
My query is this:
with CTE as
(
select a.trxType,a.created,b.transaction_key,b.description,a.mode
FROM transaction_data AS a with (nolock)
RIGHT JOIN transaction_types b with (nolock) ON b.transaction_key = a.trxType
)
SELECT COUNT (trxType) AS Frequency, description as trxType,mode
from CTE where created >='2017-04-11' and created <= '2018-04-13'
group by trxType ,description,mode
The transaction_types table contains all the types of transactions only and transaction_data contains the transactions which have occurred.
The problem I am facing is that even though it's the RIGHT join, it does not select all the records from the transaction_types table.
I need to select all the transactions from the transaction_types table and show the number of counts for each transaction, even if it's 0.
Please help.

LEFT JOIN is so much easier to follow.
I think you want:
select tt.transaction_key, tt.description, t.mode, count(t.trxType)
from transaction_types tt left join
transaction_data t
on tt.transaction_key = t.trxType and
t.created >= '2017-04-11' and t.created <= '2018-04-13'
group by tt.transaction_key, tt.description, t.mode;
Notes:
Use reasonable table aliases! a and b mean nothing. t and tt are abbreviations of the table name, so they are easier to follow.
t.mode will be NULL for non-matching rows.
The condition on dates needs to be in the ON clause. Otherwise, the outer join is turned into an inner join.
LEFT JOIN is easier to follow (at least for people whose native language reads left-to-right) because it means "keep all the rows in the table you have already read".

Related

confused with INNER JOIN and FULL JOIN with temporary table

WITH
longest_used_bike AS (
SELECT
bikeid,
SUM(duration_minutes) AS trip_duration
FROM
`bigquery-public-data.austin_bikeshare.bikeshare_trips`
GROUP BY
bikeid
ORDER BY
trip_duration DESC
LIMIT 1
)
-- find station at which longest_used bike leaves most often
SELECT
trips.start_station_id,
COUNT(*) AS trip_ct
FROM
longest_used_bike AS longest
INNER JOIN
`bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
ON longest.bikeid = trips.bikeid
GROUP BY
trips.start_station_id
ORDER BY
trip_ct DESC
LIMIT 1
this query will give you a result thats 2575 but why does the result change to 3798 when you use full join instead of inner join? im trying to figure that one what but i am not sure what to think
A full join will include all entries from the trips table - regardless of whether or not they are joinable to the longest_used_bike ID (they will have a NULL value for the columns in longest)
Also see here for an explanation on join-types.
A tip: If you encounter things like these try to look at the queries unaggregated (omit the GROUP BY clause and the COUNT function) - you would then notice here that you'll suddenly have more (unwanted) rows in the FULL JOIN query.
An INNER JOIN will return only rows where the JOIN condition is satisfied. So only rows where there us a natch in both tables.
A FULL JOIN will return ALL rows from the left and all rows from the right with null values in the fields where there is not a natch.

Sum not selecting the values with Zero

I have two tables CDmachine and trnasaction.
CDMachine Table with columns CDMachineID, CDMachineName, InstallationDate
Transaction table with columns TransactionID,CDMachineID,TransactionTime,Amount
I am calculating revenue using the below query but it eliminates the machine without any transaction
SELECT CDMachine.MachineName,
SUM(Transaction.Amount)
FROM CDMachine
LEFT JOIN TRANSACTION ON CDMachine.CDMachineID = Transaction.CDMachineID
WHERE Transaction.TransactionTime BETWEEN '2019-01-01' AND '2019-01-31'
GROUP BY CDMachine.CDMachineName
ORDER BY 2
Move the WHERE condition to the ON clause:
select m.MachineName, sum(t.Amount)
from CDMachine m left join
Transaction t
on m.CDMachineID = t.CDMachineID and
t.TransactionTime between '2019-01-01' and '2019-01-31'
group by m.CDMachineName
order by 2;
The WHERE clause turns the outer join to an inner join -- meaning that you are losing the values that do not match.
If you want 0 rather than NULL for the sum, then use:
select m.MachineName, coalesce(sum(t.Amount), 0)
Even though you are using a LEFT JOIN, the fact that you have a filter on a column from the joined table causes rows that don't meet the join condition to be removed from the result set.
You need to apply the filter on transaction time to the transactions table, before joining it or as part of the join condition. I would do it like this:
SELECT CDMachine.MachineName,
SUM(Transaction.Amount)
FROM CDMachine
LEFT JOIN (
SELECT * FROM TRANSACTION
WHERE Transaction.TransactionTime BETWEEN '2019-01-01' AND '2019-01-31'
) AS Transaction
ON CDMachine.CDMachineID = Transaction.CDMachineID
GROUP BY CDMachine.CDMachineName
ORDER BY 2

How to avoid duplicates in left table where primary key is not unique in joined table

I am having SUM issues when joining 2 tables, whereby the primary key is unique in the left table but can be duplicated in the right table. The scenario I have is that a case_id may have for example a payment of £100 in the left table, which is then broken down at a lower level in to 2 £50 payments in the right table. This is causing the left table payment to be counted twice when joining as the case_id exists twice in the right table.
I have tried a number of different variations of the query but have so far been unsuccessful. I have also searched this website but have been unable to find a scenario that fits mine.
select distinct
t1.[r_code],
t1.[parent_case_id],
sum(t1.[total_redress_value]),
sum(t2.[payment_amount])
from
[SomeTable1] t1
left join
[SomeTable2] t2 on t1.[case_id] = t2.[case_id]
group by
t1.[r_code], t1.[parent_case_id]
Expecting the SUM of total_redress_value & payment_amount to be 100 each, however am finding that SUM of total_redress_value is 200 due to the duplicated case_id row from the join. Any help greatly appreciated.
Group you right table by the PK of the left.
SELECT DISTINCT
t1.[r_code],
t1.[parent_case_id],
SUM(t1.[total_redress_value]),
SUM(t2.[payment_amount])
FROM [SomeTable1] t1
LEFT JOIN
(
SELECT case_id,
MIN(payment_amount) AS payment_amount --or sum etc - whatever fits your logic
FROM [SomeTable2]
GROUP BY case_id
) AS t2
ON t1.[case_id] = t2.[case_id]
GROUP BY t1.[r_code],
t1.[parent_case_id];
Unfortunately, this type of hierarchical calculation is a little complicated. You can pre-aggregation t2 before joining:
select t1.[r_code], t1.[parent_case_id],
sum(t1.[total_redress_value]),
sum(t2.[payment_amount])
from [SomeTable1] as t1 left join
(select t2.case_id, sum(t2.payment_amount) as payment_amount
from [SomeTable2] as t2
group by t2.case_id
) as t2
on t1.[case_id] = t2.[case_id]
group by t1.[r_code], t1.[parent_case_id]
Note that select distinct is almost never needed with group by. And it is certainly not needed in this case.

SQL query returning more than one result

This is my SQL query
SELECT
room.*, reservation.rn, reservation.cin
FROM
room, reservation
But it is returning 4 instances of each row.
I just want to get the corresponding reservation from another table where they have same room number and at the same time display the remaining rooms
Room Table
Id,r_num,r_price,in_date,out_date
reservation_table
id,r_num,cIn,cOut
You have a cross join between the two tables as there is no join condition.
Assuming the reservation table has room_id FK which references id from room table, you can join like this:
select r.*,
s.rn,
s.cin
from room r
join reservation s on r.id = s.room_id
Always use proper explicit join syntax instead of comma based joins.
Your select statement results in a cross join / cartesian product.
Use a WHERE-clause!
SELECT room.*,reservation.rn,reservation.cin FROM room,reservation WHERE room.room_no = reservation.room_no
For more complicated joins I recommend using an explicit syntax with the appropriate keywords, although your implicit join is perfectly fine for this case (and performance wise implicit and explicit joins are the same).
To display unreserved rooms as well (so to keep results that do not satisfy the where-clause) you'll have to use an OUTER JOIN (LEFT or RIGHT depending on what you want to keep) like this:
SELECT room.*,reservation.rn,reservation.cin
FROM room LEFT OUTER JOIN reservation
ON room.room_no = reservation.room_no
For the first part you need a join and for second condition you need union :
select r.Id,r.r_num,r.r_price,r.in_date,r.out_date,s.id as resId,s.r_num,s.cIn,s.cOut
from room r
join reservation s on r. r_num = s. r_num
union
select r.Id,r.r_num,r.r_price,r.in_date,r.out_date,null as resId, null as r_num, null as cIn,null as cOut
from room r
where r.in_date is null
union by default distinct the result if you need repeated rows just use UNION ALL

How to find the most frequent value in a select statement as a subquery?

I am trying to get the most frequent Zip_Code for the Location ID from table B. Table A(transaction) has one A.zip_code per Transaction but table B(Location) has multiple Zip_code for one area or City. I am trying to get the most frequent B.Zip_Code for the Account using Location_D that is present in both table.I have simplified my code and changed the names of the columns for easy understanding but this is the logic for my query I have so far.Any help would be appreciated. Thanks in advance.
Select
A.Account_Number,
A.Utility_Type,
A.Sum(usage),
A.Sum(Cost),
A.Zip_Code,
( select B.zip_Code from B where A.Location_ID= B.Location_ID having count(*)= max(count(B.Zip_Code)) as Location_Zip_Code,
A.Transaction_Date
From
Transaction_Table as A Left Join
Location Table as B On A.Location_ID= B.Location_ID
Group By
A.Account_Number,
A.Utility_Type,
A.Zip_Code,
A.Transaction_Date
This is what I come up with:
Select tt.Account_Number, tt.Utility_Type, Sum(tt.usage), Sum(tt.Cost),
tt.Zip_Code,
(select TOP 1 l.zip_Code
Location_Table l
where tt.Location_ID = l.Location_ID
group by l.zip_code
order by count(*) desc
) as Location_Zip_Code,
tt.Transaction_Date
From Transaction_Table tt
Group By tt.Account_Number, tt.Utility_Type, tt.Zip_Code, tt.Transaction_Date;
Notes:
Table aliases are a good thing. However, they should be abbreviations for the tables referenced, rather than arbitrary letters.
The table alias qualifies the column name, not the function. Hence sum(tt.usage) rather than tt.sum(usage).
There is no need for a join in the outer query. You are doing all the work in the subquery.
An order by with top seems the way to go to get the most common zip code (which, incidentally, is called the mode in statistics).