Return column sorted by 2 other columns with a condition - sql

I have a DB which is simply a table with 3 columns:
viewer_id, movie_id, Ranking
(INTEGER) (INTEGER) (TEXT)
Where a row in this table represents that a certain viewer has watched a certain movie.
If the viewer hasn't ranked the movie yet- the Ranking value is NULL.
If the viewer did rank the movie than the Ranking value is
LIKE\DISLIKE
I need to write a query that returns the top 10 viewers id with the highest views and rating count ordered by views count
and then rating count (both in descending order).
So far, I wrote a query that returns a table with viewer_id, movie_watch_count
with the correct information.
SELECT viewer_id , count(*) AS movies_watched
FROM viewers_movies_rankings
Group By viewer_id
I tried adding another column to this table - "ranking_count" which will count for each viewer_id the number of rows where the Ranking value != null.
(So that I will get also the number of movies the viewer ranked,
So the only thing after this to do is to sort by those columns )
but everything I wrote didn't work.
For example (not working)
SELECT viewer_id , count(*) AS movies_watched,
COUNT(*) AS movies_watched
HAVING ranking != null
FROM viewers_movies_rankings
Group By viewer_id
Can you help me?

You would seem to want:
SELECT viewer_id , count(*) AS movies_watched,
COUNT(ranking) as movies_ranked
FROM viewers_movies_rankings
GROUP BY viewer_id ;
COUNT(<expression>) counts the number of times the expression is not NULL -- and that is exactly what you want to do.

Is this what you want?
SELECT
viewer_id,
COUNT(*) AS movies_watched,
COUNT(ranking) AS ranking_count
FROM viewers_movies_rankings
GROUP BY viewer_id
The default behavior of the COUNT function is that it only counts non NULL values.

Related

How to combine multiple rows with similar ID but varying attributes into a single row using SQL?

Context: I have 4 different song charts which show the current trending songs on their respective platforms. I want to aggregate the charts to produce a single chart with a "combined" ranking but I want the weights I give a rank from a particular chart to be configurable. So, I may have a unique song (uniqueness denoted by ISRC) that shows up on all 4 charts. I want to merge all 4 charts first, the aggregate the ranking.
Question: Currently, I've merged two tables but am unsure how I can merge two rows with the same ISRC into a single row while adding their ranks up? My code so far:
SELECT * FROM ((SELECT rank,
isrc,
song_name,
dataset_datetime,
'applemusic' AS SOURCE
FROM "myTable"
WHERE chart_country ='US')
UNION
(SELECT rank,
isrc,
song_name,
dataset_datetime,
'spotify' AS SOURCE
FROM "myTable2"
WHERE chart_country ='US'))
ORDER BY cast(rank as int);
Please let me know if any further clarification is required
You can use a group by clause to reduce the rows and perform aggregation(s) of rank.
SELECT isrc
, song_name
, count(*) as num_charts
, sum(cast rank AS INT) as sum_rank
, avg(cast rank AS decimal(10, 2)) as avg_rank
FROM (
(
SELECT rank
, isrc
, song_name
FROM `myTable`
WHERE chart_country = 'US'
)
UNION ALL ## use "union all" if summing or counting the resultset
(
SELECT rank
, isrc
, song_name
FROM `myTable2`
WHERE chart_country = 'US'
)
)
GROUP BY
isrc
, song_name
ORDER BY
sum_rank DESC
, avg_rank DESC
note: I'm not sure why you order by casting rank to an integer, if that column is not defined as numeric then you open the possibility of conversion error - but I have included casts "just in case".
Please note that "union", when used by itself removes duplicate rows, hence it has the potential to upset accurate sums, counts or averages etc. So instead use "union all" which does not attempt to remove duplicate rows (and can be a bit faster because of this).
I have removed 2 columns as they won't summarise well although you could use GROUP_CONCAT(SOURCE) (as noted by PM-77-1) and you could use MIN() or MAX() for the date column if that would be of use.

Group by question in SQL Server, migration from MySQL

Failed finding a solution to my problem, would love your help.
~~ Post has been edited to have only one question ~~-
Group by one query while selecting multiple columns.
In MySQL you can simply group by whatever you want, and it will still select all of them, so if for example I wanted to select the newest 100 transactions, grouped by Email (only get the last transaction of a single email)
In MySQL I would do that:
SELECT * FROM db.transactionlog
group by Email
order by TransactionLogId desc
LIMIT 100;
In SQL Server its not possible, googling a bit suggested to specify each column that I want to have with an aggregate as a hack, that couldn't cause a mix of values (mixing columns between the grouped rows)?
For example:
SELECT TOP(100)
Email,
MAX(ResultCode) as 'ResultCode',
MAX(Amount) as 'Amount',
MAX(TransactionLogId) as 'TransactionLogId'
FROM [db].[dbo].[transactionlog]
group by Email
order by TransactionLogId desc
TransactionLogId is the primarykey which is identity , ordering by it to achieve the last inserted.
Just want to know that the ResultCode and Amount that I'll get doing such query will be of the last inserted row, and not the highest of the grouped rows or w/e.
~Edit~
Sample data -
row1:
Email : test#email.com
ResultCode : 100
Amount : 27
TransactionLogId : 1
row2:
Email: test#email.com
ResultCode:50
Amount: 10
TransactionLogId: 2
Using the sample data above, my goal is to get the row details of
TransactionLogId = 2.
but what actual happens is that I get a mixed values of the two, as I do get transactionLogId = 2, but the resultcode and amount of the first row.
How do I avoid that?
Thanks.
You should first find out which is the latest transaction log by each email, then join back against the same table to retrieve the full record:
;WITH MaxTransactionByEmail AS
(
SELECT
Email,
MAX(TransactionLogId) as LatestTransactionLogId
FROM
[db].[dbo].[transactionlog]
group by
Email
)
SELECT
T.*
FROM
[db].[dbo].[transactionlog] AS T
INNER JOIN MaxTransactionByEmail AS M ON T.TransactionLogId = M.LatestTransactionLogId
You are currently getting mixed results because your aggregate functions like MAX() is considering all rows that correspond to a particular value of Email. So the MAX() value for the Amount column between values 10 and 27 is 27, even if the transaction log id is lower.
Another solution is using a ROW_NUMBER() window function to get a row-ranking by each Email, then just picking the first row:
;WITH TransactionsRanking AS
(
SELECT
T.*,
MostRecentTransactionLogRanking = ROW_NUMBER() OVER (
PARTITION BY
T.Email -- Start a different ranking for each different value of Email
ORDER BY
T.TransactionLogId DESC) -- Order the rows by the TransactionLogID descending
FROM
[db].[dbo].[transactionlog] AS T
)
SELECT
T.*
FROM
TransactionsRanking AS T
WHERE
T.MostRecentTransactionLogRanking = 1

Determine the number of times a null value occurs in column B for a distinct value in column A, SQL table

I have a SQL table with "name" as one column, date as another, and location as a third. The location column supports null values.
I am trying to write a query to determine the number of times a null value occurs in the location column for each distinct value in the name column.
Can someone please assist?
One method uses conditional aggregation:
select name, sum(case when location is null then 1 else 0 end)
from t
group by name;
Another method that involves slightly less typing is:
select name, count(*) - count(location)
from t
group by name;
use count along with filters, as you only requires Null occurrence
select name, count(*) occurances
from mytable
where location is null
group by name
From your question, you'll want to get a distinct list of all different 'name' rows, and then you would like a count of how many NULLs there are per each name.
The following will achieve this:
SELECT name, count(*) as null_counts
FROM table
WHERE location IS NULL
GROUP BY name
The WHERE clause will only retrieve records where the records have NULL as their location.
The GROUP BY will pivot the data based on NAME.
The SELECT will give you the name, and the COUNT(*) of the number of records, per name.

Get unique records from table avoiding all duplicates based on two key columns

I have a table Trial_tb with columns p_id,t_number and rundate.
Sample values:
p_id|t_number|rundate
=====================
111|333 |1/7/2016||
111|333 |1/1/2016||
222|888 |1/8/2016||
222|444 |1/2/2016||
666|888 |1/6/2016||
555|777 |1/5/2016||
pid and tnumber are key columns. I need fetch values such that the result should not have any record in which pid-tnumber combination are duplicated. For example there is duplication for 111|333 and hence not valid. The query should fetch all other than first two records.
I wrote below script but it fetches only the last record. :(
select rundate,p_id,t_number from
(
select rundate,p_id,t_number,
count(p_id) over (partition by p_id) PCnt,
count(t_number) over (partition by t_number) TCnt
from trialtb
)a
where a.PCnt=1 and a.TCnt=1
The having clause is ideal for this job. Having allows you to filter on aggregated records.
-- Finding unique combinations.
SELECT
p_id,
t_number
FROM
trialtb
GROUP BY
p_id,
t_number
HAVING
COUNT(*) = 1
;
This query returns combinations of p_id and t_number that occur only once.
If you want to include rundate you could add MAX(rundate) AS rundate to the select clause. Because you are only looking at unique occurrences the max or min would always be the same.
Do you mean:
select
p_id,t_number
from
trialtb
group by
p_id,t_number
having
count(*) = 1
or do you need the run date too?
select
p_id,t_number,max(rundate)
from
trialtb
group by
p_id,t_number
having
count(*) = 1
Seeing as you are only looking items with one result using max or min should work fine

how can I calculate the sum of my top n records in crystal report?

I m using report tab -> group sort expert-> top n to get top n record but i m getting sum of value in report footer for all records
I want only sum of value of top n records...
In below image i have select top 3 records but it gives sum of all records.
The group sort expert (and the record sort expert too) intervenes in your final result after the total summary is calculated. It is unable to filter and remove rows, in the same way an ORDER BY clause of SQL cannot effect the SELECT's count result (this is a job for WHERE clause). As a result, your summary will always be computed for all rows of your detail section and, of course, for all your group sums.
If you have in mind a specific way to exlude specific rows in order to appear the appropriate sum the you can use the Select Expert of Crystal Reports to remove rows.
Alternatively (and I believe this is the best way), I would make all the necessary calculations in the SQL command and I would sent to the report only the Top 3 group sums (then you can get what you want with a simple total summary of these 3 records)
Something like that
CREATE TABLE #TEMP
(
DEP_NAME varchar(50),
MINVAL int,
RMAVAL int,
NETVAL int
)
INSERT INTO #TEMP
SELECT TOP 3
T.DEP_NAME ,T.MINVAL,T.RMAVAL,T.NETVAL
FROM
(SELECT DEP_NAME AS DEP_NAME,SUM(MINVAL) AS MINVAL,SUM(RMAVAL) AS
RMAVAL,SUM(NETVAL) AS NETVAL
FROM YOURTABLE
GROUP BY DEP_NAME) AS T
ORDER BY MINVAL DESC
SELECT * FROM #TEMP