How to combine multiple rows with similar ID but varying attributes into a single row using SQL? - sql

Context: I have 4 different song charts which show the current trending songs on their respective platforms. I want to aggregate the charts to produce a single chart with a "combined" ranking but I want the weights I give a rank from a particular chart to be configurable. So, I may have a unique song (uniqueness denoted by ISRC) that shows up on all 4 charts. I want to merge all 4 charts first, the aggregate the ranking.
Question: Currently, I've merged two tables but am unsure how I can merge two rows with the same ISRC into a single row while adding their ranks up? My code so far:
SELECT * FROM ((SELECT rank,
isrc,
song_name,
dataset_datetime,
'applemusic' AS SOURCE
FROM "myTable"
WHERE chart_country ='US')
UNION
(SELECT rank,
isrc,
song_name,
dataset_datetime,
'spotify' AS SOURCE
FROM "myTable2"
WHERE chart_country ='US'))
ORDER BY cast(rank as int);
Please let me know if any further clarification is required

You can use a group by clause to reduce the rows and perform aggregation(s) of rank.
SELECT isrc
, song_name
, count(*) as num_charts
, sum(cast rank AS INT) as sum_rank
, avg(cast rank AS decimal(10, 2)) as avg_rank
FROM (
(
SELECT rank
, isrc
, song_name
FROM `myTable`
WHERE chart_country = 'US'
)
UNION ALL ## use "union all" if summing or counting the resultset
(
SELECT rank
, isrc
, song_name
FROM `myTable2`
WHERE chart_country = 'US'
)
)
GROUP BY
isrc
, song_name
ORDER BY
sum_rank DESC
, avg_rank DESC
note: I'm not sure why you order by casting rank to an integer, if that column is not defined as numeric then you open the possibility of conversion error - but I have included casts "just in case".
Please note that "union", when used by itself removes duplicate rows, hence it has the potential to upset accurate sums, counts or averages etc. So instead use "union all" which does not attempt to remove duplicate rows (and can be a bit faster because of this).
I have removed 2 columns as they won't summarise well although you could use GROUP_CONCAT(SOURCE) (as noted by PM-77-1) and you could use MIN() or MAX() for the date column if that would be of use.

Related

How to remove duplicate data from microsoft sql database(on the result only)

the column code has values that have duplicate on it , i do want to remove the duplicate of that row.
for example i want to remove the duplicates of column code as well the row that has duplicate on it. it doesent matter if the other column has duplicate but i do want to base it on the code column. what sql query can i use.? Thank you
this is the table I am working to.
as you can see there are isdeleted column that has value of 1 on them. I only want the recored with a value of 0 on them
here is a sample record, in here you can see that row 1 has a isdeleted value of 1, which mean that this record is deleted and i only need the row 2 of this code.
You could use the windowing function ROW_NUMBER() to single out the last entry per code like in:
SELECT code, shortdesc, longdesc, isobsolete, effectivefromdate
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY code ORDER BY effectivefromdate DESC) AS rn, *
FROM CodingSuite_STG
WHERE isobsolete=1 AND isdeleted=0
) AS cs
WHERE rn=1
ORDER BY effectivefromdate
Explanation:
Core of the operation is a "sub-query". That is a "table-like" expression generated by having a SELECT clause surrounded by parentheses and followed by a table name like:
( SELECT * FROM CodingSuite_STG WHERE iobsolete=1 ) AS cs
For the outer SELECT it will appear like a table with the name "cs".
Within this sub-query I placed a special function (a "window function") consisting of two parts:
ROWN_NUMBER() OVER ( PARTITION BY code ORDER BY effectivefromdate DESC) AS rn
The ROW_NUMBER() function returns a sequential number for a certain "window" of records defined by the immediately following OVER ( ... ) clause. The PARTITION BY inside it defines a group division scheme (similar to GROUP BY), so the row numbers start from 1 for each partitioned group. ORDER BY determines the numbering order within each group. So, with entries having the same code value ROW_NUMBER() will supply the number sequence 1, 2, 3... for each record, with 1 being assigned to the record with the highest value of effectivefromdate because of ORDER BY effectivefromdate DESC.
All we need to do in the outer SELECT clause is to pick up those records from the sub-query cs that have an rn-value of 1 and we're done!

SQL: Joining two table based on certain description

I have two tables:
And I want to add GTIN from table 2 to table 1 based on brand name. Though I cant use = or like because as you see in highlighted row they are not fully matched.
For example
Second row in table 1, suppose to have first GTIN from table 2 because both are Ziagen 300mg tablet. However all of what I tried failed to match all row correctly.
Postgres has a pg_trgm module described here. Start with a cross join joining both tables and calculate the similarity(t1.brand,t2.brand) function, which returns the real number.
Next filter the results based on some heuristic number. Then narrow down with choosing single best match using row_number() window function.
The results might be not accurate, you could improve it by taking generic similarity into account as well.
with cross_similarity(generic1,brand1,gtin,brand2,generic2,sim) as (
select *, similarity(t1.brand, t2.brand) as sim
from t1,
t2
where similarity(t1.brand, t2.brand) > 0
)
, max_similarity as (
select *,
row_number() over (partition by gtin order by sim desc) as best_match_rank
from cross_similarity
)
select * from max_similarity where best_match_rank =1;

Return column sorted by 2 other columns with a condition

I have a DB which is simply a table with 3 columns:
viewer_id, movie_id, Ranking
(INTEGER) (INTEGER) (TEXT)
Where a row in this table represents that a certain viewer has watched a certain movie.
If the viewer hasn't ranked the movie yet- the Ranking value is NULL.
If the viewer did rank the movie than the Ranking value is
LIKE\DISLIKE
I need to write a query that returns the top 10 viewers id with the highest views and rating count ordered by views count
and then rating count (both in descending order).
So far, I wrote a query that returns a table with viewer_id, movie_watch_count
with the correct information.
SELECT viewer_id , count(*) AS movies_watched
FROM viewers_movies_rankings
Group By viewer_id
I tried adding another column to this table - "ranking_count" which will count for each viewer_id the number of rows where the Ranking value != null.
(So that I will get also the number of movies the viewer ranked,
So the only thing after this to do is to sort by those columns )
but everything I wrote didn't work.
For example (not working)
SELECT viewer_id , count(*) AS movies_watched,
COUNT(*) AS movies_watched
HAVING ranking != null
FROM viewers_movies_rankings
Group By viewer_id
Can you help me?
You would seem to want:
SELECT viewer_id , count(*) AS movies_watched,
COUNT(ranking) as movies_ranked
FROM viewers_movies_rankings
GROUP BY viewer_id ;
COUNT(<expression>) counts the number of times the expression is not NULL -- and that is exactly what you want to do.
Is this what you want?
SELECT
viewer_id,
COUNT(*) AS movies_watched,
COUNT(ranking) AS ranking_count
FROM viewers_movies_rankings
GROUP BY viewer_id
The default behavior of the COUNT function is that it only counts non NULL values.

how can I calculate the sum of my top n records in crystal report?

I m using report tab -> group sort expert-> top n to get top n record but i m getting sum of value in report footer for all records
I want only sum of value of top n records...
In below image i have select top 3 records but it gives sum of all records.
The group sort expert (and the record sort expert too) intervenes in your final result after the total summary is calculated. It is unable to filter and remove rows, in the same way an ORDER BY clause of SQL cannot effect the SELECT's count result (this is a job for WHERE clause). As a result, your summary will always be computed for all rows of your detail section and, of course, for all your group sums.
If you have in mind a specific way to exlude specific rows in order to appear the appropriate sum the you can use the Select Expert of Crystal Reports to remove rows.
Alternatively (and I believe this is the best way), I would make all the necessary calculations in the SQL command and I would sent to the report only the Top 3 group sums (then you can get what you want with a simple total summary of these 3 records)
Something like that
CREATE TABLE #TEMP
(
DEP_NAME varchar(50),
MINVAL int,
RMAVAL int,
NETVAL int
)
INSERT INTO #TEMP
SELECT TOP 3
T.DEP_NAME ,T.MINVAL,T.RMAVAL,T.NETVAL
FROM
(SELECT DEP_NAME AS DEP_NAME,SUM(MINVAL) AS MINVAL,SUM(RMAVAL) AS
RMAVAL,SUM(NETVAL) AS NETVAL
FROM YOURTABLE
GROUP BY DEP_NAME) AS T
ORDER BY MINVAL DESC
SELECT * FROM #TEMP

How to set/serialize values based on results from multiple rows / multiple columns in postgresql

I have a table in which I want to calculate two columns values based on results from multiple rows / multiple columns. The primary key is set on the first two columns (tag,qid).
I would like to set the values of two fields (serial and total).
The "serial" column value is unique for each (tag,qid) so if I have 2 records with same tag, I must have record one with serial# 1 and record two with serial# 2 and so on. The serial must be calculated with accordance to priority field in which higher priority values must start serializing first.
the "total" column is the total number of each tag in the table
I would like to do this in plain SQL instead of creating a stored procedure/cursors, etc...
the table below shows full valid settings.
                                 
 +----+----+--------+-------+-----+  
 |tag |qid |priority|serial |total|  
 +--------------------------------+  
 |abc | 87 |  99    |  1    |  2  |  
 +--------------------------------+  
 |abc | 56 |  11    |  2    |  2  |  
 +--------------------------------+  
 |xyz | 89 |  80    |  1    |  1  |  
 +--------------------------------+  
 |pfm | 28 |  99    |  1    |  3  |  
 +--------------------------------+  
 |pfm | 17 |  89    |  2    |  3  |  
 +--------------------------------+  
 |pfm | 64 |  79    |  3    |  3  |  
 +----+----+--------+-------+-----+  
  
Many Thanks
You can readily return a result set with this information using window functions:
select tag, qid, priority,
row_number() over (partition by tag, qid order by priority desc) as serial,
count(*) over (partition by tag, qid) as total
from table t;