Select Query for Repeated Records in SQLite - sql

This problem is a generalization of this question. Rather than finding all the games with specific players playing against others, I want to be able to find all the games where the same players played against each other.
Here is sample data:
1,ChrisEveret,1
1,BillieJeanKing,1
1,RogerFederer,0
1,TomasMuster,0
2,RogerFederer,1
2,SallieMae,1
2,NovakDjokovic,0
2,JimCourier,0
3,ChrisEveret,0
3,BillieJeanKing,0
3,RogerFederer,1
3,TomasMuster,1
The desired output is
1,ChrisEveret,1
1,BillieJeanKing,1
1,RogerFederer,0
1,TomasMuster,0
3,ChrisEveret,0
3,BillieJeanKing,0
3,RogerFederer,1
3,TomasMuster,1
The actual data has only about two thousand rows, so performance is not a concern. I have come up with the following remarkably convoluted and inexact partial solution:
CREATE TABLE sets (gameid int, player text ,winloss int);
.import data.csv sets
select * from sets where gameid in
(select gameid from (select gameid,mo from
(select gameid,mo,count(*) from
(select gameid,group_concat(player) as mo from
(select gameid,player from sets order by gameid,player)
group by gameid)
group by gameid)
where mo in
(select mo from (select gameid,mo,count(*) from
(select gameid,group_concat(player) as mo from
(select gameid,player from sets order by gameid,player)
group by gameid)
group by mo
having count(*)>1))));
This returns all matches where the same four people played together, but not necessarily those in which the teams were the same. I do not know if there is a solution to this problem that does not involve using group_concat(). That is the only way I was able to make even this limited progress on it, however. I also am not sure that the method used to order the group_concat results for aggregation will always work.

SQLite does not guarantee the ordering using group_concat() -- and there is no way to control it. So you have to use more cumbersome methods.
You can get the pairs of games with the same player using:
with s as (
select s.*, count(*) over (partition by gameid) as num_players
from sets s
)
select s1.gameid, s2.gameid
from s s1 join
s s2
on s1.player = s2.player and s1.num_players = s2.num_players
group by s1.gameid = s2.gameid
having count(*) = max(s1.num_players);
You can then use this logic if you want to get the players in each game (or just use group_concat() for that).
EDIT:
Window functions were introduced in SQLite version 3.28. In earlier versions, try this:
with s as (
select s.*, ss.num_players
from sets s join
(select gameid, count(*) as num_players
from sets s
group by gameid
) ss
on ss.gameid = s.gameid
)
select s1.gameid, s2.gameid
from s s1 join
s s2
on s1.player = s2.player and s1.num_players = s2.num_players
group by s1.gameid = s2.gameid
having count(*) = max(s1.num_players);
Here is a db<>fiddle that shows all pairs of games that have the same players (note that this includes each team to itself).

Related

Is there a way to use DISTINCT and COUNT(*) together to bulletproof your code against DUPLICATE entries?

I got help with a function yesterday to correctly get the count of multiple items in a column based on multiple criteria/columns. However, if there is a way to get the DISTINCT count of all the entries in the table based on aggregated GROUP BY statement.
SELECT TIME = ap.day,
acms.tenantId,
acms.CallingService,
policyList = ltrim(sp.value),
policyInstanceList = ltrim(dp.value),
COUNT(*) AS DISTINCTCount
FROM dbo.acms_data acms
CROSS APPLY string_split(acms.policyList, ',') sp
CROSS APPLY string_split(acms.policyInstanceList, ',') dp
CROSS APPLY (select day = convert(date, acms.[Time])) ap
GROUP BY ap.day, acms.tenantId, sp.value, dp.value, acms.CallingService
I would just like to know if there would be a way to see if there is a workaround for using DISTINCT and Count(*) together and whether or not it would affect my results to make this algorithm potentially invulnerable to duplicate entries.
The reason why I have to use COUNT(*) is because I am aggregating based on every column in the table not just a specific column or multiple.
We can use DISTINCT with COUNT together like this example.
USE AdventureWorks2012
GO
-- This query shows 290 JobTitle
SELECT COUNT(JobTitle) Total_JobTitle
FROM [HumanResources].[Employee]
GO
-- This query shows only 67 JobTitle
SELECT COUNT( DISTINCT JobTitle) Total_Distinct_JobTitle
FROM [HumanResources].[Employee]
GO

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

Optimization of multiple aggregate sorting in SQL

I have a postgres query written for the Spree Commerce store that sorts all of it's products in the following order: In stock (then first available), Backorder (then first available), Sold out (then first available).
In order to chain it with rails scopes I had to put it in the order by clause as opposed to anywhere else. The query itself works, and is fairly performant, but complex. I was curious if anyone with a bit more knowledge could discuss a better way to do it? I'm interested in performance, but also different ways to approach the problem.
ORDER BY (
SELECT
CASE
WHEN tt.count_on_hand > 0
THEN 2
WHEN zz.backorderable = true
THEN 1
ELSE 0
END
FROM (
SELECT
row_number() OVER (dpartition),
z.id,
bool_or(backorderable) OVER (dpartition) as backorderable
FROM (
SELECT DISTINCT ON (spree_variants.id) spree_products.id, spree_stock_items.backorderable as backorderable
FROM spree_products
JOIN "spree_variants" ON "spree_variants"."product_id" = "spree_products"."id" AND "spree_variants"."deleted_at" IS NULL
JOIN "spree_stock_items" ON "spree_stock_items"."variant_id" = "spree_variants"."id" AND "spree_stock_items"."deleted_at" IS NULL
JOIN "spree_stock_locations" ON spree_stock_locations.id=spree_stock_items.stock_location_id
WHERE spree_stock_locations.active = true
) z window dpartition as (PARTITION by id)
) zz
JOIN (
SELECT
row_number() OVER (dpartition),
t.id,
sum(count_on_hand) OVER (dpartition) as count_on_hand
FROM (
SELECT DISTINCT ON (spree_variants.id) spree_products.id, spree_stock_items.count_on_hand as count_on_hand
FROM spree_products
JOIN "spree_variants" ON "spree_variants"."product_id" = "spree_products"."id" AND "spree_variants"."deleted_at" IS NULL
JOIN "spree_stock_items" ON "spree_stock_items"."variant_id" = "spree_variants"."id" AND "spree_stock_items"."deleted_at" IS NULL
) t window dpartition as (PARTITION by id)
) tt ON tt.row_number = 1 AND tt.id = spree_products.id
WHERE zz.row_number = 1 AND zz.id=spree_products.id
) DESC, available_on DESC
The FROM shown above determines whether or not a product is backorderable, and the JOIN shown above determines the stock in inventory. Note that these are very similar queries, except that I need to determine if something is backorderable based on a locations ability to support backorders and its state, WHERE spree_stock_locations.active=true.
Thanks for any advice!

select only last try

Consider the following table structure for an imaginary table named score:
player_name |player_lastname |try |score
primary key: (player_name,player_lastname,try)
(dont discuss the table schema, its just an example)
This table holds the scores of all players - every player should be able to play either one OR two times. Now, how could I fetch data about every player's last try only (i.e. first tries should be ignored for those who played more than once)?
An example of what I'm trying to achieve:
player_name,player_lastname,try,score
=====================================
bart, simpson,1,250
lisa,simpson,1,150
lisa,simpson,2,250
homer,simpson,1,300
homer,simpson,2,350
maggi,simpson,1,50
The result should be:
player_name,player_lastname,try,score
=====================================
bart, simpson,1,250
lisa,simpson,2,250
homer,simpson,2,350
maggi,simpson,1,50
One option is to JOIN the table to itself using a subquery with MAX:
select s.*
from score s
join (
select max(try) maxtry, player_name, player_lastname
from score
group by player_name, player_lastname
) s2 on s.player_name = s2.player_name
and s.player_lastname = s2.player_lastname
and s.try = s2.maxtry
SQL Fiddle Demo
Depending on your database, you may be able to take advantage of analytic functions such as ROW_NUMBER() though which would make this easier. Here is a another fiddle to demonstrate.
Since you are using postgresql, then you should be able to use the analytic ROW_NUMBER() function. This should work as well:
select *
from (
select try, player_name, player_lastname, score,
Row_Number() Over (Partition By player_name, player_lastname order by try desc) rn
from score
) s
where rn = 1
BTW -- I'd consider adding a player_id as a primary key.
This will probably have the best performance
select distinct on (player_name, player_lastname)
player_name, player_lastname, try, score
from score
order by 1, 2, 3 desc
A Rank function can solve this:
SELECT player_name,player_lastname,TRY,score
FROM (SELECT player_name,player_lastname,TRY,score,RANK() OVER (PARTITION BY player_name, Player_Lastname ORDER BY TRY DESC)AS try_rank
FROM score
)sub
WHERE try_rank = 1
I'm assuming 'try' is the number that can be 1/2.
Edit, forgot Partition BY
SELECT player_name,player_lastname,try,score
FROM scores sc
WHERE NOT EXISTS (
SELECT *
FROM scores nx
WHERE nx.player_name = sc.player_name
AND nx.player_lastname = sc.player_lastname
AND nx.try > sc.try
);
Try this out:
Sel player_name,
player_lastname,
try,
score
from score where try = 2 or
try = 1 and
(player_name,player_lastname) not in
(sel player_name,player_lastname from score where try=2);

how to create this query

how to create a query if i need to include two aggregate function in select row and per each function i need different group by and where conditions
in my example i need to returns the playerName, and how many the player win the this can be checked if the results in table game result= first, and how many times he played
but do not know how to deal with two aggregate functions .
simply i want to join the result of this two queries
1.
select playeName,count(*)
from player,game
where player.playerId=game.playerId and result="first"
group by game.playerId
2.
select count(*)
from game, player
where game.playerId=player.playerId
group by game.playerId
the set of attributes for table game are
playerId , result
the set of attributes for table player are
playerName,playerId
any idea???
Use:
SELECT p.playername,
SUM(CASE WHEN g.result = 'first' THEN 1 ELSE 0 END),
COUNT(*)
FROM PLAYER p
JOIN GAME g ON g.playerid = p.playerid
GROUP BY p.playername
Along with solutions proposed by OMG Ponies and Bnjmn, you can also get desired results by using WITH ROLLUP
select result, count(*)
from game, player
where game.playerId=player.playerId
group by game.playerId, result WITH ROLLUP
Then, on client side, find records with result equals 'first' and and result is null(which is #games played).