How to get unique values in the self join and how to get LIMIT number dynamically in psql - sql

Hi i am just learning databases and practicing my skills on the table shown below
id | name | wins | matches
-----+-------------------+------+---------
205 | Twilight Sparkle | 0 | 0
206 | Fluttershy | 0 | 0
207 | Applejack | 0 | 0
208 | Pinkie Pie | 0 | 0
209 | Rarity | 0 | 0
210 | Rainbow Dash | 0 | 0
211 | Princess Celestia | 0 | 0
212 | Princess Luna | 0 | 0
My Job is here is Returns a list of pairs of players for the next round of a match.
Assuming that there are an even number of players registered, each player
appears exactly once in the pairings. Each player is paired with another
player with an equal or nearly-equal win record, that is, a player adjacent to him or her in the standings.
Returns:
A list of tuples, each of which contains (id1, name1, id2, name2)
id1: the first player's unique id
name1: the first player's name
id2: the second player's unique id
name2: the second player's name
to achieve those goals i have done self joined that table and have writen code something like this
SELECT a.id, a.name, b.id, b.name
FROM results AS a, results AS b
WHERE a.id > b.id and a.wins = b.wins
LIMIT COUNT(a.id)/2;
It seems not working. Please help me to dealing with this.
Thanks.

You can sequence them based on their wins then join on the sequence, so they may have the same wins or next closest:
WITH seq_results AS
(
SELECT
id,
name,
ROW_NUMBER() OVER(ORDER BY wins DESC) AS seq
FROM
results
)
SELECT
r1.id,
r1.name,
r2.id,
r2.name
FROM
seq_results r1
JOIN
seq_results r2
ON (r1.seq = (r2.seq - 1))
AND (r2.seq % 2 = 0);
Per your request, here is some information on how this works. I will highly recommend that you visit the documentation for PostgreSQL - it really is some of the best documentation out there: http://www.postgresql.org/docs/current/static/
The first part is a common-table expression (CTE). It allows me to essentially create a table in-memory for use in subsequent queries. You could just as easily create a temp table, but these don't have to be dropped, etc.
See: http://www.postgresql.org/docs/current/static/queries-with.html
WITH seq_results AS
(
SELECT
id,
name,
ROW_NUMBER() OVER(ORDER BY wins DESC) AS seq
FROM
results
)
In this CTE, I am sequencing/sequentially numbering each record using a window function. I will use these numbers later in my join. See: http://www.postgresql.org/docs/current/static/functions-window.html
SELECT
r1.id,
r1.name,
r2.id,
r2.name
FROM
seq_results r1
JOIN
seq_results r2
ON (r1.seq = (r2.seq - 1))
AND (r2.seq % 2 = 0);
Above I am joining the CTE to itself using the sequence. I "offset" the sequence of the second instance of the CTE r2 by -1, essentially joining two sequential records together.
Had I only specified that condition in the join, I would return more than the 4 records expected. I needed to make sure that the ids and names on the "left" are not also on the "right", so I decided to include only the odd-numbered sequenced records on the left and the evens on the right. To do this, I used the modulus operator % to ensure that r2 only returned records where the sequence was even.
Lastly, because the join was an inner join (JOIN is the same as INNER JOIN), any even-numbered sequences in r1 are not returned.

Related

Return count of total group membership when providers are part of a group

TABLE A: Pre-joined table - Holds a list of providers who belong to a group and the group the provider belongs to. Columns are something like this:
ProviderID (PK, FK) | ProviderName | GroupID | GroupName
1234 | LocalDoctor | 987 | LocalDoctorsUnited
5678 | Physican82 | 987 | LocalDoctorsUnited
9012 | Dentist13 | 153 | DentistryToday
0506 | EyeSpecial | 759 | OphtaSpecialist
TABLE B: Another pre-joined table, holds a list of providers and their demographic information. Columns as such:
ProviderID (PK,FK) | ProviderName | G_or_I | OtherColumnsThatArentInUse
1234 | LocalDoctor | G | Etc.
5678 | Physican82 | G | Etc.
9012 | Dentist13 | I | Etc.
0506 | EyeSpecial | I | Etc.
The expected result is something like this:
ProviderID | ProviderName | ProviderStatus | GroupCount
1234 | LocalDoctor | Group | 2
5678 | Physican82 | Group | 2
9012 | Dentist13 | Individual | N/A
0506 | EyeSpecial | Individual | N/A
The goal is to determine whether or not a provider belongs to a group or operates individually, by the G_or_I column. If the provider belongs to a group, I need to include an additional column that provides the count of total providers in that group.
The Group/Individual portion is relatively easy - I've done something like this:
SELECT DISTINCT
A.ProviderID,
A.ProviderName,
CASE
WHEN B.G_or_I = 'G'
THEN 'Group'
WHEN B.G_or_I = 'I'
THEN 'Individual' END AS ProviderStatus
FROM
TableA A
LEFT OUTER JOIN TableB B
ON A.ProviderID = B.ProviderID;
So far so good, this returns the expected results based on the G_or_I flag.
However, I can't seem to wrap my head around how to complete the COUNT portion. I feel like I may be overthinking it, and stuck in a loop of errors. Some things I've tried:
Add a second CASE STATEMENT:
CASE
WHEN B.G_or_I = 'G'
THEN (
SELECT CountedGroups
FROM (
SELECT ProviderID, count(GroupID) AS CountedGroups
FROM TableA
WHERE A.ProviderID = B.ProviderID
GROUP BY ProviderID --originally had this as ORDER BY, but that was a mis-type on my part
)
)
ELSE 'N/A' END
This returns an error stating that a single row sub-query is returning more than one row. If I limit the number of rows returned to 1, the CountedGroups column returns 1 for every row. This makes me think that its not performing the count function as I expect it to.
I've also tried including a direct count of TableA as a factored sub-query:
WITH CountedGroups AS
( SELECT Provider ID, count(GroupID) As GroupSum
FROM TableA
GROUP BY ProviderID --originally had this as ORDER BY, but that was a mis-type on my part
) --This as a standalone query works just fine
SELECT DISTINCT
A.ProviderID,
A.ProviderName,
CASE
WHEN B.G_or_I = 'G'
THEN 'Group'
WHEN B.G_or_I = 'I'
THEN 'Individual' END AS ProviderStatus,
CASE
WHEN B.G_or_I = 'G'
THEN GroupSum
ELSE 'N/A' END
FROM
CountedGroups CG
JOIN TableA A
ON CG.ProviderID = A.ProviderID
LEFT OUTER JOIN TableB
ON A.ProviderID = B.ProviderID
This returns either null or completely incorrect column values
Other attempts have been a number of variations of this, with a mix of bad results or Oracle errors. As I mentioned above, I'm probably way overthinking it and the solution could be rather simple. Apologies if the information is confusing or I've not provided enough detail. The real tables have a lot of private medical information, and I tried to translate the essence of the issue as best I could.
Thank you.
You can use the CASE..WHEN and analytical function COUNT as follows:
SELECT
A.PROVIDERID,
A.PROVIDERNAME,
CASE
WHEN B.G_OR_I = 'G' THEN 'Group'
ELSE 'Individual'
END AS PROVIDERSTATUS,
CASE
WHEN B.G_OR_I = 'G' THEN TO_CHAR(COUNT(1) OVER(
PARTITION BY A.GROUPID
))
ELSE 'N/A'
END AS GROUPCOUNT
FROM
TABLE_A A
JOIN TABLE_B B ON A.PROVIDERID = B.PROVIDERID;
TO_CHAR is needed on COUNT as output expression must be of the same data type in CASE..WHEN
Your problem seems to be that you are missing a column. You need to add group name, otherwise you won't be able to differentiate rows for the same practitioner who works under multiple business entities (groups). This is probably why you have a DISTINCT on your query. Things looked like duplicates which weren't. Once you've done that, just use an analytic function to figure out the rest:
SELECT ta.providerid,
ta.providername,
DECODE(tb.g_or_i, 'G', 'Group', 'I', 'Individual') AS ProviderStatus,
ta.group_name,
CASE
WHEN tb.g_or_i = 'G' THEN COUNT(DISTINCT ta.provider_id) OVER (PARTITION BY ta.group_id)
ELSE 'N/A'
END AS GROUP_COUNT
FROM table_a ta
INNER JOIN table_b tb ON ta.providerid = tb.providerid
Is it possible that your LEFT JOIN was going the wrong direction? It makes more sense that your base demographic table would have all practitioners in it and then the Group table might be missing some records. For instance if the solo prac was operating under their own SSN and Type I NPI without applying for a separate Type II NPI or TIN.

Best practice for joinning 2 tables using LIKE operator or better approach

I have 2 tables that have to be processed once a day in data warehouse.
MessageTable
Id integer primary key
Message varchar(max)
Example:
Id | Message
1 | Hi! This is the first message.
2 | the last message.
PartTable
PartId integer primary key
Words varchar(100)
Example:
PartId | Message
1 | This
2 | message, first
3 | last
Table 1 contains messages to be compared with Table 2 in order to know which parts each message is belonged to.
So above example should return like this.
Id | MessageId | PartId
1 | 1 | 1
2 | 1 | 2
3 | 2 | 3
Because message(id 1) contains "This" keyword as well as "message" and "first", it can be part of 0 and 1.
When keywords in a part are separated by comma all the keywords need to be found in message irrespective of the order.
Stored procedure I roughly made for this process is like this.
INSERT INTO ResultTable(MessageId, PartId)
SELECT MessageTable.Id as MessageId, PartTable.Id as PartID
FROM MessageTable m, PartTable p
WHERE
(SELECT COUNT(VALUE) FROM STRING_SPLIT(p.Word, ',') WHERE CHARINDEX(CONCAT(' ', VALUE, ' '), m.Message) > 0) = (SELECT COUNT(VALUE) FROM STRING_SPLIT(p.Word, ','))
This SQL statement seems to work even though I haven't confirmed thoroughly. But this doesn't look like a good practice.
Should I just try to use more relational approach on PartTable like below? Then all the word rows for a part should be found in message to determine message is belonged to the part.
Id | PartId | Word
1 | 1 | This
2 | 2 | message
3 | 2 | last
I can create this table using STRING_SPLIT on PartTable or PartTable can be refactored. But I don't see the way to join this table with MessageTable. Also I am expecting there would be a lot of rows in MessageTable.
Can anyone give me any help on this?
Thanks,
Hmmmm . . . You can combine all parts and messages and split the parts into words. A where clause can be used for filtering, so only matches are included. A final aggregation and counting returns the message/part pairs where all words match:
select m.id, pt.partid
from message m cross join
parttable pt cross apply
string_split(pt.words, ',') s
where m.message like '%' + s.value + '%'
group by m.id, pt.partid
having count(*) = (select count(*)
from parttable pt2 cross apply
string_split(pt.words, ',') s
where pt2.partid = pt.partid
);
This is not efficient and it is very hard to optimize in SQL Server given your data structure.
A better structure for the parttable would be an improvement for the query:
select m.id, ptn.partid
from message m join
(select ptn.*, count(*) over (partition by partid) as cnt
from parttablenormalized ptn
) ptn
on m.message like '%' + ptn.word + '%'
group by m.id, pnt.partid, cnt
having count(*) = cnt;
However performance might not change much. You would need to denormalize message as well for a speedier query.

Get specific row from each group

My question is very similar to this, except I want to be able to filter by some criteria.
I have a table "DOCUMENT" which looks something like this:
|ID|CONFIG_ID|STATE |MAJOR_REV|MODIFIED_ON|ELEMENT_ID|
+--+---------+----------+---------+-----------+----------+
| 1|1234 |Published | 2 |2019-04-03 | 98762 |
| 2|1234 |Draft | 1 |2019-01-02 | 98762 |
| 3|5678 |Draft | 3 |2019-01-02 | 24244 |
| 4|5678 |Published | 2 |2017-10-04 | 24244 |
| 5|5678 |Draft | 1 |2015-05-04 | 24244 |
It's actually a few more columns, but I'm trying to keep this simple.
For each CONFIG_ID, I would like to select the latest (MAX(MAJOR_REV) or MAX(MODIFIED_ON)) - but I might want to filter by additional criteria, such as state (e.g., the latest published revision of a document) and/or date (the latest revision, published or not, as of a specific date; or: all documents where a revision was published/modified within a specific date interval).
To make things more interesting, there are some other tables I want to join in.
Here's what I have so far:
SELECT
allDocs.ID,
d.CONFIG_ID,
d.[STATE],
d.MAJOR_REV,
d.MODIFIED_ON,
d.ELEMENT_ID,
f.ID FILE_ID,
f.[FILENAME],
et.COLUMN1,
e.COLUMN2
FROM DOCUMENT -- Get all document revisions
CROSS APPLY ( -- Then for each config ID, only look at the latest revision
SELECT TOP 1
ID,
MODIFIED_ON,
CONFIG_ID,
MAJOR_REV,
ELEMENT_ID,
[STATE]
FROM DOCUMENT
WHERE CONFIG_ID=allDocs.CONFIG_ID
ORDER BY MAJOR_REV desc
) as d
LEFT OUTER JOIN ELEMENT e ON e.ID = d.ELEMENT_ID
LEFT OUTER JOIN ELEMENT_TYPE et ON e.ELEMENT_TYPE_ID=et.ID
LEFT OUTER JOIN TREE t ON t.NODE_ID = d.ELEMENT_ID
OUTER APPLY ( -- This is another optional 1:1 relation, but it's wrongfully implemented as m:n
SELECT TOP 1
FILE_ID
FROM DOCUMENT_FILE_RELATION
WHERE DOCUMENT_ID=d.ID
ORDER BY MODIFIED_ON DESC
) as df -- There should never be more than 1, but we're using TOP 1 just in case, to avoid duplicates
LEFT OUTER JOIN [FILE] f on f.ID=df.FILE_ID
WHERE
allDocs.CONFIG_ID = '5678' -- Just for testing purposes
and d.state ='Released' -- One possible filter criterion, there may be others
It looks like the results are correct, but multiple identical rows are returned.
My guess is that for documents with 4 revisions, the same values are found 4 times and returned.
A simple SELECT DISTINCT would solve this, but I'd prefer to fix my query.
This would be a classic row_number & partition by question I think.
;with rows as
(
select <your-columns>,
row_number() over (partion by config_id order by <whatever you want>) as rn
from document
join <anything else>
where <whatever>
)
select * from rows where rn=1

Assign a random order to each group

I want to expand each row in TableA into 4 rows. The result hold all the columns from TableA and two additional columns: SetID = ranging from 0 to 3 and unique when grouped by TableA. Random = a random permutation of SetID within the same grouping.
I use SQLite and would prefer a pure SQL solution.
Table A:
Description
-----------
A
B
Desired output:
Description | SetID | Random
------------|-------|-------
A | 0 | 2
A | 1 | 0
A | 2 | 3
A | 3 | 1
B | 0 | 3
B | 1 | 2
B | 2 | 0
B | 3 | 1
My attempt so far solves creating 4 rows for each row in TableA but doesn't get the permutation correctly. wrong will contain a random number ranging from 0 to 3. I need exactly one 0, 1, 2 and 3 for each unique value in Description and their order should be random.
SELECT
Description,
SetID,
abs(random()) % 4 AS wrong
FROM
TableA
LEFT JOIN
TableB
ON
1 = 1
Table B:
SetID
-----
0
1
2
3
Use a cross join
SELECT Description,
SetID,
abs(random()) % 4 AS wrong
FROM TableA
CROSS JOIN TableB
Consider a solution in your specialty, R. As you know, R maintains excellent database packages, one of which is RSQLite. Additionally, R can run commands via the connection without the need to import very large datasets.
Your solution is essentially a random sampling without replacement. Simply have R run the sampling and concatenate list items into an SQL string.
Below creates a table in the SQLite database where R sends the CREATE TABLE command to the SQL engine. No import or export of data. Should you need to run every four rows, run an iterative loop in a defined function that outputs the sql string. For append queries change the CREATE TABLE AS to INSERT INTO ... SELECT statement.
library(RSQLite)
sqlite <- dbDriver("SQLite")
conn <- dbConnect(sqlite,"C:\\Path\\To\\Database\\File\\newexample.db")
# SAMPLE WITHOUT REPLACEMENT
randomnums <- as.list(sample(0:3, 4, replace=F))
# SQL CONCATENATION
sql <- sprintf("CREATE TABLE PermutationsTable AS
SELECT a.Description, b.SetID,
(select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=0
union select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=1
union select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=2
union select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=3)
As RandomNumber
from TableA a, TableB b;",
randomnums[[1]], randomnums[[2]],
randomnums[[3]], randomnums[[4]])
# RUN QUERY
dbSendQuery(conn, sql)
dbDisconnect(conn)
You will notice a nested union subquery. This is used to achieve the inline random numbers for each row. Also, to return all possible combinations from all tables, no join statements are needed, simply list tables in FROM clause.

How to efficiently get a value from the last row in bulk on SQL Server

I have a table like so
Id | Type | Value
--------------------
0 | Big | 2
1 | Big | 3
2 | Small | 3
3 | Small | 3
I would like to get a table like this
Type | Last Value
--------------------
Small | 3
Big | 3
How can I do this. I understand there is an SQL Server method called LAST_VALUE(...) OVER .(..) but I can't get this to work with GROUP BY.
I've also tried using SELECT MAX(ID) & SELECT TOP 1.. but this seems a bit inefficient since there would be a subquery for each value. The queries take too long when the table has a few million rows in it.
Is there a way to quickly get the last value for these, perhaps using LAST_VALUE?
You can do it using rownumber:
select
type,
value
from
(
select
type,
value,
rownumber() over (partition by type order by id desc) as RN
) TMP
where RN = 1
Can't test this now since SQL Fiddle doesn't seem to work, but hopefully that's ok.
The most efficient method might be not exists, which uses an anti-join for the underlying operator:
select type, value
from likeso l
where not exists (select 1 from likeso l2 where l2.type = l.type and l2.id > l.id)
For performance, you want an index on likeso(type, id).
I really wonder if there is more efficent solution but, I use following query on such needs;
Select Id, Type, Value
From ( Select *, Max (Id) Over (Partition By Type) As LastId
From #Table) T
Where Id = LastId