i have a table like this
userid | points | position
1 | 100 | NULL
2 | 89 | NULL
3 | 107 | NULL
i need a query for update the position column ordering by points desc, example result:
userid | points | position
1 | 100 | 2
2 | 89 | 3
3 | 107 | 1
I would not use physical columns that depend on values in other rows, otherwise you have to update the entire table every time one row changes. Use a view or other mechanism to calculate the position on the fly.
The query to calculate "position" would look something like:
SELECT
userid,
points,
RANK() OVER (ORDER BY points DESC) AS position
However, if you have to make it an UPDATE then you could use something like
UPDATE a
SET a.position = b.position
FROM {table_name} a
INNER JOIN
(
SELECT
userid,
RANK() OVER (ORDER BY points DESC) AS position
FROM {table_name}
) b
ON a.userid = b.userid
but keep in mind that you will need to run the update every time the table is updated, so performance may be an issue if it's a decent size table that gets updated a lot.
Also consider using DENSE_RANK() instead of RANK() when you want to increment the ranking of your 'position' by 1 as the 'points' change. RANK() will do what you want, though it will create number sequence gaps according to how many duplicate 'userids' are equal in 'points' standing's (if that's ever the case in your spec).
Refer to this answer for the difference between them.
You can do something like this:
UPDATE t
SET position = t2.position
FROM table t
JOIN (
SELECT
userid,
points,
RANK() OVER (ORDER BY points DESC) AS position
FROM table) t2 ON t2.userid = t.userid
Related
So let's say I have a table named Class with the following fields: userid, time, and score. The table looks like this:
+--------+------------+-------+
| userid | time | score |
+--------+------------+-------+
| 1 | 08-20-2018 | 75 |
| 1 | 10-25-2018 | 50 |
| 1 | 02-01-2019 | 88 |
| 2 | 04-23-2019 | 98 |<remove
| 2 | 04-23-2019 | 86 |
| 3 | 06-05-2019 | 71 |<remove
| 3 | 06-05-2019 | 71 |
+--------+------------+-------+
However, I would like to remove records where the userid and the time is the same (since it doesn't make sense for someone to give another score on the same day). This would also take care of the records where the userid, time, and score are the same. So in this table, rows 4 and 6 should be removed.
The following query gives me a list of the duplicated records:
select userid, time
FROM class
GROUP BY userid, time
HAVING count(*)>1;
However, how do I remove the duplicates while still keeping the userid, time, and score column in the outcome?
You can use the row_number() window function to assign a number to each record in the order of score for each userid and time and then select only the rows where this number is equal to one.
SELECT userid,
time,
score
FROM (SELECT userid,
time,
score,
row_number() OVER (PARTITION BY userid,
time
ORDER BY score) rn
FROM class) x
WHERE rn = 1;
First, you need some criterium to distinguish between two rows that have different scores (unless you want to randomly choose between the two). E.g., you could pick the highest score (like the SATs) or the lowest.
Assuming you want the highest score per day, you can do this:
SELECT distinct on (userid, time)
user_id, time, score
from class
order by userid, time, score desc
Some key things: you have to have the same columns in your distinct on in the left-most positions in your order by but the magic is in the field that comes next in the order by - it’ll pick the first row among dupes of (userid, time) when ordered by score desc.
You have a real problem with your data model. This is easy enough to fix in a select query, as the other answer suggest (I would recommend distinct on) for this.
For actually deleting the row, you can use ctid (as mentioned in a comment. The approach is:
delete from t
where exists (select 1
from t t2
where t2.user_id = t.user_id and t2.time = t.time and
t2.ctid < t.ctid
);
That is, delete any row where there is a smaller ctid for the user_id/time combination.
I have a set of ordered results from a Postgres table, where every group of 4 rows represents a set of related data. I want to process this set of results further, so that every group of 4 rows are collapsed into 1 row with aliased column names where the value for each column is based on that row's position in the group - I'm close, but I can't quite get the query right (nor am I confident that I'm approaching this in the optimal manner). Here's the scenario:
I am collecting survey results - each survey has 4 questions, but each answer is stored in a separate row in the database. However, they are associated with each other by a submission event_id, and the results are guaranteed to be returned in a fixed order. A set of survey_results will look something like:
event_id | answer
----------------------------
a | 10
a | foo
a | 9
a | bar
b | 2
b | baz
b | 4
b | zip
What I would like to be able to do is query this result so that the final output comes out with each set of 4 results on their own line, with aliased column names.
event_id | score_1 | reason_1 | score_2 | reason_2
----------------------------------------------------------
a | 10 | foo | 9 | bar
b | 2 | baz | 4 | zip
The closest that I've been able to get is
SELECT survey_answers.event_id,
(SELECT survey_answers.answer FROM survey_answers FETCH NEXT 1 ROWS ONLY) AS score_1,
(SELECT survey_answers.answer FROM survey_answers OFFSET 1 ROWS FETCH NEXT 1 ROWS ONLY) AS reason_1
(SELECT survey_answers.answer FROM survey_answers OFFSET 2 ROWS FETCH NEXT 1 ROWS ONLY) AS score_2,
(SELECT survey_answers.answer FROM survey_answers OFFSET 3 ROWS FETCH NEXT 1 ROWS ONLY) AS reason_2
FROM survey_answers
GROUP BY survey_answers.event_id
But this, understandably, returns the correct number of rows, but with the same values (other than event_id):
event_id | score_1 | reason_1 | score_2 | reason_2
----------------------------------------------------------
a | 10 | foo | 9 | bar
b | 10 | foo | 9 | bar
How can I structure my query so that it applies the OFFSET/FETCH behaviors every batch of 4 rows, or, maybe more accurately, within every unique set of event_ids?
demo: db<>fiddle
First of all, this looks like a very bad design:
There is no guaranteed order! Databases store their data in random order and call them in random order. You really need a order column. In this small case this might work for accident.
You should generate two columns, one for score, one for reason. Mix up the types is not a good idea.
Nevertheless for this simple and short example this could be a solution (remember this is not recommended for productive tables):
WITH data AS (
SELECT
*,
row_number() OVER (PARTITION BY event_id) -- 1
FROM
survey_results
)
SELECT
event_id,
MAX(CASE WHEN row_number = 1 THEN answer END) AS score_1, -- 2
MAX(CASE WHEN row_number = 2 THEN answer END) AS reason_1,
MAX(CASE WHEN row_number = 3 THEN answer END) AS score_2,
MAX(CASE WHEN row_number = 4 THEN answer END) AS reason_2
FROM
data
GROUP BY event_id
The row_number() window function adds a row count for each event_id. In this case from 1 to 4. This can be used to identify the types of answer (see intermediate step in fiddle). In productive code you should use some order column to ensure the order. Then the window function would look like PARTITION BY event_id ORDER BY order_column
This is a simple pivot on event_id and the type id (row_number) which does exactly what you expect
You need a column that specifies the ordering. In your case, that should probably be a serial column, which is guaranteed to be increasing for each insert. I would call such a column survey_result_id.
With such a column, you can do:
select event_id,
max(case when seqnum = 1 then answer end) as score_1,
max(case when seqnum = 2 then answer end) as reason_1,
max(case when seqnum = 3 then answer end) as score_2,
max(case when seqnum = 4 then answer end) as reason_2
from (select sr.*,
row_number() over (partition by event_id order by survey_result_id) as seqnum
from survey_results sr
) sr
group by event_id;
Without such a column, you cannot reliably do what you want, because SQL tables represent unordered sets.
We are trying to remove and rank data in tables that is provided in a daily feed to our system. the example data of course isn't the actual product, but clearly represents the concept.
Daily inserts:
data is imported daily into tables that continually updates the status of the products
the daily status updates tell us when products were listed, are they currently listed and then the last date it was listed
after a period of {X} time, we can normalize the data
Cleanup & ranking:
we are now trying to remove duplicate records for values in a group that fall in-between the first and last values
we also want to set identifiers for the records that represent the first and last occurrence of those unique values in that group
Sample data:
I've found that the photo is the easiest way to show the data, show what's needed and not needed - I hope this makes it easier and not obtuse.
In the sample data:
"ridgerapp" we want to keep the records for 03/12/17 & 06/12/17.
"ridgerapp" we want to delete the records that fall between the dates above.
"ridgerapp" we want to also set/update the records for 03/12/17 & 06/12/17 as the first and last occurrence - something like -
update table set 03/12/17 = 0 (first), 06/12/17 = 1 (last)
"sierra" is just another expanded data sample, and we want to keep the records for 12/06/16 and 12/11/16.
"sierra" delete the records that fall between 12/06/16 and 12/11/16.
"sierra" update the status/rank for the 12/06/16 and 12/11/16 records as the first and last occurrence.
update table set 12/06/16 = 0 (first), 12/11/16 = 1 (last).
Conclusion:
Using pseudo code, this is the overall objective:
select distinct records in table (using id,name,color,value as unique identifiers)
for the records in each group look at the history and find the top and bottom dates
delete records between top and bottom dates for each group
update the history with a status/rank (field name is rank) of 0 and 1 for values in each group
using the sample data, the results would end up
Updated table values:
23 ridgerapp blue 25 03/12/17 0
23 ridgerapp blue 25 06/12/17 1
57 sierra red 15 12/06/16 0
57 sierra red 15 12/11/16 1
I'd use a CTE with the row_number() window function to find the first and last rows for each group, and then update it.
You didn't specify what makes a group a group so I only based this off the ID. If you want the group be a set of columns, i.e ID and Color and Value then just add these columns to the partition by list. For the sample data the result would be the same, but different sample data would have different outcomes.
Notice I didn't include the exact rows for the sierra group because I wanted to show you how it'd handle duplicate history dates.
declare #table table (id int, [name] varchar(64), color varchar(16), [value] int, history date)
insert into #table
values
(23,'ridgerapp','blue',25,'20170312'),
(23,'ridgerapp','blue',25,'20170325'),
(23,'ridgerapp','blue',25,'20170410'),
(23,'ridgerapp','blue',25,'20170610'),
(23,'ridgerapp','blue',25,'20170612'),
(57,'sierra','red',15,'20161206'),
(57,'sierra','red',15,'20161208'),
(57,'sierra','red',15,'20161210'),
(57,'sierra','red',15,'20161210') --notice this is a duplicate row
;with cte as(
select
*
,fst = row_number() over (partition by id order by history asc)
,lst = row_number() over (partition by id order by history desc)
from #table
)
delete from cte
where fst !=1 and lst !=1
select
*
,flag = case when row_number() over (partition by id order by history asc) = 1 then 0 else 1 end
from #table
RETURNS
+----+-----------+-------+-------+------------+------+
| id | name | color | value | history | flag |
+----+-----------+-------+-------+------------+------+
| 23 | ridgerapp | blue | 25 | 2017-03-12 | 0 |
| 23 | ridgerapp | blue | 25 | 2017-06-12 | 1 |
| 57 | sierra | red | 15 | 2016-12-06 | 0 |
| 57 | sierra | red | 15 | 2016-12-10 | 1 |
+----+-----------+-------+-------+------------+------+
I have two tables:
Table0:
| ID | TYPE | TIME | SITE |
|----|------|-------|------|
| aa | 1 | 12-18 | 100 |
| aa | 1 | 12-10 | 101 |
| bb | 2 | 12-10 | 102 |
| cc | 1 | 12-09 | 100 |
| cc | 2 | 12-12 | 103 |
| cc | 2 | 12-01 | 109 |
| cc | 1 | 12-07 | 101 |
| dd | 1 | 12-08 | 100 |
and
Table1:
| ID |
|----|
| aa |
| cc |
| cc |
| dd |
| dd |
I'm trying to output results where:
ID must exist in both tables.
TYPE must be the maximum for each ID.
TIME must be the minimum value for the maximum TYPE for each ID.
SITE should be the value from the same row as the minimum TIME value.
Given my sample data, my results should look like this:
| ID | TYPE | TIME | SITE |
|----|------|-------|------|
| aa | 1 | 12-10 | 101 |
| cc | 2 | 12-01 | 109 |
| dd | 1 | 12-08 | 100 |
I've tried these statements:
INSERT INTO "NuTable"
SELECT DISTINCT(QTS."ID"), "SITE",
CASE WHEN MAS.MAB=1 THEN 'B'
WHEN MAS.MAB=2 THEN 'F'
ELSE NULL END,
"TIME"
FROM (SELECT DISTINCT("ID") FROM TABLE1) AS QTS,
TABLE0 AS MA,
(SELECT "ID", MAX("TYPE") AS MASTY, MIN("TIME") AS MASTM
FROM TABLE0
GROUP BY "ID") AS MAS,
WHERE QTS."ID" = MA."ID"
AND QTS."ID" = MAS."ID"
AND MSD.MASTY =MA."TYPE"
...which generates a syntax error
INSERT INTO "NuTable"
SELECT DISTINCT(QTS."ID"), "SITE",
CASE WHEN MAS.MAB=1 THEN 'B'
WHEN MAS.MAB=2 THEN 'F'
ELSE NULL END,
"TIME"
FROM (SELECT DISTINCT("ID") FROM TABLE1) AS QTS,
TABLE0 AS MA,
(SELECT "ID", MAX("TYPE") AS MAB
FROM TABLE0
GROUP BY "ID") AS MAS,
((SELECT "ID", MIN("TIME") AS MACTM, MIN("TYPE") AS MACTY
FROM TABLE0
WHERE "TYPE" = 1
GROUP BY "ID")
UNION
(SELECT "ID", MIN("TIME"), MAX("TYPE")
FROM TABLE0
WHERE "TYPE" = 2
GROUP BY "ID")) AS MACU
WHERE QTS."ID" = MA."ID"
AND QTS."ID" = MAS."ID"
AND MACU."ID" = QTS."ID"
AND MA."TIME" = MACU.MACTM
AND MA."TYPE" = MACU.MACTB
... which is getting the wrong results.
Answering your direct question "how to avoid...":
You get this error when you specify a column in a SELECT area of a statement that isn't present in the GROUP BY section and isn't part of an aggregating function like MAX, MIN, AVG
in your data, I cannot say
SELECT
ID, site, min(time)
FROM
table
GROUP BY
id
I didn't say what to do with SITE; it's either a key of the group (in which case I'll get every unique combination of ID,site and the min time in each) or it should be aggregated (eg max site per ID)
These are ok:
SELECT
ID, max(site), min(time)
FROM
table
GROUP BY
id
SELECT
ID, site, min(time)
FROM
table
GROUP BY
id,site
I cannot simply not specify what to do with it- what should the database return in such a case? (If you're still struggling, tell me in the comments what you think the db should do, and I'll better understand your thinking so I can tell you why it can't do that ). The programmer of the database cannot make this decision for you; you must make it
Usually people ask this when they want to identify:
The min time per ID, and get all the other row data as well. eg "What is the full earliest record data for each id?"
In this case you have to write a query that identifies the min time per id and then join that subquery back to the main data table on id=id and time=mintime. The db runs the subquery, builds a list of min time per id, then that effectively becomes a filter of the main data table
SELECT * FROM
(
SELECT
ID, min(time) as mintime
FROM
table
GROUP BY
id
) findmin
INNER JOIN table t ON t.id = findmin.id and t.time = findmin.mintime
What you cannot do is start putting the other data you want into the query that does the grouping, because you either have to group by the columns you add in (makes the group more fine grained, not what you want) or you have to aggregate them (and then it doesn't necessarily come from the same row as other aggregated columns - min time is from row 1, min site is from row 3 - not what you want)
Looking at your actual problem:
The ID value must exist in two tables.
The Type value must be largest group by id.
The Time value must be smallest in the largest type group.
Leaving out a solution that involves having or analytics for now, so you can get to grips with the theory here:
You need to find the max type group by id, and then join it back to the table to get the other relevant data also (time is needed) for that id/maxtype and then on this new filtered data set you need the id and min time
SELECT t.id,min(t.time) FROM
(
SELECT
ID, max(type) as maxtype
FROM
table
GROUP BY
id
) findmax
INNER JOIN table t ON t.id = findmax.id and t.type = findmax.maxtype
GROUP BY t.id
If you can't see why, let me know
demo:db<>fiddle
SELECT DISTINCT ON (t0.id)
t0.id,
type,
time,
first_value(site) OVER (PARTITION BY t0.id ORDER BY time) as site
FROM table0 t0
JOIN table1 t1 ON t0.id = t1.id
ORDER BY t0.id, type DESC, time
ID must exist in both tables
This can be achieved by joining both tables against their ids. The result of inner joins are rows that exist in both tables.
SITE should be the value from the same row as the minimum TIME value.
This is the same as "Give me the first value of each group ofids ordered bytime". This can be done by using the first_value() window function. Window functions can group your data set (PARTITION BY). So you are getting groups of ids which can be ordered separately. first_value() gives the first value of these ordered groups.
TYPE must be the maximum for each ID.
To get the maximum type per id you'll first have to ORDER BY id, type DESC. You are getting the maximum type as first row per id...
TIME must be the minimum value for the maximum TYPE for each ID.
... Then you can order this result by time additionally to assure this condition.
Now you have an ordered data set: For each id, the row with the maximum type and its minimum time is the first one.
DISTINCT ON gives you exactly the first row of each group. In this case the group you defined is (id). The result is your expected one.
I would write this using distinct on and in/exists:
select distinct on (t0.id) t0.*
from table0 t0
where exists (select 1 from table1 t1 where t1.id = t0.id)
order by t0.id, type desc, time asc;
I have an Azure SQL Database table which is filled by importing XML-files.
The order of the files is random so I could get something like this:
ID | Name | DateFile | IsCorrection | Period | Other data
1 | Mr. A | March, 1 | false | 3 | Foo
20 | Mr. A | March, 1 | true | 2 | Foo
13 | Mr. A | Apr, 3 | true | 2 | Foo
4 | Mr. B | Feb, 1 | false | 2 | Foo
This table is joined with another table, which is also joined with a 3rd table.
I need to get the join of these 3 tables for the person with the newest data, based on Period, DateFile and Correction.
In my above example, Id=1 is the original data for Period 3, I need this record.
But in the same file was also a correction for Period 2 (Id=20) and in the file of April, the data was corrected again (Id=13).
So for Period 3, I need Id=1, for Period 2 I need Id=13 because it has the last corrected data and I need Id=4 because it is another person.
I would like to do this in a view, but using a stored procedure would not be a problem.
I have no idea how to solve this. Any pointers will be much appreciated.
EDIT:
My datamodel is of course much more complex than this sample. DateFile and Period are DateTime types in the table. Actually Period is two DateTime columns: StartPeriod and EndPeriod.
Well looking at your data I believe we can disregard the IsCorrection column and just pick the latest column for each user/period.
Lets start by ordering the rows placing the latest on top :
SELECT ROW_NUMBER() OVER (PARTITION BY Period, Name ORDER by DateFile DESC), *
And from this result you select all with row number 1:
;with numberedRows as (
SELECT ROW_NUMBER() OVER (PARTITION BY Period, Name ORDER by DateFile DESC) as rowIndex, *
)
select * from numberedRows where rowIndex=1
The PARTITION BY tells ROW_NUMBER() to reset the counter whenever it encounters change in the columns Period and Name. The ORDER BY tells the ROW_NUMBER() that we want th newest row to be number 1 and then older posts afterwards. We only need the latest row.
The WITH declares a "common table expression" which is a kind of subquery or temporary table.
Not knowing your exact data, I might recommend you something wrong, but you should be able to join your with last query with other tables to get your desired result.
Something like:
;with numberedRows as (
SELECT ROW_NUMBER() OVER (PARTITION BY Period, Name ORDER by DateFile DESC) as rowIndex, *
)
select * from numberedRows a
JOIN periods b on b.empId = a.Id
JOIN msg c on b.msgId = c.Id
where a.rowIndex=1