SQL - Set field value based on count of previous rows values - sql

I have the following table structure in Microsoft SQL:
ID Name Number
1 John
2 John
3 John
4 Mark
5 Mark
6 Anne
7 Anne
8 Luke
9 Rachael
10 Rachael
I am looking to set the 'Number' field to the number of times the 'Name' field has appeared previously, using SQL.
Desired output as follows:
ID Name Number
1 John 1
2 John 2
3 John 3
4 Mark 1
5 Mark 2
6 Anne 1
7 Anne 2
8 Luke 1
9 Rachael 1
10 Rachael 2
The table is ordered by 'Name', so there is no worry of 'John' appearing under ID 11 again, using my example.
Any help would be appreciated. I'm not sure if I can do this with a simple SELECT statement, or whether I will need an UPDATE statement, or something more advanced.

Use ROW_NUMBER:
SELECT ID, Name,
ROW_NUMBER() OVER (PARTITION BY Name
ORDER BY ID) AS Number
FROM mytable
There is no need to add a field for this, as the value can be easily calculated using window functions.

You should be able to use the ROW_NUMBER() function within SQL Server to partition each group (by their Name property) and output the individual row in each partition :
SELECT ID,
Name,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY ID) AS Number
FROM YourTable
ORDER BY ID
You can see what your data looks like prior to the query :
and then after it is executed :

If your system doesnt support OVER PARTITION, you can use following code:
SELECT
ID,
Name,
(
SELECT
SUM(counterTable.nameCount)
FROM
mytable innerTable
JOIN (SELECT 1 as nameCount) as counterTable
WHERE
innerTable.ID <= outerTable.ID
AND outerTable.Name = innerTable.Name
) AS cumulative_sum
FROM
mytable outerTable
ORDER BY outerTable.ID
Following CREATE TABLE statement I used and then filled in your data:
CREATE TABLE `mytable` (
`ID` INT(11) NULL DEFAULT NULL,
`Name` VARCHAR(50) NULL DEFAULT NULL
);
This should work with DBS not supporting OVER PARTITION like MySQL, Maria, ...

Related

How to update a column by repositioning the values in a random order

Okay, so this table will work as an example of what I am working with. This table consists of the name of someone and the order they are in compared to others:
NAME
ORDER
ZAC
1
JEFF
2
BART
3
KATE
4
My goal is to take the numbers in ORDER and reposition them randomly and update that into the table, keeping the NAME records in the same position that they were in originally.
Example of the desired result:
NAME
ORDER
ZAC
3
JEFF
1
BART
4
KATE
2
Using the table above, I have tried the following solutions:
#1
Update TEST_TABLE
Set ORDER = dbms_random.value(1,4);
This resulted in the random numbers between 1 and 4 inclusive, but the numbers could repeat, so ORDER could have the same number multiple times
Example of the attempted solution:
NAME
ORDER
ZAC
3
JEFF
1
BART
3
KATE
2
#2
Update TEST_TABLE
Set ORDER = (Select dbms_random.value(1,4) From dual);
This resulted in the same random number being copied into each ORDER record, so if the number came out at 3, then it would change them all to 3.
Example of the attempted solution:
NAME
ORDER
ZAC
3
JEFF
3
BART
3
KATE
3
This is my first time posting to StackOverflow, and I am relatively new to Oracle, so hopefully I proposed this question properly.
How about this?
Sample data:
SQL> select * from test order by rowid;
NAME C_ORDER
---- ----------
Zac 1
Jeff 2
Bart 3
Kate 4
Table is updated based on value acquired by the row_number analytic function which sorts data randomly; matches are found by the rowid value:
SQL> merge into test a
2 using (with counter (cnt) as
3 (select count(*) from test)
4 select t.rowid rid,
5 row_number() over(order by dbms_random.value(1, c.cnt)) rn
6 from counter c cross join test t
7 ) b
8 on (a.rowid = b.rid)
9 when matched then update set
10 a.c_order = b.rn;
4 rows merged.
Result:
SQL> select * from test order by rowid;
NAME C_ORDER
---- ----------
Zac 3
Jeff 4
Bart 1
Kate 2
SQL>
How about this?
MERGE INTO test d USING
(SELECT rownum AS new_order,
name
FROM (SELECT *
FROM test
ORDER BY dbms_random.value)) s
ON (d.name = s.name)
WHEN matched THEN
UPDATE
SET d.sort_order = s.new_order;
The new order is build by simply sorting the original data by random values and using rownum to number those random records from 1 to N.
I use NAME to match the records, but you should use the primary key or rowid as in Littlefoot answer.
Or at least an indexed column (for speed, when the table contains a lot of data), which uniquely identifies a row.
The simplest is to sort the data randomly and join on the "name" column:
merge into data dst
using (
select rownum as rn, name from (
select name from data order by dbms_random.value()
)
) src
on (src.name = dst.name)
when matched then
update set ord = src.rn
;

Select distinct value and bring only the latest one

I have a table that stores different statuses of each transaction. Each transaction can have multiple statuses (pending, rejected, aproved, etc).
I need to build a query that brings only the last status of each transaction.
The definition for the table that stores the statuses is:
[dbo].[Cuotas_Estado]
ID int (PK)
IdCuota int (references table dbo.Cuotas - FK)
IdEstado int (references table dbo.Estados - FK)
Here's the architecture for the 3 tables:
When running a simple SELECT statement on table dbo.Cuotas_Estado you'll get:
SELECT
*
FROM [dbo].[Cuotas_Estado] [E]
But the result I need is:
IdCuota | IdEstado
2 | 1
3 | 2
9 | 3
10 | 3
11 | 4
I'm running the following select statement:
SELECT
DISTINCT([E].[IdEstado]),
[E].[IdCuota]
FROM [dbo].[Cuotas_Estado] [E]
ORDER BY
[E].[IdCuota] ASC;
This will bring this result:
So, as you can see, it's bringing a double value to entry 9 and entry 11, I need the query to bring only the latest IdEstado column (3 in the entry 9 and 4 in the entry 11).
can you try this?
with cte as (
select IdEstado,IdCuota,
row_number() over(partition by IdCuota order by fecha desc) as RowNum
from [dbo].[Cuotas_Estado]
)
select IdEstado,IdCuota
from cte
where RowNum = 1
You can use a correlated subquery:
SELECT e.*
FROM [dbo].[Cuotas_Estado] e
WHERE e.IdEstado = (SELECT MAX(e2.IdEstado)
FROM [dbo].[Cuotas_Estado] e2
WHERE e2.IdCuota = e.IdCuota
);
With an index on Cuotas_Estado(IdCuota, IdEstado) this is probably the most efficient method.

Complex SQL query or queries

I looked at other examples, but I don't know enough about SQL to adapt it to my needs. I have a table that looks like this:
ID Month NAME COUNT First LAST TOTAL
------------------------------------------------------
1 JAN2013 fred 4
2 MAR2013 fred 5
3 APR2014 fred 1
4 JAN2013 Tom 6
5 MAR2014 Tom 1
6 APR2014 Tom 1
This could be in separate queries, but I need 'First' to equal the first month that a particular name is used, so every row with fred would have JAN2013 in the first field for example. I need the 'Last" column to equal the month of the last record of each name, and finally I need the 'total' column to be the sum of all the counts for each name, so in each row that had fred the total would be 10 in this sample data. This is over my head. Can one of you assist?
This is crude but should do the trick. I renamed your fields a bit because you are using a bunch of "RESERVED" sql words and that is bad form.
;WITH cte as
(
Select
[NAME]
,[nmCOUNT]
,ROW_NUMBER() over (partition by NAME order by txtMONTH ASC) as 'FirstMonth'
,ROW_NUMBER() over (partition by NAME order by txtMONTH DESC) as 'LastMonth'
,SUM([nmCOUNT]) as 'TotNameCount'
From Table
Group by NAME, [nmCOUNT]
)
,cteFirst as
(
Select
NAME
,[nmCOUNT]
,[TotNameCount]
,[txtMONTH] as 'ansFirst'
From cte
Where FirstMonth = 1
)
,cteLast as
(
Select
NAME
,[txtMONTH] as 'ansLast'
From cte
Where LastMonth = 1
Select c.NAME, c.nmCount, c.ansFirst, l.ansLast, c.TotNameCount
From cteFirst c
LEFT JOIN cteLast l on c.NAME = l.NAME

Trouble performing Postgres group by non-ID column to get ID containing max value

I'm attempting to perform a GROUP BY on a join table table. The join table essentially looks like:
CREATE TABLE user_foos (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
foo_id INT NOT NULL,
effective_at DATETIME NOT NULL
);
ALTER TABLE user_foos
ADD CONSTRAINT user_foos_uniqueness
UNIQUE (user_id, foo_id, effective_at);
I'd like to query this table to find all records where the effective_at is the max value for any pair of user_id, foo_id given. I've tried the following:
SELECT "user_foos"."id",
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
Unfortunately, this results in the error:
column "user_foos.id" must appear in the GROUP BY clause or be used in an aggregate function
I understand that the problem relates to "id" not being used in an aggregate function and that the DB doesn't know what to do if it finds multiple records with differing ID's, but I know this could never happen due to my trinary primary key across those columns (user_id, foo_id, and effective_at).
To work around this, I also tried a number of other variants such as using the first_value window function on the id:
SELECT first_value("user_foos"."id"),
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
and:
SELECT first_value("user_foos"."id")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id"
HAVING "user_foos"."effective_at" = max("user_foos"."effective_at")
Unfortunately, these both result in a different error:
window function call requires an OVER clause
Ideally, my goal is to fetch ALL matching id's so that I can use it in a subquery to fetch the legitimate full row data from this table for matching records. Can anyone provide insight on how I can get this working?
Postgres has a very nice feature called distinct on, which can be used in this case:
SELECT DISTINCT ON (uf."user_id", uf."foo_id") uf.*
FROM "user_foos" uf
ORDER BY uf."user_id", uf."foo_id", uf."effective_at" DESC;
It returns the first row in a group, based on the values in parentheses. The order by clause needs to include these values as well as a third column for determining which is the first row in the group.
Try:
SELECT *
FROM (
SELECT t.*,
row_number() OVER( partition by user_id, foo_id ORDER BY effective_at DESC ) x
FROM user_foos t
)
WHERE x = 1
If you don't want to use a sub query based on a composite of all three keys then you need to create a "dense rank" window function field that orders subsets of id, user_id and foo_id by effective date with the rank order field. Then subquery that and take the records where rank_order=1. Since the rank ordering was by effective date you are getting all fields of the record with the highest effective date for each foo and user.
DATSET
1 1 1 01/01/2001
2 1 1 01/01/2002
3 1 1 01/01/2003
4 1 2 01/01/2001
5 2 1 01/01/2001
DATSET WITH RANK ORDER PARTITIONED BY FOO_ID, USER_ID ORDERED BY DATE DESC
1 3 1 1 01/01/2001
2 2 1 1 01/01/2002
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001
SELECT * FROM QUERY ABOVE WHERE RANK_ORDER=1
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001

How to get get Unique Records based on multiple columns from a table

Consider the following table:
primaryKey id activity template creator created
1 1 3 5 x 2011-10-13
2 2 4 2 y 2011-10-15
3 2 4 7 z 2011-10-24
4 2 4 7 u 2011-10-29
From here I want to retrieve the records which are having unique combinations for id, activity and template. In case there are two or more unique combinations of those fields are exists I want to take the first one of them.
As an example for above table data the output that I need is
primaryKey id activity template creator created
1 1 3 5 x 2011-10-13
2 2 4 2 y 2011-10-15
3 2 4 7 z 2011-10-24
(since record 3 and 4 are having same combination I want to take just the record 3 because it is the first occurance)
Can I do this using a single SQL statement?
SELECT primarykey, id, activity, template, creator, created FROM (
SELECT *, row_number() OVER (partition BY id, activity, template ORDER BY created) as rn FROM table
) a
WHERE rn = 1
This is for MS SQL Server.
Updated, as i made a little mistake!
SELECT DISTINCT
ROW_NUMBER() OVER (ORDER BY
id
, activity
, template
, creator
, created ) PrimaryKey
, id
, activity
, template
, creator
, created
FROM
[TABLE_NAME]
GROUP BY
id
, activity
, template
, creator
, created
I think this should work -
SELECT *
FROM TABLE
WHERE
primaryKey in
(
SELECT min(primarkyKey) from TABLE
group by id, activity, template
)
Here, first distinct is obtain on required columns in the inner query by doing group by. Then the min of primary key of each distinct record is used to get all the columns from the outer query.