SQL Shifting 2 columns a row down with rollover - sql

I am currently trying to manipulate a table in SQL Server.
The following table is an example:
Id | PrimaryId | PrimaryMail | SecondaryId | SecondaryMail
1. 1. email1#something. 5. email2#something
2. 2. email3#something. 6. email4#something
3. 3. email5#something. 7. email6#something
4. 4. email7#something. 8. email8#something
I want it to become the following:
Id | PrimaryId | PrimaryMail | SecondaryId | SecondaryMail
1. 1. email1#something. 6. email4#something
2. 2. email3#something. 7. email6#something
3. 3. email5#something. 8. email8#something
4. 4. email7#something. 5. email2#something
I found a similar question on here but the person did not want to use joins and the actual thing I am after is the rollover mechanic that makes either the first row become the last row or the last row the first row depending on what direction you shift the columns.
I already have the following code that makes the items in the columns go up by 1 row. But the last row currently stays the same and I lose the data from the first row.
--Calculate the height of the table
DECLARE #CouplesCount AS INT
SELECT #CouplesCount = COUNT(*) FROM Couples
--Shift the Secondary users down one row
UPDATE c1
SET c1.SecondaryId = c2.SecondaryId, c1.SecondaryMail = c2.SecondaryMail
FROM Couples c1 join
Couples c2
on c2.Id = c1.Id + 1
WHERE c1.CouplesId <= #CouplesCount - 1
SELECT * FROM Couples
Thank in advance to anyone that can help me.
p.s. I cannot use MySQL code

If the version of SQL Server that you use is SQL Server 2012 or later, you can use window functions LEAD and FIRST_VALUE.
LEAD returns the values from the "next" row (as defined by the ORDER BY clause). It will return NULL for the last row, because there is no "next" row there.
So, for the last row we need to return the first row, which can be done by the FIRST_VALUE.
Sample data
DECLARE #Couples TABLE (
Id int, PrimaryId int, PrimaryMail nvarchar(100),
SecondaryId int, SecondaryMail nvarchar(100));
INSERT INTO #Couples VALUES
(1, 1, 'email1#something', 5, 'email2#something'),
(2, 2, 'email3#something', 6, 'email4#something'),
(3, 3, 'email5#something', 7, 'email6#something'),
(4, 4, 'email7#something', 8, 'email8#something');
Query
SELECT
C.Id
,C.PrimaryId
,C.PrimaryMail
,ISNULL(LEAD(C.SecondaryId) OVER (ORDER BY C.Id),
FIRST_VALUE(C.SecondaryId) OVER (ORDER BY C.Id))
AS SecondaryId
,ISNULL(LEAD(C.SecondaryMail) OVER (ORDER BY C.Id),
FIRST_VALUE(C.SecondaryMail) OVER (ORDER BY C.Id))
AS SecondaryMail
FROM
#Couples AS C
;
Result
+----+-----------+------------------+-------------+------------------+
| Id | PrimaryId | PrimaryMail | SecondaryId | SecondaryMail |
+----+-----------+------------------+-------------+------------------+
| 1 | 1 | email1#something | 6 | email4#something |
| 2 | 2 | email3#something | 7 | email6#something |
| 3 | 3 | email5#something | 8 | email8#something |
| 4 | 4 | email7#something | 5 | email2#something |
+----+-----------+------------------+-------------+------------------+
Update
If you need to actually change the table itself, it is easy to do as well, just wrap the query in a CTE and update it.
WITH
CTE
AS
(
SELECT
C.Id
,C.PrimaryId
,C.PrimaryMail
,C.SecondaryId
,C.SecondaryMail
,ISNULL(LEAD(C.SecondaryId) OVER (ORDER BY C.Id),
FIRST_VALUE(C.SecondaryId) OVER (ORDER BY C.Id))
AS NewSecondaryId
,ISNULL(LEAD(C.SecondaryMail) OVER (ORDER BY C.Id),
FIRST_VALUE(C.SecondaryMail) OVER (ORDER BY C.Id))
AS NewSecondaryMail
FROM
#Couples AS C
)
UPDATE CTE
SET
SecondaryId = NewSecondaryId
,SecondaryMail = NewSecondaryMail
;
SELECT * FROM #Couples ORDER BY Id;

Related

How to pivot column data into a row where a maximum qty total cannot be exceeded?

Introduction:
I have come across an unexpected challenge. I'm hoping someone can help and I am interested in the best method to go about manipulating the data in accordance to this problem.
Scenario:
I need to combine column data associated to two different ID columns. Each row that I have associates an item_id and the quantity for this item_id. Please see below for an example.
+-------+-------+-------+---+
|cust_id|pack_id|item_id|qty|
+-------+-------+-------+---+
| 1 | A | 1 | 1 |
| 1 | A | 2 | 1 |
| 1 | A | 3 | 4 |
| 1 | A | 4 | 0 |
| 1 | A | 5 | 0 |
+-------+-------+-------+---+
I need to manipulate the data shown above so that 24 rows (for 24 item_ids) is combined into a single row. In the example above I have chosen 5 items to make things easier. The selection format I wish to get, assuming 5 item_ids, can be seen below.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 4 | 0 | 0 |
+---------+---------+---+---+---+---+---+
However, here's the condition that is making this troublesome. The maximum total quantity for each row must not exceed 5. If the total quantity exceeds 5 a new row associated to the cust_id and pack_id must be created for the rest of the item_id quantities. Please see below for the desired output.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 3 | 0 | 0 |
| 1 | A | 0 | 0 | 1 | 0 | 0 |
+---------+---------+---+---+---+---+---+
Notice how the quantities of item_ids 1, 2 and 3 summed together equal 6. This exceeds the maximum total quantity of 5 for each row. For the second row the difference is created. In this case only item_id 3 has a single quantity remaining.
Note, if a 2nd row needs to be created that total quantity displayed in that row also cannot exceed 5. There is a known item_id limit of 24. But, there is no known limit of the quantity associated for each item_id.
Here's an approach which goes from left-field a bit.
One approach would have been to do a recursive CTE, building the rows one-by-one.
Instead, I've taken an approach where I
Create a new (virtual) table with 1 row per item (so if there are 6 items, there will be 6 rows)
Group those items into groups of 5 (I've called these rn_batches)
Pivot those (based on counts per item per rn_batch)
For these, processing is relatively simple
Creating one row per item is done using INNER JOIN to a numbers table with n <= the relevant quantity.
The grouping then just assigns rn_batch = 1 for the first 5 items, rn_batch = 2 for the next 5 items, etc - until there are no more items left for that order (based on cust_id/pack_id).
Here is the code
/* Data setup */
CREATE TABLE #Order (cust_id int, pack_id varchar(1), item_id int, qty int, PRIMARY KEY (cust_id, pack_id, item_id))
INSERT INTO #Order (cust_id, pack_id, item_id, qty) VALUES
(1, 'A', 1, 1),
(1, 'A', 2, 1),
(1, 'A', 3, 4),
(1, 'A', 4, 0),
(1, 'A', 5, 0);
/* Pivot results */
WITH Nums(n) AS
(SELECT (c * 100) + (b * 10) + (a) + 1 AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) A(a)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) B(b)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) C(c)
),
ItemBatches AS
(SELECT cust_id, pack_id, item_id,
FLOOR((ROW_NUMBER() OVER (PARTITION BY cust_id, pack_id ORDER BY item_id, N.n)-1) / 5) + 1 AS rn_batch
FROM #Order O
INNER JOIN Nums N ON N.n <= O.qty
)
SELECT *
FROM (SELECT cust_id, pack_id, rn_batch, 'Item_' + LTRIM(STR(item_id)) AS item_desc
FROM ItemBatches
) src
PIVOT
(COUNT(item_desc) FOR item_desc IN ([Item_1], [Item_2], [Item_3], [Item_4], [Item_5])) pvt
ORDER BY cust_id, pack_id, rn_batch;
And here are results
cust_id pack_id rn_batch Item_1 Item_2 Item_3 Item_4 Item_5
1 A 1 1 1 3 0 0
1 A 2 0 0 1 0 0
Here's a db<>fiddle with
additional data in the #Orders table
the answer above, and also the processing with each step separated.
Notes
This approach (with the virtual numbers table) assumes a maximum of 1,000 for a given item in an order. If you need more, you can easily extend that numbers table by adding additional CROSS JOINs.
While I am in awe of the coders who made SQL Server and how it determines execution plans in millisends, for larger datasets I give SQL Server 0 chance to accurately predict how many rows will be in each step. As such, for performance, it may work better to split the code up into parts (including temp tables) similar to the db<>fiddle example.

PostgreSQL CTE UPDATE-FROM query skips rows

2 tables
table_1 rows: NOTE: id 2 has two rows
-----------------------
| id | counts | track |
-----------------------
| 1 | 10 | 1 |
| 2 | 10 | 2 |
| 2 | 10 | 3 |
-----------------------
table_2 rows
---------------
| id | counts |
---------------
| 1 | 0 |
| 2 | 0 |
---------------
Query:
with t1_rows as (
select id, sum(counts) as counts, track
from table_1
group by id, track
)
update table_2 set counts = (coalesce(table_2.counts, 0) + t1.counts)::float
from t1_rows t1
where table_2.id = t1.id;
select * from table_2;
When i ran above query i got table_2 output as
---------------
| id | counts |
---------------
| 1 | 10 |
| 2 | 10 | (expected counts as 20 but got 10)
---------------
I noticed that above update query is considering only 1st match and skipping rest.
I can make it work by changing the query like below. Now the table_2 updates as expected since there are no duplicate rows from table_1.
But i would like to know why my previous query is not working. Is there anything wrong in it?
with t1_rows as (
select id, sum(counts) as counts, array_agg(track) as track
from table_1
group by id
)
update table_2 set counts = (coalesce(table_2.counts, 0) + t1.counts)::float
from t1_rows t1
where table_2.id = t1.id;
Schema
CREATE TABLE IF NOT EXISTS table_1(
id varchar not null,
counts integer not null,
track integer not null
);
CREATE TABLE IF NOT EXISTS table_2(
id varchar not null,
counts integer not null
);
insert into table_1(id, counts, track) values(1, 10, 1), (2, 10, 2), (2, 10, 3);
insert into table_2(id, counts) values(1, 0), (2, 0);
The problem is that an UPDATE in PostgreSQL creates a new version of the row rather than changing the row in place, but the new row version is not visible in the snapshot of the current query. So from the point of view of the query, the row “vanishes” when it is updated the first time.
The documentation says:
When a FROM clause is present, what essentially happens is that the target table is joined to the tables mentioned in the from_list, and each output row of the join represents an update operation for the target table. When using FROM you should ensure that the join produces at most one output row for each row to be modified. In other words, a target row shouldn't join to more than one row from the other table(s). If it does, then only one of the join rows will be used to update the target row, but which one will be used is not readily predictable.
So if I read your question correctly, you expect row 2&3 from table_1 to get added together? If so, the reason your first approach didn't work is because it grouped by id, track.
Since row 2&3 have a different number in the track column, they didn't get added together by the group by clause.
Your second approach worked because it only grouped by id

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

I have performing some queries using PostgreSQL SELECT DISTINCT ON syntax. I would like to have the query return the total number of rows alongside with every result row.
Assume I have a table my_table like the following:
CREATE TABLE my_table(
id int,
my_field text,
id_reference bigint
);
I then have a couple of values:
id | my_field | id_reference
----+----------+--------------
1 | a | 1
1 | b | 2
2 | a | 3
2 | c | 4
3 | x | 5
Basically my_table contains some versioned data. The id_reference is a reference to a global version of the database. Every change to the database will increase the global version number and changes will always add new rows to the tables (instead of updating/deleting values) and they will insert the new version number.
My goal is to perform a query that will only retrieve the latest values in the table, alongside with the total number of rows.
For example, in the above case I would like to retrieve the following output:
| total | id | my_field | id_reference |
+-------+----+----------+--------------+
| 3 | 1 | b | 2 |
+-------+----+----------+--------------+
| 3 | 2 | c | 4 |
+-------+----+----------+--------------+
| 3 | 3 | x | 5 |
+-------+----+----------+--------------+
My attemp is the following:
select distinct on (id)
count(*) over () as total,
*
from my_table
order by id, id_reference desc
This returns almost the correct output, except that total is the number of rows in my_table instead of being the number of rows of the resulting query:
total | id | my_field | id_reference
-------+----+----------+--------------
5 | 1 | b | 2
5 | 2 | c | 4
5 | 3 | x | 5
(3 rows)
As you can see it has 5 instead of the expected 3.
I can fix this by using a subquery and count as an aggregate function:
with my_values as (
select distinct on (id)
*
from my_table
order by id, id_reference desc
)
select count(*) over (), * from my_values
Which produces my expected output.
My question: is there a way to avoid using this subquery and have something similar to count(*) over () return the result I want?
You are looking at my_table 3 ways:
to find the latest id_reference for each id
to find my_field for the latest id_reference for each id
to count the distinct number of ids in the table
I therefore prefer this solution:
select
c.id_count as total,
a.id,
a.my_field,
b.max_id_reference
from
my_table a
join
(
select
id,
max(id_reference) as max_id_reference
from
my_table
group by
id
) b
on
a.id = b.id and
a.id_reference = b.max_id_reference
join
(
select
count(distinct id) as id_count
from
my_table
) c
on true;
This is a bit longer (especially the long thin way I write SQL) but it makes it clear what is happening. If you come back to it in a few months time (somebody usually does) then it will take less time to understand what is going on.
The "on true" at the end is a deliberate cartesian product because there can only ever be exactly one result from the subquery "c" and you do want a cartesian product with that.
There is nothing necessarily wrong with subqueries.

SQL Query Count and Label if first

I'm using SQL Server management Studio 2010.
I need my query to populate a column if the record contains the first occurrence of a value.
The table my query returns is huge so I'll just use pretend columns to get my point across. My query currently returns a table that looks like this
| ROW | ItemNumber | DateOpen | Status |
| 1 | 10045 | 5/5/2005 | Open |
| 2 | 10045 | 5/5/2005 | Open |
| 3 | 10046 | 5/5/2005 | Open |
| 4 | 10046 | 5/5/2005 | Open |
| 5 | 10046 | 5/5/2005 | Open |
I've already added the row indicator in to the query thinking it would help identify the first occurrence of an ItemNumber. I need to have a new column that marks an X if the record is the first occurrence.
I have this so far
DECLARE #ItemData Table(itemRow BIGINT, itemNumber BIGINT, DateOpen VARCHAR(15), status VARCHAR(15))
INSERT INTO #ItemData (itemRow, ItemNumber, DateOpen, Status)
SELECT Row_Number() OVER(ORDER BY Cm_ItemNumber) AS 'ROW'
,Cm_ItemNumber AS ItemNumber
,Dates_DateOpen AS DateOpen
,St_Status AS Status
FROM db_Items
JOIN db_Dates ON Dates_Item = Cm_ItemID
JOIN db_Status ON St_ID = Cm_StatusID
Select * from #ItemData
The reason its thrown into a table then selected seperately is because there's actually a union and a lot more stuff in the query and when I realized I needed to do a unique checker and add it to a column, I figured the easiest way would be an "after the fact" type thing and it would go into the Select * from #ItemData portion.
I haven't tested this, but it's along the lines I'd be playing with...
using your existing query as the first of 2 CTEs :
With AllData as
(
SELECT Row_Number() OVER(ORDER BY Cm_ItemNumber) AS 'ROW'
,Cm_ItemNumber AS ItemNumber
,Dates_DateOpen AS DateOpen
,St_Status AS Status
FROM db_Items
JOIN db_Dates ON Dates_Item = Cm_ItemID
JOIN db_Status ON St_ID = Cm_StatusID
),
FirstRows as
(
SELECT Min(ROW) as Row, ItemNumber
FROM AllData
GROUP BY ItemNumber
)
SELECT
ad.*,
Case When fr.Row IS NULL then '' else 'X' end as X_Col
FROM AllData ad LEFT JOIN FirstRows fr
ON ad.ROW=fr.Row

SQL: Find rows where field value differs

I have a database table structured like this (irrelevant fields omitted for brevity):
rankings
------------------
(PK) indicator_id
(PK) alternative_id
(PK) analysis_id
rank
All fields are integers; the first three (labeled "(PK)") are a composite primary key. A given "analysis" has multiple "alternatives", each of which will have a "rank" for each of many "indicators".
I'm looking for an efficient way to compare an arbitrary number of analyses whose ranks for any alternative/indicator combination differ. So, for example, if we have this data:
analysis_id | alternative_id | indicator_id | rank
----------------------------------------------------
1 | 1 | 1 | 4
1 | 1 | 2 | 6
1 | 2 | 1 | 3
1 | 2 | 2 | 9
2 | 1 | 1 | 4
2 | 1 | 2 | 7
2 | 2 | 1 | 4
2 | 2 | 2 | 9
...then the ideal method would identify the following differences:
analysis_id | alternative_id | indicator_id | rank
----------------------------------------------------
1 | 1 | 2 | 6
2 | 1 | 2 | 7
1 | 2 | 1 | 3
2 | 2 | 1 | 4
I came up with a query that does what I want for 2 analysis IDs, but I'm having trouble generalizing it to find differences between an arbitrary number of analysis IDs (i.e. the user might want to compare 2, or 5, or 9, or whatever, and find any rows where at least one analysis differs from any of the others). My query is:
declare #analysisId1 int, #analysisId2 int;
select #analysisId1 = 1, #analysisId2 = 2;
select
r1.indicator_id,
r1.alternative_id,
r1.[rank] as Analysis1Rank,
r2.[rank] as Analysis2Rank
from rankings r1
inner join rankings r2
on r1.indicator_id = r2.indicator_id
and r1.alternative_id = r2.alternative_id
and r2.analysis_id = #analysisId2
where
r1.analysis_id = #analysisId1
and r1.[rank] != r2.[rank]
(It puts the analysis values into additional fields instead of rows. I think either way would work.)
How can I generalize this query to handle many analysis ids? (Or, alternatively, come up with a different, better query to do the job?) I'm using SQL Server 2005, in case it matters.
If necessary, I can always pull all the data out of the table and look for differences in code, but a SQL solution would be preferable since often I'll only care about a few rows out of thousands and there's no point in transferring them all if I can avoid it. (However, if you have a compelling reason not to do this in SQL, say so--I'd consider that a good answer too!)
This will return your desired data set - Now you just need a way to pass the required analysis ids to the query. Or potentially just filter this data inside your application.
select r.* from rankings r
inner join
(
select alternative_id, indicator_id
from rankings
group by alternative_id, indicator_id
having count(distinct rank) > 1
) differ on r.alternative_id = differ.alternative_id
and r.indicator_id = differ.indicator_id
order by r.alternative_id, r.indicator_id, r.analysis_id, r.rank
I don't know wich database you are using, in SQL Server I would go like this:
-- STEP 1, create temporary table with all the alternative_id , indicator_id combinations with more than one rank:
select alternative_id , indicator_id
into #results
from rankings
group by alternative_id , indicator_id
having count (distinct rank)>1
-- STEP 2, retreive the data
select a.* from rankings a, #results b
where a.alternative_id = b.alternative_id
and a.indicator_id = b. indicator_id
order by alternative_id , indicator_id, analysis_id
BTW, THe other answers given here need the count(distinct rank) !!!!!
I think this is what you're trying to do:
select
r.analysis_id,
r.alternative_id,
rm.indicator_id_max,
rm.rank_max
from rankings rm
join (
select
analysis_id,
alternative_id,
max(indicator_id) as indicator_id_max,
max(rank) as rank_max
from rankings
group by analysis_id,
alternative_id
having count(*) > 1
) as rm
on r.analysis_id = rm.analysis_id
and r.alternative_id = rm.alternative_id
You example differences seems wrong. You say you want analyses whose ranks for any alternative/indicator combination differ but the example rows 3 and 4 don't satisfy this criteria. A correct result according to your requirement is:
analysis_id | alternative_id | indicator_id | rank
----------------------------------------------------
1 | 1 | 2 | 6
2 | 1 | 2 | 7
1 | 2 | 1 | 3
2 | 2 | 1 | 4
On query you could try is this:
with distinct_ranks as (
select alternative_id
, indicator_id
, rank
, count (*) as count
from rankings
group by alternative_id
, indicator_id
, rank
having count(*) = 1)
select r.analysis_id
, r.alternative_id
, r.indicator_id
, r.rank
from rankings r
join distinct_ranks d on r.alternative_id = d.alternative_id
and r.indicator_id = d.indicator_id
and r.rank = d.rank
You have to realize that on multiple analysis the criteria you have is ambiguous. What if analysis 1,2 and 3 have rank 1 and 4,5 and 6 have rank 2 for alternative/indicator 1/1? The set (1,2,3) is 'different' from the set (4,5,6) but inside each set there is no difference. what is the behavior you desire in that case, should they show up or not? My query finds all records that have a different rank for the same alternative/indicator *from all other analysis' but is not clear if this is correct in your requirement.