I would like to convert this SQL query into ANSI SQL. I am having trouble wrapping my head around the logic of this query.
I use Snowflake Data Warehouse, but it does not understand this query because of the 'delete' statement right before join, so I am trying to break it down. From my understanding the row number column is giving me the order from 1 to N based on timestamp and placing it in C. Then C is joined against itself on the rows other than the first row (based on id) and placed in C1. Then C1 is deleted from the overall data, which leaves only the first row.
I may be understanding the logic incorrectly, but I am not used to seeing the 'delete' statement right before a join. Let me know if I got the logic right, or point me in the right direction.
This query was copy/pasted from THIS stackoverflow question which has the exact situation I am trying to solve, but on a much larger scale.
with C as
(
select ID,
row_number() over(order by DT) as rn
from YourTable
)
delete C1
from C as C1
inner join C as C2
on C1.rn = C2.rn-1 and
C1.ID = C2.ID
The specific problem I am trying to solve is this. Let's assume I have this table. I need to partition the rows by primary key combinations (primKey 1 & 2) while maintaining timestamp order.
ID primKey1 primKey2 checkVar1 checkVar2 theTimestamp
100 1 2 302 423 2001-07-13
101 3 6 506 236 2005-10-25
100 1 2 302 423 2002-08-15
101 3 6 506 236 2008-12-05
101 3 6 300 100 2010-06-10
100 1 2 407 309 2005-09-05
100 1 2 302 423 2012-05-09
100 1 2 302 423 2003-07-24
Once the rows are partitioned and the timestamp is ordered within each partition, I need to delete the duplicate checkVar combination (checkVar 1 & 2) rows until the next change. Thus leaving me with the earliest unique row. The rows with asterisks are the ones which need to be removed since they are duplicates.
ID primKey1 primKey2 checkVar1 checkVar2 theTimestamp
100 1 2 302 423 2001-07-13
*100 1 2 302 423 2002-08-15
*100 1 2 302 423 2003-07-24
100 1 2 407 309 2005-09-05
100 1 2 302 423 2012-05-09
101 3 6 506 236 2005-10-25
*101 3 6 506 236 2008-12-05
101 3 6 300 100 2010-06-10
This is the final result. As you can see for ID=100, even though the 1st and 3rd record are the same, the checkVar combination changed in between, which is fine. I am only removing the duplicates until the values change.
ID primKey1 primKey2 checkVar1 checkVar2 theTimestamp
100 1 2 302 423 2001-07-13
100 1 2 407 309 2005-09-05
100 1 2 302 423 2012-05-09
101 3 6 506 236 2005-10-25
101 3 6 300 100 2010-06-10
If you want to keep the earliest row for each id, then you can use:
delete from yourtable yt
where yt.dt > (select min(yt2.dt)
from yourtable yt
where yt2.id = yd.id
);
Your query would not do this, if that is your intent.
Related
MESSAGE_ID GROUP_ID REV_NO
100 200 1
101 201 1
102 202 1
103 203 1
104 204 1
105 200 2
106 201 2
107 202 2
108 203 2
109 204 2
110 205 2
First I want to select all group ID's and their correpsponding lowest revision number.
Then I want select first X message ID's (Controllable X input) with condition that it should contain all the revisions of of any selected group. For e.g if I select first 5 messages by rownum then all revisions of group_id 200 is not selected.
Hope I made it clear.
I have a table like following
ID DATE ACCT TYPE AMOUNT SEQ CHK# TRC
1 6/5/2014 1234 C 10,000 1 1001
2 6/5/2014 3333 3,000 2 123 1002
3 6/5/2014 4444 5,000 3 234 1003
4 6/5/2014 5555 2,000 4 345 1004
5 6/5/2014 2345 C 3,000 1 1007
6 6/5/2014 5555 2,500 2 255 1008
7 6/5/2014 7777 500 3 277 1009
8 6/6/2014 1234 C 5,000 1 2001
9 6/6/2014 7777 3,000 2 278 2002
10 6/6/2014 8888 2,000 3 301 2003
The rows with TYPE = C are parent rows to the child rows that follow sequentially.
The parent rows do not have CHK# and child rows do have CHK#. Each
parent row has seq# = 1 and child rows have sequential numbers. (if it
matters) From above table, row ID 1 is the parent row to the rows with
ID 2 ~ 4. The AMOUNT on the child rows add up to the parent row's
amount.
Querying for transaction for date of '6/5/2014' on account # 2345 with
the amount of 3,000 - result should be rows with ID 6 and 7.
Is such query possible using MS-SQL 2008? If so, could you let me
know?
Well, based on the data that you have, you can use the id column to find the rows that you want. First, look for the one that has the check in that amount. The look for the subsequent ids with the same group. How do you define the group? That is easy. Take the difference between id and seq. This difference is constant for the parent and child rows.
So, here is goes:
select t.*
from table t
where (t.id - t.seq) = (select t2.id - t2.seq
from table t2
where t2.type = 'C' and
t2.acct = '2345' and
t2.date = '6/5/2014'
) and
t.type is null;
Well i made a query that is not working
i have a table like this
_id - E1
-----------
1 - 100
2 - 335
3 - 420
4 - 440
5 - 500
6 - 514
7 - 524
8 - 534
9 - 544
10 - 552
11 - 559
12 - 607
13 - 615
14 - 623
15 - 631
16 - 639
and the query that i made:
SELECT * FROM
(SELECT * FROM Table WHERE E1 > 633 AND _sentido = 'V'
UNION
SELECT * FROM Table) LIMIT 3
when i execute this i get
_id - E1
-----------
1 - 100
2 - 335
3 - 420
but what i really want is
_id - E1
-----------
1 - 639
2 - 100
3 - 335
if the last row and there are NOT 3 selected rows then complete with the first to reach 3
Always 3 rows!
I hope you can help me, John
You have a clever approach, but it is not going to work. The ordering of subqueries is not guaranteed. What you want is to order by your condition first and then fill out with the rest. Try this:
SELECT *
FROM table
ORDER BY (case when E1 > 633 AND _sentido = 'V' then 1 else 2 end)
LIMIT 3;
This puts the records you are interested in first. The limit 3 will retrieve those records (up to 3) and then pad remaining rows with the rest of the records.
I would like to transfer some existing data into new data table.
I have table with substitutions:
- ID
- currentItemId
- formerItemId
- contentId
For the same content there is possibility I have multiple entries for combinations currentItemId and formerItemId.
Let me show how it is now:
ID_T1 currentItemId formerItemId contentId
1 100 200 300
2 100 200 301
3 100 200 302
4 105 201 303
5 105 201 304
6 110 205 320
7 111 206 321
8 120 204 322
9 130 208 323
10 130 208 324
Now, I would like to select TOP ID for each combination formerItemId and currentItemId:
ID ID_T1 contentId
1 1 300
2 1 301
3 1 302
4 4 303
5 4 304
6 6 320
7 7 321
8 8 322
9 9 323
10 9 324
Both tables also contains timestamp and some other data - I haven't included that in order example to be more understandable.
I tried self join (no success), nested select (gives me right value for the original combination, but it doesn't repeat, it gives me NULL on ID for other records), but nothing seems to work. Tried something like:
SELECT di1.ID,
(SELECT TOP(1) di1.ID
FROM TABLE
WHERE
di1.currentItemtId = di2.currentItemtId AND di1.formerItemId = di1.formerItemId
) AS repeat
,di2.deleteItemId
,di1.currentitemtId
,di1.formerItemId
,di1.contentId
FROM Table di1
LEFT JOIN
Table di2 ON di1.ID = di2.ID
But this way ID doesn't repeat - I get same values for ID as in ordinary select.
I am using SQL server 2008.
Any help would be greatly appreciated.
Please try:
SELECT
MIN(ID) OVER (PARTITION BY currentItemId, formerItemId) ID,
currentItemId,
formerItemId,
contentId
FROM YourTable
SELECT
ID,
MIN(ID) OVER (PARTITION BY currentItemId, formerItemId) ID_T1,
contentId
FROM YourTable
I have a table with an ID and multiple informative columns. Sometimes however, I can have multiple data for an ID, so I added a column called "Sequence". Here is a shortened example:
ID Sequence Name Tel Date Amount
124 1 Bob 873-4356 2001-02-03 10
124 2 Bob 873-4356 2002-03-12 7
124 3 Bob 873-4351 2006-07-08 24
125 1 John 983-4568 2007-02-01 3
125 2 John 983-4568 2008-02-08 13
126 1 Eric 345-9845 2010-01-01 18
So, I would like to obtain only these lines:
124 3 Bob 873-4351 2006-07-08 24
125 2 John 983-4568 2008-02-08 13
126 1 Eric 345-9845 2010-01-01 18
Anyone could give me a hand on how I could build a SQL query to do this ?
Thanks !
You can calculate the maximum sequence using group by. Then you can use join to get only the maximum in the original data.
Assuming your table is called t:
select t.*
from t join
(select id, MAX(sequence) as maxs
from t
group by id
) tmax
on t.id = tmax.id and
t.sequence = tmax.maxs