MySQL - How to simplify this query?

MySQL - How to simplify this query? - sql

i have a query which i want to simplify:
select
sequence,
1 added
from scoredtable
where score_timestamp=1292239056000
and sequence
not in (select sequence from scoredtable where score_timestamp=1292238452000)
union
select
sequence,
0 added
from scoredtable
where score_timestamp=1292238452000
and sequence
not in (select sequence from scoredtable where score_timestamp=1292239056000);
Any ideas? basically i want to extract from the same table all the sequences that are different betweent two timestamp values. With a colum "added" which represents if a row is new or if a row has been deleted.
Source table:
score_timestamp sequence
1292239056000 0
1292239056000 1
1292239056000 2
1292238452000 1
1292238452000 2
1292238452000 3
Example between (1292239056000, 1292238452000)
Query result (2 rows):
sequence added
3 1
0 0
Example between (1292238452000, 1292239056000)
Query result (2 rows):
sequence added
0 1
3 0
Example between (1292239056000, 1292239056000)
Query result (0 rows):
sequence added

This query gets all sequences that appear only once within both timestamps, and checks if it occurs for the first or for the second timestamp.
SELECT
sequence,
CASE WHEN MIN(score_timestamp) = 1292239056000 THEN 0 ELSE 1 END AS added
FROM scoredtable
WHERE score_timestamp IN ( 1292239056000, 1292238452000 )
AND ( 1292239056000 <> 1292238452000 ) -- No rows, when timestamp is the same
GROUP BY sequence
HAVING COUNT(*) = 1
It returns your desired result:
sequence added
3 1
0 0

Given two timestamps
SET #ts1 := 1292239056000
SET #ts2 := 1292238452000
you can get your additions and deletes with:
SELECT s1.sequence AS sequence, 0 as added
FROM scoredtable s1 LEFT JOIN
scoredtable s2 ON
s2.score_timestamp = #ts2 AND
s1.sequence = s2.sequence
WHERE
s1.score_timestamp = #ts1 AND
s2.score_timestampe IS NULL
UNION ALL
SELECT s2.sequence, 1
FROM scoredtable s1 RIGHT JOIN
scoredtable s2 ON s1.score_timestamp = #ts1 AND
s1.sequence = s2.sequence
WHERE
s2.score_timestamp = #ts2 AND
s1.score_timestampe IS NULL
depending on the number of rows and the statistics the above query might perform better then group by and having count(*) = 1 version (i think that will always need full table scan, while the above union should be able to do 2 x anti-join which might fare better)
If you have substantial data set, do let us know which is faster (test with SQL_NO_CACHE for comparable results)

Related

How can I replace the LAST() function in MS Access with proper ordering on a rather large table?

I have an MS Access database with the two tables, Asset and Transaction. The schema looks like this:
Table ASSET
Key Date1 AType FieldB FieldC ...
A 2023.01.01 T1
B 2022.01.01 T1
C 2023.01.01 T2
.
.
TABLE TRANSACTION
Date2 Key TType1 TType2 TType3 FieldOfInterest ...
2022.05.31 A 1 1 1 10
2022.08.31 A 1 1 1 40
2022.08.31 A 1 2 1 41
2022.09.31 A 1 1 1 30
2022.07.31 A 1 1 1 30
2022.06.31 A 1 1 1 20
2022.10.31 A 1 1 1 45
2022.12.31 A 2 1 1 50
2022.11.31 A 1 2 1 47
2022.05.23 B 2 1 1 30
2022.05.01 B 1 1 1 10
2022.05.12 B 1 2 1 20
.
.
.
The ASSET table has a PK (Key).
The TRANSACTION table has a composite key that is (Key, Date2, Type1, Type2, Type3).
Given the above tables let's see an example:
Input1 = 2022.04.01
Input2 = 2022.08.31
Desired result:
Key FieldOfInterest
A 41
because if the Transactions in scope was to be ordered by Date2, TType1, TType2, TType3 all ascending then the record having FieldOfInterest = 41 would be the last one.
Note that Asset B is not in scope due to Asset.Date1 < Input1, neither is Asset C because AType != T1. Ultimately I am curious about the SUM(FieldOfInterest) of all the last transactions belonging to an Asset that is in scope determined by the input variables.
The following query has so far provided the right results but after upgrading to a newer MS Access version, the LAST() operation is no longer reliably returning the row which is the latest addition to the Transaction table.
I have several input values but the most important ones are two dates, lets call them InputDate1 and
InputDate2.
This is how it worked so far:
SELECT Asset.AType, Last(FieldOfInterest) AS CurrentValue ,Asset.Key
FROM Transaction
INNER JOIN Asset ON Transaction.Key = Asset.Key
WHERE Transaction.Date2 <= InputDate2 And Asset.Date1 >= InputDate1
GROUP BY Asset.Key, Asset.AType
HAVING Asset.AType='T1'
It is known that the grouped records are not guaranteed to be in any order. Obviously it is a mistake to rely on the order of the records of the group by operation will always keep the original table order but lets just ignore this for now.
I have been struggling to come up with the right way to do the following:
join the Asset and Transaction tables on Asset.Key = Transaction.Key
filter by Asset.Date1 >= InputDate1 AND Transaction.Date2 <= InputDate2
then I need to select one record for all Transaction.Key where Date2 and TType1 and TType2 and TType3 has the highest value. (this represents the actual last record for given Key)
As far as I know there is no way to order records within a group by clause which is unfortunate.
I have tried Ranking, but the Transactions table is large (800k rows) and the performance was very slow, I need something faster than this. The following are an example of three saved queries that I wrote and chained together but the performance is very disappointing probably due to the ranking step.
-- Saved query step_1
SELECT Asset.*, Transaction.*
FROM Transaction
INNER JOIN Asset ON Transaction.Key = Asset.Key
WHERE Transaction.Date2 <= 44926
AND Asset.Date1 >= 44562
AND Asset.aType = 'T1'
-- Saved query step_2
SELECT tr.FieldOfInterest, (SELECT Count(*) FROM
(SELECT tr2.Transaction.Key, tr2.Date2, tr2.Transaction.tType1, tr2.tType2, tr2.tType3 FROM step_1 AS tr2) AS tr1
WHERE (tr1.Date2 > tr.Date2 OR
(tr1.Date2 = tr.Date2 AND tr1.tType1 > tr.Transaction.tType1) OR
(tr1.Date2 = tr.Date2 AND tr1.tType1 = tr.Transaction.tType1 AND tr1.tType2 > tr.tType2) OR
(tr1.Date2 = tr.Date2 AND tr1.tType1 = tr.Transaction.tType1 AND tr1.tType2 = tr.tType2 AND tr1.tType3 > tr.tType3))
AND tr1.Key = tr.Transaction.Key)+1 AS Rank
FROM step_1 AS tr
-- Saved query step_3
SELECT SUM(FieldOfInterest) FROM step_2
WHERE Rank = 1
I hope I am being clear enough so that I can get some useful recommendations. I've been stuck with this for weeks now and really don't know what to do about it. I am open for any suggestions.

Reading the following specification
then I need to select one record for all Transaction.Key where Date2 and TType1 and TType2 and TType3 has the highest value. (this represents the actual last record for given Key)
Consider a simple aggregation for step 2 to retrieve the max values then in step 3 join all fields to first query.
Step 1 (rewritten to avoid name collision and too many columns)
SELECT a.[Key] AS Asset_Key, a.Date1, a.AType,
t.[Key] AS Transaction_Key, t.Date2,
t.TType1, t.TType2, t.TType3, t.FieldOfInterest
FROM Transaction t
INNER JOIN Asset a ON a.[Key] = a.[Key]
WHERE t.Date2 <= 44926
AND a.Date1 >= 44562
AND a.AType = 'T1'
Step 2
SELECT Transaction_Key,
MAX(Date2) AS Max_Date2,
MAX(TType1) AS TType1,
MAX(TType2) AS TType2,
MAX(TType3) AS TType3
FROM step_1
GROUP Transaction_Key
Step 3
SELECT s1.*
FROM step_1 s1
INNER JOIN step_2 s2
ON s1.Transaction_Key = s2.Transaction_Key
AND s1.Date2 = s2.Max_Date2
AND s1.TType1 = s2.Max_TType1
AND s1.TType2 = s2.Max_TType2
AND s1.TType3 = s2.Max_TType3

How to update table by of the records in the table

I have a table in PostgerSQL and I need to make N entries in the table twice and for the first half I need to fill in the partner_id field with the value 1 and the second half with the value partner_id = 2.
i try to `
update USERS_TABLE set user_rule_id = 1;
update USERS_TABLE set user_rule_id = 2 where USERS_TABLE.id > count(*)/2;
`

I depends a lot how precise the number of users have to be that are updated with 1 or 2.
The following would be quite unprecise,a s it doesn't take the exact number of user that already exist8after deleting some rows the numbers doesn't fit anymore.
SELECT * FROM USERS_TABLE
id
user_rule_id
1
1
2
1
3
2
4
2
5
2
SELECT 5
If you have a lot of deleted rows and want still the half of the users, you can choose following approach, which does rely on the id, but at teh actual row number
UPDATE USERS_TABLE1
set user_rule_id = CASE WHEN rn <= (SELECT count(*) FROM USERS_TABLE1)/ 2 then 1
ELSE 2 END
FROM (SELECT id, ROW_NUMBER() OVER( ORDER BY id) rn FROM USERS_TABLE1) t
WHERE USERS_TABLE1.id = t.id;
UPDATE 5
SELECT * FROM USERS_TABLE1
id
user_rule_id
1
1
2
1
3
2
4
2
5
2
SELECT 5
fiddle
In the sample case it it the same result, but when you have a lot of rows and a bunch of the deleted users, the senind will give you quite a good result

How to write a multple Not equal AND Clause in TSQL?

I am using tsql to write query and my where clause seems to be returning too many results. It for some reason seems to be treating my Not equal AND Clause as an OR clause.
SELECT *
FROM [dbo].TestTable
WHERE
([SomeID] <> 0)
AND
([BlahID] <> 0)
AND
([TestID] <> 0)
AND
([AnotherID] <> 0)
ORDER By ID
I need it to return only rows where all four columns are not equal to zero. Right now it is removing rows where any of the columns are equal to zero.
My Table:
ID SomeID BlahID TestID AnotherID
1 0 1 4 0
2 0 4 4 8
3 0 0 4 0
4 0 1 4 0
5 6 1 4 3
What I want it to return
ID SomeID BlahID TestID AnotherID
1 0 1 4 0
2 0 4 4 8
4 0 1 4 0
5 6 1 4 3
What it Returns
ID SomeID BlahID TestID AnotherID
5 6 1 4 3

It keeps not returning any row that has a zero in any of these columns
not all of these columns
That's exactly what your query is all about. It returns rows, with no 0 at any column you've listed.
That's because if any of AND parts is evaluated as false entire condition is false as well. So if e.g. TestID = 0, the entire WHERE clause is false and the row is not being returned.
To get rows where at lease one of these columns has non-zero value, use OR.

I need it to return only rows where all four columns are not equal to
zero. Right now it is removing rows where any of the columns are equal
to zero.
Based on your wording and revised update, what you do want is to use OR instead of AND. Not sure if the link will work, but you can check out this SQL Fiddle.

Easier to read (for me, at least)
Records that never have and id = 0
SELECT *
FROM [dbo].TestTable
WHERE 0 NOT IN ([SomeID],[BlahID],[TestID],[AnotherID])
Records that have at least one id <> 0
SELECT * FROM TestTable
WHERE EXISTS (
SELECT 1
FROM (VALUES(SomeID),(BlahID),(TestID),(AnotherID)) t(id)
WHERE id <> 0
)

Use
SELECT * FROM [dbo].TestTable WHERE [BlahID] <> 0
The problem isn't your query, it's the results you expect. Every row in the table has a non-zero value in at least one column, so the query returns the whole table.

SQL random number that doesn't repeat within a group

Suppose I have a table:
HH SLOT RN
--------------
1 1 null
1 2 null
1 3 null
--------------
2 1 null
2 2 null
2 3 null
I want to set RN to be a random number between 1 and 10. It's ok for the number to repeat across the entire table, but it's bad to repeat the number within any given HH. E.g.,:
HH SLOT RN_GOOD RN_BAD
--------------------------
1 1 9 3
1 2 4 8
1 3 7 3 <--!!!
--------------------------
2 1 2 1
2 2 4 6
2 3 9 4
This is on Netezza if it makes any difference. This one's being a real headscratcher for me. Thanks in advance!

To get a random number between 1 and the number of rows in the hh, you can use:
select hh, slot, row_number() over (partition by hh order by random()) as rn
from t;
The larger range of values is a bit more challenging. The following calculates a table (called randoms) with numbers and a random position in the same range. It then uses slot to index into the position and pull the random number from the randoms table:
with nums as (
select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
select 6 union all select 7 union all select 8 union all select 9
),
randoms as (
select n, row_number() over (order by random()) as pos
from nums
)
select t.hh, t.slot, hnum.n
from (select hh, randoms.n, randoms.pos
from (select distinct hh
from t
) t cross join
randoms
) hnum join
t
on t.hh = hnum.hh and
t.slot = hnum.pos;
Here is a SQLFiddle that demonstrates this in Postgres, which I assume is close enough to Netezza to have matching syntax.

I am not an expert on SQL, but probably do something like this:
Initialize a counter CNT=1
Create a table such that you sample 1 row randomly from each group and a count of null RN, say C_NULL_RN.
With probability C_NULL_RN/(10-CNT+1) for each row, assign CNT as RN
Increment CNT and go to step 2

Well, I couldn't get a slick solution, so I did a hack:
Created a new integer field called rand_inst.
Assign a random number to each empty slot.
Update rand_inst to be the instance number of that random number within this household. E.g., if I get two 3's, then the second 3 will have rand_inst set to 2.
Update the table to assign a different random number anywhere that rand_inst>1.
Repeat assignment and update until we converge on a solution.
Here's what it looks like. Too lazy to anonymise it, so the names are a little different from my original post:
/* Iterative hack to fill 6 slots with a random number between 1 and 13.
A random number *must not* repeat within a household_id.
*/
update c3_lalfinal a
set a.rand_inst = b.rnum
from (
select household_id
,slot_nbr
,row_number() over (partition by household_id,rnd order by null) as rnum
from c3_lalfinal
) b
where a.household_id = b.household_id
and a.slot_nbr = b.slot_nbr
;
update c3_lalfinal
set rnd = CAST(0.5 + random() * (13-1+1) as INT)
where rand_inst>1
;
/* Repeat until this query returns 0: */
select count(*) from (
select household_id from c3_lalfinal group by 1 having count(distinct(rnd)) <> 6
) x
;

Returning several rows from a single query, based on a value of a column

Let's say I have this table:
|Fld | Number|
1 5
2 2
And I want to make a select that retrieves as many Fld as the Number field has:
|Fld |
1
1
1
1
1
2
2
How can I achieve this? I was thinking about making a temporary table and instert data based on the Number, but I was wondering if this could be done with a single Select statement.
PS: I'm new to SQL

You can join with a numbers table:
SELECT Fld
FROM yourtable
JOIN Numbers
ON yourtable.Number <= Numbers.Number
A numbers table is just a table with a list of numbers:
Number
1
2
3
etc...

Not an great solution (since you still query your table twice, but maybe you can work from it)
SELECT t1.fld, t1.number
FROM table t1, (
SELECT ROWNUM number FROM dual
CONNECT BY LEVEL <= (SELECT MAX(number) FROM t1)) t2
WHERE t2.number<=t1.number
It generates maximum amount of rows needed and then filters it by each row.

I don't know if your RDBMS version supports it (although I rather suspect it does), but here is a recursive version:
WITH remaining (fld, times) as (SELECT fld, 1
FROM <table>
UNION ALL
SELECT a.fld, a.times + 1
FROM remaining as a
JOIN <table> as b
ON b.fld = a.fld
AND b.number > a.times)
SELECT fld
FROM remaining
ORDER BY fld
Given your source data table, it outputs this (count included for verification):
fld times
=============
1 1
1 2
1 3
1 4
1 5
2 1
2 2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

MySQL - How to simplify this query? - sql

Related

How can I replace the LAST() function in MS Access with proper ordering on a rather large table?

How to update table by of the records in the table

How to write a multple Not equal AND Clause in TSQL?

SQL random number that doesn't repeat within a group

Returning several rows from a single query, based on a value of a column

Categories

Resources