I have a table of transactions in Microsoft Access that contains many transactions for many vendors. I need to identify if there is sequential transaction numbering for each vendor. I don't know what the sequence will be or the number of transactions per vendor. I need to write a SQL that identifies sequential numbering for vendors and sets a field to '1' if present. I was thinking of running nested loops that first determine number of transactions per vendor then loops through those transactions comparing the transaction numbers. Can anybody help me with this??
To find one sequential set (2 records where one transaction number follows the other):
SELECT transactionId FROM tbl WHERE EXISTS
(SELECT * FROM tbl as t WHERE tbl.vendorId = t.vendorId
AND tbl.transactionId+1 = t.transactionId)
I'm not sure this is the most straightforward approach but I think it could work. Apologies for using multiple steps but Jet 4.0 kind of forces one to do so.**
I've assumed all transactionId values are positive integers and that a sequence is a set of evenly spaced transactionId values by vendorId. I further assume there is a key on (vendorId, transactionId).
First step, elmininate invalid rows e.g. need at least three rows to be able to determine a sequence (do all other rows pass or fail?); may want to filter other junk out here too (e.g. rows/groups with NULL values):
CREATE VIEW tbl1
AS
SELECT T1.vendorId, T1.transactionId
FROM tbl AS T1
WHERE EXISTS (
SELECT T2.vendorId
FROM tbl AS T2
WHERE T2.vendorId = T1.vendorId
GROUP
BY T2.vendorId
HAVING COUNT(*) > 2
);
Find the lowest value for each vendor (comes in handy later):
CREATE VIEW tbl2
AS
SELECT vendorId, MIN(transactionId) AS transactionId_min
FROM tbl1
GROUP
BY vendorId;
Make all sequences start at zero (transactionId_base_zero) by subtracting the lowest value for each vendor:
CREATE VIEW tbl3
AS
SELECT T1.vendorId, T1.transactionId,
T1.transactionId - T2.transactionId_min AS transactionId_base_zero
FROM tbl1 AS T1
INNER JOIN tbl2 AS T2
ON T1.vendorId = T2.vendorId;
Predict the step value (difference between adjacent sequence values) based on the MAX, MIN and COUNT set values for each vendor:
CREATE VIEW tbl4
AS
SELECT vendorId,
MAX(transactionId_base_zero) / (COUNT(*) - 1)
AS transactionId_predicted_step
FROM tbl3;
Test that the predicted step value hold true for each squence value i.e. (pseudo code) this_transactionId - step_value = prior_transactionId (omit the lowest transactionId because it doesn't have a prior value!):
SELECT DISTINCT T.vendorId
FROM tbl3 AS T
WHERE T.transactionId_base_zero > 0
AND NOT EXISTS (
SELECT *
FROM tbl3 AS T3
INNER JOIN tbl4 AS T4
ON T3.vendorId = T4.vendorId
WHERE T.vendorId = T3.vendorId
AND T.transactionId_base_zero
- T4.transactionId_predicted_step
= T3.transactionId_base_zero
);
The above query should return the vendorId of vendors whose transactionId values are not sequential.
** In my defense, I ran into a couple of bugs Jet 4.0 I had to code around workaround. Yes, I do know the bugs are in Jet 4.0 (or its OLE DB provider) because a) I double checked results using SQL Server and b) they defy logic! (even SQL's own strange 3VL logic :)
I would use a query that finds gaps in numbering for any vendor, and if that returns any records, then you do not have sequential numbering for all vendors.
SELECT *
FROM tblTransaction As T1
WHERE (
SELECT TOP 1 T2.transactionID
FROM tblTransaction As T2
WHERE T1.vendorID = T2.vendorID AND
T1.transactionID < T2.transactionID
ORDER BY T2.transactionID
) - T1.transactionID > 1
What this does is, for each record in the table, look for the lowest-numbered other transactionID in the same table that is for the same vendor and has a higher-numbered transactionID than the first one. If that the transactionID value of that record is more than one higher than the value in the first record, that represents a gap in numbering for the vendor.
Edit: Changed variable names above as requested.
Related
Ok...so what I'm trying to do is to have a query (I can't use PL/SQL as the query is utilized by an application that can't handle PL/SQL) that simply queries a table and if a particular condition isn't met, it actually creates a record with that condition in the returned results (not actually create a record in a table).
To set this up, imagine there is only one table with the following columns: ID, TEST, and SPEC and may have data like the following:
1234 LIMIT_TEST Total of limits
4321 LIMIT_TEST Total of limits
5678 LIMIT_TEST Etha
8765 LIMIT_TEST Metha
The SPEC column is produced by a case, when, then statement that pulls expressions out of a SPECIFICATION column.
So you'll see there are actually 3 LIMIT_TESTs:
Total of Limits
Etha
Metha
However, for ID 1234, there is only "Total of limits". What I need to have the query return is something like:
1234 LIMIT_TEST Total of limits
1234 LIMIT_TEST null Etha
1234 LIMIT_TEST null Metha
(Imagine in the case statement a column is added to put what the nulls are for).
Any ideas are appreciated.
You could form a UNION between your main query and another which includes a static NULL in its SELECT clause, and uses a NOT EXISTS in its WHERE clause to determine the absence of Etha and Metha.
select id, test, decode(spec, ms, spec) spec, nullif(ms, spec) missing
from (select id, test, spec, ms,
row_number() over (partition by id, ms order by decode(spec, ms, 1)) rn
from t cross join (select distinct spec ms from t) dt )
where rn = 1
SQLFiddle (I added one row here for id=1234, spec ='Etha' to check scenario
where two specs for one id exists). Table name is T, not creative.
Explanation:
select distinct spec - obvious step
cross join distinct specs with our table - probably must be done somehow in any solution (union, exists, etc.)
enumerate rows depending on if spec's are equal then this rows have priority - this is done by row_number()
take only rows with rn = 1, rest is the matter of presentation (functions decode and nullif).
This will do it...
select
c.id, c.test, d.spec, case when d.spec is null then c.spec else null end as missing_spec
from
(select a.id, a.test, b.spec from TABLE_NAME a, (select distinct spec from TABLE_NAME) b) c,
TABLE_NAME d
where c.id = d.id (+) and c.test = d.test (+) and c.spec = d.spec (+)
order by c.id, c.spec;
Assumption: There will only ever be one record in the table for each unique combination of id, test, and spec.
1) Cartesian join the source table with a distinct list of the spec values. This will provide a base result list having a record for each unique combination of all possible ids, tests, and spec values.
2) Left outer join the source table. This will allow you to identify which of all the possible unique combination are actually present in the source table.
3) Add a case to the select clause for the final results column that displays null when the combination is found and the spec value if missing.
If it is possible for the source table to have multiple records for a single combination of id, test, and spec, then you would want to add distinct before the a.id in line 4 (as mentioned by Ponder Stibbons).
I want to return the last report of a given range of units. The last report will be identified by its time of creation. Therefore, the result would be a collection of last reports for a given range of units. I do not want to use a bunch of SELECT statements e.g.:
SELECT * FROM reports WHERE unit_id = 9999 ORDER BY time desc LIMIT 1
SELECT * FROM reports WHERE unit_id = 9998 ORDER BY time desc LIMIT 1
...
I initially tried this (but already knew it wouldn't work because it will only return 1 report):
'SELECT reports.* FROM reports INNER JOIN units ON reports.unit_id = units.id WHERE units.account_id IS NOT NULL AND units.account_id = 4 ORDER BY time desc LIMIT 1'
So I am looking for some kind of solution using subqueries or derived tables, but I can't just seem to figure out how to do it properly:
'SELECT reports.* FROM reports
WHERE id IN
(
SELECT id FROM reports
INNER JOIN units ON reports.unit_id = units.id
ORDER BY time desc
LIMIT 1
)
Any solution to do this with subqueries or derived tables?
The simple way to do this in Postgres uses distinct on:
select distinct on (unit_id) r.*
from reports r
order by unit_id, time desc;
This construct is specific to Postgres and databases that use its code base. It the expression distinct on (unit_id) says "I want to keep only one row for each unit_id". The row chosen is the first row encountered with that unit_id based on the order by clause.
EDIT:
Your original query would be, assuming that id increases along with the time field:
SELECT r.*
FROM reports r
WHERE id IN (SELECT max(id)
FROM reports
GROUP BY unit_id
);
You might also try this as a not exists:
select r.*
from reports r
where not exists (select 1
from reports r2
where r2.unit_id = r.unit_id and
r2.time > r.time
);
I thought the distinct on would perform well. This last version (and maybe the previous) would really benefit from an index on reports(unit_id, time).
I need to update 2 columns in a table with values from another table
UPDATE transakcje t SET s_dzien = s_dzien0, s_cena = s_cena0
FROM
(SELECT c.price AS s_cena0, c.dzien AS s_dzien0 FROM ciagle c
WHERE c.dzien = t.k_dzien ORDER BY s_cena0 DESC LIMIT 1) AS zza;
But I got an error:
plan should not reference subplan's variable.
DB structure is as simple as possible: transakcje has k_dzien, k_cena, s_dzien, s_cena and ciagle has fields price, dzien.
I'm running PostgreSQL 9.3.
Edit
I want to update all records from transakcje.
For each row I must find one row from ciagle with same dzien and maximum price and save this price and dzien into transakcje.
In ciagle there are many rows with the same dzien (column is not distinct).
Problem
The form you had:
UPDATE tbl t
SET ...
FROM (SELECT ... WHERE col = t.col LIMIT 1) sub
... is illegal to begin with. As the error message tells you, a subquery cannot reference the table in the UPDATE clause. Items in the FROM list generally cannot reference other items on the same level (except with LATERAL in Postgres 9.3 or later). And the table in the UPDATE clause can never be referenced by subqueries in the FROM clause (and that hasn't changed in Postgres 9.3).
Even if that was possible the result would be nonsense for two reasons:
The subquery with LIMIT 1 produces exactly one row (total), while you obviously want a specific value per dzien:
one row from ciagle with same dzien
Once you amend that and compute one price per dzien, you would end up with something like a cross join unless you add a WHERE condition to unambiguously join the result from the subquery to the table to be updated. Quoting the manual on UPDATE:
In other words, a target row shouldn't join to more than one row from
the other table(s). If it does, then only one of the join rows will be
used to update the target row, but which one will be used is not readily predictable.
Solution
All of this taken into account your query could look like this:
UPDATE transakcje t
SET s_dzien = c.dzien
, s_cena = c.price
FROM (
SELECT DISTINCT ON (dzien)
dzien, price
FROM ciagle
ORDER BY dzien, price DESC
) c
WHERE t.k_dzien = c.dzien
AND (t.s_dzien IS DISTINCT FROM c.dzien OR
t.s_cena IS DISTINCT FROM c.price)
Get the highest price for every dzien in ciagle in a subquery with DISTINCT ON. Details:
Select first row in each GROUP BY group?
Like #wildplasser commented, if you all you need is the highest price, you could also use the aggregate function max() instead of DISTINCT ON:
...
FROM (
SELECT dzien, max(price) AS price
FROM ciagle
GROUP BY czien
) c
...
transakcje ends up with the same value in s_dzien and k_dzien where related rows are present in ciagle.
The added WHERE clause prevents empty updates, which you probably don't want: only cost and no effect (except for exotic special cases with triggers et al.) - a common oversight.
I have my table (cte) defintions and result set here
The CTE may look strange but it has been tested and returns the correct results in the most efficient manner that I've found yet. The below query will find the number of person IDs (patid) who are taking two or more drugs at the same time. Currently, the query works insofar as it returns the patIDs of the people taking both drugs, but not both drugs at the same time. Taking both drugs is indicated by one fillDate of one drug falling before a scriptEndDate of another drug. So
You can see in this partial result set that on line 18 the scriptFillDate is 2009-07-19 which is between the fillDate and scriptEndDate of the same patID from row 2. What constraint do I need to add so I can filter these unneeded results?
--PatientDrugList is a CTE because eventually parameters might be passed to it
--to alter the selection population
;with PatientDrugList(patid, filldate, scriptEndDate,drugName,strength)
as
(
select rx.patid,rx.fillDate,rx.scriptEndDate,rx.drugName,rx.strength
from rx
),
--the row constructor here will eventually be parameters for a stored procedure
DrugList (drugName)
as
(
select x.drugName
from (values ('concerta'),('fentanyl'))
as x(drugName)
where x.drugName is not null
)
--the row number here is so that I can find the largest date range
--(the largest datediff means the person was on a given drug for a larger
--amount of time. obviously not a optimal solution
--celko inspired relational division!
select distinct row_number() over(partition by pd.patid, drugname order by datediff(day,pd.fillDate,pd.scriptEndDate)desc) as rn
,pd.patid
,pd.drugname
,pd.fillDate
,pd.scriptEndDate
from PatientDrugList as pd
where not exists
(select * from DrugList
where not exists
(select * from PatientDrugList as pd2
where(pd.patid=pd2.patid)
and (pd2.drugName = DrugList.drugName)))
and exists
(select *
from DrugList
where DrugList.drugName=pd.drugName
)
group by pd.patid, pd.drugName,pd.filldate,pd.scriptEndDate
Wrap you original query into a CTE, or better yet, for performance, stability of query plan and result, store it into a temp table.
The query below (assuming CTE option) will give you the overlapping times when both drugs are being taken.
;with tmp as (
.. your query producing the columns shown ..
)
select *
from tmp a
join tmp b on a.patid = b.patid and a.drugname <> b.drugname
where a.filldate < b.scriptenddate
and b.filldate < a.scriptenddate;
I have a table with this data:
Id Qty
-- ---
A 1
A 2
A 3
B 112
B 125
B 109
But I'm supposed to only have the max values for each id. Max value for A is 3 and for B is 125. How can I isolate (and delete) the other values?
The final table should look like this :
Id Qty
-- ---
A 3
B 125
Running MySQL 4.1
Oh wait. Got a simpler solution :
I'll select all the max values(group by id), export the data, flush the table, reimport only the max values.
CREATE TABLE tabletemp LIKE table;
INSERT INTO tabletemp SELECT id,MAX(qty) FROM table GROUP BY id;
DROP TABLE table;
RENAME TABLE tabletemp TO table;
Thanks to all !
Try this in SQL Server:
delete from tbl o
left outer join
(Select max(qty) anz , id
from tbl i
group by i.id) k on o.id = k.id and k.anz = o.qty
where k.id is null
Revision 2 for MySQL... Can anyone check this one?:
delete from tbl o
where concat(id,qty) not in
(select concat(id,anz) from (Select max(qty) anz , id
from tbl i
group by i.id))
Explanation:
Since I was supposed to not use joins (See comments about MySQL Support on joins and delete/update/insert), I moved the subquery into a IN(a,b,c) clause.
Inside an In clause I can use a subquery, but that query is only allowed to return one field. So in order to filter all elements that are not the maximum, i need to concat both fields into a single one, so i can return it inside the in clause. So basically my query inside the IN returns the biggest ID+QTY only. To compare it with the main table i also need to make a concat on the outside, so the data for both fields match.
Basically the In clause contains:
("A3","B125")
Disclaimer: The above query is "evil!" since it uses a function (concat) on fields to compare against. This will cause any index on those fields to become almost useless. You should never formulate a query that way that is run on a regular basis. I only wanted to try to bend it so it works on mysql.
Example of this "bad construct":
(Get all o from the last 2 weeks)
select ... from orders where orderday + 14 > now()
You should allways do:
select ... from orders where orderday > now() - 14
The difference is subtle: Version 2 only has to do the math once, and is able to use the index, and version 1 has to do the math for every single row in the orders table., and you can forget about the index usage...
I'd try this:
delete from T
where exists (
select * from T as T2
where T2.Id = T.Id
and T2.Qty > T.Qty
);
For those who might have similar question in the future, this might be supported some day (it is now in SQL Server 2005 and later)
It won't require a join, and it has advantages over the use of a temporary table if the table has dependencies
with Tranked(Id,Qty,rk) as (
select
Id, Qty,
rank() over (
partition by Id
order by Qty desc
)
from T
)
delete from Tranked
where rk > 1;
You'll have to go via another table (among other things that makes a single delete statement here quite impossible in mysql is you can't delete from a table and use the same table in a subquery).
BEGIN;
create temporary table tmp_del select id,max(qty) as qty from the_tbl;
delete the_tbl from the_tbl,tmp_del where
the_tbl.id=tmp_del.id and the_tbl.qty=tmp_del.qty;
drop table tmp_del;
END;
MySQL 4.0 and later supports a simple multi-table syntax for DELETE:
DELETE t1 FROM MyTable t1 JOIN MyTable t2 ON t1.id = t2.id AND t1.qty < t2.qty;
This produces a join of each row with a given id to all other rows with the same id, and deletes only the row with the lesser qty in each pairing. After this is all done, the row with the greatest qty per group of id is left not deleted.
If you only have one row with a given id, it still works because a single row is naturally the one with the greatest value.
FWIW, I just tried my solution using MySQL 5.0.75 on a Macbook Pro 2.40GHz. I inserted 1 million rows of synthetic data, with different numbers of rows per "group":
2 rows per id completes in 26.78 sec.
5 rows per id completes in 43.18 sec.
10 rows per id completes in 1 min 3.77 sec.
100 rows per id completes in 6 min 46.60 sec.
1000 rows per id didn't complete before I terminated it.