Why does this mysql query give me garbage results? - sql

Im trying to get the total amount of points a user has, as well as current month's points. When a user gets a point, it gets logged into the points table with a timestamp. Totals ignore the timestamp, while the current month's points looks for the points with the correct timestamp (from the first day of the month).
SELECT user_id, user_name, sum(tpoints.point_points) as total_points, sum(mpoints.point_points) as month_points
FROM users
LEFT JOIN points tpoints
ON users.user_id = tpoints.point_userid
LEFT JOIN points mpoints
ON (users.user_id = mpoints.point_userid AND mpoints.point_date > '$this_month')
WHERE user_id = 1
GROUP BY user_id
points table structure
CREATE TABLE IF NOT EXISTS `points` (
`point_userid` int(11) NOT NULL,
`point_points` int(11) NOT NULL,
`point_date` int(11) NOT NULL,
KEY `point_userid` (`point_userid`),
KEY `point_date` (`point_date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
This results in a very large number, thats equal to the sum of all points, multiplied by the number of rows that match the query.
I need to achieve this without the use of subqueries or multiple queries.

try
SELECT user_id, user_name, sum(point_points) as total_points, sum( case when point_date > '$this_month' then point_points else 0 end ) as month_points
FROM users
LEFT JOIN points
ON users.user_id = points.point_userid
WHERE user_id = 1
GROUP BY user_id, user_name

SELECT user_id, user_name,
(
SELECT SUM(points.point_points)
FROM points
WHERE points.point_userid = users.user_id
) AS total_points,
(
SELECT SUM(points.point_points)
FROM points
WHERE points.point_userid = users.user_id
AND points.point_date > '$this_month'
) AS month_points
FROM users
WHERE user_id = 1

Related

My query is returning duplicates

I have written an SQL query to filter for a number of conditions, and have used distinct to find only unique records.
Specifically, I need only for the AccountID field to be unique, there are multiple AddressClientIDs for each AccountID.
The query works but is however producing some duplicates.
Further caveats are:
There are multiple trans for each AccountID
There can be trans record both Y and N for an AccountID
I only want to return AccountIDs which have transaction for statuses other than what's specified, hence why I used not in, as I do not want the 2 statuses.
I would like to find only unique values for the AccountID column.
If anyone could help refine the query below, it would be much appreciated.
SELECT AFS_Account.AddressClientID
,afs_transunit.AccountID
,SUM(afs_transunit.Units)
FROM AFS_TransUnit
,AFS_Account
WHERE afs_transunit.AccountID IN (
-- Gets accounts which only have non post statuses
SELECT DISTINCT accountid
FROM afs_trans
WHERE accountid NOT IN (
SELECT accountid
FROM afs_trans
WHERE STATUS IN (
'POSTPEND'
,'POSTWAIT'
)
)
-- This gets the unique accountIDs which only have transactions with Y status,
-- and removes any which have both Y and N.
AND AccountID IN (
SELECT DISTINCT accountid
FROM afs_trans
WHERE IsAllocated = 'Y'
AND accountid NOT IN (
SELECT DISTINCT AccountID
FROM afs_trans
WHERE IsAllocated = 'N'
)
)
)
AND AFS_TransUnit.AccountID = AFS_Account.AccountID
GROUP BY afs_transunit.AccountID
,AFS_Account.AddressClientID
HAVING SUM(afs_transunit.Units) > 100
Thanks.
Since you confirmed that you have one-to-many relationship across two tables on AccountID column, you could use Max value of your AccountID to get distinct values:
SELECT afa.AddressClientID
,MAX(aft.AccountID)
,SUM(aft.Units)
FROM AFS_TransUnit aft
INNER JOIN AFS_Account afa ON aft.AccountID = afa.AccountID
GROUP BY afa.AddressClientID
HAVING SUM(aft.Units) > 100
AND MAX(aft.AccountID) IN (
-- Gets accounts which only have non post statuses
-- This gets the unique accountIDs which only have transactions with Y status,
-- and removes any which have both Y and N.
SELECT DISTINCT accountid
FROM afs_trans a
WHERE [STATUS] NOT IN ('POSTPEND','POSTWAIT')
AND a.accountid IN (
SELECT t.accountid
FROM (
SELECT accountid
,max(isallocated) AS maxvalue
,min(isallocated) AS minvalue
FROM afs_trans
GROUP BY accountid
) t
WHERE t.maxvalue = 'Y'
AND t.minvalue = 'Y'
)
)
SELECT AFS_Account.AddressClientID
,afs_transunit.AccountID
,SUM(afs_transunit.Units)
FROM AFS_TransUnit
INNER JOIN AFS_Account ON AFS_TransUnit.AccountID = AFS_Account.AccountID
INNER JOIN afs_trans ON afs_trans.acccountid = afs_transunit.accountid
WHERE afs_trans.STATUS NOT IN ('POSTPEND','POSTWAIT')
-- AND afs_trans.isallocated = 'Y'
GROUP BY afs_transunit.AccountID
,AFS_Account.AddressClientID
HAVING SUM(afs_transunit.Units) > 100
and max(afs_trans.isallocated) = 'Y'
and min(afs_trans.isallocated) = 'Y'
Modified your query with ANSI SQL join syntax. As you are joining the tables, you just need to specify the conditions without using the sub-queries you have.

Getting a subquery to run N times

I'm trying to write a query that scans a table with multiple status entries for each date, for each test, for each area in a system. The goal being to get the newest status for each date for each test in ONE given area. This would allow me to get a broad overview of a system to determine where the majority of tests are failing
Below is the basic table structure, but I've created this SQLFiddle for ease of use.
CREATE TABLE area (
area_id integer NOT NULL,
area_name character varying(100)
);
CREATE TABLE test (
test_id integer NOT NULL,
test_name character varying(100) NOT NULL,
area_id integer NOT NULL,
test_isvisible boolean DEFAULT true
);
CREATE TABLE status (
status_date bigint NOT NULL,
test_id integer NOT NULL,
process_state_id integer NOT NULL,
process_step_id integer NOT NULL,
status_iteration integer DEFAULT 1 NOT NULL,
status_time bigint NOT NULL
);
CREATE TABLE process_state (
process_state_id integer NOT NULL,
process_state_name character varying(100)
);
CREATE TABLE process_step (
process_step_id integer NOT NULL,
process_step_name character varying(100)
);
The query I currently have gets the furthest point of test processing for one single test for every date that is available. I would like to figure out a way to get that same type of information but instead pass the id of a given area, so that I can get that same data for each test in that area.
I.E. in the SQLFiddle, where I have information from dates July 2 - 10 for test1, I would also like the query to return the same set of information for test2, thus returning 18 rows instead of 9.
The main problem I'm having is that when I try to just join the area table and get all of the tests that way, I end up getting only 9 days of data like I did with one test, but just a mix-and-match of data from different tests.
Let me know if you need any more information, and I will post back here if I manage to figure it out before someone here.
EDIT
As was pointed out in the comments, this trial data does not have keys (primary or foreign) simply because it saved time and wasn't necessary for the problem at hand. It is important to note though, that these keys are 100% necessary in real world application, as the larger the dataset becomes, the more unruly and time consuming it would be to run queries against your tables.
Lesson: Don't do drugs, do keys.
After a couple more hours, I found a different way to think about it, and finally got the data I was looking for.
I realized that my main problem with my previous attempts had been the use of GROUP BY, since I would have to group every selected column if I grouped any of them. So I first wrote a query that just got me the test_id/test_name along with each date that there was data for, since I knew I could group all of these no problem:
SELECT t.test_name AS test_name,
to_char( to_timestamp(s.status_date)::TIMESTAMP, 'MM/DD/YYYY' ) AS event_date,
s.status_date
FROM status s
INNER JOIN test t ON t.test_id = s.test_id
INNER JOIN area a ON a.area_id = t.area_id
INNER JOIN process_step step ON s.process_step_id = step.process_step_id
INNER JOIN process_state state ON s.process_state_id = state.process_state_id
WHERE a.area_id = 12
GROUP BY t.test_id, s.status_date, t.test_name;
This didn't give me any information about where that test made it through (completed, failed, running). So then I wrote a separate query that simply got the test status when it was given a test_id and a status_date:
SELECT
CASE WHEN state.process_state_name = 'FAILURE' OR state.process_state_name = 'WAITING' OR state.process_state_name = 'VOLUME' THEN state.process_state_name
WHEN step.process_step_name = 'COMPLETE' AND (state.process_state_name = 'SUCCESS' OR state.process_state_name = 'APPROVED') THEN 'Complete'
ELSE 'Running'
END AS process_state
FROM status s
INNER JOIN process_step step ON s.process_step_id = step.process_step_id
INNER JOIN process_state state ON s.process_state_id = state.process_state_id
WHERE s.test_id = 290
AND s.status_date = 1404273600
AND s.status_iteration = (SELECT MAX(s.status_iteration)
FROM status s
WHERE s.test_id = 290
AND s.status_date = 1404273600)
ORDER BY s.status_time DESC, s.process_step_id DESC, s.process_step_id DESC
LIMIT 1;
So this query worked for one single test and date, which I recognized would work perfectly for a subquery in my original query, since it would bypass the GROUP BY logic. So with that in mind, I proceeded to merge the two queries to get this one final query:
SELECT t.test_name AS test_name,
to_char( to_timestamp(status.status_date)::TIMESTAMP, 'MM/DD/YYYY' ) AS event_date,
(
SELECT
CASE WHEN state.process_state_name = 'FAILURE' OR state.process_state_name = 'WAITING' OR state.process_state_name = 'VOLUME' THEN state.process_state_name
WHEN step.process_step_name = 'COMPLETE' AND (state.process_state_name = 'SUCCESS' OR state.process_state_name = 'APPROVED') THEN 'Complete'
ELSE 'Running'
END AS process_state
FROM status s
INNER JOIN process_step step ON s.process_step_id = step.process_step_id
INNER JOIN process_state state ON s.process_state_id = state.process_state_id
WHERE s.test_id = t.test_id
AND s.status_date = status.status_date
AND s.status_iteration = (SELECT MAX(s.status_iteration)
FROM status s
WHERE s.test_id = t.test_id
AND s.status_date = status.status_date)
ORDER BY s.status_time DESC, s.process_step_id DESC, s.process_step_id DESC
LIMIT 1
) AS process_status
FROM status status
INNER JOIN test t ON t.test_id = status.test_id
INNER JOIN area a ON a.area_id = t.area_id
WHERE a.area_id = 12
GROUP BY t.test_id, status.status_date, t.test_name
ORDER BY 1, 2;
And all of this can be seen in action in my revised SQLFiddle.
Let me know if you have questions about what I did, hopefully this helps future developers.

How to simplify nested SQL cross join?

I'm using Postgres 9.3 and have the following four tables to have maximum flexibility regarding price and / or tax / taxe rate changes in the future (see below for more details):
CREATE TABLE main.products
(
id serial NOT NULL,
"productName" character varying(255) NOT NULL,
"productStockAmount" real NOT NULL,
)
CREATE TABLE main."productPrices"
(
id serial NOT NULL,
product_id integer NOT NULL,
"productPriceValue" real NOT NULL,
"productPriceValidFrom" timestamp without time zone NOT NULL,
)
CREATE TABLE main."productTaxes"
(
id serial NOT NULL,
product_id integer NOT NULL,
"productTaxValidFrom" timestamp without time zone NOT NULL,
"taxRate_id" integer NOT NULL,
)
CREATE TABLE main."taxRateValues"
(
id integer NOT NULL,
"taxRate_id" integer NOT NULL,
"taxRateValueValidFrom" timestamp without time zone NOT NULL,
"taxRateValue" real,
)
I built a view based on the following query to get the currently relevant values:
SELECT p.id, p."productName", p."productStockAmount", sub."productPriceValue", CHR(64+sub3."taxRate_id") AS taxRateId, sub3."taxRateValue" FROM main."products" p
CROSS JOIN LATERAL (SELECT * FROM main."productPrices" pp2 WHERE pp2."product_id"=p."id" AND pp2."productPriceValidFrom" <= NOW() ORDER BY pp2."productPriceValidFrom" DESC LIMIT 1) AS sub
CROSS JOIN LATERAL (SELECT * FROM main."productTaxes" pt WHERE pt."product_id"=p."id" AND pt."productTaxValidFrom" <= NOW() ORDER BY pt."productTaxValidFrom" DESC LIMIT 1) AS sub2
CROSS JOIN LATERAL (SELECT * FROM main."taxRateValues" trv WHERE trv."taxRate_id"=sub2."taxRate_id" AND trv."taxRateValueValidFrom" <= NOW() ORDER BY trv."taxRateValueValidFrom" DESC LIMIT 1) AS sub3
This works fine and gives me the correct results but I assume to get performance problems if several thousand products, price changes etc. are in the database.
Is there anything I can do to simplify the statement or the overall database design?
To use words to describe the needed flexibility:
Prices can be changed and I have to record which price is valid to which time (archival, so not only the current price is needed)
Applied tax rates for products can be changed (e.g. due to changes by law) - archival also needed
Tax rates in general can be changed (also by law, but not related to a single product but all products with this identifier)
Some examples of things that can happen:
Product X changes price from 100 to 200 at 2014-05-09
Product X changes tax rate from A to B at 2014-07-01
Tax rate value for tax rate A changes from 16 to 19 at 2014-09-01
As long as you fetch all rows or more than a few percent of all rows, it will be substantially faster to first aggregate once per table, and then join.
I suggest DISTINCT ON to pick the latest valid row per id:
SELECT p.id, p."productName", p."productStockAmount"
,pp."productPriceValue"
,CHR(64 + tr."taxRate_id") AS "taxRateId", tr."taxRateValue"
FROM main.products p
LEFT JOIN (
SELECT DISTINCT ON (product_id)
product_id, "productPriceValue"
FROM main."productPrices"
WHERE "productPriceValidFrom" <= now()
ORDER BY product_id, "productPriceValidFrom" DESC
) pp ON pp.product_id = p.id
LEFT JOIN (
SELECT DISTINCT ON (product_id)
product_id, "taxRate_id"
FROM main."productTaxes"
WHERE "productTaxValidFrom" <= now()
ORDER BY product_id, "productTaxValidFrom" DESC
) pt ON pt.product_id = p.id
LEFT JOIN (
SELECT DISTINCT ON ("taxRate_id") *
FROM main."taxRateValues"
WHERE "taxRateValueValidFrom" <= now()
ORDER BY "taxRate_id", "taxRateValueValidFrom" DESC
) tr ON tr."taxRate_id" = pt."taxRate_id";
Using LEFT JOIN to be on the safe side. Not every product might have entries in all sub-tables.
And I subscribe to what #Clodoaldo wrote about double-quoted identifiers. I never use anything but legal, lower-case names. Makes your life with Postgres easier.
Detailed explanation for DISTINCT ON:
Select first row in each GROUP BY group?
Do not create quoted identifiers. Once you do it you are forever stuck with them and you will have to quote and remember the casing everywhere. You can use camel case whenever you want if you don't quote the identifier at creation time.
I don't understand why you need the cross lateral. I think it can be just
select
p.id,
p."productName",
p."productStockAmount",
pp2."productPriceValue",
chr(64 + trv."taxRate_id") as "taxRateId",
trv."taxRateValue"
from
main."products" p
left join (
select *
from main."productPrices"
where "productPriceValidFrom" <= now()
order by "productPriceValidFrom" desc
limit 1
) pp2 on pp2."product_id" = p."id"
left join (
select "taxRate_id"
from main."productTaxes"
where "productTaxValidFrom" <= now()
order by "productTaxValidFrom" desc
limit 1
) pt on pt."product_id" = p."id"
left join (
select *
from main."taxRateValues"
where "taxRateValueValidFrom" <= now()
order by "taxRateValueValidFrom" desc
limit 1
) trv on trv."taxRate_id" = pt."taxRate_id"

Link subsequent patient visits from same table in SQL

I have a table containing records for patient admissions to a group of hospitals.
I would like to be able to link each record to the most recent previous record for each patient, if there is a previous record or return a null field if there is no previous record.
Further to this I would like to place some criteria of the linked records eg previous visit to the same hospital only, previous visit was less than 7 days before.
The data looks something like this (with a whole lots of other fields)
Record PatientID hospital Admitdate DischargeDate
1. 1. A. 1/2/12. 3/2/12
2. 2. A. 1/2/12. 4/2/12
3. 1. B. 4/3/12. 4/3/12
My thinking was a self join but I can't figure out how to join to the record where the difference between the admit date and the patient's previous discharge date is the minimum.
Thanks!
You could use row_number() to assign increasing numbers to records for each patient. Then you can left join to the previous record:
; with numbered_records as
(
select row_number() over (partition by PatientID, Hospital
order by Record desc) as rn
, *
from YourTable
)
select *
from numbered_records cur
left join
numbered_records prev
on prev.PatientID = cur.PatientID
and prev.Hospital = cur.Hospital
and prev.DischargeDate >= dateadd(day, -7, getdate())
and prev.rn = cur.rn + 1
To select only the latest row per patient, add:
where cur.rn = 1
at the end of the query.
It will give you the First 2 records of the same patients. If you want the same Hospital then add another check of Hospital with the PatientID. Also can add the Date as well.
SELECT * FROM T1 t
WHERE (2 >= (SELECT Count(*) FROM T1 tmp
WHERE t.PatientID = tmp.PatientID
AND t.Record <= tmp.Record))
It will only bring the one record if there is only one entry.
Note that:
I used DATE for data type. It might be possible that a patient visits one hospital before noon, and another in the afternoon. You would use DATETIME in that case. Sorting on the partitioning uses dt_admit before record_id, to allow for entry of data in any order.
CREATE TABLE #hdata(
record_id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
patient_id INT NOT NULL,
hospital_id INT NOT NULL,
dt_admit DATE NOT NULL,
dt_discharge DATE NULL
);
INSERT INTO #hdata(
patient_id,
hospital_id,
dt_admit,
dt_discharge
)
VALUES (
1,
1,
'2012-02-01',
'2012-02-03'
), (
2,
1,
'2012-02-01',
'2012-02-04'
), (
1,
2,
'2012-03-04',
'2012-03-04'
);
-- 1/ link each record to the previous record for each patient, NULL if none
SELECT
record_id,
patient_id,
ROW_NUMBER() OVER (PARTITION BY patient_id ORDER BY dt_admit,record_id) AS visit_seq_id
INTO
#visit_sequence
FROM
#hdata;
SELECT
v1.record_id,
v1.patient_id,
v2.record_id AS previous_record_id
FROM
#visit_sequence AS v1
LEFT JOIN #visit_sequence AS v2 ON
v2.patient_id=v1.patient_id AND
v2.visit_seq_id=v1.visit_seq_id-1
ORDER BY
v1.record_id;
DROP TABLE #visit_sequence;
-- 2/ criteria on linked records: same hospital, previous visit < 7 days
SELECT
record_id,
patient_id,
hospital_id,
dt_admit,
ROW_NUMBER() OVER (PARTITION BY patient_id,hospital_id ORDER BY dt_admit,record_id) AS visit_seq_id
INTO
#visit_sequence_elab
FROM
#hdata;
SELECT
v1.record_id,
v1.patient_id,
v2.record_id AS previous_record_id
FROM
#visit_sequence_elab AS v1
LEFT JOIN #visit_sequence_elab AS v2 ON
v2.patient_id=v1.patient_id AND
v2.hospital_id=v1.hospital_id AND
v2.visit_seq_id=v1.visit_seq_id-1 AND
DATEDIFF(DAY,v1.dt_admit,v2.dt_admit)<7
ORDER BY
v1.record_id;
DROP TABLE #visit_sequence_elab;
DROP TABLE #hdata;

UPDATE FROM subquery using the same table in subquery's WHERE

I have 2 integer fields in a table "user": leg_count and leg_length. The first one stores the amount of legs of a user and the second one - their total length.
Each leg that belongs to user is stored in separate table, as far as typical internet user can have zero to infinity legs:
CREATE TABLE legs (
user_id int not null,
length int not null
);
I want to recalculate the statistics for all users in one query, so I try:
UPDATE users SET
leg_count = subquery.count, leg_length = subquery.length
FROM (
SELECT COUNT(*) as count, SUM(length) as length FROM legs WHERE legs.user_id = users.id
) AS subquery;
and get "subquery in FROM cannot refer to other relations of same query level" error.
So I have to do
UPDATE users SET
leg_count = (SELECT COUNT(*) FROM legs WHERE legs.user_id = users.id),
leg_length = (SELECT SUM(length) FROM legs WHERE legs.user_id = users.id)
what makes database to perform 2 SELECT's for each row, although, required data could be calculated in one SELECT:
SELECT COUNT(*), SUM(length) FROM legs;
Is it possible to optimize my UPDATE query to use only one SELECT subquery?
I use PostgreSQL, but I beleive, the solution exists for any SQL dialect.
TIA.
I would do:
WITH stats AS
( SELECT COUNT(*) AS cnt
, SUM(length) AS totlength
, user_id
FROM legs
GROUP BY user_id
)
UPDATE users
SET leg_count = cnt, leg_length = totlength
FROM stats
WHERE stats.user_id = users.id
You could use PostgreSQL's extended update syntax:
update users as u
set leg_count = aggr.cnt
, leg_length = aggr.length
from (
select legs.user_id
, count(*) as cnt
, sum(length) as length
from legs
group by
legs.user_id
) as aggr
where u.user_id = aggr.user_id