SQL query to select multiple values - sql

I'm creating a traffic report for my company but I'm really stuck on this piece of code..
I have a report number, and type of accidents denoted by number i.e; 1-slight, 2-serious, 3-fatal and 4-not injured.
usually an accident report contains more than one number, for instance:
a report number 2014123 has a driver with serious injury '2', passenger Not Injured '4'.
so when i fire the select query where report number=2014123 , i get two records, one with '2' injury and the other with '4'.
in the above scenario, the accident is treated as 'Serious' since it contains '2'. '4' is not considered since the passenger is not injured. injury codes with (slight, serious, fatal) are treated higher than (not injured).
How can I generate the report with serious injury (i.e; code '2') and the injury count as '2'(since two records)?
Methods I have tried:
I tried with the SQL Case statement:
(Case WHEN InjuryCode IN ('1','2','4') THEN 'Serious'
WHEN InjuryCode IN ('1','2','3','4') THEN 'FATAL'
WHEN InjuryCode IN ('1','4') THEN 'SLIGHT'
ELSE 'UNKNOWN'
END) AS ACCIDENT_STATUS
but all i got was duplicating and incorrect data.
the injury code is given preference in the following manner:
3>2>1>4
eg: an accident with contains injurycode as:
1,4- slight
2,4- serious
3,4- fatal
1,2,3,4-fatal (because 3 is the highest injury code) etc etc..
I hope this doesn't get you confused, but kindly bear with me, ,i was totally confused at the beginning, but i am actually getting the picture now, although without a solution, please help!
EDIT (the full query from the comment):
SELECT REPORTNUMBER, INJURYCODE,
(CASE WHEN InjuryCode IN ('1','2','4') THEN 'Serious'
WHEN INJURYCODE IN ('1','2','3','4') THEN 'FATAL'
WHEN INJURYCODE IN ('1','4') THEN 'SLIGHT' ELSE 'UNKNOWN'
END) AS ACCIDENT_STATUS
FROM ACCIDENTS
WHERE REPORTNUMBER=20140302

The use of injuryCode suggests that you need an Injury table (if you don't have one already). Ideally, this includes some sort of severity column that you could order by - something like this:
CREATE TABLE Injury (injuryCode CHAR(1),
severity INTEGER,
description VARCHAR(20));
INSERT INTO Injury VALUES ('1', 1, 'Slight'),
('2', 2, 'Serious'),
('3', 3, 'Fatal'),
('4', 0, 'Not Injured');
Strictly speaking, what you were attempting before was sorting based on an apparent id of the injury - the only thing ids should be used for is joins, and should otherwise be considered random/undefined values (that is, the actual value is unimportant - it's whether there's anything connected to it that's important). The fact that these happen to be numerical codes (apparently stored as character data - this is perfectly acceptable) is immaterial.
Regardless, with a sorting table defined, we can now safely query the data:
SELECT AggregateAccident.reportNumber, Injury.injuryCode, Injury.description,
AggregateAccident.victimCount
FROM (SELECT Accidents.reportNumber, MAX(Injury.severity) as severity,
COUNT(*) as victimCount
FROM Accidents
JOIN Injury
ON Injury.injuryCode = Accidents.injuryCode
GROUP BY Accidents.reportNumber) AggregateAccident
JOIN Injury
ON Injury.severity = AggregateAccident.severity
ORDER BY AggregateAccident.reportNumber
(And SQL Fiddle example. Thanks to Turophile for the skeleton. Using SQL Server, but this should work on any RDBMS).
EDIT:
If you can't create a permanent table, you can create a temporary one:
WITH Injury AS (SELECT a AS injuryCode, b AS severity, c AS description
FROM (VALUES ('1', 1, 'Slight'),
('2', 2, 'Serious'),
('3', 3, 'Fatal'),
('4', 0, 'Not Injured')) I(a, b, c))
SELECT AggregateAccident.reportNumber, Injury.injuryCode, Injury.description,
AggregateAccident.victimCount
FROM (SELECT Accidents.reportNumber, MAX(Injury.severity) as severity,
COUNT(*) as victimCount
FROM Accidents
JOIN Injury
ON Injury.injuryCode = Accidents.injuryCode
GROUP BY Accidents.reportNumber) AggregateAccident
JOIN Injury
ON Injury.severity = AggregateAccident.severity
ORDER BY AggregateAccident.reportNumber
(And updated SQL Fiddle)
The WITH clause constructs what's known as a Common Table Expression (CTE), and is basically an inline view or temporary table definition. This could also be done with a subquery, but as I reference Injury twice, using a CTE means I only have to write the contained information once (in cases where the CTE is the result of some other query, this may help performance, too). Most recent/current RDBMSs support this functionality (notably, MySQL does not), including DB2.

Something like this? (Not tested).
SELECT REPORTNUMBER,
CASE INJURYCODE
WHEN 1 THEN 'SLIGHT'
WHEN 2 THEN 'Serious'
WHEN 3 THEN 'FATAL'
ELSE 'UNKNOWN'
END ACCIDENT_STATUS,
INJURYCOUNT
FROM (
SELECT REPORTNUMBER,
MAX(CASE INJURYCODE WHEN 4 THEN 0 ELSE INJURYCODE END) INJURYCODE,
COUNT(1) INJURYCOUNT,
FROM ACCIDENTS
GROUP BY REPORTNUMBER
)

This is not an answer, but an attempt to improve your skills.
If you stored the injury code as a number like this:
0 = Not injured
1 = Slight
2 = Serious
3 = Fatal
Then you could use this clear and simple SQL:
select reportnumber, max(injurycode) as injurycode, count(*) as involved
from accidents
group by reportnumber
Here is a fiddle to illustrate it: http://sqlfiddle.com/#!2/87faf/1

Related

SELECT DISTINCT to return at most one row

Given the following db structure:
Regions
id
name
1
EU
2
US
3
SEA
Customers:
id
name
region
1
peter
1
2
henry
1
3
john
2
There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.
There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.
So maybe something like this:
SELECT DISTINCT region FROM customers WHERE id IN (?, ?)
The problem with this is that the result will be either an array (if the customers are not within the same region) or a single value.
Is there are more elegant way of solving this constraint? I was thinking of SELECT INTO and use a temporary table, or I could SELECT COUNT(DISTINCT region) and then do another SELECT for the actual value if the count is less than 2, but I'd like to avoid the performance hit if possible.
There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.
There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.
This query should work:
WITH q AS (
SELECT
COUNT( * ) AS CountCustomers,
COUNT( DISTINCT c.Region ) AS CountDistinctRegions,
-- MIN( c.Region ) AS MinRegion
FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion
FROM
Customers AS c
WHERE
c.CustomerId = $senderCustomerId
OR
c.CustomerId = $receiverCustomerId
)
SELECT
CASE WHEN q.CountCustomers = 2 AND q.CountDistinctRegions = 2 THEN 'OK' ELSE 'BAD' END AS "Status",
CASE WHEN q.CountDistinctRegions = 2 THEN q.MinRegion END AS SingleRegion
FROM
q
The above query will always return a single row with 2 columns: Status and SingleRegion.
SQL doesn't have a "SINGLE( col )" aggregate function (i.e. a function that is NULL unless the aggregation group has a single row), but we can abuse MIN (or MAX) with a CASE WHEN COUNT() in a CTE or derived-table as an equivalent operation.
Alternatively, windowing-functions could be used, but annoyingly they don't work in GROUP BY queries despite being so similar, argh.
Once again, this is the ISO SQL committee's fault, not PostgreSQL's.
As your Region column is UUID you cannot use it with MIN, but I understand it should work with FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion.
As for the columns:
The Status column is either 'OK' or 'BAD' based on those business-constraints you mentioned. You might want to change it to a bit column instead of a textual one, though.
The SingleRegion column will be NOT NULL (with a valid region) if CountDistinctRegions = 2 regardless of CountCustomers, but feel free to change that, just-in-case you still want that info.
For anybody else who's interested in a simple solution, I finally came up with the (kind of obvious) way to do it:
SELECT
r.region
FROM
customers s
INNER JOIN customers r ON
s.region = r.region
WHERE s.id = 'sender_id' and r.id = 'receiver_id';
Huge credit to SELECT DISTINCT to return at most one row who helped me out a lot on this and also posted a viable solution.

CASE Statement - An expression services limit has been reached

I'm getting the following error:
An expression services limit has been reached. Please look for potentially complex expressions in your query, and try to simplify them.
I'm attempting to run the below query, however it appears there is one line too many in my case statement (when i remove the "London" Line, it works perfectly) or "Scotland" for example.
I can't think of the best way to split this statement.
If i split it into 2 queries and union all, it does work. however the ELSE 'No Region' becomes a problem. Everything which is included in the first part of the query shows as "No Region" for the second part of the query, and vice versa.
(My end goal is essentially to create a list of customers per region) I can then use this as the foundation of a regional sales report.
Many Thanks
Andy
SELECT T0.CardCode, T0.CardName, T0.PostCode,
CASE
WHEN T0.PostCodeABR IN ('DG','KW','IV','PH','AB','DD','PA','FK','KY','G','EH','ML','KA','TD') THEN 'Scotland'
WHEN T0.PostCodeABR IN ('BT') THEN 'Ireland'
WHEN T0.PostCodeABR IN ('CA','NE','DH','SR','TS','DL','LA','BD','HG','YO','HX','LS','FY','PR','BB','L','WN','BL','OL') THEN 'North M62'
WHEN T0.PostCodeABR IN ('CH','WA','CW','SK','M','HD','WF','DN','HU','DE','NG','LN','S') THEN 'South M62'
WHEN T0.PostCodeABR IN ('LL','SY','LD','SA','CF','NP') THEN 'Wales'
WHEN T0.PostCodeABR IN ('NR','IP','CB') THEN 'East Anglia'
WHEN T0.PostCodeABR IN ('SN','BS','BA','SP','BH','DT','TA','EX','TQ','PL','TR') THEN 'South West'
WHEN T0.PostCodeABR IN ('LU','AL','HP','SG','SL','RG','SO','GU','PO','BN','RH','TN','ME','CT','SS','CM','CO') THEN 'South East'
WHEN T0.PostCodeABR IN ('ST','TF','WV','WS','DY','B','WR','HR','GL','OX','CV','NN','MK','PE','LE') THEN 'Midlands'
WHEN T0.PostCodeABR IN ('WD','EN','HA','N','NW','UB','W','WC','EC','E','IG','RM','DA','BR','CR','SM','KT','TW','SW') THEN 'London'
ELSE 'No Region'
END AS 'Region'
FROM [dbo].[REPS-PostcodeABBR] T0
As I mentioned in the comment, I would suggest you create a "lookup" table for the post codes, then all you need to do is JOIN to the table, and not have a "messy" and large CASE expression (T-SQL doesn't support Case (Switch) statements).
So your lookup table would look a little like this:
CREATE TABLE dbo.PostcodeRegion (Postcode varchar(2),
Region varchar(20));
GO
--Sample data
INSERT INTO dbo.PostcodeRegion (Postcode,Region)
VALUES('DG','Scotland'),
('BT','Ireland'),
('LL','Wales');
And then your query would just do a LEFT JOIN:
SELECT RPA.CardCode,
RPA.CardName,
RPA.PostCode,
COALESCE(PR.Region,'No Region') AS Region
FROM [dbo].[REPS-PostcodeABBR] RPA --T0 is a poor choice of an alias, there is no T not 0 in "REPS-PostcodeABBR"
LEFT JOIN dbo.PostcodeRegion PR ON RPA.PostCodeABR = PR.Region;
Note you would likely want to INDEX the table as well, and/or apply a UNIQUE CONSTRAINT or PRIMARY KEY to the PostCode column.
Thanks for the help... I tried multiple ways mentioned above, and they all did work, however the most efficient seemed to be this way.
Created a lookup table within SAP; This table included PostCodeFrom, PostCodeTo, PostCodeABR, Region
This would look like; TS00, TS99, TS, North M62
I then done;
SELECT OCRD.ZipCode PCLOOKUP.Region, PCLOOKUP.PostCodeABR FROM OCRD T0 LEFT OUTER JOIN PCLOOKUP ON OCRD.ZipCode >= PCLOOKUP.PostCodeFROM AND OCRD.ZipCode <= PCLOOKUP.PostCodeFrom
Basically, if the postcode is between
FROM AND To Display the abbreviation and region.

Comparing SQL Queries

I'm considering two SQL queries (Oracle) and I shall state the difference between them by showing examples. The queries are as follows:
/* Query 1 */
SELECT DISTINCT countryCode
FROM Member M
WHERE NOT EXISTS(
(SELECT organisation FROM Member
WHERE countryCode = 'US')
MINUS
(SELECT organisation FROM Member
WHERE countryCode = M.countryCode ) )
/* Query 2 */
SELECT DISTINCT M1.countryCode
FROM Member M1, Member M2
WHERE M2.countryCode = 'US' AND M1.organisation = M2.organisation
GROUP BY M1.countryCode
HAVING COUNT(M1.organisation) = (
SELECT COUNT(M3.organisation)
FROM Member M3 WHERE M3.countryCode = 'US' )
As far as I get it, these queries give back the countries which are members of the same organisations as the United States. The scheme of Member is (countryCode, organisation, type) with bold ones as primary key. Example: ('US', 'UN', 'member'). The member table contains only a few tuples and is not complete, so when executing (1) and (2) both yield the same result (e.g. Germany, since here only 'UN' and 'G7' are in the table).
So how can I show that these queries can actually return different results?
That means how can I create an example table instance of Member such that the queries yield different results based on that table instance?
Thanks for your time and effort.
The queries will result all the country codes which are members at least with all the organization the US is member with (it could be member with other organizations as well).
I've finally found an example to show that they can actually output different values based on the same Member instance. This is actually the case when Member contains duplicates. For query 1 this is not a problem, but for query 2 it actually affects the result, since here the number of memberships is crucial. So, if you have e.g. ('FR', 'UN', member) twice in Member the HAVING COUNT(M1.organisation) will return a different value as SELECT COUNT(M3.organisation) and 'FR' would not be part of the output.
Thanks to all for your constructive suggestions, that helped me a lot.
The first query would return countries whose list of memberships is longer than that of the US. It does require they include the same organizations as US but it could be more.
The second one requires the two membership lists to be identical.
As for creating an example with real data, start with an empty table and add this row:
insert into Member (countryCode, organisation)
values ('Elbonia', 'League of Fictitious Nations')
By the way a full outer join would let you characterize the difference symmetrically:
select
mo.countryCode || ' ' ||
case
when count(case when mu.organisation is null then 1 else null end) > 0
and count(case when mo.organisation is null then 1 else null end) > 0
then 'and US both have individual memberships they that do not have in common.'
when count(case when mo.organisation is null then 1 else null end) > 0
then 'is a member of some organisations that US is not a member of.'
when count(case when mo.organisation is null then 1 else null end) > 0
then 'is not a member of some organisations that US is a member of.'
else 'has identical membership as US.'
end
from
(select * from Member where countryCode = 'US') mu
full outer join
(select * from Member where countryCode = '??') mo
on mo.organisation = mu.organisation
Please forgive the dangling prepositions.
And a disk note, though duplicate rows are not allowed in normalized data, this query has no problem with those.

SQL sub query logic

I am trying to calculate values in a column called Peak, but I need to apply different calculations dependant on the 'ChargeCode'.
Below is kind of what I am trying to do, but it results in 3 columns called Peak - Which I know is what I asked for :)
Can anyone help with the correct syntax, so that I end up with one column called Peak?
Use Test
Select Chargecode,
(SELECT 1 Where Chargecode='1') AS [Peak],
(SELECT 1 Where Chargecode='1242') AS [Peak],
Peak*2 AS [Peak],
CallType
from Daisy_March2014
Thanks
You want a case statement. I think this is what you are looking for:
Select Chargecode,
(case when chargecode = '1'
when chargecode = '1242' then 2
else 2 * Peak
end) as Peak,
CallType
from Daisy_March2014;
Thanks Gordon, I have marked you response as Answered. Here is the final working code:
(case when chargecode in ('1') then 1 when chargecode in ('1264') then 2 else Peak*2 end) as Peak,
Since it depends on your charge code, I'm going to make a wild assumption that this might be an ongoing thing where new charge codes / rules could be added. Why not store this as metadata either in the charge code table or in a new table? You could generate the initial data with this:
SELECT ChargeCode,
Multiplier
INTO ChargeMeta
FROM (
Select 1 AS ChargeCode,
1 AS Multiplier
UNION ALL
SELECT 1242 AS ChargeCode,
1 AS Multiplier
UNION ALL
SELECT ChargeCode,
2 AS Multiplier
FROM Daisy_March2014
WHERE ChargeCode NOT IN (1,1242)
) SQ
Then just join to your original data.
SELECT a.ChargeCode,
a.Peak*b.Multiplier AS Peak
FROM Daisy_March2014 a
JOIN ChargeMeta b
ON a.ChargeCode = b.ChargeCode
If you do not want to maintain all charge code multipliers, you could maintain your non-standard ones, and store the standard one in the SQL. This would be about the same as a case statement, but it may still add benefit to store the overrides in a table. At the very least, it makes it easier to re-use elsewhere. No need to check all the queries that deal with Peak values and make them consistent, if ChargeCode 42 needs to have a new multiplier set.
If you want to store the default in the table, you could use two joins instead of one, storing the default charge code under a value that will never be used. (-1?)
SELECT a.ChargeCode,
a.Peak*COALESCE(b.Multiplier,c.Multiplier) AS Peak
FROM Daisy_March2014 a
LEFT JOIN ChargeMeta b ON a.ChargeCode = b.ChargeCode
LEFT JOIN ChargeMeta c ON c.ChargeCode = -1

How would I write this SQL query?

I have the following tables:
PERSON_T DISEASE_T DRUG_T
========= ========== ========
PERSON_ID DISEASE_ID DRUG_ID
GENDER PERSON_ID PERSON_ID
NAME DISEASE_START_DATE DRUG_START_DATE
DISEASE_END_DATE DRUG_END_DATE
I want to write a query that takes an input of a disease id and returns one row for each person in the database with a column for the gender, a column for whether or not they have ever had the disease, and a column for each drug which specifies if they took the drug before contracting the disease. I.E. true would mean drug_start_date < disease_start_date. False would mean drug_start_date>disease_start_date or the person never took that particular drug.
We currently pull all of the data from the database and use Java to create a 2D array with all of these values. We are investigating moving this logic into the database. Is it possible to create a query that will return the result set as I want it or would I have to create a stored procedure? We are using Postgres, but I assume an SQL answer for another database will easily translate to Postgres.
Based on the info provided:
SELECT p.name,
p.gender,
CASE WHEN d.disease_id IS NULL THEN 'N' ELSE 'Y' END AS had_disease,
dt.drug_id
FROM PERSON p
LEFT JOIN DISEASE d ON d.person_id = p.person_id
AND d.disease_id = ?
LEFT JOIN DRUG_T dt ON dt.person_id = p.person_id
AND dt.drug_start_date < d.disease_start_date
..but there's going to be a lot of rows that will look duplicate except for the drug_id column.
You're essentially looking to create a cross-tab query with the drugs. While there are plenty of OLAP tools out there that can do this sort of thing (among all sorts of other slicing and dicing of the data), doing something like this in traditional SQL is not easy (and, in general, impossible to do without some sort of procedural syntax in all but the simplest scenarios).
You essentially have two options when doing this with SQL (well, more accurately, you have one option, and another more complicated but flexible option that derives from it):
Use a series of CASE statements in your query to produce columns that are representative of each individual drug. This requires knowing the list of variable values (i.e. drugs) ahead of time
Use a procedural SQL language, such as T-SQL, to dynamically construct a query that uses case statements as described above, but along with obtaining that list of values from the data itself.
The two options essentially do the same thing, you're just trading simplicity and ease of maintenance for flexibility in the second option.
For example, using option 1:
select
p.NAME,
p.GENDER,
(case when d.DISEASE_ID is null then 0 else 1 end) as HAD_DISEASE,
(case when sum(case when dr.DRUG_ID = 1 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_1,
(case when sum(case when dr.DRUG_ID = 2 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_2,
(case when sum(case when dr.DRUG_ID = 3 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_3
from PERSON_T p
left join DISEASE_T d on d.PERSON_ID = p.PERSON_ID and d.DISEASE_ID = #DiseaseId
left join DRUG_T dr on dr.PERSON_ID = p.PERSON_ID and dr.DRUG_START_DATE < d.DISEASE_START_DATE
group by p.PERSON_ID, p.NAME, p.GENDER, d.DISEASE_ID
As you can tell, this gets a little laborious as you get outside of just a few potential values.
The other option is to construct this query dynamically. I don't know PostgreSQL and what, if any, procedural capabilities it has, but the overall procedure would be this:
Gather list of potential DRUG_ID values along with names for the columns
Prepare three string values: the SQL prefix (everything before the first drug-related CASE statement, the SQL stuffix (everything after the last drug-related CASE statement), and the dynamic portion
Construct the dynamic portion by combining drug CASE statements based upon the previously retrieved list
Combine them into a single (hopefully valid) SQL statement and execute