I have a table of patients which has the following columns: patient_id, obs_id, obs_date. Obs_id is the ID of a clinical observation (such as weight reading, blood pressure reading....etc), and obs_date is when that observation was taken. Each patient could have several readings on different dates...etc. Currently I have a query to get all patients that had obs_id = 1 and insert them into a temporary table (has two columns, patient_id, and flag which I set to 0 here):
insert into temp_table (select patient_id, 0 from patients_table
where obs_id = 1 group by patient_id having count(*) >= 1)
I also execute an update statement to set the flag to 1 for all patients that also had obs_id = 5:
UPDATE temp_table SET flag = 1 WHERE EXISTS (
SELECT patient_id FROM patients_table WHERE obs_id = 5 group by patient_id having count(*) >=1
) v WHERE temp_table.patient_id = v.patient_id
Here's my question: How do I modify both queries (without combining them or removing the group by statement) such that I can answer the following question:
"get all patients who had obs_id = 5 after obs_id = 1". If I add a min(obs_date) or max(obs_date) to the select of each query and then add "AND v.obs_date > temp_table.obs_date" to the second one, is that correct??
The reason why I need not remove the group by statement or combine is because these queries are generated by a code generator (from a web app), and i'd like to do that modification without messing up the code generator or re-writing it.
Many thanks in advance,
The advantage of SQL is that it works with sets. You don't need to create temporary tables or get all procedural.
As you describe the problem (find all patients who have obs_id 5 after obs_id 1), I'd start with something like this
select distinct p1.patient_id
from patients_table p1, patients_table p2
where
p1.obs_id = 1 and
p2.obs_id = 5 and
p2.patient_id = p1.patient_id and
p2.obs_date > p1.obs_date
Of course, that doesn't help you deal with your code generator. Sometimes, tools that make things easier can also get in the way.
Related
I have a code block that has 7/8 with clauses(
sub-query refactoring) in queries. I'm looking on how to run this query as I'm getting 'sql compilation errors' when running these!, While I'm trying to run them I'm getting errors in snowflake. for eg:
with valid_Cars_Stock as (
select car_id
from vw_standard.agile_car_issue_dime
where car_stock_expiration_ts is null
and car_type_name in ('hatchback')
and car_id = 1102423975
)
, car_sale_hist as (
select vw.issue_id, vw.delivery_effective_ts, bm.car_id,
lag(bm.sprint_id) over (partition by vw.issue_id order by vw.delivery_effective_ts) as previous_stock_id
from valid_Cars_Stock i
join vw_standard.agile_car_fact vw on vw.car_id = bm.car_id
left join vw_standard.agile_board_stock_bridge b on b.board_stock_bridge_dim_key = vw.issue_board_sprint_bridge_dim_key
order by vw.car_stock_expiration_ts desc
)
,
So how to run this 2 queries separately or together! I'm new to sql aswell any help would be ideal
So lets just reformate that code as it stands:
with valid_Cars_Stock as (
select
car_id
from vw_standard.agile_car_issue_dime
where car_stock_expiration_ts is null
and car_type_name in ('hatchback')
and car_id = 1102423975
), car_sale_hist as (
select
vw.issue_id,
vw.delivery_effective_ts,
bm.car_id,
lag(bm.sprint_id) over (partition by vw.issue_id order by vw.delivery_effective_ts) as previous_stock_id
from valid_Cars_Stock i
join vw_standard.agile_car_fact vw
on vw.car_id = bm.car_id
left join vw_standard.agile_board_stock_bridge b
on b.board_stock_bridge_dim_key = vw.issue_board_sprint_bridge_dim_key
order by vw.car_stock_expiration_ts desc
),
There are clearly part of a larger block of code.
For an aside of CTE, you should 100% ignore anything anyone (including me) says about them. They are 2 things, a statical sugar, and they allow avoidance of repetition, thus the Common Table Expression name. Anyways, they CAN perform better than temp tables, AND they CAN perform worse than just repeating the say SQL many times in the same block. There is no one rule. Testing is the only way to find for you SQL what is "fastest" and it can and does change as updates/releases are made. So ignoring performance comments.
if I am trying to run a chain like this to debug it I alter the point I would like to stop normally like so:
with valid_Cars_Stock as (
select
car_id
from vw_standard.agile_car_issue_dime
where car_stock_expiration_ts is null
and car_type_name in ('hatchback')
and car_id = 1102423975
)--, car_sale_hist as (
select
vw.issue_id,
vw.delivery_effective_ts,
bm.car_id,
lag(bm.sprint_id) over (partition by vw.issue_id order by vw.delivery_effective_ts) as previous_stock_id
from valid_Cars_Stock i
join vw_standard.agile_car_fact vw
on vw.car_id = bm.car_id
left join vw_standard.agile_board_stock_bridge b
on b.board_stock_bridge_dim_key = vw.issue_board_sprint_bridge_dim_key
order by vw.car_stock_expiration_ts desc
;), NEXT_AWESOME_CTE_THAT_TOTALLY_MAKES_SENSE (
-- .....
and now the result of car_sale_hist will be returned. because we "completed" the CTE chain by not "starting another" and the ; stopped the this is all part of my SQL block.
Then once you have that steps working nicely, remove the semi-colon and end of line comments, and get of with value real value.
I have a table named Ticket Numbers, which (for this example) contain the columns:
Ticket_Number
Assigned_Group
Assigned_Group_Sequence_No
Reported_Date
Each ticket number could contain 4 rows, depending on how many times the ticket changed assigned groups. Some of these rows could contain an assigned group of "Desktop Support," but some may not. Here is an example:
Example of raw data
What I am trying to accomplish is to get the an output that contains any ticket numbers that contain 'Desktop Support', but also the assigned group of the max sequence number. Here is what I am trying to accomplish with SQL:
Queried Data
I'm trying to use SQL with the following query but have no clue what I'm doing wrong:
select ih.incident_number,ih.assigned_group, incident_history2.maxseq, incident_history2.assigned_group
from incident_history_public as ih
left join
(
select max(assigned_group_seq_no) maxseq, incident_number, assigned_group
from incident_history_public
group by incident_number, assigned_group
) incident_history2
on ih.incident_number = incident_history2.incident_number
and ih.assigned_group_seq_no = incident_history2.maxseq
where ih.ASSIGNED_GROUP LIKE '%DS%'
Does anyone know what I am doing wrong?
You might want to create a proper alias for incident_history. e.g.
from incident_history as incident_history1
and
on incident_history1.ticket_number = incident_history2.ticket_number
and incident_history1.assigned_group_seq_no = incident_history2.maxseq
In my humble opinion a first error could be that I don't see any column named "incident_history2.assigned_group".
I would try to use common table expression, to get only ticket number that contains "Desktop_support":
WITH desktop as (
SELECT distinct Ticket_Number
FROM incident_history
WHERE Assigned_Group = "Desktop Support"
),
Than an Inner Join of the result with your inner table to get ticket number and maxSeq, so in a second moment you can get also the "MAXGroup":
WITH tmp AS (
SELECT i2.Ticket_Number, i2.maxseq
FROM desktop D inner join
(SELECT Ticket_number, max(assigned_group_seq_no) as maxseq
FROM incident_history
GROUP BY ticket_number) as i2
ON D.Ticket_Number = i2.Ticket_Number
)
SELECT i.Ticket_Number, i.Assigned_Group as MAX_Group, T.maxseq, i.Reported_Date
FROM tmp T inner join incident_history i
ON T.Ticket_Number = i.Ticket_Number and i.assigned_group_seq_no = T.maxseq
I think there are several different method to resolve this question, but I really hope it's helpful for you!
For more information about Common Table Expression: https://www.essentialsql.com/introduction-common-table-expressions-ctes/
Can you please tell me what SQL query can I use to change duplicates in one column of my table?
I found these duplicates:
SELECT Model, count(*) FROM Devices GROUP BY model HAVING count(*) > 1;
I was looking for information on exactly how to change one of the duplicate values, but unfortunately I did not find a specific option for myself, and all the more information is all in abundance filled by deleting the duplicate value line, which I don't need. Not strong in SQL at all. I ask for help. Thank you so much.
You can easily use a Window Functions such as ROW_NUMBER() with partitioning option in order to group by Model column to eliminate the duplicates, and then pick the first rows(rn=1) returning from the subquery such as
WITH d AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Model) AS rn
FROM Devices
)
SELECT ID, Model -- , and the other columns
FROM d
WHERE rn = 1
Demo
use exists as follows:
update d
set Model = '-'
from Devices d
where exists (select 1 from device dd where dd.model = d.model and dd.id > d.id)
After the command:
SELECT Model, count (*) FROM Devices GROUP BY model HAVING count (*)> 1;
i get the result:
1895 lines = NULL;
3383 lines with duplicate values;
and all these values are 1243.
after applying your command:
update Devices set
Model = '-'
where id not in
(select
min(Devices .id)
from Devices
group by Devices.Model)
i got 4035 lines changed.
if you count, it turns out, (3383 + 1895) = 5278 - 1243 = 4035
and it seems like everything fits together, the result suits, it works.
The database is Postgres but any SQL logic should help.
I am retrieving the set of sales quotations that contain a given product within the bill of materials. I'm doing that in two steps: step 1, retrieve all DISTINCT quote numbers which contain a given product (by product number).
The second step, retrieve the full quote, with all products listed for each unique quote number.
So far, so good. Now the tough bit. Some rows are duplicates, some are not. Those that are duplicates (quote number & quote version & line number) might or might not have maintenance on them. I want to pick the row that has maintenance greater than 0. The duplicate rows I want to exclude are those that have a 0 maintenance. The problem is that some rows, which have no duplicates, have 0 maintenance, so I can't just filter on maintenance.
To make this exciting, the database holds quotes over 20+ years. And the data scientists guys have just admitted that maybe the ETL process has some bugs...
--- step 0
--- cleanup the workspace
SET CLIENT_ENCODING TO 'UTF8';
DROP TABLE IF EXISTS product_quotes;
--- step 1
--- get list of Product Quotes
CREATE TEMPORARY TABLE product_quotes AS (
SELECT DISTINCT master_quote_number
FROM w_quote_line_d
WHERE item_number IN ( << model numbers >> )
);
--- step 2
--- Now join on that list
SELECT
d.quote_line_number,
d.item_number,
d.item_description,
d.item_quantity,
d.unit_of_measure,
f.ref_list_price_amount,
f.quote_amount_entered,
f.negtd_discount,
--- need to calculate discount rate based on list price and negtd discount (%)
CASE
WHEN ref_list_price_amount > 0
THEN 100 - (ref_list_price_amount + negtd_discount) / ref_list_price_amount *100
ELSE 0
END AS discount_percent,
f.warranty_months,
f.master_quote_number,
f.quote_version_number,
f.maintenance_months,
f.territory_wid,
f.district_wid,
f.sales_rep_wid,
f.sales_organization_wid,
f.install_at_customer_wid,
f.ship_to_customer_wid,
f.bill_to_customer_wid,
f.sold_to_customer_wid,
d.net_value,
d.deal_score,
f.transaction_date,
f.reporting_date
FROM w_quote_line_d d
INNER JOIN product_quotes pq ON (pq.master_quote_number = d.master_quote_number)
INNER JOIN w_quote_f f ON
(f.quote_line_number = d.quote_line_number
AND f.master_quote_number = d.master_quote_number
AND f.quote_version_number = d.quote_version_number)
WHERE d.net_value >= 0 AND item_quantity > 0
ORDER BY f.master_quote_number, f.quote_version_number, d.quote_line_number
The logic to filter the duplicate rows is like this:
For each master_quote_number / version_number pair, check to see if there are duplicate line numbers. If so, pick the one with maintenance > 0.
Even in a CASE statement, I'm not sure how to write that.
Thoughts? The database is Postgres but any SQL logic should help.
I think you will want to use Window Functions. They are, in a word, awesome.
Here is a query that would "dedupe" based on your criteria:
select *
from (
select
* -- simplifying here to show the important parts
,row_number() over (
partition by master_quote_number, version_number
order by maintenance desc) as seqnum
from w_quote_line_d d
inner join product_quotes pq
on (pq.master_quote_number = d.master_quote_number)
inner join w_quote_f f
on (f.quote_line_number = d.quote_line_number
and f.master_quote_number = d.master_quote_number
and f.quote_version_number = d.quote_version_number)
) x
where seqnum = 1
The use of row_number() and the chosen partition by and order by criteria guarantee that only ONE row for each combination of quote_number/version_number will get the value of 1, and it will be the one with the highest value in maintenance (if your colleagues are right, there would only be one with a value > 0 anyway).
Can you do something like...
select
*
from
w_quote_line_d d
inner join
(
select
...
,max(maintenance)
from
w_quote_line_d
group by
...
) d1
on
d1.id = d.id
and d1.maintenance = d.maintenance;
Am I understanding your problem correctly?
Edit: Forgot the group by!
I'm not sure, but maybe you could Group By all other columns and use MAX(Maintenance) to get only the greatest.
What do you think?
I am trying to write a query for an inventory system. In order to do this I have to count the number of duplicates in one table and then compare it to the default quantity values taken from another table.
Here are two queries which I am working with currently:
SELECT Template_ID,
COUNT(Template_ID) AS Howmuch, t.name
FROM consumables e, templates t
Where t.consumable_type_id = '2410980'
GROUP BY template_id, t.name
HAVING ( COUNT(Template_ID) > 1 )
The query above takes account of each unique Template Id and gives me a count of how many duplicates are present which tells me the amount of a single substance.
Select
property_descriptors.default_value,
templates.name
From
templates,
Property_descriptors
Where
templates.consumable_type_id = '858190' And
templates.audit_id = property_descriptors.audit_id And
property_descriptors.name = 'Reorder Point'
This query finds the amount of each individual substance we would like to have in our system.
My Issue is that I dont know of a way to compare the results from the 2 queries.
Ideally, I want the query to only give the substance which have a duplicate count lower than their default value (found using query 2).
any ideas would be appreciated!
here is the table schema for reference:
Consumables
ID|Template_ID|
Templates
ID|Property_Descriptor_ID|Name|audit_id
Property_Descriptors
ID| Name|Default_Value|audit_id
Thanks!
SELECT q1.name, q2.default_value - q1.Howmuch FROM
(SELECT Template_ID, COUNT(Template_ID) AS Howmuch, t.name
FROM consumables e, templates t
Where t.consumable_type_id = '2410980'
GROUP BY template_id, t.name
HAVING ( COUNT(Template_ID) > 1 )) q1,
(SELECT property_descriptors.default_value default_value,
templates.name name
FROM
templates,
Property_descriptors
WHERE
templates.consumable_type_id = '858190' And
templates.audit_id = property_descriptors.audit_id And
property_descriptors.name = 'Reorder Point') q2
where q1.name = q2.name
should do the trick you'll need to clean up the result a bit to work away negative results
or add q2.default_value - q1.Howmuch > 0 in the outer WHERE clause