SQL - When result is duplicated on 2 fields remove all - sql

When i run this query
SELECT
DT.CONTRACT_NUMBER,
DT.ROLE,
DT.TAX_ID,
DT.EFFECTIVE_DATE
FROM DATA_TABLE DT
I get this result.
Id like to remove results where the TAX ID appears more than once for each contract.
i.e This result would be gone. If they had 3 results they would be gone.

I think window functions might be the way to go:
SELECT DT.CONTRACT_NUMBER, DT.ROLE, DT.TAX_ID, DT.EFFECTIVE_DATE
FROM (SELECT DT.CONTRACT_NUMBER, DT.ROLE, DT.TAX_ID, DT.EFFECTIVE_DATE,
COUNT(*) OVER (PARTITION BY TAX_ID) as cnt
FROM DATA_TABLE DT
WHERE DT.CONTRACT_NUMBER = '551000280'
) DT
WHERE CNT = 1;
If you actually want to keep one row per tax id, then use row_number() instead of count(*).

Related

SSRS Table of locations per item type

I have a basic query which shows what the latest product to be put in each location (FVTank) is:
SELECT TOP 1
T0.[DateTime],
T0.[TankName],
T1.[Item]
FROM
t005_pci_data T0
INNER JOIN t001_fvbatch T1 ON T1.[FVBatch] = T0.[FVBatch]
WHERE
T0.[TankName] = 'FV101'
UNION
SELECT TOP 1
T0.[DateTime],
T0.[TankName],
T1.[Item]
FROM
t005_pci_data T0
INNER JOIN t001_fvbatch T1 ON T1.[FVBatch] = T0.[FVBatch]
WHERE
T0.[TankName] = 'FV102'
[...etc...]
ORDER BY
T0.[DateTime] DESC
Which gives a result like this:
What I'd like to do is create a summary page on SSRS which would display all the locations which currently hold each item. Ideally it would look something like this:
There are 50 locations and 7 main items so I need it to have 8 headers (one additional one for "other".)
Is there a way to do this in SSRS? Or is there a better solution by doing it in SQL?
Thank you.
Add an additional column to your dataset that calculates a row number for each Item, ordered by the DateTime field:
row_number() over (partition by Item order by DateTime desc) as rn
Judging by your source query in your question, this may be best included as a wrapping select around your final query:
select DateTime
,TankName
,Item
,row_number() over (partition by Item order by DateTime desc) as rn
from(
<Your original query here>
) a
You can then use this as your row group, as without one you will not get the top aligned format you are after in each Item x column. Remember to delete the rn column but keep the grouping:
When you run this report you will get the following format (I didn't bother typing out all your data into my dataset query, hence the missing values):

select only last try

Consider the following table structure for an imaginary table named score:
player_name |player_lastname |try |score
primary key: (player_name,player_lastname,try)
(dont discuss the table schema, its just an example)
This table holds the scores of all players - every player should be able to play either one OR two times. Now, how could I fetch data about every player's last try only (i.e. first tries should be ignored for those who played more than once)?
An example of what I'm trying to achieve:
player_name,player_lastname,try,score
=====================================
bart, simpson,1,250
lisa,simpson,1,150
lisa,simpson,2,250
homer,simpson,1,300
homer,simpson,2,350
maggi,simpson,1,50
The result should be:
player_name,player_lastname,try,score
=====================================
bart, simpson,1,250
lisa,simpson,2,250
homer,simpson,2,350
maggi,simpson,1,50
One option is to JOIN the table to itself using a subquery with MAX:
select s.*
from score s
join (
select max(try) maxtry, player_name, player_lastname
from score
group by player_name, player_lastname
) s2 on s.player_name = s2.player_name
and s.player_lastname = s2.player_lastname
and s.try = s2.maxtry
SQL Fiddle Demo
Depending on your database, you may be able to take advantage of analytic functions such as ROW_NUMBER() though which would make this easier. Here is a another fiddle to demonstrate.
Since you are using postgresql, then you should be able to use the analytic ROW_NUMBER() function. This should work as well:
select *
from (
select try, player_name, player_lastname, score,
Row_Number() Over (Partition By player_name, player_lastname order by try desc) rn
from score
) s
where rn = 1
BTW -- I'd consider adding a player_id as a primary key.
This will probably have the best performance
select distinct on (player_name, player_lastname)
player_name, player_lastname, try, score
from score
order by 1, 2, 3 desc
A Rank function can solve this:
SELECT player_name,player_lastname,TRY,score
FROM (SELECT player_name,player_lastname,TRY,score,RANK() OVER (PARTITION BY player_name, Player_Lastname ORDER BY TRY DESC)AS try_rank
FROM score
)sub
WHERE try_rank = 1
I'm assuming 'try' is the number that can be 1/2.
Edit, forgot Partition BY
SELECT player_name,player_lastname,try,score
FROM scores sc
WHERE NOT EXISTS (
SELECT *
FROM scores nx
WHERE nx.player_name = sc.player_name
AND nx.player_lastname = sc.player_lastname
AND nx.try > sc.try
);
Try this out:
Sel player_name,
player_lastname,
try,
score
from score where try = 2 or
try = 1 and
(player_name,player_lastname) not in
(sel player_name,player_lastname from score where try=2);

Debugging a SQL Query

I have a table structure like below. I need to select the row where User_Id =100 and User_sub_id = 1 and time_used = minimum of all and where Timestamp the highest. The output of my query should result in :
US;1365510103204;NY;1365510103;100;1;678;
My query looks like this.
select *
from my_table
where CODE='DE'
and User_Id = 100
and User_sub_id = 1
and time_used = (select min(time_used)
from my_table
where CODE='DE'
and User_Id=100
and User_sub_id= 1);
this returns me all the 4 rows. I need only 1, the one with highest timestamp.
Many Thanks
CODE: Timestamp: Location: Time_recorded: User_Id: User_sub_Id: time_used
"US;1365510102420;NY;1365510102;100;1;1078;
"US;1365510102719;NY;1365510102;100;1;978;
"US;1365510103204;NY;1365510103;100;1;878;
"US;1365510102232;NY;1365510102;100;1;678;
"US;1365510102420;NY;1365510102;100;1;678;
"US;1365510102719;NY;1365510102;100;1;678;
"US;1365510103204;NY;1365510103;100;1;678;
"US;1365510102420;NY;1365510102;101;1;678;
"US;1365510102719;NY;1365510102;101;1;638;
"US;1365510103204;NY;1365510103;101;1;638;
Another possibly faster solution is using window functions:
select *
from (
select code,
timestamp,
min(time_used) over (partition by user_id, user_sub_id) as min_used,
row_number() over (partition by user_id, user_sub_id order by timestamp desc) as rn,
time_used,
user_id,
user_sub_id
from my_table
where CODE='US'
and User_Id = 100
and User_sub_id = 1
) t
where time_used = min_used
and rn = 1;
This only needs to scan the table once instead of twice as your solution with the sub-select is doing.
I would strongly recommend to rename the column timestamp.
First this is a reserved word and using them is not recommended.
And secondly it doesn't document anything - it's horrible name as such. time_used is much better and you should find something similar for timestamp. Is that the "recording time", the "expiration time", the "due time" or something completely different?
Then try this:
select *
from my_table
where CODE='DE'
and User_Id=100
and User_sub_id=1
and time_used=(
select min(time_used)
from my_table
where CODE='DE'
and User_Id=100 and User_sub_id=1
)
order by "timestamp" desc -- <-- this adds sorting
limit 1; -- <-- this retrieves only one row
Add to your query the following condition
ORDER BY Timestamp DESC, LIMIT 1

How do I get the top 10 results of a query?

I have a postgresql query like this:
with r as (
select
1 as reason_type_id,
rarreason as reason_id,
count(*) over() count_all
from
workorderlines
where
rarreason != 0
and finalinsdate >= '2012-12-01'
)
select
r.reason_id,
rt.desc,
count(r.reason_id) as num,
round((count(r.reason_id)::float / (select count(*) as total from r) * 100.0)::numeric, 2) as pct
from r
left outer join
rtreasons as rt
on
r.reason_id = rt.rtreason
and r.reason_type_id = rt.rtreasontype
group by
r.reason_id,
rt.desc
order by r.reason_id asc
This returns a table of results with 4 columns: the reason id, the description associated with that reason id, the number of entries having that reason id, and the percent of the total that number represents.
This table looks like this:
What I would like to do is only display the top 10 results based off the total number of entries having a reason id. However, whatever is leftover, I would like to compile into another row with a description called "Other". How would I do this?
with r2 as (
...everything before the select list...
dense_rank() over(order by pct) cause_rank
...the rest of your query...
)
select * from r2 where cause_rank < 11
union
select
NULL as reason_id,
'Other' as desc,
sum(r2.num) over() as num,
sum(r2.pct) over() as pct,
11 as cause_rank
from r2
where cause_rank >= 11
As said above Limit and for the skipping and getting the rest use offset... Try This Site
Not sure about Postgre but SELECT TOP 10... should do the trick if you sort correctly
However about the second part: You might use a Right Join for this. Join the TOP 10 Result with the whole table data and use only the records not appearing on the left side. If you calculate the sum of those you should get your "Sum of the rest" result.
I assume that vw_my_top_10 is the view showing you the top 10 records. vw_all_records shows all records (including the top 10).
Like this:
SELECT SUM(a_field)
FROM vw_my_top_10
RIGHT JOIN vw_all_records
ON (vw_my_top_10.Key = vw_all_records.Key)
WHERE vw_my_top_10.Key IS NULL

SQL query count divided by a distinct count of same query

Having some trouble with some SQL.
Take the following result for instance:
LOC_CODE CHANNEL
------------ --------------------
3ATEST-01 CHAN2
3ATEST-01 CHAN3
3ATEST-02 CHAN4
What I need to do is get a count of the above query, grouped by channel, but i want that count to be divided by the count that the "LOC_CODE" appears.
Example of the result I am after is:
CHANNEL COUNT
---------------- ----------
CHAN2 0.5
CHAN3 0.5
CHAN4 1
Above explaination is that the CHAN2 appears next to "3ATEST-01", but that LOC_CODE of "3ATEST-01" appears twice, so the count should be divided by 2.
I know I can do this by basically duplicating the query with a distinct count, but the underlying query is quite complex and don't really want to harm performance.
Please let me know if you would like more information!
Try:
select channel,
count(*) over (partition by channel, loc_code)
/ count(*) over (partition by loc_code) as count_ratio
from my_table
SELECT t.CHANNEL, COUNT(*) / gr.TotalCount
FROM my_table t JOIN (
SELECT LOC_CODE, COUNT(*) TotalCount
FROM my_table
GROUP BY LOC_CODE
) gr USING(LOC_CODE)
GROUP BY t.LOC_CODE, t.CHANNEL
Create a index on (LOC_CODE, CHANNEL)
If are no duplicate channels, replace COUNT(*) / gr.TotalCount with 1 / gr.TotalCount and remove the GROUP BY clause
First, find a query that gets you the correct results. Then, see if it can be optimised. My guess is that it's hard to optimise as you require two different groupings, one per Channel and one pre Loc_Code.
I'm not even sure that this fits your description:
SELECT t.CHANNEL
, COUNT(*) / SUM(grp.TotalCount)
FROM my_table t
JOIN
( SELECT LOC_CODE
, COUNT(*) TotalCount --- or is it perhaps?:
--- COUNT(DISTINCT CHANNEL)
FROM my_table
GROUP BY LOC_CODE
) grp
ON grp.LOC_CODE = t.LOC_CODE
GROUP BY t.CHANNEL
Your requirements are still a bit unclear to me when it comes to duplicate CHANNELs, but this should work if you want grouping on both CHANNEL and LOC_CODE to sum up later;
SELECT L1.CHANNEL, 1/COUNT(L2.LOC_CODE)
FROM Locations L1
LEFT JOIN Locations L2 ON L1.LOC_CODE = L2.LOC_CODE
GROUP BY L1.CHANNEL, L1.LOC_CODE
Demo here.