SQL Server: Return results when not duplicate - sql

I just can't seem to get my mind wrapped around this. To simplify as much as possible, let's say I have a table:
Id cid Account
1 4010 Bank Co
2 5323 Webazon
3 3513 Internal
4 3513 PhoneCo
5 5597 Internal
I'm wanting to return all results except for the lines that are Account = 'Internal' where there's also a customer with the same cid. So, in this case, we would return lines 1,2,4, and 5. Line 3 would not be returned, because 'PhoneCo' and 'Internal' share cid 3513. However, line 5 would be returned because there's not another record that shares cid 5597.
I'm going down the road of doing it with a UNION, where the first part is eliminating all 'Internal' records, and the second part is just those I'm interested in, but I may be going about it the wrong way.

Here is one method:
select t.*
from t
where t.account <> 'Internal' or
not exists (select 1
from t t2
where t2.cid = t.cid and t2.account <> 'Internal'
);
That is, select everything all non-internal records. And, select records for internal accounts where there is not a corresponding non-internal account.

Related

How to write a SQL query to calculate percentages based on values across different tables?

Suppose I have a database containing two tables, similar to below:
Table 1:
tweet_id tweet
1 Scrap the election results
2 The election was great!
3 Great stuff
Table 2:
politician tweet_id
TRUE 1
FALSE 2
FALSE 3
I'm trying to write a SQL query which returns the percentage of tweets that contain the word 'election' broken down by whether they were a politician or not.
So for instance here, the first 2 tweets in Table 1 contain the word election. By looking at Table 2, you can see that tweet_id 1 was written by a politician, whereas tweet_id 2 was written by a non-politician.
Hence, the result of the SQL query should return 50% for politicians and 50% for non-politicians (i.e. two tweets contained the word 'election', one by a politician and one by a non-politician).
Any ideas how to write this in SQL?
You could do this by creating one subquery to return all election tweets, and one subquery to return all election tweets by politicians, then join.
Here is a sample. Note that you may need to cast the totals to decimals before dividing (depending on which SQL provider you are working in).
select
politician_tweets.total / election_tweets.total
from
(
select
count(tweet) as total
from
table_1
join table_2 on table_1.tweet_id = table_2.tweet_id
where
tweet like '%election%'
) election_tweets
join
(
select
count(tweet) as total
from
table_1
join table_2 on table_1.tweet_id = table_2.tweet_id
where
tweet like '%election%' and
politician = 1
) politician_tweets
on 1 = 1
You can use aggregation like this:
select t2.politician, avg( case when t.tweet like '%election%' then 1.0 else 0 end) as election_ratio
from tweets t join
table2 t2
on t.tweet_id = t2.tweet_id
group by t2.politician;
Here is a db<>fiddle.

SQL to return records that do not have a complete set according to a second table

I have two tables. I want to find the erroneous records in the first table based on the fact that they aren't complete set as determined by the second table. eg:
custID service transID
1 20 1
1 20 2
1 50 2
2 49 1
2 138 1
3 80 1
3 140 1
comboID combinations
1 Y00020Y00050
2 Y00049Y00138
3 Y00020Y00049
4 Y00020Y00080Y00140
So in this example I would want a query to return the first row of the first table because it does not have a matching 49 or 50 or (80 and 140), and the last two rows as well (because there is no 20). The second transaction is fine, and the second customer is fine.
I couldn't figure this out with a query, so I wound up writing a program that loads the services per customer and transid into an array, iterates over them, and ensures that there is at least one matching combination record where all the services in the combination are present in the initially loaded array. Even that came off as hamfisted, but it was less of a nightmare than the awkward outer joining of multiple joins I was trying to accomplish with SQL.
Taking a step back, I think I need to restructure the combinations table into something more accommodating, but I still can't think of what the approach would be.
I do not have DB2 so I have tested on Oracle. However listagg function should be there as well. The table service is the first table and comb the second one. I assume the service numbers to be sorted as in the combinations column.
select service.*
from service
join
(
select S.custid, S.transid
from
(
select custid, transid, listagg(concat('Y000',service)) within group(order by service) as agg
from service
group by custid, transid
) S
where not exists
(
select *
from comb
where S.agg = comb.combinations
)
) NOT_F on NOT_F.custid = service.custid and NOT_F.transid = service.transid
I dare to say that your database design does not conform to the first normal form since the combinations column is not atomic. Think about it.

Unable to remove duplicates with subquery

I had a built a query previously that returned zero duplicates until I decided to join in a couple more tables. Now that I've joined them in, I'm unable to create the desired flag due to duplicates being returned. I've attached the scenario below as an example.
I only want one occurrence of each reference number (123456789). I want to create a flag when certain criteria are met. For example, I want to see when reference numbers for a certain account meet "X", but when I join the table I get every instance of that reference number in the joined table.
REFNO BEG END STATUS
123456789 123 456 E
123456789 456 789 E
123456789 789 012 A
I want to see all of the REFNO's based on other parameters set in the query, but I want a flag for anything where END = '012'. I can't left join to the table because I will get all three lines. If I do an inner join then I just get the 012 lines. I Tried the code below in my select statement to only pull when that scenario exists, but I'm getting wacky returns and don't know why. I feel like this should be fairly easy to accomplish, but I can't wrap my head around how to create a flag for just that scenario without getting duplicates or removing results with an inner join.
,(CASE WHEN EXISTS(SELECT 1
FROM QW.ABCD Z
WHERE Z.ABCD = P.ABCD
AND Z.END = '012'
AND Z.TIMESTAMP IS NULL
AND Z.STATUS IN ('A','E'))
THEN 'Y' ELSE 'N'
END)
AS "FLAG"
Please help as I'm not sure what I'm doing wrong to get the flag I want to see.
I am not sure id DB2 allows this combination of UNION and GROUP BY, but what you want is probably something like this:
SELECT 'N' AS FLAG, REF_NO,MIN(BEG),MAX(THEEND) FROM TAB_QW A
WHERE NOT EXIST (SELECT * FROM TAB_QW B WHERE B.REF_NO = A.REF_NO AND B.THEEND = '012')
GROUP BY REF_NO
UNION
SELECT 'Y' AS FLAG, REF_NO,MIN(BEG),MAX(THEEND) FROM TAB_QW C
WHERE EXISTS (SELECT * FROM TAB_QW D WHERE D.REF_NO = C.REF_NO AND D.THEEND = '012')
GROUP BY REF_NO

How to make one column fixed?

There is one scheme and different items inside it, so the scenario is that if user send SchemeID to the procedure then it should return the SchemeName(once) and all items inside a scheme i.e. DescriptionOfitem, Quantity, Rate, Amount... in this format
SchemeName DescriptionOfItems Quantity Unit Rate Amount
Scheme01 Bulbs 2 M2 200 400
Titles 10 M3 300 3000
SolarPanels 2 M2 1000 2000
Bricks 50 M9 50 2500
Total 7900
My try, it works but it also repeats the SchemeName for each row and can't find total
Select
Schemes.SchemeName,
ContractorsWorkDetails.ContractorsWorkDetailsItemDescription,
ContractorsWorkDetails.ContractorsWorkDetailsUnit,
ContractorsWorkDetails.ContractorsWorkDetailsItemQuantity,
ontractorsWorkDetails.ContractorsWorkDetailsItemRate,
ContractorsWorkDetails.ContractorsWorkDetailsAmount
From ContractorsWorkDetails
Inner Join Schemes
ON Schemes.pk_Schemes_SchemeID= ContractorsWorkDetails.fk_Schemes_ContractorsWorkDetails_SchemeID
Where ContractorsWorkDetails.fk_Schemes_ContractorsWorkDetails_SchemeID= 2
Update:
I tested the query as suggested below but it gives this kinda result
You can get the total using grouping sets. I would advise you to keep the schema name on each row. If you want it filtered out on certain rows, then do that at the application layer.
Now, having said that, I think this will do what you want in SQL:
Select (case when GROUPING(cwd.ContractorsWorkDetailsItemDescription) = 0
then 'Total'
when row_number() over (partition by s.SchemeName
order by cwd.ContractorsWorkDetailsItemDescription
) = 1
then s.SchemeName else ''
end) as SchemeName,
cwd.ContractorsWorkDetailsItemDescription,
cwd.ContractorsWorkDetailsUnit,
cwd.ContractorsWorkDetailsItemQuantity,
cwd.ContractorsWorkDetailsItemRate,
SUM(cwd.ContractorsWorkDetailsAmount) as ContractorsWorkDetailsAmount
From ContractorsWorkDetails cwd Inner Join
Schemes s
ON s.pk_Schemes_SchemeID = cwd.fk_Schemes_ContractorsWorkDetails_SchemeID
Where cwd.fk_Schemes_ContractorsWorkDetails_SchemeID = 2
group by GROUPING SETS ((s.SchemeName,
cwd.ContractorsWorkDetailsItemDescription,
cwd.ContractorsWorkDetailsUnit,
cwd.ContractorsWorkDetailsItemQuantity,
cwd.ContractorsWorkDetailsItemRate
), s.SchemeName)
Order By GROUPING(cwd.ContractorsWorkDetailsItemDescription),
s.SchemeName, cwd.ContractorsWorkDetailsItemDescription;
The reason you don't want to do this in SQL is because the result set no longer has a relational structure: the ordering of the rows is important.

DB2 query to get next available number in table

I have a table with few columns and I want to achieve the following functionality using DB2 query.
say for e.g. USR table has User ID column and Option ID column
USER ID OPTION ID
1 1
1 5
1 22
1 100
1 999
I want to write a query and result should be next available number in sequence.
So when the first time query will be executed, it should return me the next
available option ID as 2, so user will enter #2, so DB would have now
USER ID OPTION ID
1 1
1 2
1 5
1 22
1 100
1 999
so now when the query will be executed, it will show me available Option ID as 3.
Can somebody help to get the optimized query to get the correct results?
Please note that I think that exposing option_id to the user is a terrible idea, business requirement or no. Surrogate id's like this are meant to be completely hidden from the end user ('natural' keys, like credit-card numbers, obviously have to be exposed, but still shouldn't be dictated in this manner).
The following should work on any version of DB2:
SELECT a.optionid + :nextIncrement as next_value
FROM Usr as a
LEFT JOIN Usr as b
ON b.userid = a.userid
AND b.optionid = a.optionid + :nextIncrement
WHERE a.userid = :userId
AND b.userid IS NULL
ORDER BY a.optionid ASC
FETCH FIRST 1 ROW ONLY
(statement run against a local table on my iSeries instance, with host variables replaced)
Again, I strongly recommend you not use this, and see about getting the business requirement changed.