SQL - Sum of Unique Values From Reused Column - sql

I need a sum of Shoes and Hats from a table containing a User, Filename, and Payload. Duplicate records should be ignored where a Duplicate Record is defined as the same User, Payload, and the portion of the Filename following the '/'. In the example table below, record #3 is a duplicate of record #2 using the rules above. The desired result is a sum of Shoes and a sum of Hats, example below.
Example Data
+---+------+----------+-----------+
| # | User | Filename | Payload |
+---+------+----------+-----------+
| 1 | A | a/123 | Shoes = 3 |
| 2 | A | a/123 | Hats = 2 |
| 3 | A | b/123 | Hats = 2 |
| 4 | B | a/123 | Shoes = 1 |
| 5 | B | a/123 | Hats = 1 |
+---+------+----------+-----------+
Expected Output
+-------+------+
| Shoes | Hats |
+-------+------+
| 4 | 3 |
+-------+------+

Hive happens to support substring_index(), so you can do:
select sum(case when payload like 'Shoes%'
then substring_index(payload, ' = ', -1)
else 0
end) as num_shoes,
sum(case when payload like 'Hats%'
then substring_index(payload, ' = ', -1)
else 0
end) as num_hats
from (select t.*,
row_number() over (partition by user, payload, substring_index(filename, '/', -1)
order by user
) as seqnum
from t
) t
where seqnum = 1;
I strongly suggest that you change your data model and not store the payload as a string. Numbers should be stored as numbers. Names should be stored as names. They should not be combined in a string, if that can be avoided.

Related

SQL Server: GROUP BY with multiple columns produces duplicate results

I'm trying to include a 3rd column into my existing SQL Server query but I am getting duplicate result values.
Here is an example of the data contained in tb_IssuedPermits:
| EmployeeName | Current |
|--------------|---------|
| Person A | 0 |
| Person A | 0 |
| Person B | 1 |
| Person C | 0 |
| Person B | 0 |
| Person A | 1 |
This is my current query which produces duplicate values based on 1 or 0 bit values.
SELECT EmployeeName, COUNT(*) AS Count, [Current]
FROM tb_IssuedPermits
GROUP BY EmployeeName, [Current]
| EmployeeName | Count | Current |
|--------------|-------|---------|
| Person A | 2 | 0 |
| Person B | 1 | 0 |
| Person C | 1 | 0 |
| Person A | 1 | 1 |
| Person B | 1 | 1 |
Any ideas on how I can amend my query to have the following expected result? I want one result row per EmployeeName. And Current shall be 1, if for the EmployeeName exists a row with Current = 1, else it shall be 0.
| EmployeeName | Count | Current |
|--------------|-------|---------|
| Person A | 3 | 1 |
| Person B | 2 | 1 |
| Person C | 1 | 0 |
The result does not need to be in any specific order.
TIA
If your Current column contains the string values 'FALSE' and 'TRUE' you can do this
SELECT EmployeeName, Count(*) AS Count,
MAX([Current]) AS Current
FROM tb_IssuedPermits
GROUP BY EmployeeName
It's a hack but it works: MAX will get the TRUE from each group if there is one.
If your Current column is a BIT, cast to INT and cast back, as #ThorstenKettner suggested.
SELECT EmployeeName,
Count(*) AS Count,
CAST(MAX(CAST([Current] AS INT)) AS BIT) AS Current
FROM tb_IssuedPermits
GROUP BY EmployeeName
Alternatively, you can use conditional aggregation:
SELECT EmployeeName,
Count(*) AS Count,
CAST(COUNT(NULLIF(Current, 0)) AS BIT) AS Current
FROM tb_IssuedPermits
GROUP BY EmployeeName
you can do like this
SELECT EmployeeName, Count(1) AS Count,SUM(CAST([Current]AS INT)) AS Current FROM tb_IssuedPermits GROUP BY EmployeeName

Count the number of appearances of char given a ID

I have to perform a query where I can count the number of distinct codes per Id.
|Id | Code
------------
| 1 | C
| 1 | I
| 2 | I
| 2 | C
| 2 | D
| 2 | D
| 3 | C
| 3 | I
| 3 | D
| 4 | I
| 4 | C
| 4 | C
The output should be something like:
|Id | Count | #Code C | #Code I | #Code D
-------------------------------------------
| 1 | 2 | 1 | 1 | 0
| 2 | 3 | 1 | 0 | 2
| 3 | 3 | 1 | 1 | 1
| 4 | 2 | 2 | 1 | 0
Can you give me some advise on this?
This answers the original version of the question.
You are looking for count(distinct):
select id, count(distinct code)
from t
group by id;
If the codes are only to the provided ones, the following query can provide the desired result.
select
pvt.Id,
codes.total As [Count],
COALESCE(C, 0) AS [#Code C],
COALESCE(I, 0) AS [#Code I],
COALESCE(D, 0) AS [#Code D]
from
( select Id, Code, Count(code) cnt
from t
Group by Id, Code) s
PIVOT(MAX(cnt) FOR Code IN ([C], [I], [D])) pvt
join (select Id, count(distinct Code) total from t group by Id) codes on pvt.Id = codes.Id ;
Note: as I can see from sample input data, code 'I' is found in all of Ids. Its count is zero for Id = 3 in the expected output (in the question).
Here is the correct output:
DB Fiddle

SQL Order By with case when

I am trying to understand the ORDER BY with CASE WHEN.
My aim is to understand it fundamentally for that I had different use cases created
My base table is as below
| Name |
|--------|
| BPM |
| BXR |
| Others |
| XZA |
| XYZ |
| PQR |
| ABC |
Query 1: Basic ORDER BY
SELECT *
FROM City
ORDER BY Name
Query 1 Result:Gave correct output as below(Name column sorted in ascending order)
| Name |
|--------|
| ABC |
| BPM |
| BXR |
| Others |
| PQR |
| XYZ |
| XZA |
Query 2: I want Others at last
SELECT *
FROM City
ORDER BY CASE
WHEN Name = 'Others' THEN 1
ELSE 0
END
Query 2 Result: I got partially correct result.I got Others at last but other names I expected it to be in ascending order.They actually appear the way they are in base table.
| Name |
|--------|
| BPM |
| BXR |
| XZA |
| XYZ |
| PQR |
| ABC |
| Others |
I am also not getting what does 0 and 1 actually mean in the ORDER BY statement.
Query 3: I want BXR and Others at last.
SELECT *
FROM City
ORDER BY CASE
WHEN Name = 'BXR' THEN 1
WHEN Name = 'Others' THEN 2
ELSE 0
END
Query 3 Result: I got partially correct result.I got 'Others' and 'BXR' at last but other Name are not in alphabetical order.Same as seen in Query 2.Here also I am not understanding the significance of 0, 1,2
| Name |
|--------|
| BPM |
| XZA |
| XYZ |
| PQR |
| ABC |
| BXR |
| Others |
Query 4: I want Others and PQR at top.
SELECT *
FROM City
ORDER BY CASE
WHEN Name = 'PQR' THEN 0
WHEN Name = 'Others' THEN 1
ELSE 2
END
QUery 4 Result: I get PQR and Others at top but the remaining names are not in aplhabetical order.
| Name |
|--------|
| PQR |
| Others |
| BPM |
| BXR |
| XZA |
| XYZ |
| ABC |
My assumption about 0, 1, 2 is that they are just numbers deciding the "order" in which a record should be.
(The record having 0 should be kept first and if all other records have 1 then should be sorted alphabetically)
(If there are '0', '1','2', in record with 0 should be first, record with 1 should be second all other record having 2 should be sorted alphabetically)
Correct me if I am wrong with this
SQLFiddle
You need to add name also in order by
DEMO
SELECT *
FROM City
ORDER BY CASE
WHEN Name = 'PQR' THEN 0
WHEN Name = 'Others' THEN 1
ELSE 2
END,name
OUTPUT:
**Name**
PQR
Others
ABC
BPM
BXR
XYZ
XZA
We can also ORDER BY using FIELD:
SELECT *
FROM City
ORDER BY FIELD(Name, 'Others', 'PQR') DESC, name;
Demo
The behavior of FIELD is such that it will return 1 for Others, 2 for PQR, and 0 for any other name. So, we use a descending order to ensure that PQR appears first, followed by Others, followed all other names.
You may keep the name column in the else case as :
SELECT *
FROM City
ORDER BY CASE
WHEN Name = 'PQR' THEN 0
WHEN Name = 'Others' THEN 1
ELSE Name
END
since always numbers has precedence over alphabets.
SQL Fiddle Demo

SQL select multiple values present in multiple columns

I have two tables DiagnosisCodes and DiagnosisConditions as shown below. I need to find the members(IDs) who have a combination of Hypertension and Diabetes. The problem here is the DiagnosisCodes are spread across 10 columns. How do I check if the member qualifies for both conditions
DiagnosisCodes
+----+-------+-------+-------+-----+--------+
| ID | Diag1 | Diag2 | Diag3 | ... | Diag10 |
+----+-------+-------+-------+-----+--------+
| A | 2502 | 2593 | NULL | ... | NULL |
| B | 2F93 | 2509 | 2593 | ... | NULL |
| C | C257 | 2509 | C6375 | ... | NULL |
+----+-------+-------+-------+-----+--------+
DiagnosisConditions
+------+--------------+
| Code | Condition |
+------+--------------+
| 2502 | Hypertension |
| 2593 | Diabetes |
| 2509 | Diabetes |
| 2F93 | Hypertension |
| 2673 | HeartFailure |
+------+--------------+
Expected Result
+---------+
| Members |
+---------+
| A |
| B |
+---------+
How do I query to check Mulitple values which are present in Multiple columns. Do you suggest to use EXISTS?
SELECT DISTINCT id
FROM diagnosiscodes
WHERE ( diag1, diag2...diag10 ) IN (SELECT code
FROM diagnosiscondition
WHERE condition IN ( 'Hypertension','Diabetes' )
)
I would do this using group by and having:
select dc.id
from diagnosiscodes dc join
diagnosiscondistions dcon
on dcon.code in (dc.diag1, dc.diag2, . . . )
group by id
having sum(case when dcon.condition = 'diabetes' then 1 else 0 end) > 0 and
sum(case when dcon.condition = 'Hypertension' then 1 else 0 end) > 0;
Then, you should fix your data structure. Having separate columns with the same information distinguished by a number is usually a sign of a poor data structure. You should have a table, called somethhing like PatientDiagnoses with one row per patient and diagnosis.
Here is one way by unpivoting the data
SELECT DISTINCT id
FROM yourtable
CROSS apply (VALUES (Diag1),(Diag2),..(Diag10))tc(Diag)
WHERE Diag IN (SELECT code
FROM diagnosiscondition
WHERE condition IN ( 'Hypertension', 'Diabetes' ) group by code having count(distinct condition)=2)

How to Order by enum in Oracle DB

I want to order by a string column where that column is an enumeration. For example:
+----+--------+----------------------+
| ID | NAME | STATUS |
+----+--------+----------------------+
| 1 | Serdar | ACTIVE |
| 2 | John | DEACTIVE |
| 3 | Jerry | WAITING_FOR_APPROVAL |
| 4 | Jessie | REJECTED |
+----+--------+----------------------+
I want to order by STATUS. It should sort the results such that the first result must have STATUS = WAITING_FOR_APPROVAL, then ACTIVE, then DEACTIVE and then REJECTED.
Is there any way to do that in SQL? Is there something like Comparator in java?
You can enumerate the values in a CASE statement and order by that
SELECT id, name, status
FROM your_table
ORDER BY (CASE status
WHEN 'WAITING_FOR_APPROVAL' THEN 1
WHEN 'ACTIVE' THEN 2
WHEN 'DEACTIVE' THEN 3
WHEN 'REJECTED' THEN 4
ELSE 5
END)