SQL A Union B result different from A+B?

SQL A Union B result different from A+B? - sql

I was using Union to merge two SQL queries, but the results are different from a combination of two different queries.
Below is my SQL code:
(SELECT CONCAT(Name, '(', LEFT(Occupation, 1), ')')
FROM OCCUPATIONS
ORDER BY Name ASC)
UNION
(SELECT CONCAT('There are a total of ', COUNT(Occupation), ' ',
LOWER(Occupation), 's.')
FROM OCCUPATIONS
GROUP BY Occupation
ORDER BY Occupation ASC)
If I only run the first half of the query, I get the following result:
Aamina(D)
Ashley(P)
Belvet(P)
Britney(P)
Christeen(S)
Eve(A)
Jane(S)
Jennifer(A)
Jenny(S)
Julia(D)
Ketty(A)
Kristeen(S)
Maria(P)
Meera(P)
Naomi(P)
Priya(D)
Priyanka(P)
Samantha(A)
If I only run the second half of the query, I get the following result:
There are a total of 4 actors.
There are a total of 3 doctors.
There are a total of 7 professors.
There are a total of 4 singers.
Both results above are in expected order. However, if I run all the query, I get the following result:
Ashley(P)
Samantha(A)
Julia(D)
Britney(P)
Maria(P)
Meera(P)
Priya(D)
Priyanka(P)
Jennifer(A)
Ketty(A)
Belvet(P)
Naomi(P)
Jane(S)
Jenny(S)
Kristeen(S)
Christeen(S)
Eve(A)
Aamina(D)
There are a total of 4 actors.
There are a total of 3 doctors.
There are a total of 7 professors.
There are a total of 4 singers.
As you may notice, the order of the first half is screwed. Does anyone know why?
How does Union different from writting two separate SQL query? Thanks!

The order is not "screwed". You have no order by for the overall query, just for the subqueries. The ordering is not preserved. You are using UNION, which removes duplicates.
The safe way to execute this query is:
select str
from ((select concat(Name, '(', LEFT(Occupation, 1), ')') as str, 1 as which
from OCCUPATIONS
) union all
(select concat('There are a total of ', COUNT(Occupation), ' ',
lower(Occupation), 's.') as str, 2 as which
from OCCUPATIONS
group by occupation
)
) o
order by which, str

Related

sum of two columns assigned to a condition

hi im trying to get the total of two columns stored to a name then get a condition but i having error on the 'Rebound' name on line 3
the offreb and defreb has a integer type and some values are stored as 0 (zero)
SELECT team, CONCAT(firstname,' ',lastname) AS name, SUM(offreb + defreb) AS Rebounds
FROM boxscore
WHERE round = 'Finals' AND game = 7 AND Rebounds > 0
ORDER BY team, Rebounds;

You want to filter by column in the WHERE clause which is not yet calculated when the WHERE clause is executed. You can use a sub-query or having.
It should be something like this:
SELECT team, CONCAT(firstname,' ',lastname) AS name, SUM(offreb + defreb) AS Rebounds
FROM boxscore
WHERE round = 'Finals' AND game = 7
GROUP BY team, CONCAT(firstname,' ',lastname)
HAVING SUM(offreb + defreb) > 0
ORDER BY team, Rebounds;

Here using HAVING clause solves your issue.
If a table has been grouped using GROUP BY, but only certain groups
are of interest, the HAVING clause can be used, much like a WHERE
clause, to eliminate groups from the result.
Official postgres docs
SELECT
team,
CONCAT(firstname,' ',lastname) AS name,
SUM(offreb + defreb) AS "Rebounds"
FROM
boxscore
WHERE
round = 'Finals' AND game = 7
GROUP BY
team,
CONCAT(firstname,' ',lastname)
HAVING
SUM(offreb + defreb) > 0
ORDER BY
team, "Rebounds";
Note that you cannot use column alias in WHERE and GROUP BY clause, but can be used in ORDER BY clause and wrap double quotes to preserve case-sensitivity.

How to use regexp_replace() with GROUP BY clause in presto query

I am trying to retrieve records based on a custom field "ci_ku". For the same values of "ci_ku" we will be having multiple "l1m_visits", and I want to retrieve the minimum value of "l1mvisits" for each "ci_ku".
Sample Data:
ku
ci_ku
l1m_visits
1234-5678-HIJK
1234-HIJK
A
1234-9012-HIJK
1234-HIJK
B
Expected Output:
ku
ci_ku
l1m_visits
1234-5678-HIJK
1234-HIJK
A
Have tried the query below:
SELECT DISTINCT REGEXP_REPLACE(ku, CONCAT('-',CAST(v_nbr AS varchar)), '') AS ci_ku,
ku,
MIN(l1m_visits),
last_refresh_date
FROM db.schema.table
GROUP BY ci_ku;
and facing the following error:
line 1:194: Column 'ci_ku' cannot be resolved

That error is fired because the field "ci_ku" is not yet generated when the GROUP BY clause is evaluated. Further there are some more issues in your query:
not all non-aggregated rows are found within the GROUP BY clause ("ku" and "last_refresh_date" should be included)
the DISTINCT keyword will remove duplicate rows, though there are none after your SELECT statement.
Instead of using aggregation, the ROW_NUMBER window function may get your result faster. It will generate an incremental number for each of your "ci_ku" values (PARTITION BY ci_ku) and ordered by "l1m_visits" (ORDER BY ci_ku), such that your row number equal to 1 will represent the lowest "l1m_visits" for each "ci_ku".
WITH tab_with_ci_ku AS (
SELECT REGEXP_REPLACE(ku, CONCAT('-',CAST(v_nbr AS varchar)), '') AS ci_ku,
ku,
l1m_visits,
last_refresh_date
FROM db.schema.table
), ranked_visits AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ci_ku ORDER BY l1m_visits) AS rn
FROM tab_with_ci_ku
)
SELECT ku,
ci_ku,
l1m_visits
FROM ranked_visits
WHERE rn = 1
If you're using PostgreSQL, you can also use the FETCH n ROWS WITH TIES clause that retrieves the first row for each tied row number (it will pick the each row number = 1):
WITH tab_with_ci_ku AS (
SELECT REGEXP_REPLACE(ku, CONCAT('-',CAST(v_nbr AS varchar)), '') AS ci_ku,
ku,
l1m_visits,
last_refresh_date
FROM db.schema.table
)
SELECT ku,
ci_ku,
l1m_visits
FROM ranked_visits
ORDER BY ROW_NUMBER() OVER(PARTITION BY ci_ku ORDER BY l1m_visits)
FETCH FIRST 1 ROWS WITH TIES;

How to combine 2 result sets

To solve the below question, I want to combine 2 result sets (Query #1 + Query #2) using SQL.
I've tried UNION but it didn't work. Help!
Question:
Sample Data:
Query #1:
SELECT
COUNT(student_id),
MAX(registration_date),
MAX(lab1)
FROM grades;
Query #1 result:
Query #2:
SELECT
MAX(SUM(NVL(lab1, 0) + NVL(lab2, 0)))
FROM grades
GROUP BY student_id;
Query #2 result:

You had a few descent attempts. If you could edit your existing post and supply some sample data would help better show us what the underlying basis would be. Are there multiple rows per lab1 and/or lab2. Are there multiple classes represented? You want a sum, but doing a max?
You need a group by per student so each student shows of their own. So you had that. You also had max(), but also sum(). So hopefully this will be close to what you need.
SELECT
g.student_id,
max( g.registration_date ) NewestRegistrationDate,
max( coalesce( g.lab1, 0 )) HighestLab1,
max( coalesce( g.lab2, 0 )) HighestLab2,
max( coalesce( g.lab1, 0 ))
+ max( coalesce( g.lab2, 0 )) HighestBothLabs
FROM
grades g
group by
g.student_id
At least the above will give you the per-student basis as your grid image would represent, not the final answer which apparently wants the total student and the highest grade overall. That one, you WOULD want the count(), and ignore the group by as you want the answer over the entire set. But do you want the highest in each lab category? Or the highest per student with whatever the highest student's total labs are.
EX:
Student Lab1 Lab2 Total
1 75 92 167
2 74 100 174
Here, you have two students. The first has the highest lab1, but the second has highest total. You need to make that determination or at least get clarification from the teacher. If you want the highest by lab 1, you probably need something like wrapping the first query into the second. But that too may be over what you are learning. Using one result as basis for the outer such as.
select
count(*) TotalStudents,
Max( PerStudent.NewestRegistrationDate ) MostRecentRegistrationDate,
Max( PerStudent.HighestLab1 ) HighestFirstLab,
Max( PerStudent.HighestBothLabs ) HighestAllLabs
from
( SELECT
g.student_id,
max( g.registration_date ) NewestRegistrationDate,
max( coalesce( g.lab1, 0 )) HighestLab1,
max( coalesce( g.lab2, 0 )) HighestLab2,
max( coalesce( g.lab1, 0 ))
+ max( coalesce( g.lab2, 0 )) HighestBothLabs
FROM
grades g
group by
g.student_id ) PerStudent
The inner pre-query (alias PerStudent) keeps the data at the per-student level giving you the basis to get the highest per at the OUTER query level that does NOT have the group by. But at least running the first query to confirm the baseline of the data first. If you dont have that correct, the outer won't make sense either.

Thanks to DRapp, I solved it with the code below.
SELECT
COUNT(*) TotalStudents,
MAX(PerStudent.MostRecentRegistrationDate) MostRecentRegistrationDate,
MAX(PerStudent.HighestFirstLab) HighestFirstLab,
MAX(PerStudent.SumBothLabs) HighestBothLabs
FROM
(SELECT
MAX(g.registration_date) MostRecentRegistrationDate,
MAX(coalesce(g.lab1, 0)) HighestFirstLab,
SUM(coalesce(g.lab1, 0) + coalesce(g.lab2, 0)) SumBothLabs
FROM grades g
GROUP BY g.student_id) PerStudent;

Teradata Show people who have over 10 occurrences on the list

I need some help figuring this one out. See the Teradata query below. I am only trying to show the people (l_name, f_name) who have 10 or more occurrences of calls under 60 seconds. For example if a person has 9 occurrences of calls under 60 seconds, none of those records would appear in the results. However if they have 11 occurrences, all 11 records would appear in the results.
select
group_name, device_id as record, starttime, length, csr_id,
f_name, l_name, sum(score), sum(poss_score), Manager,
case
when gro.name in ('Group A’) then 'Group 1’
when gro.name in ('Group B’) then 'Group 2’
when gro.name in ('Group c’) then 'Group 3’
else gro.name
end as Group_Name
from
rep_voice re
left join
qa_complete com on re.record_ck = com.record_ck
join
users_groups us on us.user_ck = re.user_ck and us.current = 1
where
cast(re.starttime as date) between TRUNC((CURRENT_DATE-7)) and LAST_DAY((CURRENT_DATE-1))
and duration<= 60
and name in ('ABCD','EFGH','IJKL','LMNO')
qualify rank () over (partition by us.user_ck order by cast (us.modifiedon as date) desc) =1
group by
1,2,3,4,5,6,7,9,us.user_ck,us.modifiedon

I am not fully-familiar with your schema layout - just as Gordon mentioned - but I think what you are looking for is this:
select * from call_table
qualify count(*) over (partition by user_caller)>=10

SQL unique combinations

I have a table with three columns with an ID, a therapeutic class, and then a generic name. A therapeutic class can be mapped to multiple generic names.
ID therapeutic_class generic_name
1 YG4 insulin
1 CJ6 maleate
1 MG9 glargine
2 C4C diaoxy
2 KR3 supplies
3 YG4 insuilin
3 CJ6 maleate
3 MG9 glargine
I need to first look at the individual combinations of therapeutic class and generic name and then want to count how many patients have the same combination. I want my output to have three columns: one being the combo of generic names, the combo of therapeutic classes and the count of the number of patients with the combination like this:
Count Combination_generic combination_therapeutic
2 insulin, maleate, glargine YG4, CJ6, MG9
1 supplies, diaoxy C4C, KR3

One way to match patients by the sets of pairs (therapeutic_class, generic_name) is to create the comma-separated strings in your desired output, and to group by them and count. To do this right, you need a way to identify the pairs. See my Comment under the original question and my Comments to Gordon's Answer to understand some of the issues.
I do this identification in some preliminary work in the solution below. As I mentioned in my Comment, it would be better if the pairs and unique ID's existed already in your data model; I create them on the fly.
Important note: This assumes the comma-separated lists don't become too long. If you exceed 4000 characters (or approx. 32000 characters in Oracle 12, with certain options turned on), you CAN aggregate the strings into CLOBs, but you CAN'T GROUP BY CLOBs (in general, not just in this case), so this approach will fail. A more robust approach is to match the sets of pairs, not some aggregation of them. The solution is more complicated, I will not cover it unless it is needed in your problem.
with
-- Begin simulated data (not part of the solution)
test_data ( id, therapeutic_class, generic_name ) as (
select 1, 'GY6', 'insulin' from dual union all
select 1, 'MH4', 'maleate' from dual union all
select 1, 'KJ*', 'glargine' from dual union all
select 2, 'GY6', 'supplies' from dual union all
select 2, 'C4C', 'diaoxy' from dual union all
select 3, 'GY6', 'insulin' from dual union all
select 3, 'MH4', 'maleate' from dual union all
select 3, 'KJ*', 'glargine' from dual
),
-- End of simulated data (for testing purposes only).
-- SQL query solution continues BELOW THIS LINE
valid_pairs ( pair_id, therapeutic_class, generic_name ) as (
select rownum, therapeutic_class, generic_name
from (
select distinct therapeutic_class, generic_name
from test_data
)
),
first_agg ( id, tc_list, gn_list ) as (
select t.id,
listagg(p.therapeutic_class, ',') within group (order by p.pair_id),
listagg(p.generic_name , ',') within group (order by p.pair_id)
from test_data t join valid_pairs p
on t.therapeutic_class = p.therapeutic_class
and t.generic_name = p.generic_name
group by t.id
)
select count(*) as cnt, tc_list, gn_list
from first_agg
group by tc_list, gn_list
;
Output:
CNT TC_LIST GN_LIST
--- ------------------ ------------------------------
1 GY6,C4C supplies,diaoxy
2 GY6,KJ*,MH4 insulin,glargine,maleate

You are looking for listagg() and then another aggregation. I think:
select therapeutics, generics, count(*)
from (select id, listagg(therapeutic_class, ', ') within group (order by therapeutic_class) as therapeutics,
listagg(generic_name, ', ') within group (order by generic_name) as generics
from t
group by id
) t
group by therapeutics, generics;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL A Union B result different from A+B? - sql

Related

sum of two columns assigned to a condition

How to use regexp_replace() with GROUP BY clause in presto query

How to combine 2 result sets

Teradata Show people who have over 10 occurrences on the list

SQL unique combinations

Categories

Resources