Postgresql: Query using multiple attributes - sql

I'm trying to create a query with Postgresql. Unfortunately, the attributes in the attribute_table are listed as rows instead of columns which makes it harder to pull. I want to pull a count based on the three attributes I have listed below (1000 = gender - 2 = female, 1001 = age group - 5 = 55-64, 1002 = household size = 1). How do I adjust this query so that it only gives me one row vs three rows of the same personal_ID? Also when I use this query, it doesn't pull any values but if I put only 1 attribute it works.
select sa.country_id ,count(distinct sa.personal_id )
from study_table sa ,attribute_table a
where sa.country_id =a.country_id and sa.personal_id =a.personal_id
and to_char(sa.mailing_date,'yyyy-MM')='2021-01'
and attribute_id =1000 and a.attribute_number =2
and attribute_id =1001 and a.attribute_number =5
and attribute_id =1002 and a.attribute_number =1
and study_type ='Wave'
and status not in ('NEW','EXCLUDED','ERROR')
group by sa.country_id

First, use proper, explicit, standard JOIN syntax.
Second, your WHERE conditions are contradictory. You need ORs . . . and then a HAVING for the final filtering:
select sa.country_id ,count(distinct sa.personal_id )
from study_table sa join
attribute_table a
on sa.country_id = a.country_id and
sa.personal_id = a.personal_id
where to_char(sa.mailing_date,'yyyy-MM') = '2021-01' and
( (attribute_id = 1000 and a.attribute_number = 2) or
(attribute_id = 1001 and a.attribute_number = 5) or
(attribute_id = 1002 and a.attribute_number = 1)
) and
study_type ='Wave'
status not in ('NEW','EXCLUDED','ERROR')
group by sa.country_id
having count(distinct attribute_id) = 3;

It seems your attribute_table follows an Entity-Attribute-Value. (IMHO that is an extremely bad bad plan - but if that is what you got that is what you deal with). So to get/validate/set 3 attributes you need to reference that table 3 times in the query -once for each attribute.
select sa.country_id
, count(distinct sa.personal_id )
from study_table sa
join ( select g.personal_id
from attribute_table g
join attribute_table ag on ag.personal_id = ag.personal_id
join attribute_table hs on hs.personal_id = hs.personal_id
where 1=1
and (g.attribute_id =1000 and g.attribute_number =2)
and (ag.attribute_id =1001 and ag.attribute_number =5)
and (hs.attribute_id =1002 and hs.attribute_number =1)
) sf
on sf.personal_id = sa.personal_id
where 1=1
and to_char(sa.mailing_date,'yyyy-mm')='2021-01'
and sa.study_type ='Wave'
and sa.status not in ('NEW','EXCLUDED','ERROR')
group by sa.country_id;
The above of course assumes the column names study_type and status originate in the study table - seems likely.
But if not you will need 2 more references to the attribute table.
Hint: Always use tables aliases on ALL column references.

Related

SQL Server Select Statement Columns

Original Query:
select StudyID, count(CompletedDate), count(Removed), count(RemovalReason)
from Study a
full outer join Households b
on a.HouseholdID = b.HouseholdID
where StudyID = '123456'
and Removed = 1
and RemovalReason = 5
group by StudyID
How do I write out this query so that for each column (CompletedDate, Removed, and RemovalReason) is not restricted to the conditions (i.e. Removed = 1, Removal Reason = 5) and only applies to the specific column. If I execute this query, it will not show me the total count for CompletedDate because I'm restricting it to these conditions. Is there a way to write it directly next to count?
Table/Columns - Study:
HouseholdID (primary key),
StudyID,
CompletedDate
Table/Columns - Households:
HouseholdID (primary key),
Removed,
RemovalReason
I think you are looking for something like this, but your question is a little loose with details:
select StudyID
, count(CompletedDate)
, sum(case when Removed = 1 then 1 else 0 end)
, sum(case when RemovalReason = 5 then 1 else 0 end)
from Study a
join Households b
on a.HouseholdID = b.HouseholdID
where StudyID = '123456'
group by StudyID

How to find highest count of result set using multiple tables in SQL (Oracle)

I have four tables. Here are the skeletons...
ACADEMIC_TBL
academic_id
academic_name
AFFILIATION_TBL
academic_id*
institution_id*
joined_date
leave_date
INSTITUTION_TBL
institution_id
institution_name
REVIEW_TBL
academic_id*
institution_id*
date_posted
review_score
Using these tables I need to find the academic (displaying their name, not ID) with the highest number of reviews and the institution name (not ID) they are currently affiliated with. I imagine this will need to be done using multiple sub-select scripts but I'm having trouble figuring out how to structure it.
this will work:
SELECT at.academic_name,
it.institution_name,
Max(rt.review_score),
from academic_tbl at,
affiliation_tbl afft,
institution_tbl it,
review_tbl rt
WHERE AT.academic_id=afft.academic_id
AND afft.institution_id=it.institution_id
AND afft.academic_id=rt.academic_id
GROUP BY at.academia_name,it.instituton_id
You need an aggregated query that JOINs all 4 tables to count how many reviews were performed by each academic.
Query :
SELECT
inst.institution_name,
aca.academic_name,
COUNT(*)
FROM
academic_tbl aca
INNER JOIN affiliation_tbl aff ON aff.academic_id = aca.academic_id
INNER JOIN institution_tbl inst ON inst.institution_id = aff.institution_id
INNER JOIN review_tbl rev ON rev.academic_id = aca.academic_id AND rev.institution_id = aff.institution_id
GROUP BY
inst.institution_name,
aca.academic_name,
inst.institution_id,
aca.academic_id
NB :
added the academic and institution id to the GROUP BY clause to prevent potential academics or institutions having the same name from being (wrongly) grouped together
if the same academic performed reviews for different institutions, then you will find one row for each academic / institution couple, which, if I understood you right, is what you want
Try this one:
select
inst.institution_name
, aca.academic_name
from
academic_tbl aca
, institution_tbl inst
, affiliation_tbl aff
, review_tbl rev
, (
select
max(rt.review_score) max_score
from
review_tbl rt
, affiliation_tbl aff_inn
where
rt.date_posted >= aff_inn.join_date
and rt.date_posted <= aff_inn.leave_date
and rt.academic_id = aff_inn.academic_id
and rt.institution_id = aff_inn.institution_id
)
agg
where
aca.academic_id = inst.academic_id
and inst.institution_id = aff.institution_id
and aff.institution_id = rev.institution_id
and aff.academic_id = rev.academic_id
and rev.date_posted >= aff.join_date
and rev.date_posted <= aff.leave_date
and rev.review_score = agg.max_score
;
It might return more than one academic, if there are more with the same score (maximum one).

How to Get a Count of Records Using Partitioning in Oracle

I have the following query:
SELECT
F.IID,
F.E_NUM AS M_E_NUM,
MCI.E_NUM AS MCI_E_NUM,
F.C_NUM AS M_C_NUM,
MCI.C_NUM AS MCI_C_NUM,
F.ET_ID AS M_ET_ID,
EDIE.ET_ID AS ED_INDV_ET_ID,
COUNT(*) OVER (PARTITION BY F.IID) IID_COUNT
FROM FT_T F JOIN CEMEI_T MCI ON F.IID = MCI.IID
JOIN EDE_T EDE ON MCI.E_NUM = EDE.E_NUM
JOIN EDIE_T EDIE ON EDIE.IID = F.IID AND EDIE.ET_ID = EDE.ET_ID
WHERE
F.DEL_F = 'N'
AND MCI.EFF_END_DT IS NULL
AND MCI.TOS = 'BVVB'
AND EDE.PTEND_DT IS NULL
AND EDE.DEL_S = 'N'
AND EDE.CUR_IND = 'A'
AND EDIE.TAR_N = 'Y'
AND F.IID IN
(
SELECT DISTINCT IID
FROM FT_T
WHERE GROUP_ID = 'BG'
AND DEL_F = 'N'
AND (IID, E_NUM) NOT IN
(
SELECT IID, E_NUM FROM CEMEI_T
WHERE TOS = 'BVVB' AND EFF_END_DT IS NULL
)
);
I am basically grabbing information from several tables and creating a flat record of them.
Everything works accordingly except now I need to find out whether there are two records in FT_T table with identical IID's and display that count as part of the result set.
I tried to use partitioning but all the rows in the result set return a single count even though there are ones that have 2 records with identical IID's in FT_T.
The reason I initially said that I'm gathering information from several tables is due to the fact that FT_T might not have all the information I need if two records are not available for the same IID, so I have to retrieve them from other tables JOINed in the query. However, I need to know which FT_T.IID's have two records in FT_T (or greater than one).
Perhaps you need to calculate the count before the join and filtering:
SELECT . . .
FROM (SELECT F.*,
COUNT(*) OVER (PARTITION BY F.IID) as IID_CNT
FROM FT_T F
) JOIN
CEMEI_T MCI
ON F.IID = MCI.IID JOIN
EDE_T EDE
ON MCI.E_NUM = EDE.E_NUM JOIN
EDIE_T EDIE
ON EDIE.IID = F.IID AND EDIE.ET_ID = EDE.ET_ID
. . .
this is merely a comment/observation, but formatting is needed
You use of in(...) with select distinct and not in(...,...) seems complex and could be a problem if some values are NULL. I suggest you consider using EXISTS and NOT EXISTS instead. e.g.
AND EXISTS (
SELECT
NULL
FROM FT_T
WHERE F.IID = FT_T.IID
AND FT_T.GROUP_ID = 'BG'
AND FT_T.DEL_F = 'N'
AND NOT EXISTS (
SELECT
NULL
FROM CEMEI_T
WHERE FT_T.IID = CEMEI_T.IID
AND FT_T.E_NUM = CEMEI_T.E_NUM
AND CEMEI_T.TOS = 'BVVB'
AND CEMEI_T.EFF_END_DT IS NULL
)
)

How to get fields from multiple tables

I want to get fields from 2 different tables . The last field candidate_score_id has a many to one relationship. So how should I join the below 2 queries
1) To get candidate_score_id from the candidate_score table.
select candidate_score_id from candidate_score a where
a.assessment_id = NEW.assessment_id and
a.candidate_id = NEW.candidate_id and
a.attempt_Count = NEW.attempt_count;
2) To insert different fields in to the candidate_score_details table. The field in this table should be obtained by query above.
insert into candidate_score_details(candidate_score_details_id, candidate_id, assessment_id, attempt_count, score_type, score_tag,correct, candidate_score_id)
select uuid();
select a.candidate_id, a.assessment_id,a.attempt_count,"BY-COMPLEXITY",
case c.complexity
when 1 then "HIGH"
when 2 then "MEDIUM"
when 3 then "LOW"
end, count(*) from candidate_answer a, answer_key b, question_meta_data c where a.candidate_id = NEW.candidate_id and
a.assessment_id = NEW.assessment_id and
a.attempt_count = NEW.attempt_count and
a.assessment_id = b.assessment_id and
a.question_id = b.question_number and
a.response = b.answer and
a.question_id = c.question_number
group by a.candidate_id, a.assessment_id, a.attempt_count, c.complexity;
Just looking at the SQL joining aspect of your question, you'll need to specify the table I THINK you're aliasing a 2nd table with the "NEW" reference. If that's the case, then the query would be (replacing "OTHER_TABLE_NAME" with the name of the 2nd table:
select a.candidate_score_id
from candidate_score a
left join OTHER_TABLE_NAME new on
and a.assessment_id = NEW.assessment_id
and a.candidate_id = NEW.candidate_id
and a.attempt_Count = NEW.attempt_count
Seems that Query 1 has the same 3 criteria on the "candidate_score" table as for the "candidate_answer" table in Query 2.
So how about adding a LEFT JOIN of "candidate_score" to "candidate_answer" on those 3 fields?
For example:
INSERT INTO candidate_score_details
(
candidate_score_details_id,
candidate_id,
assessment_id,
attempt_count,
score_type,
score_tag,
correct,
candidate_score_id
)
SELECT
uuid(),
answer.candidate_id,
answer.assessment_id,
answer.attempt_count,
'BY-COMPLEXITY' AS score_type,
(CASE meta.complexity
WHEN 1 THEN 'HIGH'
WHEN 2 THEN 'MEDIUM'
WHEN 3 THEN 'LOW'
END) AS score_tag,
COUNT(*) AS correct,
MAX(score.candidate_score_id) AS max_candidate_score_id
FROM candidate_answer AS answer
JOIN answer_key AS akey
ON (akey.assessment_id = answer.assessment_id AND akey.question_number = answer.question_id AND akey.answer = answer.response)
LEFT JOIN candidate_score AS score
ON (score.candidate_id = answer.candidate_id AND score.assessment_id = answer.assessment_id AND score.attempt_count = answer.attempt_count)
LEFT JOIN question_meta_data AS meta
ON meta.question_number = answer.question_id
WHERE answer.candidate_id = NEW.candidate_id
AND answer.assessment_id = NEW.assessment_id
AND answer.attempt_count = NEW.attempt_count
GROUP BY answer.candidate_id, answer.assessment_id, answer.attempt_count, meta.complexity;

SQL: Split rows with same ID into columns + left join

I have a cs cart database and I am trying to select all the attributes for all the products, the problem is that for each separate attribute for a product, my query creates a new row, I want to to have a single row for each products that has all the attributes into columns.
This is my query right now:
SELECT a.product_id, b.variant, c.description, d.product_code
FROM cscart_product_features_values a
LEFT JOIN cscart_product_feature_variant_descriptions b ON a.variant_id = b.variant_id
LEFT JOIN cscart_product_features_descriptions c ON a.feature_id = c.feature_id
LEFT JOIN cscart_products d ON a.product_id = d.product_id
After I run the query, I get the following result:
product_id;"variant";"description";"product_code"
38;"1st";"Grade Level";"750"
38;"Math";"Subject Area";"750"
38;"Evan-Moor";"Publisher";"750"
etc next product
What I want is this:
product_id;"product_code";"Grade Level";"Subject Area";"Publisher"
38;"750";"1st";"Math";"Evan-Moor"
etc next product
We only have 3 type of attributes: Grade Level, Subject Area and Publisher.
Any ideas how to improve my query and achieve this? I would be happy even with concatenating all 3 attributes in one column, delimited by ",".
This is a generic SQL solution using GROUP BY and MAX(case expression) to achieve the transformation of 3 rows into a single row with the 3 columns.
SELECT
v.product_id
, p.product_code
, MAX(CASE WHEN fd.description = 'Grade Level' THEN vd.variant END) AS GradeLevel
, MAX(CASE WHEN fd.description = 'Subject Area' THEN vd.variant END) AS SubjectArea
, MAX(CASE WHEN fd.description = 'Publisher' THEN vd.variant END) AS Publisher
FROM cscart_products p
LEFT JOIN cscart_product_features_values v ON p.product_id = v.product_id
LEFT JOIN cscart_product_feature_variant_descriptions vd ON v.variant_id = vd.variant_id
LEFT JOIN cscart_product_features_descriptions fd ON v.feature_id = fd.feature_id
GROUP BY
v.product_id
, p.product_code
This approach should work on just about any SQL database.
Note also that I have changed the order of tables because I presume there has to be a row in cscart_products, but there might not be related rows in the other tables.
I have also changed the aliases, personally I do not care for aliaes based on the order of use in a query (e.g. I just changed the order so I had to change all references). I have use 'p' = product, 'v' = variant, 'vd' = variant description & 'fd' = feature description' - with such a convention for aliases I can re-arrange the query without changing every reference.