I want to count rows of two columns with different values - sql

I have two columns gender(values = 'Male' , 'Female') and grade(values = 'senior officer' , 'Junior Officer') and the table name is employee. I want to count all Males who are senior officers and all Males who are junior officers, vise versa for all females. Below is my code
''' SELECT staff_gender, staff_grade COUNT(*)
FROM permanent_staff
WHERE staff_gender='Female' AND staff_grade='senior officer' '''

Aggregate.
SELECT staff_gender, staff_grade
, COUNT(*) AS total
FROM permanent_staff
GROUP BY staff_gender, staff_grade
ORDER BY staff_gender, staff_grade

Related

How to find every combination of features shared across multiple rows?

I am pretty new to using SQL (using StandardSQL via Big Query currently) and unfortunately my Google-fu could not find me a solution to this issue.
I'm working with a dataset where each row is a different person and each column is an attribute (name, age, gender, weight, ethnicity, height, bmi, education level, GPA, etc.). I am tying to 'cluster' these people into all of the feature combinations that match 5 or more people.
Originally I did this manually with 3 feature columns where I would essentially concatenate a 'cluster name' column and then have 7 select queries for each grouping with a >5 where clause, which I then UNIONed together:
gender
age
ethnicity
gender + age
gender + ethnicity
age + ethnicity
gender + age + ethnicity
^ unfortunately doing it this way just balloons the number of combinations and with my anticipated ~15 total features doing it this way seems really unfeasible. I'd also like to do this through a less manual approach so that if a new feature is added in the future it does not require major edits to include it in my cluster identification.
Is there a function or existing process that could accomplish something like this? I'd ideally like to be able to identify ALL combinations that meet my combination user count minimum (so it's expected the same rows would match multiple different clusters here. Any advice or help here would be appreciated! Thanks.
If only BQ supported grouping sets or cube, this would be simple. One method that is pretty generalizable enumerates the 7 groups and then uses bits to figure out what to aggregate:
select (case when n & 1 > 0 then gender end) as gender,
(case when n & 2 > 0 then age end) as age,
(case when n & 4 > 0 then ethnicity end) as ethnicity,
count(*)
from t cross join
unnest(generate_array(1, 7)) n
group by n, 1, 2, 3;
Another method which is trickier is to reconstruct the groups using rollup(). Something like this:
select gender, age, ethnicity, count(*)
from t
group by rollup(gender, age, ethnicity);
Produces three of the groups you want. So:
select gender, age, ethnicity, count(*)
from t
group by rollup(gender, age, ethnicity)
union all
select gender, null, ethnicity, count(*)
from t
group by gender, ethnicity
union all
select null, age, ethnicity, count(*)
from t
group by rollup (ethnicity, age);
The above reconstructs all your groups using rollup().

Oracle SQL - Counting distinct column combinations

Running Oracle 12.1. I have a Line Items table. Its structure is fixed, and I cannot change it. I need to build a dashboard style page of information of the Line items table for a person to look at their sales territory. This person might be a GVP, who owns a large territory, or a Manager, or an individual rep. The Line Items table is pretty de-normalized, as this copy is part of a DW. This ‘copy’ of the table is only updated every 2 weeks, and it looks like this.
Line_Item_ID // PK
Account_ID //
Company_Name // The legal name of the Headquarters
LOB_Name // Line of business, aka Division within the Company_Name
Account_Type // One of 2 values, ‘NAMED’ or “GENERAL’
ADG_STATUS // 3 possible values, ‘A’, ‘D’ or ‘G’
Industry // One of 15 values, for this example assume it is ONLY ‘MFG’, ‘GOV’, ‘HEALTHCARE’
// Now have the sales hierarchy of the rep who sold this
GVP // Group Vice President
SVP // Sales Vice President
RVP // Regional Vice President
RM // Regional Manager
REP // Sales Rep
// Now have information about the product sold
ProductName
ProductPrice
VariousOtherFields….
I need to make an aggregated table which will be used for quick access of the dashboard. It will have counts of various combinations, and there will be one row per PERSON, not account. A person is every UNIQUE person listed in any of the GVP, SVP, RVP, RM or REP fields. Here is what the end result table will look like. Other than PERSON, every column is based on a DISTINCT count, and it is an integer value.
PERSON
TOTAL_COMPANIES // For this person, count of DISTINCT COMPANY_NAME
TOTAL_LOBS // For this person, count of DISTINCT LOBS
TOTAL_COMPANIES_NAMED // count of DISTINCT COMPANY_NAME with ACCOUNT_TYPE=’NAMED’
TOTAL_COMPANIES_GENERAL // count of DISTINCT COMPANY_NAME with ACCOUNT_TYPE=’GENERAL’
TOTAL_LOBS_NAMED // count of DISTINCT LOB_NAME with ACCOUNT_TYPE=’NAMED’
TOTAL_LOBS_GENERAL // count of DISTINCT LOB_NAME with ACCOUNT_TYPE=’GENERAL’
TOTAL_COMPANIES_STATUS_A // count of DISTINCT COMPANY_NAME with ADG_STATUS=’A’
TOTAL_COMPANIES_STATUS_D // count of DISTINCT COMPANY_NAME with ADG_STATUS=’D’
TOTAL_COMPANIES_STATUS_G // count of DISTINCT COMPANY_NAME with ADG_STATUS=’G’
TOTAL_LOB_STATUS_A // count of DISTINCT LOB_NAME with ADG_STATUS=’A’
TOTAL_LOB_STATUS_D // count of DISTINCT LOB_NAME with ADG_STATUS=’D’
TOTAL_LOB_STATUS_G // count of DISTINCT LOB_NAME with ADG_STATUS=’G’
//Now Various Industry Permutations. I have 15 different industries, but only showing 2. This will only be at the COMPANY_NAME level, not the LOB_NAME level
MFG_COMPANIES_STATUS_A // count of DISTINCT COMPANY_NAME with ADG_STATUS=’A’ and Industry = ‘MFG’
MFG_COMPANIES_STATUS_D // count of DISTINCT COMPANY_NAME with ADG_STATUS=’D’ and Industry = ‘MFG’
MFG_COMPANIES_STATUS_G // count of DISTINCT COMPANY_NAME with ADG_STATUS=’G’ and Industry = ‘MFG’
GOV_COMPANIES_STATUS_A // count of DISTINCT COMPANY_NAME with ADG_STATUS=’A’ and Industry = ‘GOV’
GOV_COMPANIES_STATUS_D // count of DISTINCT COMPANY_NAME with ADG_STATUS=’D’ and Industry = ‘GOV’
GOV_COMPANIES_STATUS_G // count of DISTINCT COMPANY_NAME with ADG_STATUS=’G’ and Industry = ‘GOV’
There are approx. 400 people, 35000 unique accounts, and 200,000 entries in the line items table.
So what is my strategy? I have thought about making another table of unique PERSON values, and using it as a driving table. Let’s call this table PERSON_LIST.
Pseudo-code…
For each entry in PERSON_LIST
For all LINE_ITEMS where person_list in ANY(GVP, SVP, RVP, RM, REP) do
Calculations…
This would be an incredibly long running process…
How can I do this more effectively (set based as opposed to row by row)? I believe I would have to use the PIVOT operator for the INDUSTRY list, but can I use PIVOT with additional criteria? Aka count of distinct COMPANY with a specific industry and a specific ADG_STATUS?
Any ideas or SQL code most appreciated.
You could unpivot the original data to get the data from the original GVP etc. columns into one 'person' column:
select * from line_items
unpivot (person for role in (gvp as 'GVP', svp as 'SVP', rvp as 'RVP',
rm as 'RM', rep as 'REP'))
And then use that as a CTE or inline view, with pretty much what you showed; conditional aggregation using case expressions, something like:
select person,
count(distinct company_name) as total_companies,
count(distinct lob_name) as total_lobs,
count(distinct case when account_type='NAMED' then company_name end)
as total_companies_named,
count(distinct case when account_type='GENERAL' then company_name end)
as total_companies_general,
count(distinct case when account_type='NAMED' then lob_name end)
as total_lobs_named,
count(distinct case when account_type='GENERAL' then lob_name end)
as total_lobs_general,
count(distinct case when adg_status='A' then company_name end)
as total_companies_status_a,
count(distinct case when adg_status='D' then company_name end)
as total_companies_status_d,
count(distinct case when adg_status='G' then company_name end)
as total_companies_status_g,
count(distinct case when adg_status='A' then lob_name end)
as total_lob_status_a,
count(distinct case when adg_status='D' then lob_name end)
as total_lob_status_d,
count(distinct case when adg_status='G' then lob_name end)
as total_lob_status_g,
count(distinct case when adg_status='A' and industry = 'MFG' then company_name end)
as mfg_companies_status_a,
count(distinct case when adg_status='D' and industry = 'MFG' then company_name end)
as mfg_companies_status_d,
count(distinct case when adg_status='G' and industry = 'MFG' then company_name end)
as mfg_companies_status_g,
count(distinct case when adg_status='A' and industry = 'GOV' then company_name end)
as gov_companies_status_a,
count(distinct case when adg_status='D' and industry = 'GOV' then company_name end)
as gov_companies_status_d,
count(distinct case when adg_status='G' and industry = 'GOV' then company_name end)
as gov_companies_status_g
from (
select * from line_items
unpivot (person for role in (gvp as 'GVP', svp as 'SVP', rvp as 'RVP',
rm as 'RM', rep as 'REP'))
)
group by person;

Postgresql SQL to find revenue from different department on invoice number

I need to find the doctors revenue from various departments like laboratory,radiology,pharmacy and other departments trough patients.
I have only document_number column where the values recorded as 'L1432','R87j7','P652', etc. if doc_no starts with 'L' then it is laboratory, if doc_no starts with 'R' then it is radiology, if doc_no starts with 'P' then it is pharmacy. How can I do this in SQL?
Output should look like this:
doctor_name laboratory radiology pharmacy others
Michel 23098 6763 78732 98838
John 77838 89898 56542 52654
Cranys 98973 78763 5432 65565
This is a conditional aggregation by the first character of the document_number:
select doctor_name,
sum(turnover) filter (where left(document_number,1) = 'L') as laboratory,
sum(turnover) filter (where left(document_number,1) = 'R') as radiology,
sum(turnover) filter (where left(document_number,1) = 'P') as pharmacy,
sum(turnover) filter (where left(document_number,1) not in ('L','R','P')) as others
from the_table
group by doctor_name
order by doctor_name;

Pivoting to calculate average with 1's and 0's

I need help building a pivot query from a data set that looks like this:
1 indicates the employee spoke with someone or the location and 0 indicates they haven't spoke with someone
I want to return a calculation the % of contacts spoken to and the % of locations spoken to by employee and then manager so the table would look like this:
Any ideas on how to pivot this so it calculates the percentage for each employee's assigned contacts and then the percentage for each manager's employee's assigned contacts?
You don't need a pivot if you only have two categories. Try case statements:
Select employee as [Name]
,count(case when ContactType = 'Individual' and SpokeTo = 1 then LocationName end) * 100.0/ NULLIF(count(case when ContactType = 'Individual' then LocationName end), 0) as IndividualContacts
,count(case when ContactType = 'Location' and SpokeTo = 1 then LocationName end) * 100.0/ NULLIF(count(case when ContactType = 'Location' then LocationName end), 0) as LocationContacts
from MyTable
group by Employee
Note the *100.0 is important to avoid integer division (or you can cast explicitly to decimal). The NULLIF is optional unless you have some employees that were not assigned any contacts of one type or another - then you must include it to avoid division by 0 errors.

Need a count of one column if i am having another column values multiple

I am having this query:
select qos.orgname, qos.org, qos.suborg, qos.Archive, qos.location, count(c.coe) AS DEPT, c.coe AS DEP,
qos.siteid, qos.admin as sitelead,
CASE When qos.Archive = 0 THEN 'Active'
when qos.Archive is null THEN '-'
ELSE 'Archived'
END AS STATUS
from qryOrgsite qos WITH (NOLOCK)
LEFT JOIN ltbcoe c WITH (NOLOCK) on qos.orgname = c.orgname and qos.location= c.location
group by qos.orgname, qos.location, qos.org, qos.suborg, qos.Archive, c.coe,
qos.siteid, qos.ADMIN
This gives me some records as follows:
So i want the count of "Dept" column which are active. I mean it should return only one row with Organization B and Dept as 7....e.g here the Dept column should be 7.
that means I want count of c.coe column.
The problem here is your GROUP BY is too inclusive. What the query is asking for is a count, but the results have to be unique by all of the columns in your GROUP BY. If you only want a count per orgname, you will need to do
SELECT qos.orgname, COUNT(*)
FROM qryOrgsite qos
GROUP BY qos.orgname
This essentially says that you want to count all rows by the orgname. Each column you add to the group by creates unique combinations for your COUNT. For example, if you grouped by orgname and location it would give you a roll up count for each combination of those two columns. Based on the data you show above this would result in
OrganizationB Demo-Fixe 1
OrganizationB GE CapitalP 3
OrganizationB Hadasa Plant 1
OrganizationB Mostoles Plant 1
You can wrap your query in another:
select orgname, count(*)
from (
select qos.orgname, qos.org, qos.suborg, qos.Archive, qos.location, count(c.coe) AS DEPT, c.coe AS DEP,
qos.siteid, qos.admin as sitelead,
CASE When qos.Archive = 0 THEN 'Active'
when qos.Archive is null THEN '-'
ELSE 'Archived'
END AS STATUS
from qryOrgsite qos WITH (NOLOCK)
LEFT JOIN ltbcoe c WITH (NOLOCK) on qos.orgname = c.orgname and qos.location= c.location
group by qos.orgname, qos.location, qos.org, qos.suborg, qos.Archive, c.coe,
qos.siteid, qos.ADMIN) t1
where t1.orgname = 'Organization B' and t1.STATUS = 'Active'
group by t1.orgname
guys i have got the answer.
I had to remove the department name from the group by and the select
because the count(c.coe) didnt had any effect.
Thanks for all you help