how to classify employers into three columns based on a condition?

how to classify employers into three columns based on a condition? - sql

I wish to classify employers who took the track into three different columns as below, based on condition of the no. of days they took in completing the courses, using the DB column lrn_complt tells the no. of days taken :
no of emp who completed the track in
0-30days 30-60days 60-90days
1st column 2nd column 3rd column
Need Sql for this or if you can say logic too it may help ???

You'll need to post create table and insert statements for anyone to understand your problem correctly. Your input table, data and expected output and your Target RDBMS at the very least.
http://tkyte.blogspot.com/2005/06/how-to-ask-questions.html
Assuming you have two columns like this...
You can try inline queries like below...
Select id,
(select count(*) from courses where days between 0 and 30) 0_to_30_days,
(select count(*) from courses where days between 31 and 60) 0_to_30_days
(select count(*) from courses where days between 61 and 90) 0_to_30_days
from courses;

Basically, you need to make 3 subqueries inside one master query:
SELECT
(SELECT COUNT(*) FROM EMPLOYER WHERE LRN_COMPLT BETWEEN 0 AND 30) AS COLUMN1,
(SELECT COUNT(*) FROM EMPLOYER WHERE LRN_COMPLT BETWEEN 31 AND 60) AS COLUMN2,
(SELECT COUNT(*) FROM EMPLOYER WHERE LRN_COMPLT BETWEEN 61 AND 90) AS COLUMN3
FROM DUAL

Looks like you need a PIVOT.
Select id,
COUNT(CASE WHEN lrn_complt between 0 and 30 THEN 1 END) Group1,
COUNT(CASE WHEN lrn_complt between 31 and 60 THEN 1 END) Group2,
COUNT(CASE WHEN lrn_complt between 61 and 90 THEN 1 END) Group3
from courses;

Related

Finding all instances where a foreign key appears multiple times grouped by month

I am not too familiar with SQL, and I have been tasked with something that I quite frankly have no clue how to go about it.
I am just going to simplify the tables to the point where only the necessary fields are taken into consideration.
The tables look as follows.
Submission(course(string), student(foreign_key), date-submitted)
Student(id)
What I need to do is produce a table of active students per month, per course with a total. An active student being anyone who has more than 4 submissions in the month. I am only looking at specific courses, so I will need to hard code the values that I need, for the sake of the example "CourseA" and "CourseB"
The result should be as follows
month | courseA | CourseB | Total
------------------------------------------
03/2020 50 27 77
02/2020 25 12 37
01/2020 43 20 63
Any help would be greatly apreciated

You can do this with two levels of aggregation: first by month, course and student (while filtering on students having more than 4 submissions), then by month (while pivoting the dataset):
select
month_submitted,
count(*) filter(where course = 'courseA') active_students_in_courseA,
count(*) filter(where course = 'courseB') active_students_in_courseB,
count(*) total
from (
select
date_trunc('month', date_submitted) month_submitted,
course,
student_id,
count(*) no_submissions
from submission
where course in ('courseA', 'courseB')
group by 1, 2, 3
having count(*) > 4
) t
group by 1

You could do subqueries using the WITH keyword like this:
WITH monthsA AS (
SELECT to_char(date-submitted, "MM/YYYY") as month, course, COUNT(*) as students
FROM Submission
WHERE course = 'courseA'
GROUP BY 1, 2
), monthsB AS (
SELECT to_char(date-submitted, "MM/YYYY") as month, course, COUNT(*) AS students
FROM Submission
WHERE course = 'courseB'
GROUP BY 1, 2
)
SELECT ma.month,
COALESE(ma.students, 0) AS courseA,
COALESCE(mb.students) AS courseB,
COALESCE(ma.students, 0) + COALESCE(mb.students, 0) AS Total
FROM monthsA ma
LEFT JOIN monthsB mb ON ma.month = mb.month
ORDER BY 1 DESC

SQL COUNT with condition and without - using JOIN

My goal is something like following table:
Key | Count since date X | Count total
1 | 4 | 28
With two simple selects I could gain this values: (the key of the table consists of 3 columns [t$ncmp, t$trav, t$seqn])
1. SELECT COUNT(*) FROM db.table WHERE t$date >= sysdate-2 GROUP BY t$ncmp, t$trav, t$seqn
2. SELECT COUNT(*) FROM db.table GROUP BY t$ncmp, t$trav, t$seqn
How can I join these statements?
What I tried:
SELECT n.t$trav, COUNT(n.t$trav), m.total FROM db.table n
LEFT JOIN (SELECT t$ncmp, t$trav, t$seqn, COUNT(*) as total FROM db.table
GROUP BY t$ncmp, t$trav, t$seqn) m
ON (n.t$ncmp = m.t$ncmp AND n.t$trav = m.t$trav AND n.t$seqn = m.t$seqn)
WHERE n.t$date >= sysdate-2
GROUP BY n.t$ncmp, n.t$trav, n.t$seqn
I tried different variantes, but always got errors like 'group by is missing' or 'unknown qualifier'.
Now this at least executes, but total is always 2.
T$TRAV COUNT(N.T$TRAV) TOTAL
4 2 2
29 3 2
51 1 2
62 2 2
16 1 2
....
If it matter, I will run this as an OPENQUERY from MSSQLSERVER to Oracle-DB.

I'd try
GROUP BY n.t$trav, m.total
You typically GROUP BY the same columns as you SELECT - except those who are arguments to set functions.

My goal is something like following table:
If so, you seem to want conditional aggregation:
select key, count(*) as total,
sum(case when datecol >= date 'xxxx-xx-xx' then 1 else 0 end) as total_since_x
from t
group by key;
I'm not sure how this relates to your sample queries. I simply don't see the relationship between that code and your question.

Case Statement for multiple criteria

I would like to ignore some of the results of my query as for all intents and purposes, some of the results are a duplicate, but based on the way the request was made, we need to use this hierarchy and although we are seeing different 'Company_Name' 's, we need to ignore one of the results.
Query:
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
2
ORDER BY
3 ASC, 2 ASC
This code omits half a doze joins and where statements that are not germane to this question.
Results:
Customer_Name_Count Company_Name Total_Sales
-------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 6 Jimmy's Restaurant 1,500
4 9 Impala Hotel 2,000
5 12 Sports Drink 2,500
In the above set, we can see that numbers 2 & 3 have the same count and the same total_sales number and similar company names. Is there a way to create a case statement that takes these 3 factors into consideration and then drops one or the other for Jimmy's enterprises? The other issue is that this has to be variable as there are other instances where this happens. And I would only want this to happen if the count and sales number match each other with a similar name in the company name.
Desired result:
Customer_Name_Count Company_Name Total_Sales
--------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 9 Impala Hotel 2,000
4 12 Sports Drink 2,500

Looks like other answers are accurate based on assumption that Company_IDs are the same for both.
If Company_IDs are different for both Jimmy's Bar and Jimmy's Restaurant then you can use something like this. I suggest you get functional users involved and do some data clean-up else you'll be maintaining this every time this issue arise:
SELECT
COUNT(DISTINCT CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END) AS Customer_Name_Count
,CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END AS Company_Name
,SUM(A12.Total_Sales) AS Total_Sales
FROM some_table er
GROUP BY CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END

Your problem is that the joins you are using are multiplying the number of rows. Somewhere along the way, multiple names are associated with exactly the same entity (which is why the numbers are the same). You can fix this by aggregating by the right id:
SELECT COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
MAX(Company_Name) as Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM some_table AS A12
GROUP BY Company_id -- I'm guessing the column is something like this
ORDER BY 3 ASC, 2 ASC;
This might actually overstate the sales (I don't know). Better would be fixing the join so it only returned one name. One possibility is that it is a type-2 dimension, meaning that there is a time component for values that change over time. You may need to restrict the join to a single time period.

You need to have function to return a common name for the companies and then use DISTINCT:
SELECT DISTINCT
Customer_Name_Count,
dbo.GetCommonName(Company_Name) as Company_Name,
Total_Sales
FROM dbo.theTable

You can try to use ROW_NUMBER with window function to make row number by Customer_Name_Count and Total_Sales then get rn = 1
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY Customer_Name_Count,Total_Sales ORDER BY Company_Name) rn
FROM (
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
Company_Name
)t1
)t1
WHERE rn = 1

Pivot for redshift database

I know this question has been asked before but any of the answers were not able to help me to meet my desired requirements. So asking the question in new thread
In redshift how can use pivot the data into a form of one row per each unique dimension set, e.g.:
id Name Category count
8660 Iced Chocolate Coffees 105
8660 Iced Chocolate Milkshakes 10
8662 Old Monk Beer 29
8663 Burger Snacks 18
to
id Name Cofees Milkshakes Beer Snacks
8660 Iced Chocolate 105 10 0 0
8662 Old Monk 0 0 29 0
8663 Burger 0 0 0 18
The category listed above gets keep on changing.
Redshift does not support the pivot operator and a case expression would not be of much help (if not please suggest how to do it)
How can I achieve this result in redshift?
(The above is just an example, we would have 1000+ categories and these categories keep's on changing)

i don't think there is a easy way to do that in Redshift,
also you say you have more then 1000 categories and the number is growing
you need to taking in to account you have limit of 1600 columns per table,
see attached link
[http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_usage.html][1]
you can use case but then you need to create case for each category
select id,
name,
sum(case when Category='Coffees' then count end) as Cofees,
sum(case when Category='Milkshakes' then count end) as Milkshakes,
sum(case when Category='Beer' then count end) as Beer,
sum(case when Category='Snacks' then count end) as Snacks
from my_table
group by 1,2
other option you have is to upload the table for example to R and then you can use cast function for example.
cast(data, name~ category)
and then upload the data back to S3 or Redshift

We do a lot of pivoting at Ro - we built python based tool for autogenerating pivot queries. This tool allows for the same basic options as what you'd find in excel, including specifying aggregation functions as well as whether you want overall aggregates.

Redshift released a Pivot/Unpivot functionality on last re:Invent 2021 (December 2021): https://docs.aws.amazon.com/redshift/latest/dg/r_FROM_clause-pivot-unpivot-examples.html
SELECT *
FROM (SELECT id, Name, Category, count FROM my_table) PIVOT (
SUM(count) FOR Category IN ('Coffees', 'Milkshakes', 'Beer', 'Snacks')
);

If you will typically want to query specific subsets of the categories from the pivot table, a workaround based on the approach linked in the comments might work.
You can populate your "pivot_table" from the original like so:
insert into pivot_table (id, Name, json_cats) (
select id, Name,
'{' || listagg(quote_ident(Category) || ':' || count, ',')
within group (order by Category) || '}' as json_cats
from to_pivot
group by id, Name
)
And access specific categories this way:
select id, Name,
nvl(json_extract_path_text(json_cats, 'Snacks')::int, 0) Snacks,
nvl(json_extract_path_text(json_cats, 'Beer')::int, 0) Beer
from pivot_table
Using varchar(max) for the JSON column type will give 65535 bytes which should be room for a couple thousand categories.

#user3600910 is right with the approach however 'END' is required else '500310' invalid operation would occur.
select id,
name,
sum(case when Category='Coffees' then count END) as Cofees,
sum(case when Category='Milkshakes' then count END) as Milkshakes,
sum(case when Category='Beer' then count END) as Beer,
sum(case when Category='Snacks' then count END) as Snacks
from my_table
group by 1,2

The answer given above worked for me after switching count to 1
select id,
name,
sum(case when Category='Coffees' then 1 end) as Cofees,
sum(case when Category='Milkshakes' then 1 end) as Milkshakes,
sum(case when Category='Beer' then 1 end) as Beer,
sum(case when Category='Snacks' then 1 end) as Snacks
from my_table
group by 1,2

Difference in output from two SQL queries

What is the difference between the two SQL queries below other than Query2 returning an additional field? Are there any possible scenarios where the output of the two queries would be different (other than the additional field in Query2)
Query1:
SELECT Field1, COUNT(*)
FROM Table1
GROUP BY Field1
HAVING COUNT(*) > 1
Query2:
SELECT Field1, Field2, COUNT(*)
FROM Table1
GROUP BY Field1, Field2
HAVING COUNT(*) > 1

Absolutely, these are different. Query2's Group By clause specifies an extra field. That means when the results are aggregated, they will be aggregated for the combined unique values of Field1 AND Field2. That is, two records are aggregated if and only if both Field1 and Field2 are equal.
For example:
SELECT Profession, Count(*)
FROM People
GROUP BY Profession
HAVING Count(*) > 1
will return a list of professions with associated counts like:
Software Developer, 10
PM, 5
Tester, 2
whereas:
SELECT Profession, Gender, Count(*)
FROM People
GROUP BY Profession, Gender
HAVING Count(*) > 1
will return a list of professions broken out by gender like:
Software Developer, Male, 5
Sofware Developer, Female, 5
PM, Male, 3
PM, Female, 2
Tester, Male, 2
Edit with additional requested information:
You can retrieve counts of professions with rows for both genders via:
SELECT Profession, Count(*)
FROM People
GROUP BY Profession
HAVING SUM(case Gender when 'Female' then 1 else 0 end) > 0 AND SUM(case Gender when 'Male' then 1 else 0 end) > 0
It gets a bit hairy (need subqueries) if you also need associated gender counts

Extra group by clause in query 2 filters records.To know more look at below example.
test data:
id name
1 a
2 b
3 a
4 a
So when I say group by name,sql first filters out distinct records for name which goes like below for the below query
select name,sum(id)
from test
group by name
--first filter out distinct values for group by column (here name)
a
b
--next for each distinct record ,how many values fall into that category..
a 1 a
4 a
3 a
b 2 b
So from the above groups ,now you can calculate any aggregations on the group in our case,it is sum,so next output will go some thing like this
a 8
b 2
As you can see from above output,you also can calculate,any aggregation on group (here a and b values) ,like give me count(id),len(name) on group like below
select name,len(name),sum(id)
from test
group by name
The same thing happens when you group by another field,lets say like below
select id,name
from
test
group by id,name
so in above case,sql first filters alldistinct records for id,name
1 a
2 b
3 a
4 a
next step is to get records which fall for each group
groupby columns --columns which fall into this
1 a 1 a
2 b 2 b
3 a 3 a
4 a 4 a
Now you can calculate aggergations on above groups.hope this helps in visualizing your group by.further having will eliminate groups after group by phase,where will eliminate record before group by phase

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to classify employers into three columns based on a condition? - sql

Looks like you need a PIVOT. Select id, COUNT(CASE WHEN lrn_complt between 0 and 30 THEN 1 END) Group1, COUNT(CASE WHEN lrn_complt between 31 and 60 THEN 1 END) Group2, COUNT(CASE WHEN lrn_complt between 61 and 90 THEN 1 END) Group3 from courses;

Related

Finding all instances where a foreign key appears multiple times grouped by month

SQL COUNT with condition and without - using JOIN

Case Statement for multiple criteria

Pivot for redshift database

Difference in output from two SQL queries

Categories

Resources