Using instructions found here I've tried to create a crosstab query to show historical data from three previous years and I would like to output it in a report.
I've got a few complications that are making this difficult and I'm having trouble getting the data to show correctly.
The query it is based on is structured like this:
EmpID | ReviewYearID | YearName | ReviewDate | SelfRating | ManagerRating | NotSelfRating |
1 | 5 | 2013 | 01/09/2013 | 3.5 | 3.5 | 3.5 |
1 | 6 | 2014 | 01/09/2014 | 2.5 | 2.5 | 2.5 |
1 | 7 | 2015 | 01/09/2015 | 4.5 | 4.5 | 4.5 |
2 | 6 | 2014 | 01/09/2014 | 2.0 | 2.0 | 2.0 |
2 | 7 | 2015 | 01/09/2015 | 2.0 | 2.0 | 2.0 |
3 | 7 | 2015 | 01/09/2015 | 5.0 | 5.0 | 5.0 |
[Edit]: Here is the SQL for the base query. It is combining data from two tables:
SELECT tblEmployeeYear.EmployeeID AS EmpID, tblReviewYear.ID AS ReviewYearID, tblReviewYear.YearName, tblReviewYear.ReviewDate, tblEmployeeYear.SelfRating, tblEmployeeYear.ManagerRating, tblEmployeeYear.NotSelfRating
FROM tblReviewYear INNER JOIN tblEmployeeYear ON tblReviewYear.ID = tblEmployeeYear.ReviewYearID;
[/Edit]
I would like a crosstab query that transposes the columns/rows to show historical data for up to 3 previous years (based on review date) for a specific employee. The end result would look something like this for Employee ID 1:
Year | 2015 | 2014 | 2013 |
SelfRating | 4.5 | 2.5 | 3.5 |
ManagerRating | 4.5 | 2.5 | 3.5 |
NotSelfRating | 4.5 | 2.5 | 3.5 |
Other employees would have less columns since they don't have data for previous years.
I'm having issues with filtering it down to a specific employee and sorting the years by their review date (the name isn't always a reliable way to sort them).
In the end I'm looking to use this as the data for a report.
If there is a different way than a crosstab query to accomplish this I would be okay with that as well.
Thanks!
You need a column for all the rating types, not an individual column for each type. If you can't redesign the table, I would suggest creating a new one for your purposes. The below uses a union to add in that type column referred to above. You create a column and hardcode the value (SelfRating, ManagerRating, etc):
SELECT * INTO EmployeeRatings
FROM (SELECT tblEmployeeYear.EmployeeId AS EmpId, ReviewYearId, "SelfRating" AS Category, SelfRating AS Score
FROM tblEmployeeYear
WHERE SelfRating Is Not Null
UNION ALL
SELECT tblEmployeeYear.EmployeeId, ReviewYearId, "ManagerRating", ManagerRating
FROM tblEmployeeYear
WHERE ManagerRating Is Not Null
UNION ALL
SELECT tblEmployeeYear.EmployeeId, ReviewYearId, "NotSelfRating", NotSelfRating
FROM tblEmployeeYear
WHERE NotSelfRating Is Not Null)
Then use the newly created table in place of tblEmployeeYear. Note that I use Year([ReviewDate]) which will return only the year. Also, since it looks like it may be possible to have more than one of each review type per year, I averaged the Score for the year.
TRANSFORM Avg(Score)
SELECT EmpId, Category
FROM (SELECT EmpId, Category, ReviewDate, Score
FROM tblReviewYear
INNER JOIN EmployeeRatings
ON tblReviewYear.ID = EmployeeRatings.ReviewYearID) AS Reviews
GROUP BY EmpId, Category
PIVOT Year([ReviewDate]);
Related
I want to use post and pre revenue of an interaction to calculate net revenue. Sometimes there are multiple customers in an interaction. The data is like:
InteractionID | Customer ID | Pre | Post
--------------+-------------+--------+--------
1 | ab12 | 10 | 30
2 | cd12 | 40 | 15
3 | de12;gh12 | 15;30 | 20;10
Expected output is to take sum in pre and post call to calculate net
InteractionID | Customer ID | Pre | Post | Net
--------------+---------------+--------+-------+------
1 | ab12 | 10 | 30 | 20
2 | cd12 | 40 | 15 | -25
3 | de12;gh12 | 45 | 30 | -15
How do I get the net revenue column?
The proper solution is to normalize your relational design by adding a separate table for customers and their respective pre and post.
While stuck with the current design, this would do it:
SELECT *, post - pre AS net
FROM (
SELECT interaction_id, customer_id
,(SELECT sum(x::numeric) FROM string_to_table(pre, ';') x) AS pre
,(SELECT sum(x::numeric) FROM string_to_table(post, ';') x) AS post
FROM tbl
) sub;
db<>fiddle here
string_to_table() requires at least Postgres 14.
You did not declare your Postgres version, so I assume the current version Postgres 14.
For older versions replace with regexp_split_to_table() or unnest(string_to array)).
I'm using DBISAM to have a table from existing datas.
My problem is :
NAME | COUNT(QTY) | PRICE | ? |
_______|________________|____________|_________|
A | 1 | 9.5 | 9.5 |
_______|________________|____________|_________|
B | 2 | 12.5 | 25 |
_______|________________|____________|_________|
C | 5 | 20 | 100 |
_______|________________|____________|_________|
My SQL request is SELECT NAME, COUNT(QTY), PRICE FROM ARTICLE.
But I can't find how to have the last column (with the ?).
I tried PRICE * COUNT(QTY) but DBISAM can't handle this and say "Invalid use of non-aggregated column".
Have you an idea about how I can get the last column ?
Thanks
Apologies if this is silly question...
I have a table that holds rates for multiple currencies at differing dates, as follows:
+-----------------------------+
| FOREKEY FOREDATE FORERATE |
+-----------------------------+
| 1 01/01/16 1.5 |
| 2 01/01/16 1.9 |
| 3 01/01/16 9.2 |
| 4 01/01/16 1.0 |
| 2 01/02/16 1.7 |
| 3 01/03/16 9.0 |
| 4 01/04/16 1.1 |
+-----------------------------+
I would like to create a query that gives the prevailing currency rate at any of the given dates.
I have tried the query below, but this does not show the FOREKEY 1 rate on 01/03/16, for example.
SELECT
F.FOREDATE,
F.FOREKEY,
F.FORERATE
FROM
FORERATE F
INNER JOIN
(SELECT
MAX(FOREDATE) FOREDATE
FROM
FORERATE
GROUP BY
FOREDATE) FSUB
ON (F.FOREDATE = FSUB.FOREDATE) OR (F.FOREDATE > FSUB.FOREDATE)
Any help gratefully received!
You could use a SELECT MAX() in the where clause like below:
SELECT
F.FOREDATE,
F.FOREKEY,
F.FORERATE
FROM FORERATE F
WHERE F.FOREDATE = (SELECT MAX(FOREDATE)
FROM FORERATE FMD
WHERE F.FOREKEY=FMD.FOREKEY)
I have data on approx 1000 individuals, where each individual can have multiple rows, with multiple dates and where the columns indicate the program admitted to and a code number.
I need each row to contain a distinct date, so I need to delete the rows of duplicate dates from my table. Where there are multiple rows with the same date, I need to keep the row that has the lowest code number. In the case of more than one row having both the same date and the same lowest code, then I need to keep the row that also has been in program (prog) B. For example;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-06-02 | 211 | B |
| 1 | 1997-08-19 | 67 | A |
| 1 | 1997-08-19 | 23 | A |
So my desired output would look like this;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-08-19 | 23 | A |
I'm struggling to come up with a solution to this, so any help greatly appreciated!
Microsoft SQL Server 2012 (X64)
The following works with your test data
SELECT ID, date, MIN(code), MAX(prog) FROM table
GROUP BY date
You can then use the results of this query to create a new table or populate a new table. Or to delete all records not returned by this query.
SQLFiddle http://sqlfiddle.com/#!9/0ebb5/5
You can use min() function: (See the details here)
select ID, DATE, min(CODE), max(PROG)
from table
group by DATE
I assume that your table has a valid primary key. However i would recommend you to take IDas Primary key. Hope this would help you.
Simplified, I got the following situation. I've got two tables. One migration has multiple checks through checks.migration_id. The Column checks.old describes a type of check. Now I want to get for each migration the check with the biggest time where old is true (query1) and false (query2).
There are about 30.000 migrations and each has around 1000 checks where old=true and 1000 checks where old=false. The table checks will grow quite extreme. The order of the checks is not given and could be totally mixed up.
I want to get the latest check for a maximum of 150 migrations at once.
SQL Fiddle: http://sqlfiddle.com/#!15/282ce/15
I'm using PostgreSQL 9.3 and Rails 3.2 (shouldn't matter)
Whats the most efficient way to get the latest subrecord where old = true?
Table Migrations:
| ID |
|----|
| 1 |
| 2 |
Table Checks:
| ID | MIGRATION_ID | OLD | OK | TIME |
|----|--------------|-----|----|----------------------------------|
| 1 | 1 | 1 | 1 | September, 22 2014 12:00:01+0000 |
| 2 | 1 | 0 | 1 | September, 22 2014 12:00:02+0000 |
| 3 | 2 | 1 | 1 | September, 22 2014 12:00:01+0000 |
| 4 | 2 | 0 | 1 | September, 22 2014 12:00:02+0000 |
| 5 | 1 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 6 | 1 | 0 | 1 | September, 22 2014 12:00:04+0000 |
| 7 | 2 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 8 | 2 | 0 | 1 | September, 22 2014 12:00:04+0000 |
Query 1 should return the following result:
| Migration.id | Check_ID | OLD | OK | TIME |
|--------------|----------|-----|----|----------------------------------|
| 1 | 5 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 2 | 7 | 1 | 1 | September, 22 2014 12:00:03+0000 |
Query 1 should return the following result:
| Migration.id | Check_ID | OLD | OK | TIME |
|--------------|----------|-----|----|----------------------------------|
| 1 | 6 | 0 | 1 | September, 22 2014 12:00:04+0000 |
| 2 | 8 | 0 | 1 | September, 22 2014 12:00:04+0000 |
I tried to solve it with a max in a subquery, but then I lose the information about checks.ok and check.time.
SELECT eq.id, (SELECT max(checks.id) FROM checks WHERE checks.migration_id = eq.id and checks.old = 't') AS latest FROM migrations eq;
SELECT eq.id, (SELECT max(checks.id) FROM checks WHERE checks.migration_id = eq.id and checks.old = 'f') AS latest FROM migrations eq;
(I know that I get max(id) instead of max(time).)
In Rails I tried to fetch for each Migration the latest Record which resulted in the 1+n Problem. I'm not able to include all Checks because there are way to much of them.
A simple solution with the Postgres specific DISTINCT ON:
Query 1 ("for each migration the check with the biggest time where old is true"):
SELECT DISTINCT ON (migration_id)
migration_id, id AS check_id, old, ok, time
FROM checks
WHERE old
ORDER BY migration_id, time DESC;
Invert the the WHERE condition for Query 2:
...
WHERE NOT old
...
Details:
Select first row in each GROUP BY group?
But if you want better read performance with big tables, use JOIN LATERAL (Postgres 9.2+, standard SQL), building on a multicolumn index like:
CREATE INDEX checks_special_idx ON checks(old, migration_id, time DESC);
Query 1:
SELECT m.id AS migration_id
, c.id AS check_id, c.old, c.ok, c.time
FROM migrations m
-- FROM (SELECT id FROM migrations LIMIT 150) m
JOIN LATERAL (
SELECT id, old, ok, time
FROM checks
WHERE migration_id = m.id
AND old
ORDER BY time DESC
LIMIT 1
) c ON TRUE;
Switch the condition on old again for query 2.
For an unspecified "maximum of 150 migrations", use the commented alternative line.
Details:
Optimize GROUP BY query to retrieve latest record per user
SQL Fiddle.
Aside: don't use "time" as identifier. It's a reserved word in standard SQL and a basic type name in Postgres.