Creating a Postgres crosstab query and calculating difference between columns - sql

Tni analysis
I have data in the format
Patid | TNT | date
A123 | 1.2. | 23/1/2012
A123 | 1.3. | 23/1/2012
B123 | 2.6. | 24/7/2011
B123 | 2.7. | 24/7/2011
And I would like to be able to calculate the difference between two rows like so
rowid. | TNT-1. | TNT-2. | difference
A123. |. 1.2. | 1.3. | 0.1
B123. | 2.6. | 2.7. | 0.1
Etc
I presume this is a use for the cross tab function in Postgres but am struggling to get results. Any help greatly appreciated.

You can pivot by hand and then take the difference (assuming you always have 2 records for each Patid and you don't have to take date into account):
with cte1 as (
select
Patid, TNT, date, row_number() over(partition by Patid order by TNT) as rn
from Table1
), cte2 as (
select
Patid,
max(case when rn = 1 then TNT end) as "TNT-1",
max(case when rn = 2 then TNT end) as "TNT-2"
from cte1
group by Patid
)
select
Patid as rowid, "TNT-1", "TNT-2", "TNT-2" - "TNT-1" as difference
from cte2
-------------------------------------
ROWID TNT-1 TNT-2 DIFFERENCE
A123 1.2 1.3 0.1
B123 2.6 2.7 0.1
sql fiddle demo

Related

Group by and fetch column that is not in group by clause

I have (sample) data:
equipment_id | node_id | value (type: jsonb)
------------------------------
1 | 1 | 0.3
1 | 2 | 0.4
2 | 3 | 0.7
2 | 4 | 0.6
2 | 5 | 0.7
And I want to get the rows that has max value within the same equipment_id:
equipment_id | node_id | value
------------------------------
1 | 2 | 0.4
2 | 3 | 0.7
2 | 5 | 0.7
There is query that does what I want but I'm afraid of performance degradation because of casting jsonb to float:
with cte as (
select
equipment_id,
max(value::text::float) as val
from metrics
group by equipment_id
)
select cte.equipment_id, m.node_id, cte.val
from cte
join metrics m on cte.equipment_id = m.equipment_id and cte.val = m.value::text::float
How can I avoid casting?
Use distinct on:
select distinct on (equipement_id) m.*
from metrics m
order by equipment_id, value desc;
If your value is actually stored as a string, then use:
order by equipment_id, value::numeric desc;
You can use row_number()
select * from
(
select *, row_number() over(partition by equipment_id order by value::text::float desc) as rn
from tablename
)A where rn=1

Get MAX and MIN in a row SQL

;WITH CTE AS
(
SELECT * FROM
(
SELECT CandidateID, t_Candidate.Name, ISNULL(CAST(AVG(Rate) AS DECIMAL(12,2)),0) AS Rate, t_Ambassadors.Name AS CN
FROM t_Vote INNER JOIN t_Candidate
ON t_Vote.CandidateID = t_Candidate.ID
INNER JOIN t_Ambassadors
ON t_Vote.AmbassadorID = t_Ambassadors.ID
GROUP BY Rate, CandidateID, t_Candidate.Name, t_Ambassadors.Name
)MySrc
PIVOT
(
AVG(Rate)
FOR CN IN ([Jean],[Anna],[Felicia])
)AS nSrc
)SELECT CandidateID, Name, CAST([Jean] AS DECIMAL(12,2)) AS AHH ,CAST([Anna] AS DECIMAL(12,2)) AS MK,CAST([Felicia] AS DECIMAL(12,2)) AS DIL, CAST(([Jean] + [Anna] + [Felicia])/3 AS DECIMAL(12,2)) AS Total
FROM CTE
GROUP BY Cte.CandidateID, cte.Name, cte.[Jean], cte.[Anna], cte.[Felicia]
I have solved my previous problem with the above query. I created a new question because I have new problem. How do I get the MAX and MIN rate in a row?
The following is the result I get from the above query:
| CandidateID | Name | AHH | MK | DIL | Total |
|-------------|------|-------|------|------|-------|
| CID1 | Jay | 7.00 | 3.00 | 3.00 | 4.33 |
| CID2 | Mia | 2.00 | 9.00 | 7.00 | 6.00 |
What I want to achieve is this:
| CandidateID | Name | AHH | MK | DIL | Total |
|-------------|------|-------|------|------|-------|
| CID1 | Jay | 7.00 | 3.00 | 3.00 | 3.00 |
| CID2 | Mia | 2.00 | 9.00 | 7.00 | 7.00 |
So what happened on the 2nd result is that, it removed the Highest and Lowest score/rate from the row and Get the average of remaining rate/score. AHH, MK and DIL are not the only Voters, there are 14 of them, I just took the 3 first to make it short and clearer.
I believe you're looking by something like the following (though I'm using case aggregation rather than a pivot).
Essentially, it does the same thing your query does except that it uses a row number to figure out the highest and lowest and exclude them from the final "total" (in the case of a tie, it'll just select one of them, but you can use RANK() instead of row_number() if you don't want to include tied highest/lowest in the average):
WITH CTE AS
(
SELECT CandidateID,
Name,
CN,
Rate,
Lowest = ROW_NUMBER() OVER (PARTITION BY CandidateID, Name ORDER BY Rate),
Highest = ROW_NUMBER() OVER (PARTITION BY CandidateID, Name ORDER BY Rate DESC)
FROM
(
SELECT CandidateID,
t_Candidate.Name,
CN = t_Ambassadors.Name,
Rate = ISNULL(CAST(AVG(Rate) AS DECIMAL(12,2)),0)
FROM t_Vote
JOIN t_Candidate
ON t_Vote.CandidateID = t_Candidate.ID
JOIN t_Ambassadors
ON t_Vote.AmbassadorID = t_Ambassadors.ID
GROUP BY CandidateID, t_Candidate.Name, t_Ambassadors.Name
) AS T
)
SELECT CandidateID,
Name,
AHH = MAX(CASE WHEN CN = 'Jean' THEN Rate END),
MK = MAX(CASE WHEN CN = 'Anna' THEN Rate END),
DIL = MAX(CASE WHEN CN = 'Felicia' THEN Rate END), -- and so on and so forth for each CN
Total = AVG(CASE WHEN Lowest != 1 AND Highest != 1 THEN Rate END)
FROM CTE
GROUP BY CandidateID, Name;
EDIT: It is possible to do this using PIVOT, but unless I'm mistaken, it becomes a matter of working out the average of the ones that aren't highest and lowest before pivoting, which becomes a bit more convoluted. It's all around easier to use case aggregation, IMO.

Find last version (major, minor) of some information

I have a table with versioned information for several companies.
|Useful info | major | minor | week_id | company_id |
---------------------------------------|------------|
|************| 1 | 0 | 2015_01 | 1 |
|************| 1 | 1 | 2015_01 | 1 |
|************| 2 | 0 | 2015_01 | 1 |
|************| 1 | 0 | 2015_01 | 2 |
|************| 1 | 1 | 2015_01 | 2 |
So, for each week, I need to get the information corresponding to the last version (max (major, minor))
I tried :
select * from my_table
where (major, minor) = max(major, minor)
group by compatny_id, week_id
It did not work because max() is not supposed to take several arguments.
So I decided to change (major, minor) to 100 * major + minor. I tried :
select * from my_table
where (company_id, week_id, 100 * major + minor) in
(
select sec_semaine_cinema_id, cpx_complexe_id, max(100 * dlo_version_major + dlo_version_minor)
from demande_log_dlo
group by sec_semaine_cinema_id, cpx_complexe_id
)
This works! But: It will obviously force a full scan.
Do you have a better solution?
(I am using Postgresql 9.3)
You can do this easily using distinct on:
select distinct on (company_id, week_id) t.*
from my_table t
order by company_id, week_id, major desc, minor desc;
If you prefer to use more standard SQL, use row_number():
select t.*
from (select t.*,
row_number() over (partition by company_id, week_id order by major desc, minor desc) as seqnum
from my_table t
) t
where seqnum = 1;

SQL Query: get the unique id/date combos based on latest dates - need speed improvement

Not sure how to title or ask this really. Say I am getting a result set like this on a join of two tables, one contains the Id (C), the other contains the Rating and CreatedDate (R) with a foreign key to the first table:
-----------------------------------
| C.Id | R.Rating | R.CreatedDate |
-----------------------------------
| 2 | 5 | 12/08/1981 |
| 2 | 3 | 01/01/2001 |
| 5 | 1 | 11/11/2011 |
| 5 | 2 | 10/10/2010 |
I want this result set (the newest ones only):
-----------------------------------
| C.Id | R.Rating | R.CreatedDate |
-----------------------------------
| 2 | 3 | 01/01/2001 |
| 5 | 1 | 11/11/2011 |
This is a very large data set, and my methods (I won't mention which so there is no bias) is very slow to do this. Any ideas on how to get this set? It doesn't necessarily have to be a single query, this is in a stored procedure.
Thank you!
You need a CTE with a ROW_NUMBER():
WITH CTE AS (
SELECT ID, Rating, CreatedDate, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedDate DESC) RowID
FROM [TABLESWITHJOIN]
)
SELECT *
FROM CTE
WHERE RowID = 1;
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by createddate desc) as seqnum
from table t
) t
where seqnum = 1;
If you are using SQL Server 2008 or later, you should consider using windowing functions. For example:
select ID, Rating, CreatedDate from (
select ID, Rating, CreatedDate,
rowseq=ROW_NUMBER() over (partition by ID order by CreatedDate desc)
from MyTable
) x
where rowseq = 1
Also, please understand that while this is an efficient query in and of itself, your overall performance depends even more heavily on the underlying tables and, in particular, the indexes and explain plans that are used when joining the tables in the first place, etc.

Selecting Top 1 for Every ID

I have the following table:
| ID | ExecOrd | date |
| 1 | 1.0 | 3/4/2014|
| 1 | 2.0 | 7/7/2014|
| 1 | 3.0 | 8/8/2014|
| 2 | 1.0 | 8/4/2013|
| 2 | 2.0 |12/2/2013|
| 2 | 3.0 | 1/3/2014|
| 2 | 4.0 | |
I need to get the date of the top ExecOrd per ID of about 8000 records, and so far I can only do it for one ID:
SELECT TOP 1 date
FROM TABLE
WHERE DATE IS NOT NULL and ID = '1'
ORDER BY ExecOrd DESC
A little help would be appreciated. I have been trying to find a similar question to mine with no success.
There are several ways of doing this. A generic approach is to join the table back to itself using max():
select t.date
from yourtable t
join (select max(execord) execord, id
from yourtable
group by id
) t2 on t.id = t2.id and t.execord = t2.execord
If you're using 2005+, I prefer to use row_number():
select date
from (
select row_number() over (partition by id order by execord desc) rn, date
from yourtable
) t
where rn = 1;
SQL Fiddle Demo
Note: they will give different results if ties exist.
;with cte as (
SELECT id,row_number() over(partition by ID order byExecOrd DESC) r
FROM TABLE WHERE DATE IS NOT NULL )
select id from
cte where r=1