Crosstab transpose query request - sql

Using Postgres 9.3.4, I've got this table:
create table tbl1(country_code text, metric1 int, metric2 int, metric3 int);
insert into tbl1 values('us', 10, 20, 30);
insert into tbl1 values('uk', 11, 21, 31);
insert into tbl1 values('fr', 12, 22, 32);
I need a crosstab query to convert it to this:
create table tbl1(metric text, us int, uk int, fr int);
insert into tbl1 values('metric1', 10, 11, 12);
insert into tbl1 values('metric2', 20, 21, 22);
insert into tbl1 values('metric3', 30, 31, 32);
As an added bonus, I'd love a rollup:
create table tbl1(metric text, total int, us int, uk int, fr int);
insert into tbl1 values('metric1', 33, 10, 11, 12);
insert into tbl1 values('metric2', 63, 20, 21, 22);
insert into tbl1 values('metric3', 93, 30, 31, 32);
I'm done staring at the crosstab spec, I have it written with case statements but it's mad unruly and long, so can someone who's fluent in crosstab please whip up a quick query so I can move on?

The special difficulty is that your data is not ready for cross tabulation. You need data in the form row_name, category, value. You can get that with a UNION query:
SELECT 'metric1' AS metric, country_code, metric1 FROM tbl1
UNION ALL
SELECT 'metric2' AS metric, country_code, metric2 FROM tbl1
UNION ALL
SELECT 'metric3' AS metric, country_code, metric3 FROM tbl1
ORDER BY 1, 2 DESC;
But a smart LATERAL query only needs a single table scan and will be faster:
SELECT x.metric, t.country_code, x.val
FROM tbl1 t
, LATERAL (VALUES
('metric1', metric1)
, ('metric2', metric2)
, ('metric3', metric3)
) x(metric, val)
ORDER BY 1, 2 DESC;
Related:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
SELECT DISTINCT on multiple columns
Using the simple form of crosstab() with 1 parameter with this query as input:
SELECT * FROM crosstab(
$$
SELECT x.metric, t.country_code, x.val
FROM tbl1 t
, LATERAL (
VALUES
('metric1', metric1)
, ('metric2', metric2)
, ('metric3', metric3)
) x(metric, val)
ORDER BY 1, 2 DESC
$$
) AS ct (metric text, us int, uk int, fr int);
List country names in alphabetically descending order (like in your demo).
This also assumes all metrics are defined NOT NULL.
If one or both are not the case, use the 2-parameter form instead:
PostgreSQL Crosstab Query
Add "rollup"
I.e. totals per metric:
SELECT * FROM crosstab(
$$
SELECT x.metric, t.country_code, x.val
FROM (
TABLE tbl1
UNION ALL
SELECT 'zzz_total', sum(metric1)::int, sum(metric2)::int, sum(metric3)::int -- etc.
FROM tbl1
) t
, LATERAL (
VALUES
('metric1', metric1)
, ('metric2', metric2)
, ('metric3', metric3)
) x(metric, val)
ORDER BY 1, 2 DESC
$$
) AS ct (metric text, total int, us int, uk int, fr int);
'zzz_total' is an arbitrary label, that must sort last alphabetically (or you need the 2-parameter form of crosstab()).
If you have lots of metrics columns, you might want to build the query string dynamically. Related:
How to perform the same aggregation on every column, without listing the columns?
Executing queries dynamically in PL/pgSQL
Also note that the upcoming Postgres 9.5 (currently beta) introduces a dedicated SQL clause for ROLLUP.
Related:
Spatial query on large table with multiple self joins performing slow

Related

Re-format table, placing multiple column headers as rows

I have a table of fishing catches, showing number of fish and total kg, for all the fishing days. Current format of the data is showing as below
In the other reference table is a list of all the official fish species with codes and names.
How can I re-format the first table so the rows are repeated for each day showing a certain species with the corresponding total catches and kgs in a row. So instead of the species kg and n having their different columns, I would have them in rows while there is only one n and kg column. I am thinking of looping through the list of all species and based on the numbers it will duplicate the rows in a way with the right values of n and kg of the species in the rows. This is the final format I need. My database is SQL Server.
You may use a union query here:
SELECT Day, 'Albacore' AS Species, ALB_n AS n, ALB_kg AS kg FROM yourTable
UNION ALL
SELECT Day, 'Big eye tuna', BET_n, BET_kg FROM yourTable
UNION ALL
SELECT Day, 'Sword fish', SWO_n, SWO_kg FROM yourTable
ORDER BY Day, Species;
You can also use a cross apply here, e.g.:
/*
* Data setup...
*/
create table dbo.Source (
Day int,
ALB_n int,
ALB_kg int,
BET_n int,
BET_kg int,
SWO_n int,
SWO_kg int
);
insert dbo.Source (Day, ALB_n, ALB_kg, BET_n, BET_kg, SWO_n, SWO_kg) values
(1, 10, 120, 4, 60, 2, 55),
(2, 15, 170, 8, 100, 1, 30);
create table dbo.Species (
Sp_id int,
Sp_name nvarchar(20)
);
insert dbo.Species (Sp_id, Sp_name) values
(1, N'Albacore'),
(2, N'Big eye tuna'),
(3, N'Sword fish');
/*
* Unpivot data using cross apply...
*/
select Day, Sp_name as Species, n, kg
from dbo.Source
cross apply dbo.Species
cross apply (
select
case
when Sp_name=N'Albacore' then ALB_n
when Sp_name=N'Big eye tuna' then BET_n
when Sp_name=N'Sword fish' then SWO_n
else null end as n,
case
when Sp_name=N'Albacore' then ALB_kg
when Sp_name=N'Big eye tuna' then BET_kg
when Sp_name=N'Sword fish' then SWO_kg
else null end as kg
) unpivotted (n, kg);

Insert Multiple Rows SQL Teradata

I am creating a volatile table and trying to insert rows to the table. I can upload one row like below...
create volatile table Example
(
ProductID VARCHAR(15),
Price DECIMAL (15,2)
)
on commit preserve rows;
et;
INSERT INTO Example
Values
('Steve',4);
However, when I try to upload multiple I get the error:
"Syntax error: expected something between ')' and ','."
INSERT INTO Example
Values
('Steve',4),
('James',8);
As Gordon said, Teradata doesn't support VALUES with multiple rows (and the UNION ALL will fail because of the missing FROM.
You can utilize a Multi Statement Request (MSR) instead:
INSERT INTO Example Values('Steve',4)
;INSERT INTO Example Values('James',8)
;
If it's a BTEQ job the Inserts are submitted as one block after the final semicolon (when there's a new command starting on the same line it's part of the MSR). In SQL Assistant or Studio you must submit it using F9 instead of F5.
I don't think Teradata supports the multiple row values syntax. Just use select:
INSERT INTO Example(ProductId, Price)
WITH dual as (SELECT 1 as x)
SELECT 'Steve' as ProductId, 4 as Price FROM dual UNION ALL
SELECT 'James' as ProductId, 8 as Price FROM dual;
CTE syntax (working):
insert into target_table1 (col1, col2)
with cte as (select 1 col1)
select 'value1', 'value2' from cte
union all
select 'value1a', 'value2a' from cte
;
CTE Syntax not working in Teradata
(error: expected something between ")" and the "insert" keyword)
with cte as (select 1 col1)
insert into target_table1 (col1, col2)
select 'value1', 'value2' from cte
union all
select 'value1a', 'value2a' from cte
;
I found a solution for this via RECURSIVE. It goes like this:-
INSERT INTO table (col1, col2)
with recursive table (col1, col2) as
(select 'val1','val2' from table) -- 1
select 'val1','val2' from table -- 2
union all select 'val3','val4' from table
union all select 'val5','val6' from table;
Data of line 1 does not get inserted (but you need this line). Starting from line 2, the data you enter for val1, val2 etc. gets inserted into the respective columns. Use as many UNION ALLs' as many rows you want to insert. Hope this helps :)
At least in our version of Teradata, we are not able to use an insert statement with a CTE. Instead, find a real table (preferably small in size) and do a top 1.
Insert Into OtherRealTable(x, y)
Select top 1
'x' as x,
'y' as y
FROM RealTable
create table dummy as (select '1' col1) with data;
INSERT INTO Student
(Name, Maths, Science, English)
SELECT 'Tilak', 90, 40, 60 from dummy union
SELECT 'Raj', 30, 20, 10 from dummy
;
yes you can try this.
INSERT INTO Student
SELECT (Name, Maths, Science, English) FROM JSON_Table
(ON (SELECT 1 id,cast('{"DataSet" : [
{"s":"m", "Name":"Tilak", "Maths":"90","Science":"40", "English":"60" },
{"s":"m", "Name":"Raj", "Maths":"30","Science":"20", "English":"10" }
]
}' AS json ) jsonCol)
USING rowexpr('$.DataSet[*]')
colexpr('[{"jsonpath":"$.s","type":"CHAR(1)"},{"jsonpath":"$.Name","type":"VARCHAR(30)"}, {"jsonpath":"$.Maths","type":"INTEGER"}, {"jsonpath":"$.Science","type":"INTEGER"}, {"jsonpath":"$.English","type":"INTEGER"}]')
) AS JT(id,State,Name, Maths, Science, English)

create view for items not in list

I have two tables. Table 1 is a master list of equipment with equipment_id and equipment_description. So, let's say for this table I have ten equipment_id's. 1,2,3....10 each with some description attached.
Table 2 logs when the equipment has been inspected:
equipment_id|inspection_date
1 | '1-22-2012'
2 | '1-22-2012'
4 | '1-22-2012'
2 | '1-23-2012'
3 | '1-23-2012'
I've created a view, v_dates which pulls out of table 2 all of the distinct inspection dates - not sure if I needed it but did it anyway.
I would like to create another view which shows all equipment that was NOT inspected for each date in the v_dates. So it would show:
3 | '1-22-2012'
5 | '1-22-2012'
and so on.
Rookie here and just not sure how to join these tables correctly. Can't get it to work and would appreciate any help.
Untested, but I think this should give the desired result:
SELECT i.id,d.date FROM
( SELECT DISTINCT inspection_date AS date FROM inspections ORDER BY 1 ) d
LEFT JOIN
inspections i
ON d.date=i.date
WHERE i.date IS NULL
GROUP BY 1,2
ORDER BY 1,2
Like mentioned in the comments would a table with inspection dates really help.
The following appears to work based on my test data using SQL SERVER 2005. I am using a CROSS JOIN of distinct dates along with a LEFT JOIN to throw out EQUIPMENT_ID records that exist for those dates.
Sorry, I am having problems getting my code formatting correct with tabs and spaces...
IF OBJECT_ID('tempdb..#EQUIPMENT') IS NOT NULL
DROP TABLE #EQUIPMENT
CREATE TABLE #EQUIPMENT
( EQUIPMENT_ID smallint,
EQUIPMENT_DESC varchar(32)
)
INSERT INTO #EQUIPMENT
( EQUIPMENT_ID, EQUIPMENT_DESC )
SELECT 1, 'AAA'
UNION SELECT 2, 'BBB'
UNION SELECT 3, 'CCC'
UNION SELECT 4, 'DDD'
UNION SELECT 5, 'EEE'
UNION SELECT 6, 'FFF'
UNION SELECT 7, 'GGG'
UNION SELECT 8, 'HHH'
UNION SELECT 9, 'III'
UNION SELECT 10, 'JJJ'
IF OBJECT_ID('tempdb..#INSPECTION') IS NOT NULL
DROP TABLE #INSPECTION
CREATE TABLE #INSPECTION
( EQUIPMENT_ID smallint,
INSPECTION_DATE smalldatetime
)
INSERT INTO #INSPECTION
( EQUIPMENT_ID, INSPECTION_DATE )
SELECT 1, '1-22-2012'
UNION SELECT 1, '1-27-2012'
UNION SELECT 3, '1-27-2012'
UNION SELECT 5, '1-29-2012'
UNION SELECT 7, '1-22-2012'
UNION SELECT 7, '1-27-2012'
UNION SELECT 7, '1-29-2012'
SELECT E.EQUIPMENT_ID, D.INSPECTION_DATE
FROM #EQUIPMENT E
CROSS JOIN ( SELECT DISTINCT INSPECTION_DATE
FROM #INSPECTION
) D
LEFT JOIN #INSPECTION I2
ON E.EQUIPMENT_ID = I2.EQUIPMENT_ID
AND D.INSPECTION_DATE = I2.INSPECTION_DATE
WHERE I2.EQUIPMENT_ID IS NULL
ORDER BY E.EQUIPMENT_ID, D.INSPECTION_DATE
As per my comment to the question, you really need a table of valid inspection dates. It makes the sql much more sensible, and besides it's the only way to do it if you want to see all items listed for dates when inspections were supposed to be done, but no inspections were done.
So, assuming the two tables:
create table inspections (equipment_id int, inspection_date date);
create table inspection_dates (id int, inspection_date date);
Then a join to get all the equipment that does not have an inspection on a date when an inspection should have taken place would be:
select i.equipment_id, id.inspection_date
from inspection_dates id,
(select distinct equipment_id from inspections) i
where not exists (select * from inspections i2
where i2.inspection_date = id.inspection_date
and i2.equipment_id = i.equipment_id);
You want the combos that do not exist. Thus the not exists predicate.
Note again, that presumably you would have a table for all the unique equipment_ids, but not knowing that I had to construct it myself in place.

Is there a nesting limit for correlated subqueries in some versions of Oracle?

Here is the code that will help you understand my question:
create table con ( content_id number);
create table mat ( material_id number, content_id number, resolution number, file_location varchar2(50));
create table con_groups (content_group_id number, content_id number);
insert into con values (99);
insert into mat values (1, 99, 7, 'C:\foo.jpg');
insert into mat values (2, 99, 2, '\\server\xyz.mov');
insert into mat values (3, 99, 5, '\\server2\xyz.wav');
insert into con values (100);
insert into mat values (4, 100, 5, 'C:\bar.png');
insert into mat values (5, 100, 3, '\\server\xyz.mov');
insert into mat values (6, 100, 7, '\\server2\xyz.wav');
insert into con_groups values (10, 99);
insert into con_groups values (10, 100);
commit;
SELECT m.material_id,
(SELECT file_location
FROM (SELECT file_location
FROM mat
WHERE mat.content_id = m.content_id
ORDER BY resolution DESC) special_mats_for_this_content
WHERE rownum = 1) special_mat_file_location
FROM mat m
WHERE m.material_id IN (select material_id
from mat
inner join con on con.content_id = mat.content_id
inner join con_groups on con_groups.content_id = con.content_id
where con_groups.content_group_id = 10);
Please consider the number 10 at the end of the query to be a parameter. In other words this value is just hardcoded in this example; it would change depending on the input.
My question is: Why do I get the error
"M"."CONTENT_ID": invalid identifier
for the nested, correlated subquery? Is there some sort of nesting limit? This subquery needs to be ran for every row in the resultset because the results will change based on the content_id, which can be different for each row. How can I accomplish this with Oracle?
Not that I'm trying to start a SQL Server vs Oracle discussion, but I come from a SQL Server background and I'd like to point out that the following, equivalent query runs fine on SQL Server:
create table con ( content_id int);
create table mat ( material_id int, content_id int, resolution int, file_location varchar(50));
create table con_groups (content_group_id int, content_id int);
insert into con values (99);
insert into mat values (1, 99, 7, 'C:\foo.jpg');
insert into mat values (2, 99, 2, '\\server\xyz.mov');
insert into mat values (3, 99, 5, '\\server2\xyz.wav');
insert into con values (100);
insert into mat values (4, 100, 5, 'C:\bar.png');
insert into mat values (5, 100, 3, '\\server\xyz.mov');
insert into mat values (6, 100, 7, '\\server2\xyz.wav');
insert into con_groups values (10, 99);
insert into con_groups values (10, 100);
SELECT m.material_id,
(SELECT file_location
FROM (SELECT TOP 1 file_location
FROM mat
WHERE mat.content_id = m.content_id
ORDER BY resolution DESC) special_mats_for_this_content
) special_mat_file_location
FROM mat m
WHERE m.material_id IN (select material_id
from mat
inner join con on con.content_id = mat.content_id
inner join con_groups on con_groups.content_id = con.content_id
where con_groups.content_group_id = 10);
Can you please help me understand why I can do this in SQL Server but not Oracle 9i? If there is a nesting limit, how can I accomplish this in a single select query in Oracle without resorting to looping and/or temporary tables?
Recent versions of Oracle do not have a limit but most older versions of Oracle have a nesting limit of 1 level deep.
This works on all versions:
SELECT (
SELECT *
FROM dual dn
WHERE dn.dummy = do.dummy
)
FROM dual do
This query works in 12c and 18c but does not work in 10g and 11g. (However, there is at least one version of 10g that allowed this query. And there is a patch to enable this behavior in 11g.)
SELECT (
SELECT *
FROM (
SELECT *
FROM dual dn
WHERE dn.dummy = do.dummy
)
WHERE rownum = 1
)
FROM dual do
If necessary you can workaround this limitation with window functions (which you can use in SQL Server too:)
SELECT *
FROM (
SELECT m.material_id, ROW_NUMBER() OVER (PARTITION BY content_id ORDER BY resolution DESC) AS rn
FROM mat m
WHERE m.material_id IN
(
SELECT con.content_id
FROM con_groups
JOIN con
ON con.content_id = con_groups.content_id
WHERE con_groups.content_group_id = 10
)
)
WHERE rn = 1
#Quassnoi This was the case in oracle 9. From Oracle 10 ...
From Oracle Database SQL Reference 10g Release 1 (10.1)
Oracle performs a correlated subquery when a nested subquery references a column from a table referred to a parent statement any number of levels above the subquery
From Oracle9i SQL Reference Release 2 (9.2)
Oracle performs a correlated subquery when the subquery references a column from a table referred to in the parent statement.
A subquery in the WHERE clause of a SELECT statement is also called a nested subquery. You can nest up to 255 levels of subqueries in the a nested subquery.
I don't think it works if you have something like select * from (select * from ( select * from ( ....))))
Just select * from TableName alias where colName = (select * from SomeTable where someCol = (select * from SomeTable x where x.id = alias.col))
Check out http://forums.oracle.com/forums/thread.jspa?threadID=378604
Quassnoi answered my question about nesting, and made a great call by suggesting window analytic functions. Here is the exact query that I need:
SELECT m.material_id, m.content_id,
(SELECT max(file_location) keep (dense_rank first order by resolution desc)
FROM mat
WHERE mat.content_id = m.content_id) special_mat_file_location
FROM mat m
WHERE m.material_id IN (select material_id
from mat
inner join con on con.content_id = mat.content_id
inner join con_groups on con_groups.content_id = con.content_id
where con_groups.content_group_id = 10);
Thanks!

DB2 SQL - median with GROUP BY

First of all, I am running on DB2 for i5/OS V5R4. I have ROW_NUMBER(), RANK() and common table expressions. I do not have TOP n PERCENT or LIMIT OFFSET.
The actual data set I'm working with is hard to explain, so let's just say I have a weather history table where the columns are (city, temperature, timestamp). I want to compare medians to averages for each group (city).
This was the cleanest way I found to get a median for a whole table aggregation. I adapted it from the IBM Redbook here:
WITH base_t AS
( SELECT temp, row_number() over (order by temperature) AS rownum FROM t ),
count_t AS
( SELECT COUNT(temperature) + 1 AS base_count FROM base_t ),
median_t AS
( SELECT temperature FROM base_t, count_t
WHERE rownum in (FLOOR(base_count/2e0), CEILING(base_count/2e0)) )
SELECT DECIMAL(AVG(temperature),10,2) AS median FROM median_t
That works well for getting a single row back, but it seems to fall apart for grouping. Conceptually, this is what I want:
SELECT city, AVG(temperature), MEDIAN(temperature) FROM ...
city | mean_temp | median_temp
===================================================
'Minneapolis' | 60 | 64
'Milwaukee' | 65 | 66
'Muskegon' | 70 | 61
There could be an answer that makes me look stupid, but I'm having a mental block and this isn't my #1 thing to work on right now. Seems like it could be possible, but I can't use something that's extremely complex since it's a large table and I want the ability to customize which columns are being aggregated.
In SQL Server, agreagate functions like count(*) can be partitioned and calculated without a group by. I looked quickly through the referenced redbook, and it looks like DB2 has the same feature. But if not, then this won't work:
create table TemperatureHistory
(City varchar(20)
, Temperature decimal(5, 2)
, DateTaken datetime)
insert into TemperatureHistory values ('Minneapolis', 61, '20090101')
insert into TemperatureHistory values ('Minneapolis', 59, '20090102')
insert into TemperatureHistory values ('Milwaukee', 65, '20090101')
insert into TemperatureHistory values ('Milwaukee', 65, '20090102')
insert into TemperatureHistory values ('Milwaukee', 100, '20090103')
insert into TemperatureHistory values ('Muskegon', 80, '20090101')
insert into TemperatureHistory values ('Muskegon', 70, '20090102')
insert into TemperatureHistory values ('Muskegon', 70, '20090103')
insert into TemperatureHistory values ('Muskegon', 20, '20090104')
; with base_t as
(select city
, Temperature
, row_number() over (partition by city order by temperature) as RowNum
, (count(*) over (partition by city)) + 1 as CountPlusOne
from TemperatureHistory)
select City
, avg(Temperature) as MeanTemp
, avg(case
when RowNum in (FLOOR(CountPlusOne/2.0), CEILING(CountPlusOne/2.0))
then Temperature
else null end) as MedianTemp
from base_t
group by City