unpivot sql table - sql

I have a table with logid,skilllevel,logskill where Data is like
logid, skilllevel1, skilllevel2,skilllevel3,logonskill1,logonskill2,logonskill3,
101, 90, 40, 60 1 2 3
102, 30, 20, 10 4 5 6
I want to get it arranged like the following:
logid, skilllevel, logonskill , skillposition
101, 90, 1 1
101, 40, 2 2
102, 30, 4 1
skilllevel1 corresponds to logonskill1 as so on
skillposition is the substring of logonskill
How can I achieve this?

My preferred method is a lateral join, using apply:
select v.*
from t cross apply
(values (logid, skilllevel1, logonskill1, 1),
(logid, skilllevel2, logonskill2, 2),
(logid, skilllevel3, logonskill3, 3)
) v(logid, skilllevel, logonskill, skillposition)
where skilllevel is not null or logonskill is not null;
Lateral joins are very powerful. This is just one or many things that you can do with apply.

Related

DB2 SQL : selecting rows where value is different from previous one

Let's say I have a table (PERSON) like this :
I would like to select only the rows where the value of column "C" has changed from previous row.
In this case, I should get : rows 1, 4, 5, 7, 8, 9 and 15.
I can't figure out how to achieve this.
Does someone has an idea please ?
Thank you
Try this:
WITH PERSON (ROW_NUMBER, C) AS
(
VALUES
( 1, NULL::INT)
, ( 3, NULL::INT)
, ( 4, 189)
, ( 5, NULL::INT)
, ( 6, NULL::INT)
, ( 7, 212)
, ( 8, NULL::INT)
, ( 9, 235)
, (10, 235)
, (11, NULL::INT)
)
SELECT ROW_NUMBER, C
FROM
(
SELECT
P.*
, LAG (P.C) OVER (ORDER BY ROW_NUMBER) AS C_PREV
, LAG (P.ROW_NUMBER) OVER (ORDER BY ROW_NUMBER) AS ROW_NUMBER_PREV
FROM PERSON P
)
WHERE
ROW_NUMBER_PREV IS NULL
OR (C IS DISTINCT FROM C_PREV)
ROW_NUMBER
C
1
4
189
5
7
212
8
9
235
11

How to use a table in SQL WITH statement?

I am trying to use a pre-existing table in the SQL statement at the bottom of the question rather than the data that is being generated in the SQL statement. Currently, there is some data that is generated using:
WITH polys(poly_id, geom) AS (VALUES (1, 'POLYGON((1 1, 1 5, 4 5, 4 4, 2 4, 2 2, 4 2, 4 1, 1 1))'::GEOMETRY),
(2, 'POLYGON((6 6, 6 10, 8 10, 9 7, 8 6, 6 6))'::GEOMETRY)),
However, let's say I already have a table named polys with the poly_id and geom columns, exactly as what would be created above. How can I insert my pre-existing polys table into this SQL statement (i.e. what syntax would I use)?
I have tried the following to add a pre-existing polys table using:
CREATE TABLE polys_pts AS
WITH polys(poly_id, geom) AS,
with the following error:
ERROR: syntax error at or near ","
LINE 2: WITH polys(poly_id, geom) AS,
^
Full Code:
CREATE TABLE polys_pts AS
WITH polys(poly_id, geom) AS (VALUES (1, 'POLYGON((1 1, 1 5, 4 5, 4 4, 2 4, 2 2, 4 2, 4 1, 1 1))'::GEOMETRY),
(2, 'POLYGON((6 6, 6 10, 8 10, 9 7, 8 6, 6 6))'::GEOMETRY)),
pnt_clusters AS (SELECT polys.poly_id,
CASE
WHEN ST_Area(polys.geom)>9 THEN ST_ClusterKMeans(pts.geom, 8) OVER(PARTITION BY polys.poly_id)
ELSE ST_ClusterKMeans(pts.geom, 2) OVER(PARTITION BY polys.poly_id)
END AS cluster_id, pts.geom FROM polys,
LATERAL ST_Dump(ST_GeneratePoints(polys.geom, 1000, 1)) AS pts),
centroids AS (SELECT cluster_id, ST_PointOnSurface(ST_collect(geom)) AS geom FROM pnt_clusters GROUP BY poly_id, cluster_id),
neg_buffer AS (SELECT poly_id, (ST_Buffer(geom, -0.4, 'endcap=flat join=round')) geom FROM polys GROUP BY poly_id, polys.geom),
neg_buffer_pts_out AS (SELECT a.cluster_id, (a.geom) geom FROM centroids a WHERE EXISTS (SELECT 1 FROM neg_buffer b WHERE ST_Intersects(a.geom, b.geom))),
neg_buffer_pts_in AS (SELECT a.cluster_id, (a.geom) geom FROM centroids a WHERE NOT EXISTS (SELECT 1 FROM neg_buffer b WHERE ST_Intersects(a.geom, b.geom))),
snap_pts_clusters_in AS (SELECT DISTINCT ST_ClosestPoint(ST_ExteriorRing(a.geom), b.geom) AS geom FROM neg_buffer a, neg_buffer_pts_in b),
node_pts AS (SELECT ST_StartPoint(ST_ExteriorRing(geom)) geom FROM neg_buffer),
snap_pts AS (SELECT b.cluster_id, a.geom FROM snap_pts_clusters_in a JOIN centroids b ON ST_DWithin(a.geom, b.geom, 0.4))
SELECT a.cluster_id, (a.geom) geom FROM snap_pts a WHERE NOT EXISTS (SELECT 1 FROM node_pts b WHERE ST_Intersects(a.geom, b.geom))
UNION SELECT c.cluster_id, (c.geom) geom FROM neg_buffer_pts_out c ORDER BY cluster_id;
I'm not sure of understanding your question so i give you a broad answer.
To create a table from a query you must use:
CREATE TABLE foo AS
SELECT * FROM my_table;
CTEs are builded as:
WITH
tmp1 AS (
SELECT * from my_table1
), -- commna
tmp2 AS (
SELECT * from my_table2
)
SELECT * from tmp1 JOIN tmp2 ON tmp1.id = tmp2.id -- no comma
;
Note that the are , to separate different "temporary" tables defined in the CTE but the final sentence is not preceded with a ,
So to create a table from a CTE the syntax will be:
CREATE TABLE foo AS
WITH
tmp1 AS (
SELECT * from my_table1
),
tmp2 AS (
SELECT * from my_table2
)
SELECT * from tmp1 JOIN tmp2 ON tmp1.id = tmp2.id -- no comma
;
Create a table from a VALUES clause is the same as the other cases:
CREATE TABLE polys2 AS
VALUES
(1, 'POLYGON((1 1, 1 5, 4 5, 4 4, 2 4, 2 2, 4 2, 4 1, 1 1))'::GEOMETRY),
(2, 'POLYGON((6 6, 6 10, 8 10, 9 7, 8 6, 6 6))'::GEOMETRY)
;
If you already have a table called polys2 that has been created for example like is shown in the previous example, you can replace
CREATE TABLE polys_pts AS
WITH
polys(poly_id, geom) AS (
VALUES
(1, 'POLYGON((1 1, 1 5, 4 5, 4 4, 2 4, 2 2, 4 2, 4 1, 1 1))'::GEOMETRY),
(2, 'POLYGON((6 6, 6 10, 8 10, 9 7, 8 6, 6 6))'::GEOMETRY)),
pnt_clusters AS (SELECT polys.poly_id, ...
with
CREATE TABLE polys_pts AS
WITH
polys(poly_id, geom) AS (
SELECT poly_id, geom FROM polys2
),
pnt_clusters AS (SELECT polys.poly_id, ...
um, the question is not 100% clear to me - ... I am not familiar with pecularities of postgresql, but my first bet would be to try
WITH polys(...) AS (...),
pnt_clusters AS (...)
CREATE polys_pts AS (
SELECT ..
FROM polys... etc.
)
but I guess this is not allowed since WITH only goes with DML statements (data manipulation unlike data definition (DDL) statements like CREATE)
so.. my next bet would be to try using polys and pnt_clusters that you defined inside WITH clause, inline inside the SELECT statement, given that
WITH a AS (
SELECT x, y FROM z
)
SELECT *
FROM a
is the same as
SELECT *
FROM (
SELECT x, y
FROM z
) AS a
well, otherwise I would split the process into two steps - create some kind of temporary tables first for polys and pnt_clusters and then do the create...
The definition of a CTE must be a complete statement, so you have to use
WITH polys(poly_id, geom) AS (
SELECT *
FROM (VALUES
(1, 'POLYGON((1 1, 1 5, 4 5, 4 4, 2 4, 2 2, 4 2, 4 1, 1 1))'::GEOMETRY),
(2, 'POLYGON((6 6, 6 10, 8 10, 9 7, 8 6, 6 6))'::GEOMETRY)
) AS p(p, g)
)

Group elements of a column into mulitple subgroups SQL

I am looking at different breeds of cattle and their AnimalTypeCode , BreedCateoryID and resultant Growth.
I have the following query
SELECT DATEPART(yyyy,[KillDate])
,[AnimalTypeCode]
,AVG([Growth])
,[BreedCategoryID]
FROM [dbo].[tblAnimal]
WHERE (AnimalTypeCode='C'
or AnimalTypeCode= 'E')
GROUP BY DATEPART(yyyy,[KillDate])
,[AnimalTypeCode]
,[BreedCategoryID]
GO
This query is good and gives me almost what I want, but BreedCategoryID is numbered 1 through 7 and I would like to group them:
(1 = Pure Dairy),
(2 and 3 = Dairy)
(4, 5, 6 and 7 = Beef)
So instead of getting the mean Growthrate for each BreedCategoryID I would like to get the average for Pure Dairy, Dairy, and Beef.
Any help greatly appreciated!
You can assign a new "variable" using cross apply in the from clause:
SELECT YEAR(KillDate]), a.AnimalTypeCode, v.grp,
AVG([Growth])
FROM [dbo].[tblAnimal] a CROSS APPLY
(VALUES (CASE WHEN a.BreedCategoryID IN (1) THEN 'Pure Dairy'
WHEN a.BreedCategoryID IN (2, 3) THEN 'Dairy'
WHEN a.BreedCategoryID IN (4, 5, 6, 7) THEN 'Beef'
END)
) as v(grp)
WHERE a.AnimalTypeCode IN ('C', 'E')
GROUP BY YEAR(KillDate]), a.AnimalTypeCode, v.grp;
Note that I also introduced table aliases and qualified all the column references.
Do the calculations in a derived table (the subquery). GROUP BY its result:
select killyear, [AnimalTypeCode], AVG([Growth]), BreedCat
(
SELECT DATEPART(yyyy,[KillDate]) killyear
,[AnimalTypeCode]
,[Growth]
,case when [BreedCategoryID] = 1 then 'Pure Dairy'
when [BreedCategoryID] in (2, 3) then 'Dairy'
when [BreedCategoryID] in (4, 5, 6, 7) then 'Beef'
end BreedCat
FROM [dbo].[tblAnimal]
WHERE (AnimalTypeCode='C'
or AnimalTypeCode= 'E')
) dt
GROUP BY killyear
,[AnimalTypeCode]
,BreedCat

Maintain WHERE Order In SQL Select

Is it possible to maintain the order of the WHERE clause when doing a SELECT for specific records?
For instance, given the following SELECT statement:
SELECT [RecSeq] FROM [MyData] WHERE
[RecSeq]=3 OR [RecSeq]=2 OR [RecSeq]=1 OR [RecSeq]=21 OR [RecSeq]=20 OR
[RecSeq]=19 OR [RecSeq]=110 OR [RecSeq]=109 OR [RecSeq]=108 OR
[RecSeq]=53 OR [RecSeq]=52 OR [RecSeq]=51;
I'd like the results to come back as:
3
2
1
21
20
19
110
109
108
53
53
51
However, what I get back isn't in any particular order. Currently I have a loop that calls the SELECT statement for each record required. This could range anywhere from 1 to 700,000 times. Needless to say the performance isn't the best.
Any solutions or am I stuck in the loop?
You need the ORDER BY FIELD clause.
SELECT RecSeq From MyData WHERE RecSeq IN (3, 2, 1, 21, 20, 19, 110, 109, 108, 53, 52, 51)
ORDER BY FIELD (RecSeq, 3, 2, 1, 21, 20, 19, 110, 109, 108, 53, 52, 51);
You don't say what database system you are using - I know this works in MySQL.
There is exactly one way to reliable enforce an ordering of the results of a sql statement: use an order by clause. I don't know if it is standard sql, but in oracle you could do something like this:
select ... from ...
where recseq in ( 3, 2, 1, 21, 20, 19, 110, 109, 108, 53, 53, 51)
order by decode(recseq 3,1, 2,2, 1,3, 21,4, 20,5, 19,6, 110,7, 109,8, 108,9, 53,10, 53,11, 51,12,13)
WHERE clause cannot specify your output order.
You will have to sort your results using an "order by".
If you absolutely need this order, try a 'pseudo-column' , or fake column with a union clause (performance warning here).
select 0 as my_fake_column, blah_columns from table where recseq = 3
UNION
select 1, blah_columns from table where recseq = 2
UNION
select 2, blah_columns from table where recseq = 1
UNION
select 3, blah_columns from table where recseq = 21
order by my_fake_column
The above will deliver the results in your specific order 3,2,1,21.
As the other poster said, adding a column could be an option.
You can use a derived table for filtering and sorting like this
SELECT t.RecSeq
FROM MyData t
JOIN (
SELECT 3, 1 UNION ALL
SELECT 2, 2 UNION ALL
SELECT 1, 3 UNION ALL
SELECT 21, 4 UNION ALL
SELECT 20, 5 UNION ALL
SELECT 19, 6
...
) f(RecSeq, SortKey)
ON t.RecSeq = f.RecSeq
ORDER BY f.SortKey
Ya there is a way, although, some might consider it a hack. Also, I want to point out that you can/should use the IN function instead of the giant conditional statement.
SELECT [RecSeq]
FROM [MyData]
WHERE [RecSeq] in (3,2,1,21,20,19,110,109,108,53,52,51)
ORDER BY DECODE (recseq 3,1, 2,2, 1,3, 21,4,......)
You could try using a UNION. Something like:
SELECT [RecSeq], 1 FROM [MyData] WHERE [RecSeq]=3
UNION
SELECT [RecSeq], 2 FROM [MyData] WHERE [RecSeq]=2
UNION
SELECT [RecSeq], 3 FROM [MyData] WHERE [RecSeq]=1
*etc...*
ORDER BY 2

How to do equivalent of "limit distinct"?

How can I limit a result set to n distinct values of a given column(s), where the actual number of rows may be higher?
Input table:
client_id, employer_id, other_value
1, 2, abc
1, 3, defg
2, 3, dkfjh
3, 1, ldkfjkj
4, 4, dlkfjk
4, 5, 342
4, 6, dkj
5, 1, dlkfj
6, 1, 34kjf
7, 7, 34kjf
8, 6, lkjkj
8, 7, 23kj
desired output, where limit distinct=5 distinct values of client_id:
1, 2, abc
1, 3, defg
2, 3, dkfjh
3, 1, ldkfjkj
4, 4, dlkfjk
4, 5, 342
4, 6, dkj
5, 1, dlkfj
Platform this is intended for is MySQL.
You can use a subselect
select * from table where client_id in
(select distinct client_id from table order by client_id limit 5)
This is for SQL Server. I can't remember, MySQL may use a LIMIT keyword instead of TOP. That may make the query more efficient if you can get rid of the inner most subquery by using the LIMIT and DISTINCT in the same subquery. (It looks like Vinko used this method and that LIMIT is correct. I'll leave this here for the second possible answer though.)
SELECT
client_id,
employer_id,
other_value
FROM
MyTable
WHERE
client_id IN
(
SELECT TOP 5
client_id
FROM
(
SELECT DISTINCT
client_id
FROM
MyTable
) SQ
ORDER BY
client_id
)
Of course, add in your own WHERE clause and ORDER BY clause in the subquery.
Another possibility (compare performance and see which works out better) is:
SELECT
client_id,
employer_id,
other_value
FROM
MyTable T1
WHERE
T1.code IN
(
SELECT
T2.code
FROM
MyTable T2
WHERE
(SELECT COUNT(*) FROM MyTable T3 WHERE T3,code < T2.code) < 5
)
-- Using Common Table Expression in Microsoft SQL Server.
-- LIMIT function does not exist in MS SQL.
WITH CTE
AS
(SELECT DISTINCT([COLUMN_NAME])
FROM [TABLE_NAME])
SELECT TOP (5) [[COLUMN_NAME]]
FROM CTE;
This works for ‍‍MS SQL if anyone is on that platform:
SET ROWCOUNT 10;
SELECT DISTINCT
column1, column2, column3,...
FROM
Table1
WHERE ...