Select top N columns based on standardized values - sql

Got a bit of googly question. Is it possible to select say 10 columns based on the values in each column if all the values are standardized.
So for example
cluster Id | v1 | v2| v3 | v4 | v6 | v26
___________________________________________
1 | 4.2|0.9|05 |3.2 | 0.7|0.5
2 | 1.2|0.1|0.9 |0.21|0.3 |0.1
so in this example if I wanted 3 top three columns for cluster 1 i'd have
cluster ID |v1 |v4 |v2
1 |4.2|3.2|0.9
I'm open to any suggestions at the moment i'm using Oracle Sql but wiling to switch if theres a solution on a different platform and its impossible using SQL
edit. I've added an image which shows the feature i'm trying to replicate on Sql developer. The fetch size is the number of variables/attributes and there must be some table sitting behind the model that's being queried when I change the fetch size and thats the statement i'm trying to reproduce
thank you

If you want the top three values, I would unpivot the data and reaggregate. Oracle 12c has some useful functionality for this; for earlier versions I would just use more traditional SQL methods.
It is unclear whether you want the column names or the values. The following does both:
select id,
max(case when seqnum = 1 then v end) as v_1,
max(case when seqnum = 2 then v end) as v_2,
max(case when seqnum = 3 then v end) as v_3,
max(case when seqnum = 1 then which end) as which_1,
max(case when seqnum = 2 then which end) as which_2,
max(case when seqnum = 3 then which end) as which_3
from (select id, v, which, row_number() over (partition by id order by v desc) as seqnum
from ((select id, v1 as v, 'v1' as which from t) union all
(select id, v2 as v, 'v2' as which from t) union all
(select id, v3 as v, 'v3' as which from t) union all
(select id, v4 as v, 'v4' as which from t) union all
(select id, v5 as v, 'v5' as which from t)
) t
) t
group by id;

In the end the approach I took was to go through all the Oracle Data Miner tables created during the clustering of my dataset. One of them , table DM$PTCLUS_K_M_1_2 , contained a pivot table with with all the clusters,values,variable Id and name. Recreated here using my example
cluster_id,variable_id,value,variable_name
1 | 1 | 4.2 | v1
And by doing a nested select statement with a where clause (cluster_id) and ordering by value I could then pick out the top 10 variables and their values for each cluster
select * from
(select * from DM$PTCLUS_K_M_1_2
where cluster_id = 1
order by value)
where rownum < 11
For those with a similar problem and want to get cluster centroids or values i suggest looking at the dataminer schema and checking the tables there , a few of them will contain the data u need

Related

SQL - Returning unique row based on criteria and a priority

I have a data table that looks in practice like this:
Team Shirt Number Name
1 1 Seaman
1 13 Lucas
2 1 Bosnic
2 14 Schmidt
2 23 Woods
3 13 Tubilandu
3 14 Lev
3 15 Martin
I want to remove duplicates of team by the following logic - if there is a "1" shirt number, use that. If not, look for a 13. If not look for 14 then any.
I realise it is probably quite basic but I don't seem to be making any progress with case statements. I know it's something with sub-queries and case statements but I'm struggling and any help gratefully received!
Using SSMS.
Since you didn't specified any DBMS, let me assume row_number() would work for that :
DELETE
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY Team
ORDER BY (CASE WHEN Shirt_Number = 1
THEN 1
WHEN Shirt_Number = 13
THEN 2
WHEN Shirt_Number = 14
THEN 3
ELSE 4
END)
) AS Seq
FROM table t
) t
WHERE Seq = 1;
This assuming Shirt_Numbers have a gap else only order by Shirt_Number enough.
I think you are looking for a partition by clause usage. Solution below worked in Sql Server.
create table #eray
(team int, shirtnumber int, name varchar(200))
insert into #eray values
(1, 1, 'Seaman'),
(1, 13, 'Lucas'),
(2, 1, 'Bosnic'),
(2, 14, 'Schmidt')
;with cte as (
Select Team, ShirtNumber, Name,
ROW_NUMBER() OVER (PARTITION BY Team ORDER BY ShirtNumber ASC) AS rn
From #eray
where ShirtNumber in (1,13,14)
)
select * from cte where rn=1
If you have a table of teams, you can use cross apply:
select ts.*
from teams t cross apply
(select top (1) ts.*
from teamshirts ts
where ts.team = t.team
order by (case shirt_number when 1 then 1 when 13 then 2 when 14 then 3 else 4 end)
) ts;
If you have no numbers between 2 and 12, you can simplify this to:
select ts.*
from teams t cross apply
(select top (1) ts.*
from teamshirts ts
where ts.team = t.team
order by shirt_number
) ts;

Aggregate data from multiple rows into single row

In my table each row has some data columns Priority column (for example, timestamp or just an integer). I want to group my data by ID and then in each group take latest not-null column. For example I have following table:
id A B C Priority
1 NULL 3 4 1
1 5 6 NULL 2
1 8 NULL NULL 3
2 634 346 359 1
2 34 NULL 734 2
Desired result is :
id A B C
1 8 6 4
2 34 346 734
In this example table is small and has only 5 columns, but in real table it will be much larger. I really want this script to work fast. I tried do it myself, but my script works for SQLSERVER2012+ so I deleted it as not applicable.
Numbers: table could have 150k of rows, 20 columns, 20-80k of unique ids and average SELECT COUNT(id) FROM T GROUP BY ID is 2..5
Now I have a working code (thanks to #ypercubeᵀᴹ), but it runs very slowly on big tables, in my case script can take one minute or even more (with indices and so on).
How can it be speeded up?
SELECT
d.id,
d1.A,
d2.B,
d3.C
FROM
( SELECT id
FROM T
GROUP BY id
) AS d
OUTER APPLY
( SELECT TOP (1) A
FROM T
WHERE id = d.id
AND A IS NOT NULL
ORDER BY priority DESC
) AS d1
OUTER APPLY
( SELECT TOP (1) B
FROM T
WHERE id = d.id
AND B IS NOT NULL
ORDER BY priority DESC
) AS d2
OUTER APPLY
( SELECT TOP (1) C
FROM T
WHERE id = d.id
AND C IS NOT NULL
ORDER BY priority DESC
) AS d3 ;
In my test database with real amount of data I get following execution plan:
This should do the trick, everything raised to the power 0 will return 1 except null:
DECLARE #t table(id int,A int,B int,C int,Priority int)
INSERT #t
VALUES (1,NULL,3 ,4 ,1),
(1,5 ,6 ,NULL,2),(1,8 ,NULL,NULL,3),
(2,634 ,346 ,359 ,1),(2,34 ,NULL,734 ,2)
;WITH CTE as
(
SELECT id,
CASE WHEN row_number() over
(partition by id order by Priority*power(A,0) desc) = 1 THEN A END A,
CASE WHEN row_number() over
(partition by id order by Priority*power(B,0) desc) = 1 THEN B END B,
CASE WHEN row_number() over
(partition by id order by Priority*power(C,0) desc) = 1 THEN C END C
FROM #t
)
SELECT id, max(a) a, max(b) b, max(c) c
FROM CTE
GROUP BY id
Result:
id a b c
1 8 6 4
2 34 346 734
One alternative that might be faster is a multiple join approach. Get the priority for each column and then join back to the original table. For the first part:
select id,
max(case when a is not null then priority end) as pa,
max(case when b is not null then priority end) as pb,
max(case when c is not null then priority end) as pc
from t
group by id;
Then join back to this table:
with pabc as (
select id,
max(case when a is not null then priority end) as pa,
max(case when b is not null then priority end) as pb,
max(case when c is not null then priority end) as pc
from t
group by id
)
select pabc.id, ta.a, tb.b, tc.c
from pabc left join
t ta
on pabc.id = ta.id and pabc.pa = ta.priority left join
t tb
on pabc.id = tb.id and pabc.pb = tb.priority left join
t tc
on pabc.id = tc.id and pabc.pc = tc.priority ;
This can also take advantage of an index on t(id, priority).
previous code will work with following syntax:
with pabc as (
select id,
max(case when a is not null then priority end) as pa,
max(case when b is not null then priority end) as pb,
max(case when c is not null then priority end) as pc
from t
group by id
)
select pabc.Id,ta.a, tb.b, tc.c
from pabc
left join t ta on pabc.id = ta.id and pabc.pa = ta.priority
left join t tb on pabc.id = tb.id and pabc.pb = tb.priority
left join t tc on pabc.id = tc.id and pabc.pc = tc.priority ;
This looks rather strange. You have a log table for all column changes, but no associated table with current data. Now you are looking for a query to collect your current values from the log table, which is a laborious task naturally.
The solution is simple: have an additional table with the current data. You can even link the tables with a trigger (so either every time a record gets inserted in your log table you update the current table or everytime a change is written to the current table you write a log entry).
Then just query your current table:
select id, a, b, c from currenttable order by id;

Advanced SQL Select and Union Statements

I've seen other similar questions and I have tried implementing many solutions, but to to no avail so far. This specific questions involves a little more complexity. What I need to do is create a table and join columns to the right side depending on certain criterion. It seems simple enough, but there are a few bumps that I am encountering.
The tables are as follows:
ADC_DATA_COLLECTION_HEADER
(PK)Transaction_ID | BEMSID | DEVICE | TIMESTAMP | CONFIG_NAME
ADC_DATA_COLLECTION_APPS
(FK)CONFIG_NAME | NUM_DATA_ELEMENTS | DATA_ELEMENT1 | DATA_ELEMENT2 | DATA_ELEMENT3 | DATA_ELEMENT4
ADC_DATA_COLLECTION_DATA
(FK)TRANSACTION_ID | DATA_ELEMENT_NUMBER | DATA
I want my final output to look like:
TRANSACTION_ID | DEVICE | CONFIG_NAME | DATA | DATA | DATA | DATA
The "data" column is filled in using the table ADC_DATA_COLLECTION_DATA. The first instance of "data" would be the "data" field in ADC_DATA_COLLECTION_DATA where DATA_ELEMENT_NUMBER = 1. The second instance of "data" would be the "data" field in ADC_DATA_COLLECTION_DATA where DATA_ELEMENT_NUMBER = 2... And so on.
The furthest I have gotten is by using a join statement, except I have nulls in places I do not want them. The code I have used and the results are posted below. So far I only wrote code for the first two columns of data.
SELECT
ADC_Data_Collection_header.BEMSID,
ADC_Data_Collection_header.DEVICE,
ADC_Data_Collection_header.CONFIG_NAME,
null AS locationlabel,
null AS partno
/*null AS partno2,
null AS DE4,
null AS DE5,
null AS DE6 */
FROM
ADC_Data_Collection_header,
ADC_Data_Collection_apps,
ADC_Data_Collection_data
WHERE
ADC_Data_Collection_header.CONFIG_NAME = 'mobileScanning'
AND ADC_Data_Collection_header.BEMSID = '2386531'
AND ADC_Data_Collection_header.CONFIG_NAME = ADC_Data_Collection_apps.CONFIG_NAME
AND (TO_DATE('7/19/2013','MM/DD/YYYY') <= timestamp AND TO_DATE('7/27/2013','MM/DD/YYYY') >= timestamp)
AND ADC_DATA_COLLECTION_HEADER.transaction_ID = ADC_DATA_COLLECTION_DATA.Transaction_ID
UNION
SELECT
null as BEMSID,
null as DEVICE,
null as CONFIG_NAME,
ADC_Data_Collection_DATA.DATA AS locationlabel,
null as partno
FROM
ADC_DATA_COLLECTION_DATA,
ADC_Data_Collection_header,
ADC_Data_Collection_apps
WHERE
ADC_DATA_COLLECTION_DATA.DATA_ELEMENT_NUMBER = 3
AND ADC_Data_Collection_header.CONFIG_NAME = 'mobileScanning'
AND (TO_DATE('7/19/2013','MM/DD/YYYY') <= timestamp AND TO_DATE('7/27/2013','MM/DD/YYYY') >= timestamp)
AND ADC_DATA_COLLECTION_HEADER.transaction_ID = ADC_DATA_COLLECTION_DATA.Transaction_ID
UNION
SELECT
null as BEMSID,
null as DEVICE,
null as CONFIG_NAME,
null as locationlabel,
ADC_Data_Collection_DATA.DATA AS partno
FROM
ADC_DATA_COLLECTION_DATA,
ADC_Data_Collection_header,
ADC_Data_Collection_apps
WHERE
ADC_DATA_COLLECTION_DATA.DATA_ELEMENT_NUMBER = 4
AND ADC_Data_Collection_header.CONFIG_NAME = 'mobileScanning'
AND (TO_DATE('7/19/2013','MM/DD/YYYY') <= timestamp AND TO_DATE('7/27/2013','MM/DD/YYYY') >= timestamp)
AND ADC_DATA_COLLECTION_HEADER.transaction_ID = ADC_DATA_COLLECTION_DATA.Transaction_ID
The result from this appears with null values which I do not want to have.
If you can offer an explicit solution using a join statement or a fix to this union approach, it would be much appreciated. Thank you in advance!
UNION gives you additional rows so it's not the right tool for this situation.
Here's an abbreviated version that uses your ADC_DATA_COLLECTION_DATA table only; you should be able to incorporate this into your query:
SELECT
Transaction_ID,
MAX(CASE WHEN Data_Element_Number = 1 THEN Data END) AS Data1,
MAX(CASE WHEN Data_Element_Number = 2 THEN Data END) AS Data2,
MAX(CASE WHEN Data_Element_Number = 3 THEN Data END) AS Data3,
MAX(CASE WHEN Data_Element_Number = 4 THEN Data END) AS Data4
FROM ADC_DATA_COLLECTION_DATA
GROUP BY Transaction_ID
This is a fairly common "Pivot Table" hack for Oracle (and MySQL and SQL Server). Oracle also supports PIVOT queries but I'm not that good with them.
Note that once you put your final query together with the Device and Config_Name columns, you'll need to add those columns to your GROUP BY.
I would use pivot for this:
select
h.transaction_id,
h.device,
h.config_name,
d.data1,
d.data2,
d.data3,
d.data4
from
ADC_DATA_COLLECTION_HEADER h
inner join (
select *
from ADC_DATA_COLLECTION_DATA
pivot
(
max(data)
for data_element_number in (1 as data1, 2 as data2, 3 as data3, 4 as data4)
)
) d
on d.transaction_id = h.transaction_id
where
(TO_DATE('7/19/2013','MM/DD/YYYY') <= timestamp AND TO_DATE('7/27/2013','MM/DD/YYYY') >= timestamp);
I put together an example SQL Fiddle at: http://www.sqlfiddle.com/#!4/fe1c94/9/0

Sort data row in sql

please help me i have columns from more than one table and the data type for all these columns is integer
i want to sort the data row (not columns (not order by)) Except the primary key column
for example
column1(pk) column2 column3 column4 column5
1 6 5 3 1
2 10 2 3 1
3 1 2 4 3
How do I get this result
column1(pk) column2 column3 column4 column5
1 1 3 5 6
2 1 2 3 10
3 1 2 3 4
Please help me quickly .. Is it possible ?? or impossible ???
if impossible how I could have a similar result regardless of sort
What database are you using? The capabilities of the database are important. Second, this suggests a data structure issue. Things that need to be sorted would normally be separate entities . . . that is, separate rows in a table. The rest of this post answers the question.
If the database supports pivot/unpivot you can do the following:
(1) Unpivot the data to get in the format , ,
(2) Use row_number() to assign a new column, based on the ordering of the values.
(3) Use the row_number() to create a varchar new column name.
(4) Pivot the data again using the new column.
You can do something similar if this functionality is not available.
First, change the data to rows:
(select id, 'col1', col1 as val from t) union all
(select id, 'col2', col2 from t) union all
. . .
Call this byrow. The following query appends a row number:
select br.*, row_number() over (partition by id order by val) as seqnum
from byrow br
Put this into a subquery to unpivot. The final solution looks like:
with byrow as (<the big union all query>)
select id,
max(case when seqnum = 1 then val end) as col1,
max(case when seqnum = 2 then val end) as col2,
...
from (select br.*, row_number() over (partition by id order by val) as seqnum
from byrow br
) br
group by id
You can use pivot function of sql server to convert the row in column. Apply the sorting there and again convert column to row using unpivot.
Here is a good example using PIVOT, you should be able to adapt this to meet your needs
http://blogs.msdn.com/b/spike/archive/2009/03/03/pivot-tables-in-sql-server-a-simple-sample.aspx

sql query that will get a distinct type, brand, and model, but get a count of how many duplicates were found

I have a table "Competitor" and here are some of its columns:
Type | Brand | Model | Date | Resolution | etc.
The table will have duplicate Model entries (with obviously same Brand as well, but possibly a different Type (two possible types: 'ProAV' and 'Disti')). I need to build a query that will output a table like this:
Top (ProAV) | Top (Disti) | Last Occurrence | Brand | Model | Resolution | etc.
Basically I need a query that will get a distinct type, brand, and model, but get a count of how many duplicates were found and put that number in either Top (ProAV) or Top (Disti), whichever Type it has. I would need to pull the most recent (given Date) out of the duplicates, so that I can put its Date as the Last Occurrence field. I hope this makes sense, let me know if it doesn't.
SELECT SUM(CASE WHEN Type = 'ProAV' THEN 1 ELSE 0 END) AS TopProAV,
SUM(CASE WHEN Type = 'Disti' THEN 1 ELSE 0 END) AS TopDisti,
MAX(Date) AS LastOccurence,
Brand, Model, Resolution
FROM Competitor
GROUP BY Brand, Model, Resolution
EDIT: Based on the comment, you could use a subquery or CTE to accomplish what you want. Something like:
WITH cteMaxDate AS (
SELECT SUM(CASE WHEN Type = 'ProAV' THEN 1 ELSE 0 END) AS TopProAV,
SUM(CASE WHEN Type = 'Disti' THEN 1 ELSE 0 END) AS TopDisti,
MAX(Date) AS LastOccurence,
Brand, Model, Resolution
FROM Competitor
GROUP BY Brand, Model, Resolution
)
SELECT md.TopProAV, md.TopDisti,
md.LastOccurentce,
md.Brand, md.Model, md.Resolution,
c.AdditionalColumn1, c.AdditionalColumn2
FROM cteMaxDate md
INNER JOIN Competitor c
ON md.Brand = c.Brand
AND md.Model = c.Model
AND md.Resolution = c.Resolution
AND md.LastOccurence = c.Date
Do you have a limited number of Types? In this case you can solve your problem using pivot
More specifically, for the table
Type Model
---- -----
A X
B X
C Y
A Z
NULL NULL
you run this query
Select Model, [A], [B], [C]
From
(select Model, Type
from dbo.Competitor) as SourceTable
PIVOT
(Count([Type]) for [Type] in ([A], [B], [C])) as PivotTable
to get
Model A B C
------ - - -
X 1 1 0
Y 0 0 1
Z 1 0 0