I'm trying to get exclusive max values from a query.
My first query (raw data) is something like that:
Material¦Fornecedor
X B
X B
X B
X C
X C
Y B
Y D
Y D
Firstly, I need to create the max values query for table above. For that, I need to count sequentially sames values of Materials AND Fornecedors. I mean, I need to count until SQL find a line that shows different material and fornecedors.
After that, I'll get an result as showed below (max_line is the number of times that it found a line with same material and fornecedor):
max_line¦Material¦Fornecedor
3 X B
2 X C
1 Y B
2 Y D
In the end, I need to get the highest rows lines for an exclusive Material. The result of the query that I need to contruct, based on table above, should be like that:
max_line¦Material¦Fornecedor
3 X B
2 Y D
My code, so far, is showed below:
select * from
(SELECT max(w2.line) as max_line, w2.Material, w2.[fornecedor] FROM
(SELECT w.Material, ROW_NUMBER() OVER(PARTITION BY w.Material, w.[fornecedor]
ORDER BY w.[fornecedor] DESC) as line, w.[fornecedor]
FROM [Database].[dbo].['Table1'] w) as w2
group by w2.Material, w2.[fornecedor]) as w1
inner join (SELECT w1.Material, MAX(w1.max_line) AS maximo FROM w1 GROUP BY w1.material) as w3
ON w1.Material = w3.Material AND w1.row = w3.maximo
I'm stuck on inner join, since I can't alias a query and use it on inner join.
Could you, please, help me?
Thank you,
Use a window function to find the max row number then filter by it.
SELECT MAXROW, w1.Material, w1.[fornecedor]
FROM (
SELECT w2.Material, w2.[fornecedor]
, max([ROW]) over (partition by Material) MAXROW
FROM (
SELECT w.Material, w.[fornecedor]
, ROW_NUMBER() OVER (PARTITION BY w.Material, w.[fornecedor] ORDER BY w.[fornecedor] DESC) as [ROW]
FROM [Database].[dbo].['Table1'] w
) AS w2
) AS w1
WHERE w1.[ROW] = w1.MAXROW;
Related
I have a table showing production steps (PosID) for a production order (OrderID) and which machine (MachID) they will be run on; I’m trying to reduce the table to show one record for each order – the lowest position (field “PosID”) that is still open (field “Open” = Y); i.e. the next production step for the order.
Example data I have:
OrderID
PosID
MachID
Open
1
1
A
N
1
2
B
Y
1
3
C
Y
2
4
C
Y
2
5
D
Y
2
6
E
Y
Example result I want:
OrderID
PosID
MachID
1
2
B
2
4
C
I’ve tried two approaches, but I can’t seem to get either to work:
I don’t want to put “MachID” in the GROUP BY because that gives me all the records that are open, but I also don’t think there is an appropriate aggregate function for the “MachID” field to make this work.
SELECT “OrderID”, MIN(“PosID”), “MachID”
FROM Table T0
WHERE “Open” = ‘Y’
GROUP BY “OrderID”
With this approach, I keep getting error messages that T1.”PosID” (in the JOIN clause) is an invalid column. I’ve also tried T1.MIN(“PosID”) and MIN(T1.”PosID”).
SELECT T0.“OrderID”, T0.“PosID”, T0.“MachID”
FROM Table T0
JOIN
(SELECT “OrderID”, MIN(“PosID”)
FROM Table
WHERE “Open” = ‘Y’
GROUP BY “OrderID”) T1
ON T0.”OrderID” = T1.”OrderID”
AND T0.”PosID” = T1.”PosID”
Try this:
SELECT “OrderID”,“PosID”,“MachID” FROM (
SELECT
T0.“OrderID”,
T0.“PosID”,
T0.“MachID”,
ROW_NUMBER() OVER (PARTITION BY “OrderID” ORDER BY “PosID”) RNK
FROM Table T0
WHERE “Open” = ‘Y’
) AS A
WHERE RNK = 1
I've included the brackets when selecting columns as you've written it in the question above but in general it's not needed.
What it does is it first filters open OrderIDs and then numbers the OrderIDs from 1 to X which are ordered by PosID
OrderID
PosID
MachID
Open
RNK
1
2
B
Y
1
1
3
C
Y
2
2
4
C
Y
1
2
5
D
Y
2
2
6
E
Y
3
After it filters on the "rnk" column indicating the lowest PosID per OrderID. ROW_NUMBER() in the select clause is called a window function and there are many more which are quite useful.
P.S. Above solution should work for MSSQL
I am not very fluent with SQL.. Im just facing a little issue in making the best and efficient sql query. I have a table with a composite key of column A and B as shown below
A
B
C
1
1
4
1
2
5
1
3
3
2
2
4
2
1
5
3
1
4
So what I need is to find rows where column C has both values of 4 and 5 (4 and 5 are just examples) for a particular value of column A. So 4 and 5 are present for two A values (1 and 2). For A value 3, 4 is present but 5 is not, hence we cannot take it.
My explanation is so confusing. I hope you get it.
After this, I need to find only those where B value for 4 (First Number) is less than B value for 5 (Second Number). In this case, for A=1, Row 1 (A-1, B-1,C-4) has B value lesser than Row 2 (A-1, B-2, C-5) So we take this row. For A = 2, Row 1(A-2,B-2,C-4) has B value greater than Row 2 (A-2,B-1,C-5) hence we cannot take it.
I Hope someone gets it and helps. Thanks.
Rows containing both c=4 and c=5 for a given a and ordered by b and by c the same way.
select a, b, c
from (
select tbl.*,
count(*) over(partition by a) cnt,
row_number() over (partition by a order by b) brn,
row_number() over (partition by a order by c) crn
from tbl
where c in (4, 5)
) t
where cnt = 2 and brn = crn;
EDIT
If an order if parameters matters, the position of the parameter must be set explicitly. Comparing b ordering to explicit parameter position
with params(val, pos) as (
select 4,2 union all
select 5,1
)
select a, b, c
from (
select tbl.*,
count(*) over(partition by a) cnt,
row_number() over (partition by a order by b) brn,
p.pos
from tbl
join params p on tbl.c = p.val
) t
where cnt = (select count(*) from params) and brn = pos;
I assume you want the values of a where this is true. If so, you can use aggregation:
select a
from t
where c in (4, 5)
group by a
having count(distinct c) = 2;
I'm looking for a similar SQL Statement to the any statement in R. What I have is a time-series data set that begins in 2014 and ends in 2020. I have a column that identifies if, in 2016, individuals voluntarily or involuntarily changed a drug. What I want to do is completely remove any individuals that involuntarily changed a drug. In R what I would do is group by the individual's ID and delete all IDs from the data set if the DrugChange column is 'Involuntarily'. My R code would look like this:
df<-df%>%group_by(ID)%>%filter(!any(DrugChange=='Involuntarily'))
In SQL I've been searching around for a somewhat simple solution, and (stupidly) thought just using a WHERE statement would work, but all it does is remove one row not all rows. Is there a way I can use a WHERE statement or is there a better method?
I think you want something like this:
select id
from t
group by id
having sum(case when DrugChange = 'Involuntarily' then 1 else 0 end) = 0;
My understanding is that you are looking to take a subset of rows such that if any row for an ID has Involuntarily in the DrugChange column then all rows for that ID should be excluded so in the example in the Note at the end all rows for ID 1 would be excluded and all rows for ID 2 would be kept.
1) windowing function Using the test data in the Note at the end and an SQL windowing function create a column ok which is 1 for every row of an ID not having any Involutarily in the DrugChange column and then pick only those rows . We have removed the ok column but if you want it omit the [-1].
library(sqldf)
sqldf("select * from (
select not max(DrugChange = 'Involuntarily') over (partition by ID) ok, *
from df
) where ok")[-1]
giving:
DrugChange ID
1 X 2
2 X 2
1a) This could be written in terms of a CTE like this:
sqldf("with inner as (
select not max(DrugChange = 'Involuntarily') over (partition by ID) ok, *
from df
)
select * from inner where ok")[-1]
2) join An alternate approach is to generate one row per ID with the ok value and then join it to df if ok is 1.
sqldf("select a.*
from df a join (select ID, not max(DrugChange = 'Involuntarily') ok
from df
group by ID) b on a.ID = b.ID and b.ok")
giving:
DrugChange ID
1 X 2
2 X 2
2a) We could also write this in terms of a CTE like this:
sqldf("with right as (
select ID, not max(DrugChange = 'Involuntarily') ok
from df
group by ID
)
select a.* from df a join right b on a.ID = b.ID and b.ok")
3) in A different approach is to use in as shown here:
sqldf("select *
from df
where id not in (select distinct id from df where DrugChange = 'Involuntarily')")
giving:
DrugChange ID
1 X 2
2 X 2
It will also work without the distinct keyword.
3a) We could also write it with a CTE like this:
sqldf("with ids as (
select distinct id from df where DrugChange = 'Involuntarily'
)
select * from df where id not in ids")
Note
Test data used.
df <- data.frame(DrugChange = c("Involuntarily", "X", "X", "X"), ID = c(1,1,2,2))
How can I build a query with records such that 2 columns are unique?
This is my attempted code:
Select x.a, y.b
from table1 x, table2 y
where x.id = y.id;
1 - a
2 - b
3 - a
4 - b
5 - c
1 - c
6 - a
2 - a
should return this:
1 - a
2 - b
5 - c
But there is data loss, that's okay, I only want both unique columns.
I have tried to use group by, but that doesn't get the uniqueness of both.
To make it more descriptive about the intention behind:
table1 above stores classification, and table2 stores values. One classification can have multiple values and one value can belong to any number of classification. Now i want to pick all(or some) classifications from table1 and corresponding to each pick one value, condition is that the values should not repeat if it is picked with any other classification. Here in my example 1,2,3..6 are classifications and a,b,c are values. Now i want to pick classifications and corresponding values, i picked classification 1,2,5 and corresponding values a,b,c because all other classification will make the value repetition(say 3 will make value a repetition as a is already picked with classification 1).
OP was looking for unique values (table2) with an arbitrary classification (table1). As such, a method that can be used is to enumerate all the classification and value pairs within each group of values from table2 and then only select the first value from each group.
WITH
enumerated_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY y.b ORDER BY x.a DESC) rn
, x.a
, y.b
FROM table1 x
INNER JOIN table2 y ON x.id = y.id
)
SELECT a, b
FROM enumerated_data
WHERE rn = 1
If your data is as you posted above, then it is already unique.
1 - a
2 - b
3 - a
4 - b
5 - c
1 - c
6 - a
2 - a
[1,a], [3,a], [6,a], [2,a] are all separate unique values.
You can use an aggregate function like min or max on the number portion if you want to match the data like that. Your desired output is not min or max though, so not really sure what you are trying to achieve.
Sounds like a two-level group by is required?
select count(*), x.a, y.b
from table1 x
inner join table2 y on x.id = y.id
group by x.a, y.b;
This is my first post and I am new to SQL
I have a table like
H Amount Count ID
h1 2 1 x
h2 3 2 x
h3 5 3 x
h1 3 3 x
h1 1 5 y
h2 3 2 x
h3 1 1 x
h3 2 3 y
h2 5 5 y
and I want SUM(Amount*Count) of each H group based on id / Total SUM(Amount*Count) in that H group
i.e
H value ID
h1 11/16 x value = (2*1+3*3)/2*1+3*3+1*5
h1 5/16 y value = 1*5/ 2*1+3*3+1*5
h2 12/37 x
h2 25/37 y
h3 16/22 x
h3 6/22 y
My aim is to group by H and then on EACH GROUP I have to do - Sum(average*count) Over(partition by ID) / Sum(average*count)
but I am not able to write such query can you guys please help me.
And sorry about the formatting
Thanks
Try this:
SELECT t2.h, t1.value1/t2.value2, t1.id
FROM
(SELECT sum(value) as value1, id from table
group by id) as t1,
(SELECT sum(value) as value2, h from table
group by h) as t2
WHERE t1.h = t2.h
The easy answer is to use an inner query like so:
SELECT SUM(Amount * Count), (SELECT SUM(Amount * Count) FROM table AS t2 WHERE t2.H = t1.H)
FROM table AS t1
GROUP BY H, ID
This essentially refers to the same table as both t1 and t2 for two different queries.
However, the specific database management system you're using (MySQL, Microsoft SQL Server, sqlite, whatever) may have a built-in function to handle this sort of thing. You should look into what your DBMS offers (or tag your question here with a specific platform).
What you want to do is get the dividend value (sum of amount*count per h -> id group) and join the divisor value (sum of amount*count per h group) in another subselect:
SELECT
a.h, a.id, a.dividend / b.divisor AS value
FROM
(
SELECT h, id, SUM(amount*count) AS dividend
FROM tbl
GROUP BY h, id
) a
INNER JOIN
(
SELECT h, SUM(amount*count) AS divisor
FROM tbl
GROUP BY h
) b ON a.h = b.h