This is the result set of my raw data.
what I want is to fill the next blank row for column
name with the first name per group.
in this example rowid 1881 and 1879 should be filled
MAR ROXAS, and 1881-1887 filled into RODRIGO DUTERTE and so on.
The ideal way is lag(ignore nulls), but SQL Server doesn't support that. Instead, you can use two levels of window functions:
select max(name) over (partition by name_rowid) as new_name
from (select t.*,
max(case when name is not null then rowid end) over (order by rowid) as name_rowid
from billtrans t
) t;
The above works in SQL Server 2012+. In SQL Server 2008, you can use much less efficient methods, such as outer apply:
select t.*, t2.name as new_name
from billtrans t outer apply
(select top 1 t2
from billtrans t2
where t2.rowid <= t.rowid and t2.name is not null
order by t2.rowid desc
) t2;
You can also phrase this using a similarly structured correlated subquery.
Related
I have the following table:
I need to make query that returns the corresponding value for each code on last month, like the following:
Like I said, I think it is a simple query, but I can't do.
Just use lag():
select t.*, lag(value) over (partition by code order by date) as value_last_month
from t;
In older versions of SQL Server, you can use apply:
select t.*, t2.value as value_last_month
from t outer apply
(select top (1) t2.*
from t t2
where t2.code = t.code and t2.date < t.date
order by date desc
) t2;
I have a table with 2 fields
CREATE TABLE Temp_tab
(
id int identity primary key,
value float
);
INSERT INTO Temp_tab(value)
VALUES (65.09),(17.09);
I want to select all the records that are greater than Avg(Value).
Say... Select * from temp_tab where value > (select avg(value) from temp_tab);
This above query(using subquery) gives me the expected output
1 65.09
I want to achieve this without using Sub Query, CTE and Prodedure, since i am using Spark DB. Spark Db does not support Sub Queries, CTE and Prodedures
You can do this quite painfully with a cross join and aggregation:
Select t1.id, t1.value
from temp_tab t1 cross join
temp_tab t2
group by t1.id, t1.value
having t1.value > avg(t2.value);
As a note: Spark SQL claims to support subqueries (see here). So, your original query should work. If it only supports subqueries in the from clause, then you can do:
Select t.*
from temp_tab t join
(select avg(value) as avgvalue from temp_tab) a
on t.value > a.avgvalue;
spark-sql accept this query under version of 1.6.x
select * from (select * from tenmin_history order by TS_TIME DESC limit 144) a order by TS_TIME
This query solved my problem.
So I have this basic setup:
Declare #temp Table(t1 varchar(1)
,t2 int)
insert into #temp (t1,t2)
Values
('a','1')
,('a','2')
,('a','3')
,('a','4')
,('a',null)
select t1,t2,ROW_NUMBER() OVER ( PARTITION BY T1 ORDER BY t2) 'rnk'
from #temp
The problem is, the value that is Null get ranked the highest. What I am trying to do is set the first non zero/null value to the highest rank(lowest number) current output is:
t1 t2 rnk
a NULL 1
a 0 2
a 1 3
a 2 4
a 3 5
I want
t1 t2 rnk
a NULL 4/5 --either or
a 0 4/5
a 1 1
a 2 2
a 3 3
I know i can do this with subquerys but the problem is to get t2, is a 200 character case statement that i really don't want to copy and paste all over, once to calculate, then one to order by and such. I am seeing a Query to get the values, inside a query to get the rank,inside a query to only pull those ranked 1, which is 3 deep and i don't know like that. note i know it say oracle and i am sure at least one person will mark me down since this is in SQL server, BUT, the actual code is in oracle, i am just much better in SQL server and its easy to translate unless Oracle has some magic function that makes this easier.
You can use two keys for the order by. The following is compatible with both SQL Server and Oracle:
select t1, t2,
ROW_NUMBER() OVER (PARTITION BY T1
ORDER BY (CASE WHEN t2 IS NOT NULL OR T2 <> 0 THEN 0 ELSE 1 END),
t2
) as rnk
from #temp;
Oracle supports NULLS LAST, which makes it easier: ORDER BY t2 NULLS LAST.
Another option would be to use the ISNULL function to set it to the max value of the type on null values. So if t2 is an integer it would be:
select t1,t2,ROW_NUMBER() OVER ( PARTITION BY T1 ORDER BY ISNULL(t2,2147483647)) 'rnk'
from #temp
This would prevent you from having to use t2 (your big case statement) in your expression more than once.
I believe Oracle uses NVL instead of ISNULL.
I have this SQL query for pagination:
SELECT * FROM
(
SELECT T1.*,T2.*, ROW_NUMBER() over(ORDER BY ID DESC) row
FROM
table1 t1
LEFT JOIN
table2 t2 on t1.id = t2.pid
) tbl
WHERE row>= #start and row<#end
Now the problem is that the select result can be thousands of records, that will be executed for each page of each users.
Any suggestion that I can part the select (select less records?)
the ROW_NUMBER could be over order by ID or DATE.
and by the way, selecting * is just for simplicity of the sample code.
If you have SQL Server 2012 or above you can use the Offset and Fetch keywords as stated here
I have a microsoft sql 2005 db table where the entire row is not duplicate, but a column is duplicated.
1 aaa
1 bbb
1 ccc
2 abc
2 def
How can i delete all the rows but 1 that have the first column duplicated?
For clarification I need to get rid of the second, third and fifth rows.
Try the following query in sql server 2005
WITH T AS (SELECT ROW_NUMBER()OVER(PARTITION BY id ORDER BY id) AS rnum,* FROM dbo.Table_1)
DELETE FROM T WHERE rnum>1
Let's call these the id and the Col1 columns.
DELETE myTable T1
WHERE EXISTS
(SELECT * FROM myTable T2
WHERE T2.id = T1.id AND T2.Col1 > T1.Col1)
Edit: As pointed out by Andomar, the above doesn't get rid of exact duplicate cases, where both id and Col1 are the same in different rows.
These can be handled as follow:
(note: whereby the above query is generic SQL, the following applies to MSSQL 2005 and above)
It uses the Common Table Expression (CTE) feature, along with ROW_NUMBER() function to produce a distinctive row value. It is essentially the same construct as the above except that it now works with a "table" (CTEs are mostly like a table) which has a truly distinct identifier key.
Note that by removing "AND T2.Col1 = T1.Col1", we produce a query which can handle both types of duplicates (id-only duplicates and both Id and Col1 duplicates) in a single query, i.e. in a similar fashion that Hamadri's solution (the PARTITION in his/her CTE serves the same purpose as the subquery in this solution, essentially the same amount of work is done). Depending on the situation, it may be preferable, performance-wise or other, to handle the situation in two steps.
WITH T AS
(SELECT ROW_NUMBER() OVER (ORDER BY id, Col1) AS rn, id, Col1 FROM MyTable)
DELETE T AS T1
WHERE EXISTS
(SELECT *
FROM T AS T2
WHERE T2.id = T1.id AND T2.Col1 = T1.Col1
AND T2.rn > T1.rn
)
DELETE tableName as ta
WHERE col2 NOT IN (SELECT MIN(col2) FROM tableName AS t2 GROUP BY col1)
Make sure the sub select returns the rows you want to keep.
Try this.
DELETE FROM <TABLE_NAME_HERE> WHERE <SECOND_COLUMN_NAME_HERE> IN ("bbb","abc","def");
SQL server is not my native SQL database, but maybe something like this? The idea is to get the duplicates and delete the ones with the larger ROW_NUMBER. This should leave only the first one. I dont know if this is what you want or if it will work, but the logic seems sound
DELETE T1
FROM T1 T2
WHERE T1.Col1 = T2.col1
AND T1.ROW_NUMBER() > T2.ROW_NUMBER()
Please feel free to correct me if SQL server cant handle that kind of treatment :)
--Another idea using ROW_NUMBER()
Delete MyTable
Where Id IN
(
Select T.Id FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY UniqueColumn ORDER BY Id) AS RowNumber FROM MyTable
)T
WHERE T.RowNumber > 1
)