Maintain order when using SQLite WHERE-clause and IN operator

Maintain order when using SQLite WHERE-clause and IN operator - sql

Consider the following tbl:
CREATE TABLE tbl (ID INTEGER, ticker TEXT, desc TEXT);
INSERT INTO tbl (ID, ticker, desc)
VALUES (1, 'GDBR30', '30YR'),
(2, 'GDBR10', '10YR'),
(3, 'GDBR5', '5YR'),
(4, 'GDBR2', '2YR');
For reference, tbl looks like this:
ID ticker desc
1 GDBR30 30YR
2 GDBR10 10YR
3 GDBR5 5YR
4 GDBR2 2YR
When issuing the following statement, the result will be ordered according to ID.
SELECT * FROM tbl
WHERE ticker in ('GDBR10', 'GDBR5', 'GDBR30')
ID ticker desc
1 GDBR30 30YR
2 GDBR10 10YR
3 GDBR5 5YR
However, I need the ordering to adhere to the order of the passed list of values. Here's what I am looking for:
ID ticker desc
2 GDBR10 10YR
3 GDBR5 5YR
1 GDBR30 30YR

You can create a CTE that returns 2 columns: the values that you search for and for each value the sort order and join it to the table.
In the ORDER BY clause use the sort order column to sort the results:
WITH cte(id, ticker) AS (VALUES (1, 'GDBR10'), (2, 'GDBR5'), (3, 'GDBR30'))
SELECT t.*
FROM tbl t INNER JOIN cte c
ON c.ticker = t.ticker
ORDER BY c.id
See the demo.

The only way to be sure of the final ordering of records is to use the ORDER BY clause.
The order the list of values is given is not relevant for final ordering.
In your case your only solution is to give to each value a 'weight' to use as sort order.
You could for example change the IN operator with a INSTR function to get both a filterable and a sortable result.
Try something like that
SELECT *, INSTR(',GDBR10,GDBR5,GDBR30,', ',' || ticker || ',') POS
FROM tbl
WHERE POS>0
ORDER BY POS;
If you don't want the position in the selected fields list you can use a subquery:
SELECT *
FROM (SELECT *, INSTR(',GDBR10,GDBR5,GDBR30,', ',' || ticker || ',') pos FROM tbl) X
WHERE POS>0
ORDER BY POS;

Related

How to select rows corresponding to a randomly selected column value in SQL

My query returns a result like shown in the table. I would like to randomly pick an ID from the ID column and get all the rows having that ID. How can I do that in SnowFlake or SQL:
ID
Postalcode
Value
...
1e3d
NK25F4
3214
...
1e3d
NK25F4
3258
...
1e3d
NK25F4
3354
...
1f74
NG2LK8
5524
1f74
NG2LK8
5548
3e9a
N6B7H4
3694
3e9a
N6B7H4
3325
38e4
N6C7H2
3654
...

There is a Snowflake function to return a fix number of "random" rows SAMPLE, so using that will reduce the need to read all rows.
SELECT t.*
FROM your_table as t
JOIN (SELECT ID FROM your_table SAMPLE (1 ROWS)) as r
ON t.id = r.id
thus using your data above:
with your_table(id, postalcode, value) as (
select * from values
('1e3d', 'NK25F4', 3214),
('1e3d', 'NK25F4', 3258),
('1e3d', 'NK25F4', 3354),
('1f74', 'NG2LK8', 5524),
('1f74', 'NG2LK8', 5548),
('3e9a', 'N6B7H4', 3694),
('3e9a', 'N6B7H4', 3325),
('38e4', 'N6C7H2', 3654)
)
I get (random set) but one looks like:
ID
POSTALCODE
VALUE
1f74
NG2LK8
5,524
1f74
NG2LK8
5,548
You could also use a NATURAL JOIN like:
SELECT *
FROM your_table
NATURAL JOIN (SELECT ID FROM your_table SAMPLE (1 ROWS))

You could put your existing query in a common table expression, then pick a random ID from it, and use it to filter the dataset:
with
dat as ( ... your query ...),
tid as (select id from dat order by random() fetch first 1 row)
select d.*
from dat d
inner join tid t on t.id = d.id
The second CTE, tid picks the random id; it does that by randomly ordering the dataset, then getting the id of the top row.

Something like
SELECT *
FROM Table_NAME
WHERE ID IN (SELECT ID FROM Table_Name ORDER BY RAND() LIMIT 1);
Should work. Though it's not particularly efficient and in many application scenarios it would arguably be more reasonable overall to compute the random ID in your application (e.g. keeping the set of all ids cached, periodically pulling it separately if need be etc).
(Note: The query assumes MYSQL, other variants may have slightly different keywords/structure, e.g. for the random function).

WITH DATA AS (
select '1e3d' id,'NK25F4' postalcode,3214 some_value union all
select '1e3d' id,'NK25F4' postalcode,3258 some_value union all
select '1e3d' id,'NK25F4' postalcode,3354 some_value union all
select '1f74' id,'NG2LK8' postalcode,5524 some_value union all
select '1f74' id,'NG2LK8' postalcode,5548 some_value union all
select '3e9a' id,'N6B7H4' postalcode,3694 some_value union all
select '3e9a' id,'N6B7H4' postalcode,3325 some_value union all
select '38e4' id,'N6C7H2' postalcode,3654 some_value )
SELECT * FROM DATA ,LATERAL (SELECT ID FROM DATA SAMPLE(2 ROWS)) I WHERE I.ID = DATA.ID

You can also play with the window frame a little and let qualify do the work
select *
from your_table
qualify id=first_value(id) over (order by random() rows between unbounded preceding and unbounded following)
Snowflake deviates from ANSI standard on the default window frames for rank-related functions (first_value, last_value, nth_value), so that makes the above equivalent to :
select *
from your_table
qualify id=first_value(id) over (order by random())

Query historized data

To describe my query problem, the following data is helpful:
A single table contains the columns ID (int), VAL (varchar) and ORD (int)
The values of VAL may change over time by which older items identified by ID won't get updated but appended. The last valid item for ID is identified by the highest ORD value (increases over time).
T0, T1 and T2 are points in time where data got entered.
How do I get in an efficient manner to the Result set?
A solution must not involve materialized views etc. but should be expressible in a single SQL-query. Using Postgresql 9.3.

The correct way to select groupwise maximum in postgres is using DISTINCT ON
SELECT DISTINCT ON (id) sysid, id, val, ord
FROM my_table
ORDER BY id,ord DESC;
Fiddle

You want all records for which no newer record exists:
select *
from mytable
where not exists
(
select *
from mytable newer
where newer.id = mytable.id
and newer.ord > mytable.ord
)
order by id;
You can do the same with row numbers. Give the latest entry per ID the number 1 and keep these:
select sysid, id, val, ord
from
(
select
sysid, id, val, ord,
row_number() over (partition by id order by ord desc) as rn
from mytable
)
where rn = 1
order by id;

Left join the table (A) against itself (B) on the condition that B is more recent than A. Pick only the rows where B does not exist (i.e. A is the most recent row).
SELECT last_value.*
FROM my_table AS last_value
LEFT JOIN my_table
ON my_table.id = last_value.id
AND my_table.ord > last_value.ord
WHERE my_table.id IS NULL;
SQL Fiddle

Postgresql : How do I select top n percent(%) entries from each group/category

We are new to postgres, we have following query by which we can select top N records from each category.
create table temp (
gp char,
val int
);
insert into temp values ('A',10);
insert into temp values ('A',8);
insert into temp values ('A',6);
insert into temp values ('A',4);
insert into temp values ('B',3);
insert into temp values ('B',2);
insert into temp values ('B',1);
select a.gp,a.val
from temp a
where a.val in (
select b.val
from temp b
where a.gp=b.gp
order by b.val desc
limit 2);
Output of above query is something like this
gp val
----------
A 10
A 8
B 3
B 2
But our requirement is different, we want to select top n% records from each category where n is not fixed, n is based of some percent of elements in each group.

To retrieve the rows based on the percentage of the number of rows in each group you can use two window functions: one to count the rows and one to give them a unique number.
select gp,
val
from (
select gp,
val,
count(*) over (partition by gp) as cnt,
row_number() over (partition by gp order by val desc) as rn
from temp
) t
where rn / cnt <= 0.75;
SQLFiddle example: http://sqlfiddle.com/#!15/94fdd/1
Btw: using char is almost always a bad idea because it is a fixed-length data type that is padded to the defined length. I hope you only did that for setting up the example and don't use it in your real table.

Referencing the response from a_horse_with_no_name, you can achieve something similar using percent_rank()
SELECT
gp,
val,
pct_rank
FROM (
SELECT
gp,
val,
percent_rank() over (order by val desc) as pct_rank
FROM variables.temp
) t
WHERE pct_rank <= 0.75;
You can then set the final WHERE clause to return data at whatever percent_rank() threshold you require.

The accepted answer did not work for me. I find this solution that works for me:
SELECT * FROM temp ORDER BY val DESC
LIMIT (SELECT (count(*) / 10) AS selnum FROM temp )
It is not optimal (performance) but it works

remove duplicate records with a criteria

I am using a script which requires only unique values. And I have a table which has duplicates like below, i need to keep only unique values (first occurrence) irrespective of what is present inside the brackets.
can I delete the records and keep the unique records using a single query?
Input table
ID Name
1 (Del)testing
2 (Del)test
3 (Delete)testing
4 (Delete)tester
5 (Del)tst
6 (Delete)tst
So the output tables should be something like
Input table
ID Name
1 (Del)testing
2 (Del)test
3 (Delete) tester
4 (Del)tst

SELECT DISTINCT * FROM FOO;
It depends how much data you have to retrieve, if you only have to change Delete -> Del you can try with REPLACE
http://technet.microsoft.com/en-us/library/ms186862.aspx
also grouping functions should help you
I don't think this would be easy query

Assumption: The name column always has all strings in the format given in the sample data.
Try this:
;with cte as
(select *, rank() over
(partition by substring(name, charindex(')',name)+1,len(name)+1 - charindex(')',name))
order by id) rn
from tbl
),
filtered_cte as
(select * from cte
where rn = 1
)
select rank() over (partition by getdate() order by id,getdate()) id , name
from filtered_cte
How this works:
The first CTE cte uses rank() to rank the occurrence of the string outside brackets in the name column.
The second CTE filtered_cte only returns the first row for each occurence of the specified string. In this step, we get the expected results, but not in the desired format.
In this step we partition by and order by the getdate() function. This function is chosen as a dummy to give us continuous values for the id column while using the rank function as we did in step 1.
Demo here.
Note that this solution will return filtered values, but not delete anything in the source table. If you wish, you can delete from the CTE created in step 1 to remove data from the source table.

First use this update to make them uniform
Update table set name = replace(Name, '(Del)' , '(Delete)')
then delete the repetitive names
Delete from table where id in
(Select id from (Select Row_Number() over(Partition by Name order by id) as rn,* from table) x
where rn > 1)

First create the input date table
CREATE TABLE test
(ID int,Name varchar(20));
INSERT INTO test
(`ID`, `Name`)
VALUES
(1, '(Del)testing'),
(2, '(Del)test'),
(3, '(Delete)testing'),
(4, '(Delete)tester'),
(5, '(Del)tst'),
(6, '(Delete)tst');
Select Query
select id, name
from (
select id, name ,
ROW_NUMBER() OVER(PARTITION BY substring(name,PATINDEX('%)%',name)+1,20) ORDER BY name) rn
from test ) t
where rn= 1
order by 1
SQL Fiddle Link
http://www.sqlfiddle.com/#!6/a02b0/34

Row with the highest ID

You have three fields ID, Date and Total. Your table contains multiple rows for the same day which is valid data however for reporting purpose you need to show only one row per day. The row with the highest ID per day should be returned the rest should be hidden from users (not returned).
To better picture the question below is sample data and sample output:
ID, Date, Total
1, 2011-12-22, 50
2, 2011-12-22, 150
The correct result is:
2, 2012-12-22, 150
The correct output is single row for 2011-12-22 date and this row was chosen because it has the highest ID (2>1)

Assuming that you have a database that supports window functions, and that the date column is indeed just date (and not datetime), then something like:
SELECT
* --TODO - Pick columns
FROM
(
SELECT ID,[Date],Total,ROW_NUMBER() OVER (PARTITION BY [Date] ORDER BY ID desc) rn
FROM [Table]
) t
WHERE
rn = 1
Should produce one row per day - and the selected row for any given day is that with the highest ID value.

SELECT *
FROM table
WHERE ID IN ( SELECT MAX(ID)
FROM table
GROUP BY Date )

This will work.
SELECT *
FROM tableName a
INNER JOIN
(
SELECT `DATE`, MAX(ID) maxID
FROM tableName
GROUP BY `DATE`
) b ON a.id = b.MaxID AND
a.`date` = b.`date`
SQLFiddle Demo

Probably
SELECT * FROM your_table ORDER BY ID DESC LIMIT 1

Select MAX(ID),Data,Total from foo
for MySQL

Another simple way is
SELECT TOP 1 * FROM YourTable ORDER BY ID DESC
And, I think this is the most simple way!

SELECT * FROM TABLE_SUM S WHERE S.ID =
(
SELECT MAX(ID) FROM TABLE_SUM
WHERE CDATE = GG.CDATE
GROUP BY CDATE
)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Maintain order when using SQLite WHERE-clause and IN operator - sql

Related

How to select rows corresponding to a randomly selected column value in SQL

Query historized data

Postgresql : How do I select top n percent(%) entries from each group/category

remove duplicate records with a criteria

Row with the highest ID

Categories

Resources