Using ORDER BY and LIMIT in an SQL view - sql

I'm trying to create a view that is limited to the last entry per id
My table structure is as follows
CREATE TABLE IF NOT EXISTS `u_tbleeditlog` (
`editID` bigint(20) NOT NULL AUTO_INCREMENT,
`editType` int(1) NOT NULL,
`editTypeID` bigint(20) NOT NULL,
`editedID` bigint(20) NOT NULL,
`editedDtm` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`editID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
And I'm trying to create a view that will only display the last entry assigned to the Type and TypeID
My view so far
CREATE OR REPLACE VIEW vwu_editlog AS
SELECT u_tbleeditlog.*, CONCAT_WS(' ',u_users.user_firstname,u_users.user_lastname) AS editedEditor
FROM u_tbleeditlog
JOIN u_users ON u_users.user_id = u_tbleeditlog.editedID
ORDER BY u_tbleeditlog.editedDtm DESC LIMIT 1
But my problem is that this limits the entire view to just 1 result overall, and I get the message Current selection does not contain a unique column. Grid edit, checkbox, Edit, Copy and Delete features are not available.
So say there are multiple values with 1, 1, 2017-08-16, 1, 1, 2016-05-14 etc it will only return 1, 1, 2017-08-16
Can anyone please tell me if what I'm trying to do is possible, and if so how? :)

Do this with the not exists approach to getting the last row in a series:
CREATE OR REPLACE VIEW vwu_editlog AS
SELECT el.*, CONCAT_WS(' ', u.user_firstname, u.user_lastname) AS editedEditor
FROM u_tbleeditlog el JOIN
u_users u
ON u.user_id = el.editedID
WHERE not exists (select 1
from u_tbleeditlog el2
where el2.editType = el.editType and
el2.editTypeID = el.editTypeID and
el2.editedDtm > el.editedDtm
);

You have to use GROUP BY and HAVING() for that. What database are you using?
It should look something like this:
SELECT editType, editedDtm
FROM u_tbleeditlog AS u
GROUP BY editType, editedDtm
HAVING editedDtm = (SELECT MAX(editedDtm) FROM u_tbleeditlog WHERE editType = u.editType)
ORDER BY editedDtm DESC

Related

Redshift create list and search different table with it

I think there a few ways to tackle this, but I'm not sure how to do any of them.
I have two tables, the first has ID's and Numbers. The ID's and numbers can potentially be listed more than once, so I create a result table that lists the unique numbers grouped by ID.
My second table has rows (100 million) with the ID and Numbers again. I need to search that table for any ID that has a Number not in the list of Numbers from the result table.
Can redshift do a query based on if the ID matches and the Number exists in the list from the table? Can this all be done in memory/one statement?
DROP TABLE IF EXISTS `myTable`;
CREATE TABLE `myTable` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`ID` varchar(255),
`Numbers` mediumint default NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;
INSERT INTO `myTable` (`ID`,`Numbers`)
VALUES
("CRQ44MPX1SZ",1890),
("UHO21QQY3TW",4370),
("JTQ62CBP6ER",1825),
("RFD95MLC2MI",5014),
("URZ04HGG2YQ",2859),
("CRQ44MPX1SZ",1891),
("UHO21QQY3TW",4371),
("JTQ62CBP6ER",1826),
("RFD95MLC2MI",5015),
("URZ04HGG2YQ",2860),
("CRQ44MPX1SZ",1892),
("UHO21QQY3TW",4372),
("JTQ62CBP6ER",1827),
("RFD95MLC2MI",5016),
("URZ04HGG2YQ",2861);
SELECT ID, listagg(distinct Numbers,',') as Number_List, count(Numbers) as Numbers_Count
FROM myTable
GROUP BY ID
AS result
DROP TABLE IF EXISTS `myTable2`;
CREATE TABLE `myTable2` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`ID` varchar(255),
`Numbers` mediumint default NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;
INSERT INTO `myTable2` (`ID`,`Numbers`)
VALUES
("CRQ44MPX1SZ",1870),
("UHO21QQY3TW",4350),
("JTQ62CBP6ER",1825),
("RFD95MLC2MI",5014),
("URZ04HGG2YQ",2859),
("CRQ44MPX1SZ",1891),
("UHO21QQY3TW",4371),
("JTQ62CBP6ER",1826),
("RFD95MLC2MI",5015),
("URZ04HGG2YQ",2860),
("CRQ44MPX1SZ",1882),
("UHO21QQY3TW",4372),
("JTQ62CBP6ER",1827),
("RFD95MLC2MI",5016),
("URZ04HGG2YQ",2861);
Pseudo Code
Select ID, listagg(distinct Numbers) as Violation
Where Numbers IN NOT IN result.Numbers_List
or possibly: WHERE Numbers NOT LIKE '%' || result.Numbers_List|| '%'
Desired Output
(“CRQ44MPX1SZ”, ”1870,1882”)
(“UHO21QQY3TW”, ”4350”)
EDIT
Going the JOIN route, I am not getting the right results...but I'm pretty sure my WHERE implementation is wrong.
SELECT mytable1.ID, listagg(distinct mytable2.Numbers, ',') as unauth_list, count(mytable2.Numbers) as unauth_count
FROM mytable1
LEFT JOIN mytable2 on mytable1.id = mytable2.id
WHERE (mytable1.id = mytable2.id)
AND (mytable1.Numbers <> mytable2.Numbers)
GROUP BY mytable1.id
Expected output:
(“CRQ44MPX1SZ”, ”1870,1882”, 2)
(“UHO21QQY3TW”, ”4350”, 1)
Just left join the two tables on ID and numbers and check for (where clause) to see if the match wasn't found. Shouldn't be a need for listagg() and complex comparing. Or did I miss part of the question?

Why where not exists return exist ids?

my query:
select a.id, a.affiliation
FROM public.affiliation AS a
WHERE NOT EXISTS (
SELECT *
FROM ncbi.affi_known1 AS b
WHERE a.id = b.id
)
limit 5000
it returns:
id
affiliation
4683763
Psychopharmacology Unit, Dorothy Hodgkin Building, University of Bristol, Whitson Street, Bristol, BS1 3NY, UK.
as first row.
but
select * from ncbi.affi_known1 where id = 4683763
do return the data with id = 4683763
both id are int8 type
table a
CREATE TABLE "public"."affiliation" (
"id" int8 NOT NULL,
"affiliation" text COLLATE "pg_catalog"."default",
"tsv_affiliation" tsvector,
CONSTRAINT "affiliation_pkey" PRIMARY KEY ("id")
)
;
CREATE INDEX "affi_idx_tsv" ON "public"."affiliation" USING gin (
to_tsvector('english'::regconfig, affiliation) "pg_catalog"."tsvector_ops"
);
CREATE INDEX "tsv_affiliation_idx" ON "public"."affiliation" USING gin (
"tsv_affiliation" "pg_catalog"."tsvector_ops"
);
table b
CREATE TABLE "ncbi"."affi_known1" (
"id" int8 NOT NULL,
"affi_raw" text COLLATE "pg_catalog"."default",
"affi_main" text COLLATE "pg_catalog"."default",
"affi_known" bool,
"divide" text COLLATE "pg_catalog"."default",
"divide_known" bool,
"sub_divides" text[] COLLATE "pg_catalog"."default",
"country" text COLLATE "pg_catalog"."default",
CONSTRAINT "affi_known_pkey" PRIMARY KEY ("id")
)
;
update:
after create index on id, everything works well.
delete the index, it seems go wrong.
so why primary key id fails there.
update2:
table b is generated from table a, using:
query = '''
select a.id, a.affiliation
FROM public.affiliation AS a
WHERE NOT EXISTS (
SELECT 1
FROM ncbi.affi_known AS b
WHERE a.id = b.id
)
limit 2000000
'''
data = pd.read_sql(query,conn)
while len(data):
for i,row in tqdm(data.iterrows()):
...
curser_insert.execute(
'insert into ncbi.affi_known(id,affi_raw, affi_main ,affi_known,divide,country) values ( %s, %s, %s,%s,%s,%s) ',
[affi_id,affi_raw, affi_main, affi_known,devide,country]
)
conn2.commit()
conn2.commit()
conn.commit()
data = pd.read_sql(query, conn)
and the code exit improperly.
Your understanding of how EXISTS works might be off. Your current exists query is saying that id 4683763 exists in the affiliation table, not the affi_known1 table. So, the following query should return the single record:
SELECT a.id, a.affiliation
FROM public.affiliation a
WHERE a.id = 4683763;
I am assuming the requirement is to fetch rows only when the id is not present in the second table, so you can try this
select a.id, a.affiliation
FROM public.affiliation AS a
WHERE a.id NOT IN (
SELECT id
FROM ncbi.affi_known1
)
If id were an integer, your query would do what you want.
If id is a string, you could have issues with "look-alikes". It is very hard to say what the problem is -- there could be spaces in the id, hidden characters, or something else. And this could be in either table.
Assuming the ids look like numbers, you could filter "bad" ids out using regular expressions:
select id
from ncbi.affi_known1
where not id ~ '^[0-9]*$';

How to show which students are still in school using sql

This table shows the records of students entering and leaving the school. IN represents student entering school and OUT represents student leaving school. I wondering how to show which students are still in school.
I'm trying so much but still cannot figure it out, does anyone can help me, Thank you so much.
DROP TABLE IF EXISTS `student`;
CREATE TABLE `student` (
`id` int(11) NOT NULL auto_increment,
`time` varchar(128) default NULL,
`status` varchar(128) default NULL,
`stu_id` varchar(128) default NULL,
PRIMARY KEY (`id`)
)
INSERT INTO `student` (`id`, `time`, `status`, `stu_id`) VALUES
(1,'11AM','IN','1'),
(2,'11AM','IN','2'),
(3,'12AM','OUT','1'),
(4,'12AM','IN','3'),
(5,'1PM','OUT','3'),
(6,'2PM','IN','3'),
(11,'2PM','IN','4');
I expect the answer is 2, 3, 4
The number of students in the school is the sum of the ins minus the sum of the outs:
select sum(case when status = 'in' then 1
when status = 'out' then -1
else 0
end)
from student;
Basically to see the students who are in the school, you want the students whose last status is in. One way uses a correlated subquery:
select s.stu_id
from student s
where s.time = (select max(s2.time)
from student s2
where s2.stu_id = s.stu_id
) and
s.status = 'in';
If status is either only IN or OUT can't you do
SELECT * from student WHERE status="IN"
here's the query considering the auto increment id
select t2.* from
student t2
left join (select ROW_NUMBER() OVER(PARTITION by stu_id ORDER BY id desc) as row_num, id from student) t1 on t1.id = t2.id
where t1.row_num = 1 and [status] = 'IN'

Ambiguous column name SQL

I get the following error when I want to execute a SQL query:
"Msg 209, Level 16, State 1, Line 9
Ambiguous column name 'i_id'."
This is the SQL query I want to execute:
SELECT DISTINCT x.*
FROM items x LEFT JOIN items y
ON y.i_id = x.i_id
AND x.last_seen < y.last_seen
WHERE x.last_seen > '4-4-2017 10:54:11'
AND x.spot = 'spot773'
AND (x.technology = 'Bluetooth LE' OR x.technology = 'EPC Gen2')
AND y.id IS NULL
GROUP BY i_id
This is how my table looks like:
CREATE TABLE [dbo].[items] (
[id] INT IDENTITY (1, 1) NOT NULL,
[i_id] VARCHAR (100) NOT NULL,
[last_seen] DATETIME2 (0) NOT NULL,
[location] VARCHAR (200) NOT NULL,
[code_hex] VARCHAR (100) NOT NULL,
[technology] VARCHAR (100) NOT NULL,
[url] VARCHAR (100) NOT NULL,
[spot] VARCHAR (200) NOT NULL,
PRIMARY KEY CLUSTERED ([id] ASC));
I've tried a couple of things but I'm not an SQL expert:)
Any help would be appreciated
EDIT:
I do get duplicate rows when I remove the GROUP BY line as you can see:
I'm adding another answer in order to show how you'd typically select the lastest record per group without getting duplicates. You's use ROW_NUMBER for this, marking every last record per i_id with row number 1.
SELECT *
FROM
(
SELECT
i.*,
ROW_NUMBER() over (PARTITION BY i_id ORDER BY last_seen DESC) as rn
FROM items i
WHERE last_seen > '2017-04-04 10:54:11'
AND spot = 'spot773'
AND technology IN ('Bluetooth LE', 'EPC Gen2')
) ranked
WHERE rn = 1;
(You'd use RANK or DENSE_RANK instead of ROW_NUMBER if you wanted duplicates.)
You forgot the table alias in GROUP BY i_id.
Anyway, why are you writing an anti join query where you are trying to get rid of duplicates with both DISTINCT and GROUP BY? Did you have issues with a straight-forward NOT EXISTS query? You are making things way more complicated than they actually are.
SELECT *
FROM items i
WHERE last_seen > '2017-04-04 10:54:11'
AND spot = 'spot773'
AND technology IN ('Bluetooth LE', 'EPC Gen2')
AND NOT EXISTS
(
SELECT *
FROM items other
WHERE i.i_id = other.i_id
AND i.last_seen < other.last_seen
);
(There are other techniques of course to get the last seen record per i_id. This is one; another is to compare with MAX(last_seen); another is to use ROW_NUMBER.)

MySQL Subquery returning incorrect result?

I've got the following MySQL query / subquery:
SELECT id, user_id, another_id, myvalue, created, modified,
(
SELECT id
FROM users_values AS ParentUsersValue
WHERE ParentUsersValue.user_id = UsersValue.user_id
AND ParentUsersValue.another_id = UsersValue.another_id
AND ParentUsersValue.id < UsersValue.id
ORDER BY id DESC
LIMIT 1
) AS old_id
FROM users_values AS UsersValue
WHERE created >= '2009-12-20'
AND created <= '2010-01-21'
AND user_id = 9917
AND another_id = 23
Given the criteria listed, the result for the subquery (old_id) should be null (no matches would be found in my table). Instead of MySQL returning null, it just seems to drop the "WHERE ParentUsersValue.user_id = UsersValue.user_id" clause and pick the first value that matches the other two fields. Is this a MySQL bug, or is this for some reason the expected behavior?
Update:
CREATE TABLE users_values (
id int(11) NOT NULL AUTO_INCREMENT,
user_id int(11) DEFAULT NULL,
another_id int(11) DEFAULT NULL,
myvalue double DEFAULT NULL,
created datetime DEFAULT NULL,
modified datetime DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=2801 DEFAULT CHARSET=latin1
EXPLAIN EXTENDED:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY UsersValue index_merge user_id,another_id user_id,another_id 5,5 NULL 1 100.00 Using intersect(user_id,another_id); Using where
2 DEPENDENT SUBQUERY ParentUsersValue index PRIMARY,user_id,another_id PRIMARY 4 NULL 1 100.00 Using where
EXPLAIN EXTENDED Warning 1003:
select `mydb`.`UsersValue`.`id` AS `id`,`mydb`.`UsersValue`.`user_id` AS `user_id`,`mydb`.`UsersValue`.`another_id` AS `another_id`,`mydb`.`UsersValue`.`myvalue` AS `myvalue`,`mydb`.`UsersValue`.`created` AS `created`,`mydb`.`UsersValue`.`modified` AS `modified`,(select `mydb`.`ParentUsersValue`.`id` AS `id` from `mydb`.`users_values` `ParentUsersValue` where ((`mydb`.`ParentUsersValue`.`user_id` = `mydb`.`UsersValue`.`user_id`) and (`mydb`.`ParentUsersValue`.`another_id` = `mydb`.`UsersValue`.`another_id`) and (`mydb`.`ParentUsersValue`.`id` < `mydb`.`UsersValue`.`id`)) order by `mydb`.`ParentUsersValue`.`id` desc limit 1) AS `old_id` from `mydb`.`users_values` `UsersValue` where ((`mydb`.`UsersValue`.`another_id` = 23) and (`mydb`.`UsersValue`.`user_id` = 9917) and (`mydb`.`UsersValue`.`created` >= '2009-12-20') and (`mydb`.`UsersValue`.`created` <= '2010-01-21'))
This returns correct results (NULL) for me:
CREATE TABLE users_values (id INT NOT NULL PRIMARY KEY, user_id INT NOT NULL, another_id INT NOT NULL, created DATETIME NOT NULL);
INSERT
INTO users_values VALUES (1, 9917, 23, '2010-01-01');
SELECT *,
(
SELECT id
FROM users_values AS ParentUsersValue
WHERE ParentUsersValue.user_id = UsersValue.user_id
AND ParentUsersValue.another_id = UsersValue.another_id
AND ParentUsersValue.id < UsersValue.id
ORDER BY id
DESC
LIMIT 1
) AS old_id
FROM users_values AS UsersValue
WHERE created >= '2009-12-20'
AND created <= '2010-01-21'
AND user_id = 9917
AND another_id = 23
Could you please run this query:
SELECT COUNT(*)
FROM users_values AS UsersValue
WHERE user_id = 9917
AND another_id = 23
and make sure it returns 1?
Note that your subquery does not filter on created, so the subquery can return values out of the range the main query defines.
Update:
This is definitely a bug in MySQL.
Most probably the reason is that the access path chosen for UsersValues is index_intersect.
This selects appropriate ranges from both indexes and build their intersection.
Due to the bug, the dependent subquery is evaluated before the intersection completes, that's why you get the results with the correct another_id but wrong user_id.
Could you please check if the problem persists when you force PRIMARY scan on the UsersValues:
SELECT *,
(
SELECT id
FROM users_values AS ParentUsersValue
WHERE ParentUsersValue.user_id = UsersValue.user_id
AND ParentUsersValue.another_id = UsersValue.another_id
AND ParentUsersValue.id < UsersValue.id
ORDER BY id
DESC
LIMIT 1
) AS old_id
FROM users_values AS UsersValue FORCE INDEX (PRIMARY)
WHERE created >= '2009-12-20'
AND created <= '2010-01-21'
AND user_id = 9917
AND another_id = 23
Also, for this query you should create a composite index on (user_id, another_id, id) rather than two distinct indexes on user_id and another_id.
Create the index and rewrite the query a little:
SELECT *,
(
SELECT id
FROM users_values AS ParentUsersValue
WHERE ParentUsersValue.user_id = UsersValue.user_id
AND ParentUsersValue.another_id = UsersValue.another_id
AND ParentUsersValue.id < UsersValue.id
ORDER BY
user_id DESC, another_id DESC, id DESC
LIMIT 1
) AS old_id
FROM users_values AS UsersValue
WHERE created >= '2009-12-20'
AND created <= '2010-01-21'
AND user_id = 9917
AND another_id = 23
The user_id DESC, another_id DESC clauses are logically redundant, but they will make the index to be used for ordering.
Did you try running the subquery only to see if you are getting the right results? Could you show us the schema for your users_values table?
Also, try replacing your SELECT id in your subquery by SELECT ParentUsersValue.id