How record with the next number in the order added to the table? - sql

For example table:
+----+------+------+
| id | name | price|
+----+------+------+
| 4 | ABC | 1000 |
| 5 | ABD | 1001 |
+----+------+------+
How insert in table following lines?
+----+------+------+
| 6 | ABF | 1002 |
| 7 | ABG | 1003 |
| 8 | ABH | 1004 |
+----+------+------+
This script does not work correctly:
insert into table
(id, name, price)
select max(id)+1, 'ABF', max(price)+1 from table
union all
select max(id)+1, 'ABG', max(price)+1 from table
union all
select max(id)+1, 'ABH', max(price)+1 from table

We don't know your database, but if id be an auto increment column, then you should not be passing a value for it. Try this version:
INSERT INTO yourTable (name, price)
VALUES
('ABF', 1002),
('ABG', 1003),
('ABH', 1004);

Related

Kudu Conditional UPSERT INTO

Does Kudu support conditions on the UPDATE portion of UPSERT INTO?
Can I provide a conditional clause to only update given values based on a comparison between the insert values and destination table?
The actual use case is to update a timestamp column with the latest.
Here's the behavior as I imagine it.
CREATE TABLE my_first_table
(
id INT,
name STRING,
status INT,
PRIMARY KEY(id)
)
PARTITION BY HASH PARTITIONS 4
STORED AS KUDU;
INSERT INTO my_first_table VALUES (1, "lee", 101), (2 "shiv", 102), (3,"bob", 103);
--CONDITION FALSE, UPDATE NOT PERFORMED
UPSERT INTO my_first_table AS t
VALUES (3, "bobby", 100) AS v
WHERE v.status > t.status
+----+------+--------+
| id | name | status |
+----+------+--------+
| 1 | lee | 101 |
| 2 | shiv | 102 |
| 3 | bob | 103 |
+----+------+--------+
--CONDITION TRUE, UPDATE PERFORMED
UPSERT INTO my_first_table AS t
VALUES (3, "bobby", 100) AS v
WHERE v.status < t.status
+----+------+--------+
| id | name | status |
+----+------+--------+
| 1 | lee | 101 |
| 2 | shiv | 102 |
| 3 | bobby| 100 |
+----+------+--------+
In the case where 3 does not exist, it should insert.
Is there an elegant workaround if not?
A solution I found was to use a LEFT JOIN and filter in the SELECT expression. So say we have an table to_upsert identical to the destination table with all our potential upserts...
INSERT INTO to_upsert VALUES (3, "bobby" 100), (5, "newgal", 600);
UPSERT INTO my_first_table
SELECT to_upsert.id, to_upsert.name, to_upsert.status
FROM to_upsert
LEFT JOIN my_first_table ON to_upsert.id = my_first_table.id
WHERE my_first_table.status > to_upsert.status OR my_first_table.id IS NULL;
SELECT * FROM my_first_table;
+----+--------+--------+
| id | name | status |
+----+--------+--------+
| 3 | bobby | 100 |
| 1 | lee | 101 |
| 2 | shiv | 102 |
| 5 | newgal | 600 |
+----+--------+--------+
Thank you for watching this episode of watching me learn sql.

How to select timestamp values in PostgreSQL under conditions?

I have a database table 'table1' as follows:
f_key | begin | counts|
1 | 2018-10-04 | 15 |
1 | 2018-10-06 | 20 |
1 | 2018-10-08 | 34 |
1 | 2018-10-09 | 56 |
I have another database table 'table2' as follows:
f_key | p_time | percent|
1 | 2018-10-05 | 80 |
1 | 2018-10-07 | 90 |
1 | 2018-10-08 | 70 |
1 | 2018-10-10 | 60 |
The tables can be joined by the f_key field.
I want to get a combined table as shown below:
If the begin time is earlier than any of the p_time then the p_time value in the combined table would be the same as begin time and the percent value would be 50. (As shown in row 1 in the following table)
If the begin time is later than any of the p_time then the p_time value in the combined table would be the very next available p_time and the percent value would be the corresponding value of the selected p_time.
(As shown in row 2, 3 and 4 in the following table)
row | f_key | begin | counts| p_time | percent|
1 | 1 | 2018-10-04 | 15 | 2018-10-04 | 50 |
2 | 1 | 2018-10-06 | 20 | 2018-10-05 | 80 |
3 | 1 | 2018-10-08 | 34 | 2018-10-07 | 90 |
4 | 1 | 2018-10-09 | 56 | 2018-10-08 | 70 |
You can try to use row_number window function to make row number which is the closest row from table1 by begin.
then use coalesce function to let begin time is earlier than any of the p_time then the p_time value in the combined table would be the same as begin time and the percent value would be 50
PostgreSQL 9.6 Schema Setup:
CREATE TABLE table1(
f_key INT,
begin DATE,
counts INT
);
INSERT INTO table1 VALUES (1,'2018-10-04',15);
INSERT INTO table1 VALUES (1,'2018-10-06',20);
INSERT INTO table1 VALUES (1,'2018-10-08',34);
INSERT INTO table1 VALUES (1,'2018-10-09',56);
CREATE TABLE table2(
f_key INT,
p_time DATE,
percent INT
);
INSERT INTO table2 VALUES (1, '2018-10-05',80);
INSERT INTO table2 VALUES (1, '2018-10-07',90);
INSERT INTO table2 VALUES (1, '2018-10-08',70);
INSERT INTO table2 VALUES (1, '2018-10-10',60);
Query 1:
SELECT ROW_NUMBER() OVER(ORDER BY begin) "row",
t1.f_key,
t1.counts,
coalesce(t1.p_time,t1.begin) p_time,
coalesce(t1.percent,50) percent
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY t1.begin,t1.f_key order by t2.p_time desc) rn,
t2.p_time,
t2.percent,
t1.counts,
t1.f_key,
t1.begin
FROM table1 t1
LEFT JOIN table2 t2 ON t1.f_key = t2.f_key and t1.begin > t2.p_time
)t1
WHERE rn = 1
Results:
| row | f_key | counts | p_time | percent |
|-----|-------|--------|------------|---------|
| 1 | 1 | 15 | 2018-10-04 | 50 |
| 2 | 1 | 20 | 2018-10-05 | 80 |
| 3 | 1 | 34 | 2018-10-07 | 90 |
| 4 | 1 | 56 | 2018-10-08 | 70 |

SQL Query to count number of records must match total number of records

I have 2 tables as
Result Master
+------+-------------+
| QnID | Description |
+------+-------------+
| 1 | Qn1 |
| 2 | Qn2 |
| 3 | Qn3 |
| 4 | Qn4 |
| 5 | Qn5 |
+------+-------------+
Result Details
+----+------+--------+--------+
| ID | QnID | TCDesc | Result |
+----+------+--------+--------+
| 1 | 1 | TC1 | PASS |
| 2 | 1 | TC2 | FAIL |
| 3 | 1 | TC3 | PASS |
| 4 | 2 | TC1 | PASS |
| 5 | 3 | TC1 | PASS |
| 6 | 3 | TC1 | PASS |
| 7 | 3 | TC3 | PASS |
+----+------+--------+--------+
I need a query which will return following result:
+----+------+--------+
| ID | QnID | Result |
+----+------+--------+
| 1 | 2 | PASS |
| 2 | 3 | PASS |
| 3 | 4 | ERROR |
| 4 | 5 | ERROR |
+----+------+--------+
Conditions:
each question will have different number of testcase "ResultDetails", I need to select questions for which all the test case get passsed (number of entries for a particular question must be same as number of test cases passed for the same) or Error (ResultDetail doesn't have an entry for a question).
Can anyone please help me with a query, thank you.
You can get the desired results using a common table expression and conditional aggregation.
First, create and populate sample tables (Please save us this step in your future questions):
DECLARE #ResultMaster AS TABLE
(
QnID int,
Description char(3)
);
INSERT INTO #ResultMaster (QnID, Description) VALUES
(1, 'Qn1'),
(2, 'Qn2'),
(3, 'Qn3'),
(4, 'Qn4'),
(5, 'Qn5');
DECLARE #ResultDetails AS TABLE
(
ID int,
QnID int,
TCDesc char(3),
Result char(4)
);
INSERT INTO #ResultDetails VALUES
(1, 1, 'TC1', 'PASS'),
(2, 1, 'TC2', 'FAIL'),
(3, 1, 'TC3', 'PASS'),
(4, 2, 'TC1', 'PASS'),
(5, 3, 'TC1', 'PASS'),
(6, 3, 'TC1', 'PASS'),
(7, 3, 'TC3', 'PASS');
Then, use a common table expression to calculate the number of pass details and a simple count to get the number of total details:
WITH CTE AS
(
SELECT M.QnId,
COUNT(CASE WHEN Result = 'PASS' THEN 1 END) As CountPass,
COUNT(Result) As CountDetails
FROM #ResultMaster As M
LEFT JOIN #ResultDetails As D ON M.QnId = D.QnId
GROUP BY M.QnId
)
Then, select from that cte:
SELECT ROW_NUMBER() OVER(ORDER BY QnId) AS Id,
QnId,
CASE WHEN CountDetails = 0 THEN
'ERROR'
ELSE
'PASS'
END
FROM CTE
WHERE CountPass = CountDetails
Results:
+----+------+--------+
| ID | QnID | Result |
+----+------+--------+
| 1 | 2 | PASS |
| 2 | 3 | PASS |
| 3 | 4 | ERROR |
| 4 | 5 | ERROR |
+----+------+--------+
You can see a live demo on rextester.

Inserting extra rows within table

I am wanting to add a extra 2 rows to my table for each part number which is present. Currently I have something like this:
+-------------+-----------+---------------+
| item_number | operation | resource_code |
+-------------+-----------+---------------+
| abc | 10 | kit |
| abc | 20 | build |
| abc | 30 | test |
+-------------+-----------+---------------+
There are hundreds of more items set up like this within the table. I am wanting to add 2 extra lines of records to the table based upon each part number. So once these have been added my data set will look like this:
+-------------+-----------+---------------+
| item_number | operation | resource_code |
+-------------+-----------+---------------+
| abc | 10 | kit |
| abc | 20 | build |
| abc | 30 | test |
| abc | NULL | NULL |
| abc | NULL | NULL |
+-------------+-----------+---------------+
I am wanting these new records to be blank for now and add to them later.
I am using access and looking for the sql to add these new records to the table.
Try this on for size:
INSERT INTO my_table
SELECT item_number, NULL AS operation, NULL AS resource_code
FROM my_table
GROUP BY item_number
UNION ALL
SELECT item_number, NULL AS operation, NULL AS resource_code
FROM my_table
GROUP BY item_number

Optimal query to fetch a cumulative sum in MySQL

What is 'correct' query to fetch a cumulative sum in MySQL?
I've a table where I keep information about files, one column list contains the size of the files in bytes. (the actual files are kept on disk somewhere)
I would like to get the cumulative file size like this:
+------------+---------+--------+----------------+
| fileInfoId | groupId | size | cumulativeSize |
+------------+---------+--------+----------------+
| 1 | 1 | 522120 | 522120 |
| 2 | 2 | 316042 | 316042 |
| 4 | 2 | 711084 | 1027126 |
| 5 | 2 | 697002 | 1724128 |
| 6 | 2 | 663425 | 2387553 |
| 7 | 2 | 739553 | 3127106 |
| 8 | 2 | 700938 | 3828044 |
| 9 | 2 | 695614 | 4523658 |
| 10 | 2 | 744204 | 5267862 |
| 11 | 2 | 609022 | 5876884 |
| ... | ... | ... | ... |
+------------+---------+--------+----------------+
20000 rows in set (19.2161 sec.)
Right now, I use the following query to get the above results
SELECT
a.fileInfoId
, a.groupId
, a.size
, SUM(b.size) AS cumulativeSize
FROM fileInfo AS a
LEFT JOIN fileInfo AS b USING(groupId)
WHERE a.fileInfoId >= b.fileInfoId
GROUP BY a.fileInfoId
ORDER BY a.groupId, a.fileInfoId
My solution is however, extremely slow. (around 19 seconds without cache).
Explain gives the following execution details
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
| 1 | SIMPLE | a | index | PRIMARY,foreignId | PRIMARY | 4 | NULL | 14905 | |
| 1 | SIMPLE | b | ref | PRIMARY,foreignId | foreignId | 4 | db.a.foreignId | 36 | Using where |
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
My question is:
How can I optimize the above query?
Update
I've updated the question as to provide the table structure and a procedure to fill the table with 20,000 records test data.
CREATE TABLE `fileInfo` (
`fileInfoId` int(10) unsigned NOT NULL AUTO_INCREMENT
, `groupId` int(10) unsigned NOT NULL
, `name` varchar(128) NOT NULL
, `size` int(10) unsigned NOT NULL
, PRIMARY KEY (`fileInfoId`)
, KEY `groupId` (`groupId`)
) ENGINE=InnoDB;
delimiter $$
DROP PROCEDURE IF EXISTS autofill$$
CREATE PROCEDURE autofill()
BEGIN
DECLARE i INT DEFAULT 0;
DECLARE gid INT DEFAULT 0;
DECLARE nam char(20);
DECLARE siz INT DEFAULT 0;
WHILE i < 20000 DO
SET gid = FLOOR(RAND() * 250);
SET nam = CONV(FLOOR(RAND() * 10000000000000), 20, 36);
SET siz = FLOOR((RAND() * 1024 * 1024));
INSERT INTO `fileInfo` (`groupId`, `name`, `size`) VALUES(gid, nam, siz);
SET i = i + 1;
END WHILE;
END;$$
delimiter ;
CALL autofill();
About the possible duplicate question
The question linked by Forgotten Semicolon is not the same question. My question has extra column. because of this extra groupId column, the accepted answer there does not work for my problem. (maybe it can be adapted to work, but I don't know how, hence my question)
You could use a variable - it's far quicker than any join:
SELECT
id,
size,
#total := #total + size AS cumulativeSize,
FROM table, (SELECT #total:=0) AS t;
Here's a quick test case on a Pentium III with 128MB RAM running Debian 5.0:
Create the table:
DROP TABLE IF EXISTS `table1`;
CREATE TABLE `table1` (
`id` int(11) NOT NULL auto_increment,
`size` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
Fill with 20,000 random numbers:
DELIMITER //
DROP PROCEDURE IF EXISTS autofill//
CREATE PROCEDURE autofill()
BEGIN
DECLARE i INT DEFAULT 0;
WHILE i < 20000 DO
INSERT INTO table1 (size) VALUES (FLOOR((RAND() * 1000)));
SET i = i + 1;
END WHILE;
END;
//
DELIMITER ;
CALL autofill();
Check the row count:
SELECT COUNT(*) FROM table1;
+----------+
| COUNT(*) |
+----------+
| 20000 |
+----------+
Run the cumulative total query:
SELECT
id,
size,
#total := #total + size AS cumulativeSize
FROM table1, (SELECT #total:=0) AS t;
+-------+------+----------------+
| id | size | cumulativeSize |
+-------+------+----------------+
| 1 | 226 | 226 |
| 2 | 869 | 1095 |
| 3 | 668 | 1763 |
| 4 | 733 | 2496 |
...
| 19997 | 966 | 10004741 |
| 19998 | 522 | 10005263 |
| 19999 | 713 | 10005976 |
| 20000 | 0 | 10005976 |
+-------+------+----------------+
20000 rows in set (0.07 sec)
UPDATE
I'd missed the grouping by groupId in the original question, and that certainly made things a bit trickier. I then wrote a solution which used a temporary table, but I didn't like it—it was messy and overly complicated. I went away and did some more research, and have come up with something far simpler and faster.
I can't claim all the credit for this—in fact, I can barely claim any at all, as it is just a modified version of Emulate row number from Common MySQL Queries.
It's beautifully simple, elegant, and very quick:
SELECT fileInfoId, groupId, name, size, cumulativeSize
FROM (
SELECT
fileInfoId,
groupId,
name,
size,
#cs := IF(#prev_groupId = groupId, #cs+size, size) AS cumulativeSize,
#prev_groupId := groupId AS prev_groupId
FROM fileInfo, (SELECT #prev_groupId:=0, #cs:=0) AS vars
ORDER BY groupId
) AS tmp;
You can remove the outer SELECT ... AS tmp if you don't mind the prev_groupID column being returned. I found that it ran marginally faster without it.
Here's a simple test case:
INSERT INTO `fileInfo` VALUES
( 1, 3, 'name0', '10'),
( 5, 3, 'name1', '10'),
( 7, 3, 'name2', '10'),
( 8, 1, 'name3', '10'),
( 9, 1, 'name4', '10'),
(10, 2, 'name5', '10'),
(12, 4, 'name6', '10'),
(20, 4, 'name7', '10'),
(21, 4, 'name8', '10'),
(25, 5, 'name9', '10');
SELECT fileInfoId, groupId, name, size, cumulativeSize
FROM (
SELECT
fileInfoId,
groupId,
name,
size,
#cs := IF(#prev_groupId = groupId, #cs+size, size) AS cumulativeSize,
#prev_groupId := groupId AS prev_groupId
FROM fileInfo, (SELECT #prev_groupId := 0, #cs := 0) AS vars
ORDER BY groupId
) AS tmp;
+------------+---------+-------+------+----------------+
| fileInfoId | groupId | name | size | cumulativeSize |
+------------+---------+-------+------+----------------+
| 8 | 1 | name3 | 10 | 10 |
| 9 | 1 | name4 | 10 | 20 |
| 10 | 2 | name5 | 10 | 10 |
| 1 | 3 | name0 | 10 | 10 |
| 5 | 3 | name1 | 10 | 20 |
| 7 | 3 | name2 | 10 | 30 |
| 12 | 4 | name6 | 10 | 10 |
| 20 | 4 | name7 | 10 | 20 |
| 21 | 4 | name8 | 10 | 30 |
| 25 | 5 | name9 | 10 | 10 |
+------------+---------+-------+------+----------------+
Here's a sample of the last few rows from a 20,000 row table:
| 19481 | 248 | 8CSLJX22RCO | 1037469 | 51270389 |
| 19486 | 248 | 1IYGJ1UVCQE | 937150 | 52207539 |
| 19817 | 248 | 3FBU3EUSE1G | 616614 | 52824153 |
| 19871 | 248 | 4N19QB7PYT | 153031 | 52977184 |
| 132 | 249 | 3NP9UGMTRTD | 828073 | 828073 |
| 275 | 249 | 86RJM39K72K | 860323 | 1688396 |
| 802 | 249 | 16Z9XADLBFI | 623030 | 2311426 |
...
| 19661 | 249 | ADZXKQUI0O3 | 837213 | 39856277 |
| 19870 | 249 | 9AVRTI3QK6I | 331342 | 40187619 |
| 19972 | 249 | 1MTAEE3LLEM | 1027714 | 41215333 |
+------------+---------+-------------+---------+----------------+
20000 rows in set (0.31 sec)
I think that MySQL is only using one of the indexes on the table. In this case, it's choosing the index on foreignId.
Add a covering compound index that includes both primaryId and foreignId.