Kudu Conditional UPSERT INTO - impala

Does Kudu support conditions on the UPDATE portion of UPSERT INTO?
Can I provide a conditional clause to only update given values based on a comparison between the insert values and destination table?
The actual use case is to update a timestamp column with the latest.
Here's the behavior as I imagine it.
CREATE TABLE my_first_table
(
id INT,
name STRING,
status INT,
PRIMARY KEY(id)
)
PARTITION BY HASH PARTITIONS 4
STORED AS KUDU;
INSERT INTO my_first_table VALUES (1, "lee", 101), (2 "shiv", 102), (3,"bob", 103);
--CONDITION FALSE, UPDATE NOT PERFORMED
UPSERT INTO my_first_table AS t
VALUES (3, "bobby", 100) AS v
WHERE v.status > t.status
+----+------+--------+
| id | name | status |
+----+------+--------+
| 1 | lee | 101 |
| 2 | shiv | 102 |
| 3 | bob | 103 |
+----+------+--------+
--CONDITION TRUE, UPDATE PERFORMED
UPSERT INTO my_first_table AS t
VALUES (3, "bobby", 100) AS v
WHERE v.status < t.status
+----+------+--------+
| id | name | status |
+----+------+--------+
| 1 | lee | 101 |
| 2 | shiv | 102 |
| 3 | bobby| 100 |
+----+------+--------+
In the case where 3 does not exist, it should insert.
Is there an elegant workaround if not?

A solution I found was to use a LEFT JOIN and filter in the SELECT expression. So say we have an table to_upsert identical to the destination table with all our potential upserts...
INSERT INTO to_upsert VALUES (3, "bobby" 100), (5, "newgal", 600);
UPSERT INTO my_first_table
SELECT to_upsert.id, to_upsert.name, to_upsert.status
FROM to_upsert
LEFT JOIN my_first_table ON to_upsert.id = my_first_table.id
WHERE my_first_table.status > to_upsert.status OR my_first_table.id IS NULL;
SELECT * FROM my_first_table;
+----+--------+--------+
| id | name | status |
+----+--------+--------+
| 3 | bobby | 100 |
| 1 | lee | 101 |
| 2 | shiv | 102 |
| 5 | newgal | 600 |
+----+--------+--------+
Thank you for watching this episode of watching me learn sql.

Related

How record with the next number in the order added to the table?

For example table:
+----+------+------+
| id | name | price|
+----+------+------+
| 4 | ABC | 1000 |
| 5 | ABD | 1001 |
+----+------+------+
How insert in table following lines?
+----+------+------+
| 6 | ABF | 1002 |
| 7 | ABG | 1003 |
| 8 | ABH | 1004 |
+----+------+------+
This script does not work correctly:
insert into table
(id, name, price)
select max(id)+1, 'ABF', max(price)+1 from table
union all
select max(id)+1, 'ABG', max(price)+1 from table
union all
select max(id)+1, 'ABH', max(price)+1 from table
We don't know your database, but if id be an auto increment column, then you should not be passing a value for it. Try this version:
INSERT INTO yourTable (name, price)
VALUES
('ABF', 1002),
('ABG', 1003),
('ABH', 1004);

Linear extrapolate values down to 0 from variable starting points

I want to build a query which allows me to flexible linear extrapolate a number down to Age 0 starting from the last known value. The table (see below) has two columns, column Age and Volume. My last known volume is 321.60 at age 11, how can I linear extrapolate the 321.60 down to age 0 in annual steps? Also, I would like to design the query in a way which allows the age to change. For example, in another scenario the last volume is at age 27. I have been experimenting with the lead function, as a result I can extrapolate the volume at age 10 but the function does not allow me to extrapolate down to 0. How can I design a query which (A) allows me to linear extrapolate to age 0 and (B) is flexible and allows different starting points for the linear extrapolation.
SELECT [age],
[volume],
Concat(CASE WHEN volume IS NULL THEN ( Lead(volume, 1, 0) OVER (ORDER BY age) ) / ( age + 1 ) *
age END, volume) AS 'Extrapolate'
FROM tbl_volume
+-----+--------+-------------+
| Age | Volume | Extrapolate |
+-----+--------+-------------+
| 0 | NULL | NULL |
| 1 | NULL | NULL |
| 2 | NULL | NULL |
| 3 | NULL | NULL |
| 4 | NULL | NULL |
| 5 | NULL | NULL |
| 6 | NULL | NULL |
| 7 | NULL | NULL |
| 8 | NULL | NULL |
| 9 | NULL | NULL |
| 10 | NULL | 292.363 |
| 11 | 321.60 | 321.60 |
| 12 | 329.80 | 329.80 |
| 13 | 337.16 | 337.16 |
| 13 | 343.96 | 343.96 |
| 14 | 349.74 | 349.74 |
+-----+--------+-------------+
If I assume that the value is 0 at 0, then you can use simple arithmetic. This seems to work in your case:
select t.*,
coalesce(t.volume, t.age * (t2.volume / t2.age)) as extrapolated_volume
from t cross join
(select top (1) t2.*
from t t2
where t2.volume is not null
order by t2.age asc
) t2;
Here is a db<>fiddle
You can use a windowing function with an empty over() for this kind of thing. As a trivial example:
create table t(j int, k decimal(3,2));
insert t values (1, null), (2, null), (3, 3), (4, 4);
select j, j * avg(k / j) over ()
from t
Note that avg() ignores nulls.

SQL Query to count number of records must match total number of records

I have 2 tables as
Result Master
+------+-------------+
| QnID | Description |
+------+-------------+
| 1 | Qn1 |
| 2 | Qn2 |
| 3 | Qn3 |
| 4 | Qn4 |
| 5 | Qn5 |
+------+-------------+
Result Details
+----+------+--------+--------+
| ID | QnID | TCDesc | Result |
+----+------+--------+--------+
| 1 | 1 | TC1 | PASS |
| 2 | 1 | TC2 | FAIL |
| 3 | 1 | TC3 | PASS |
| 4 | 2 | TC1 | PASS |
| 5 | 3 | TC1 | PASS |
| 6 | 3 | TC1 | PASS |
| 7 | 3 | TC3 | PASS |
+----+------+--------+--------+
I need a query which will return following result:
+----+------+--------+
| ID | QnID | Result |
+----+------+--------+
| 1 | 2 | PASS |
| 2 | 3 | PASS |
| 3 | 4 | ERROR |
| 4 | 5 | ERROR |
+----+------+--------+
Conditions:
each question will have different number of testcase "ResultDetails", I need to select questions for which all the test case get passsed (number of entries for a particular question must be same as number of test cases passed for the same) or Error (ResultDetail doesn't have an entry for a question).
Can anyone please help me with a query, thank you.
You can get the desired results using a common table expression and conditional aggregation.
First, create and populate sample tables (Please save us this step in your future questions):
DECLARE #ResultMaster AS TABLE
(
QnID int,
Description char(3)
);
INSERT INTO #ResultMaster (QnID, Description) VALUES
(1, 'Qn1'),
(2, 'Qn2'),
(3, 'Qn3'),
(4, 'Qn4'),
(5, 'Qn5');
DECLARE #ResultDetails AS TABLE
(
ID int,
QnID int,
TCDesc char(3),
Result char(4)
);
INSERT INTO #ResultDetails VALUES
(1, 1, 'TC1', 'PASS'),
(2, 1, 'TC2', 'FAIL'),
(3, 1, 'TC3', 'PASS'),
(4, 2, 'TC1', 'PASS'),
(5, 3, 'TC1', 'PASS'),
(6, 3, 'TC1', 'PASS'),
(7, 3, 'TC3', 'PASS');
Then, use a common table expression to calculate the number of pass details and a simple count to get the number of total details:
WITH CTE AS
(
SELECT M.QnId,
COUNT(CASE WHEN Result = 'PASS' THEN 1 END) As CountPass,
COUNT(Result) As CountDetails
FROM #ResultMaster As M
LEFT JOIN #ResultDetails As D ON M.QnId = D.QnId
GROUP BY M.QnId
)
Then, select from that cte:
SELECT ROW_NUMBER() OVER(ORDER BY QnId) AS Id,
QnId,
CASE WHEN CountDetails = 0 THEN
'ERROR'
ELSE
'PASS'
END
FROM CTE
WHERE CountPass = CountDetails
Results:
+----+------+--------+
| ID | QnID | Result |
+----+------+--------+
| 1 | 2 | PASS |
| 2 | 3 | PASS |
| 3 | 4 | ERROR |
| 4 | 5 | ERROR |
+----+------+--------+
You can see a live demo on rextester.

How to use joins and sum() in SQL Server query

I have two tables Items and Transactions. In the items table, all the items are listed. In the transactions table it is where a particular employee can request for an item depending on the quantity that he/she requested.
How to use joins to merge the data from two tables that will compute for the balance quantity of each item?
Note: (Quantity Balance = Quantity - TR_Qty)
ITEMS table:
| ID | ITEM | UNIT | QUANTITY | PRICE |
| 1 | Perfume | btl. | 50 | 200.00 |
| 2 | Battery | pc. | 100 | 25.00 |
| 3 | Milk | can | 250 | 70.00 |
| 4 | Soap | pack | 400 | 150.00 |
TRANSACTIONS table:
| ID | ITEM_ID | TR_QTY | REQUSETOR | PROCESSOR | Date |Time |
| 1 | 1 | 20 | A. Jordan | K. Koslav | 12-22-2014 |09:00|
| 2 | 2 | 8 | B. Wilkins | Z. Flores | 12-22-2014 |10:03|
| 3 | 3 | 80 | C. Potran | A. Mabag | 12-26-2014 |14:23|
| 4 | 3 | 45 | D. Korvak | D. Sanchez | 12-28-2014 |15:33|
| 5 | 4 | 22 | C. Carvicci | A. Flux | 12-31-2014 |16:02|
| 6 | 1 | 18 | F. Sansi | N. Mahone | 01-22-2015 |08:45|
| 7 | 4 | 14 | Z. Gorai | M. Sucre | 01-30-2015 |16:33|
| 8 | 2 | 7 | L. ZOnsey | P. Panchito | 02-11-2015 |17:22|
Desired output:
| ID | ITEM | QUANITY BALANCE|
| 1 | Perfume | 462 |
| 2 | Battery | 85 |
| 3 | Milk | 125 |
| 4 |Soap | 364 |
Try this:
DECLARE #Items TABLE(ID INT, Item NVARCHAR(10), Q INT)
DECLARE #Transactions TABLE(ID INT, ItemID INT, TQ INT)
INSERT INTO #Items VALUES
(1, 'Perfume', 500),
(2, 'Battery', 100),
(3, 'Milk', 250),
(4, 'Soap', 400)
INSERT INTO #Transactions VALUES
(1, 1, 20),
(2, 2, 8),
(3, 3, 80),
(4, 3, 45),
(5, 4, 22),
(6, 1, 18),
(7, 4, 14),
(8, 2, 7)
SELECT i.ID, i.Item, MAX(i.Q) - ISNULL(SUM(t.TQ), 0) AS Balance
FROM #Items i
LEFT JOIN #Transactions t ON t.ItemID = i.ID
GROUP BY i.ID, i.Item
ORDER BY i.ID
Output:
ID Item Balance
1 Perfume 462
2 Battery 85
3 Milk 125
4 Soap 364
You can do it for example by using outer apply and creating the sum of quantities in there:
select
I.ID,
I.ITEM,
I.QUANTITY - isnull(T.QUANTITY, 0) as BALANCE
from
ITEMS I
outer apply (
select sum(TR_QTY) as QUANTITY
from TRANSACTIONS T
where T.ITEM_ID = I.ID
) T
SELECT ITEM , ( SELECT (SUM(TRANSACTIONS.TR_QTY)-ITEMS.TR_QTY) FROM TRANSACTIONS WHERE TRANSACTIONS.ITEM_ID = ITEMS.ID ) AS QUANITY BALANCE FROM ITEMS
Field name and table name is as you mentioned in query ( you should change that as space is not valid for field or table name )

Optimal query to fetch a cumulative sum in MySQL

What is 'correct' query to fetch a cumulative sum in MySQL?
I've a table where I keep information about files, one column list contains the size of the files in bytes. (the actual files are kept on disk somewhere)
I would like to get the cumulative file size like this:
+------------+---------+--------+----------------+
| fileInfoId | groupId | size | cumulativeSize |
+------------+---------+--------+----------------+
| 1 | 1 | 522120 | 522120 |
| 2 | 2 | 316042 | 316042 |
| 4 | 2 | 711084 | 1027126 |
| 5 | 2 | 697002 | 1724128 |
| 6 | 2 | 663425 | 2387553 |
| 7 | 2 | 739553 | 3127106 |
| 8 | 2 | 700938 | 3828044 |
| 9 | 2 | 695614 | 4523658 |
| 10 | 2 | 744204 | 5267862 |
| 11 | 2 | 609022 | 5876884 |
| ... | ... | ... | ... |
+------------+---------+--------+----------------+
20000 rows in set (19.2161 sec.)
Right now, I use the following query to get the above results
SELECT
a.fileInfoId
, a.groupId
, a.size
, SUM(b.size) AS cumulativeSize
FROM fileInfo AS a
LEFT JOIN fileInfo AS b USING(groupId)
WHERE a.fileInfoId >= b.fileInfoId
GROUP BY a.fileInfoId
ORDER BY a.groupId, a.fileInfoId
My solution is however, extremely slow. (around 19 seconds without cache).
Explain gives the following execution details
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
| 1 | SIMPLE | a | index | PRIMARY,foreignId | PRIMARY | 4 | NULL | 14905 | |
| 1 | SIMPLE | b | ref | PRIMARY,foreignId | foreignId | 4 | db.a.foreignId | 36 | Using where |
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
My question is:
How can I optimize the above query?
Update
I've updated the question as to provide the table structure and a procedure to fill the table with 20,000 records test data.
CREATE TABLE `fileInfo` (
`fileInfoId` int(10) unsigned NOT NULL AUTO_INCREMENT
, `groupId` int(10) unsigned NOT NULL
, `name` varchar(128) NOT NULL
, `size` int(10) unsigned NOT NULL
, PRIMARY KEY (`fileInfoId`)
, KEY `groupId` (`groupId`)
) ENGINE=InnoDB;
delimiter $$
DROP PROCEDURE IF EXISTS autofill$$
CREATE PROCEDURE autofill()
BEGIN
DECLARE i INT DEFAULT 0;
DECLARE gid INT DEFAULT 0;
DECLARE nam char(20);
DECLARE siz INT DEFAULT 0;
WHILE i < 20000 DO
SET gid = FLOOR(RAND() * 250);
SET nam = CONV(FLOOR(RAND() * 10000000000000), 20, 36);
SET siz = FLOOR((RAND() * 1024 * 1024));
INSERT INTO `fileInfo` (`groupId`, `name`, `size`) VALUES(gid, nam, siz);
SET i = i + 1;
END WHILE;
END;$$
delimiter ;
CALL autofill();
About the possible duplicate question
The question linked by Forgotten Semicolon is not the same question. My question has extra column. because of this extra groupId column, the accepted answer there does not work for my problem. (maybe it can be adapted to work, but I don't know how, hence my question)
You could use a variable - it's far quicker than any join:
SELECT
id,
size,
#total := #total + size AS cumulativeSize,
FROM table, (SELECT #total:=0) AS t;
Here's a quick test case on a Pentium III with 128MB RAM running Debian 5.0:
Create the table:
DROP TABLE IF EXISTS `table1`;
CREATE TABLE `table1` (
`id` int(11) NOT NULL auto_increment,
`size` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
Fill with 20,000 random numbers:
DELIMITER //
DROP PROCEDURE IF EXISTS autofill//
CREATE PROCEDURE autofill()
BEGIN
DECLARE i INT DEFAULT 0;
WHILE i < 20000 DO
INSERT INTO table1 (size) VALUES (FLOOR((RAND() * 1000)));
SET i = i + 1;
END WHILE;
END;
//
DELIMITER ;
CALL autofill();
Check the row count:
SELECT COUNT(*) FROM table1;
+----------+
| COUNT(*) |
+----------+
| 20000 |
+----------+
Run the cumulative total query:
SELECT
id,
size,
#total := #total + size AS cumulativeSize
FROM table1, (SELECT #total:=0) AS t;
+-------+------+----------------+
| id | size | cumulativeSize |
+-------+------+----------------+
| 1 | 226 | 226 |
| 2 | 869 | 1095 |
| 3 | 668 | 1763 |
| 4 | 733 | 2496 |
...
| 19997 | 966 | 10004741 |
| 19998 | 522 | 10005263 |
| 19999 | 713 | 10005976 |
| 20000 | 0 | 10005976 |
+-------+------+----------------+
20000 rows in set (0.07 sec)
UPDATE
I'd missed the grouping by groupId in the original question, and that certainly made things a bit trickier. I then wrote a solution which used a temporary table, but I didn't like it—it was messy and overly complicated. I went away and did some more research, and have come up with something far simpler and faster.
I can't claim all the credit for this—in fact, I can barely claim any at all, as it is just a modified version of Emulate row number from Common MySQL Queries.
It's beautifully simple, elegant, and very quick:
SELECT fileInfoId, groupId, name, size, cumulativeSize
FROM (
SELECT
fileInfoId,
groupId,
name,
size,
#cs := IF(#prev_groupId = groupId, #cs+size, size) AS cumulativeSize,
#prev_groupId := groupId AS prev_groupId
FROM fileInfo, (SELECT #prev_groupId:=0, #cs:=0) AS vars
ORDER BY groupId
) AS tmp;
You can remove the outer SELECT ... AS tmp if you don't mind the prev_groupID column being returned. I found that it ran marginally faster without it.
Here's a simple test case:
INSERT INTO `fileInfo` VALUES
( 1, 3, 'name0', '10'),
( 5, 3, 'name1', '10'),
( 7, 3, 'name2', '10'),
( 8, 1, 'name3', '10'),
( 9, 1, 'name4', '10'),
(10, 2, 'name5', '10'),
(12, 4, 'name6', '10'),
(20, 4, 'name7', '10'),
(21, 4, 'name8', '10'),
(25, 5, 'name9', '10');
SELECT fileInfoId, groupId, name, size, cumulativeSize
FROM (
SELECT
fileInfoId,
groupId,
name,
size,
#cs := IF(#prev_groupId = groupId, #cs+size, size) AS cumulativeSize,
#prev_groupId := groupId AS prev_groupId
FROM fileInfo, (SELECT #prev_groupId := 0, #cs := 0) AS vars
ORDER BY groupId
) AS tmp;
+------------+---------+-------+------+----------------+
| fileInfoId | groupId | name | size | cumulativeSize |
+------------+---------+-------+------+----------------+
| 8 | 1 | name3 | 10 | 10 |
| 9 | 1 | name4 | 10 | 20 |
| 10 | 2 | name5 | 10 | 10 |
| 1 | 3 | name0 | 10 | 10 |
| 5 | 3 | name1 | 10 | 20 |
| 7 | 3 | name2 | 10 | 30 |
| 12 | 4 | name6 | 10 | 10 |
| 20 | 4 | name7 | 10 | 20 |
| 21 | 4 | name8 | 10 | 30 |
| 25 | 5 | name9 | 10 | 10 |
+------------+---------+-------+------+----------------+
Here's a sample of the last few rows from a 20,000 row table:
| 19481 | 248 | 8CSLJX22RCO | 1037469 | 51270389 |
| 19486 | 248 | 1IYGJ1UVCQE | 937150 | 52207539 |
| 19817 | 248 | 3FBU3EUSE1G | 616614 | 52824153 |
| 19871 | 248 | 4N19QB7PYT | 153031 | 52977184 |
| 132 | 249 | 3NP9UGMTRTD | 828073 | 828073 |
| 275 | 249 | 86RJM39K72K | 860323 | 1688396 |
| 802 | 249 | 16Z9XADLBFI | 623030 | 2311426 |
...
| 19661 | 249 | ADZXKQUI0O3 | 837213 | 39856277 |
| 19870 | 249 | 9AVRTI3QK6I | 331342 | 40187619 |
| 19972 | 249 | 1MTAEE3LLEM | 1027714 | 41215333 |
+------------+---------+-------------+---------+----------------+
20000 rows in set (0.31 sec)
I think that MySQL is only using one of the indexes on the table. In this case, it's choosing the index on foreignId.
Add a covering compound index that includes both primaryId and foreignId.