Duplicate selected row in Vertica Database - sql

I have been asked to duplicate a specific row in a table. For this I used a simple SQL statment:
insert into xyz_tablename(x,y) select * from xyz_tablename where x = "something";
However, this statement copies all the row present where x = "something" which is like multipling the selected row by 2.
What I want is to control via counter the number of rows to be duplicated. Is there any function/procedure for this in Vertica?
Now what I have done uptil now:
Studied about function (I have understood this, however I cannot use this for this problem).
Studied about procedures (Have studied, but cannot understand how to make that bash file).
Learnt that there are no for-while loop in vertica.
Anyone can help me with this problem? I hope I am clear. Let me know if I am missing something. Thanks in advance.

Try as below :
create table mystore.xyz_tablename(
x varchar (10)
,y INT
)
;
INSERT INTO mystore.xyz_tablename VALUES('abc' ,1);
INSERT INTO mystore.xyz_tablename VALUES('abc' ,2);
INSERT INTO mystore.xyz_tablename VALUES('abc' ,3);
INSERT INTO mystore.xyz_tablename VALUES('abc' ,4);
INSERT INTO mystore.xyz_tablename VALUES('abc' ,5);
INSERT INTO mystore.xyz_tablename VALUES('abc' ,6);
INSERT INTO mystore.xyz_tablename VALUES('abc' ,7);
INSERT INTO mystore.xyz_tablename VALUES('abc' ,8);
select * from mystore.xyz_tablename;
mystore_owner=> select * from mystore.xyz_tablename;
x | y
-----+---
abc | 1
abc | 2
abc | 3
abc | 4
abc | 5
abc | 6
abc | 7
abc | 8
(8 rows)
INSERT INTO mystore.xyz_tablename(x,y)
SELECT a.*
FROM (SELECT * from mystore.xyz_tablename where y = 8 LIMIT 1) a
INNER JOIN (SELECT y FROM mystore.xyz_tablename LIMIT 5) b
ON (1=1)
;
OUTPUT
--------
5
(1 row)
mystore_owner=> select * from mystore.xyz_tablename;
x | y
-----+---
abc | 1
abc | 2
abc | 3
abc | 4
abc | 5
abc | 6
abc | 7
abc | 8
abc | 8
abc | 8
abc | 8
abc | 8
abc | 8
(13 rows)
Let us know if this works for your requirement .
The number of copies can be controlled via the limit clause which is now 5 , you can alter that to your wish . But you can select from other table also which has more rows . If the source table has < than clause then it will fail.

Related

is there a way to preserve order or array when using ANY in postgres query?

I'd like to be able to do a query using ANY that maintains the order of the array passed into the any function. Consider this simple example:
create table stuff (
id serial,
value int
);
insert into stuff (value) values (1), (2), (3), (4), (5);
select * from stuff where value = ANY(ARRAY[1,2,3,4,5]);
select * from stuff where value = ANY(ARRAY[5,4,3,2,1]);
which results in the same order for both queries, even though the arrays had a different order.
----+-------
1 | 1
2 | 2
3 | 3
4 | 4
5 | 5
(5 rows)
id | value
----+-------
1 | 1
2 | 2
3 | 3
4 | 4
5 | 5
(5 rows)
I'd like to have a shorthand way, if possible, to preserve results in order of array inside of the ANY. Is this possible?
So far I've had to write something like this, which feels a bit heavy-handed:
CREATE FUNCTION ordered_any (
ints int[]
) RETURNS int[] as $$
DECLARE
results int[];
i int;
value int;
BEGIN
FOR i IN 1 .. cardinality(ints) LOOP
SELECT f.id FROM stuff f
WHERE f.value = ints[i]
INTO value;
results = array_append(results, value);
END LOOP;
RETURN results;
END;
$$
LANGUAGE 'plpgsql';
select ordered_any(ARRAY[5,4,3,2,1]);
"Any" help is appreciated! No pun intended ;)
select
id,
value,
array_position(array[5,4,3,2,1],id) as ord
from stuff where value=any(array[5,4,3,2,1])
order by ord;
output:
id | value | ord
----+-------+-----
5 | 5 | 1
4 | 4 | 2
3 | 3 | 3
2 | 2 | 4
1 | 1 | 5

Update records in SQL by looking up in different table

I am copying data from few tables in SQL server A to B. I have a set of staging tables in B and need to update some of those staging tables based on updated values in final target table in B.
Example:
Server B:
StagingTable1:
ID | NAME | CITY
1 ABC XYZ
2 BCD XXX
StagingTable2:
ID | AGE | Table1ID(FK)
10 15 1
20 16 2
After Copying StagingTable1 to TargetTable1 (ID's get auto polulated and I get new ID's, now ID 1 becomes 2 and ID 2 becomes 3)
TargetTable1:
ID | NAME | CITY
1 PQR YYY (pre-existing record)
2 ABC XYZ
3 BCD XXX
So now before I can copy the StagingTable2 I need to update the Table1ID column in it by correct values from TargetTable1.
StagingTable2 should become:
ID | AGE | Table1ID(FK)
10 15 2
20 16 3
I am writing a stored procedure for this and not sure how do I lookup and update the records in staging tables?
Assuming that (name, city) tuples are unique in StagingTable1 and TargetTable1, you can use an updatable common table expression to generate the new mapping and assign the corresponding values:
with cte as (
select st2.Table1ID, tt1.id
from StagingTable2 st2
inner join StagingTable1 st1 on st1.ID = st2.Table1ID
inner join TargetTable1 tt1 on tt1.name = st1.name and tt1.city = st1.city
)
update cte set Table1ID = id
Demo on DB Fiddle - content of StagingTable2 after the update:
id | age | Table1ID
-: | --: | -------:
10 | 15 | 2
20 | 16 | 3

SQL - Delete duplicate columns error [duplicate]

This question already has answers here:
How to delete duplicate rows in SQL Server?
(26 answers)
Closed 4 years ago.
I have the following table (TBL_VIDEO) with duplicate column entries in "TIMESTAMP", and I want to remove them only if the "CAMERA" number matches.
BEFORE:
ANALYSIS_ID | TIMESTAMP | EMOTION | CAMERA
-------------------------------------------
1 | 5 | HAPPY | 1
2 | 10 | SAD | 1
3 | 10 | SAD | 1
4 | 5 | HAPPY | 2
5 | 15 | ANGRY | 2
6 | 15 | HAPPY | 2
AFTER:
ANALYSIS_ID | TIMESTAMP | EMOTION | CAMERA
-------------------------------------------
1 | 5 | HAPPY | 1
2 | 10 | SAD | 1
4 | 5 | HAPPY | 2
5 | 15 | ANGRY | 2
I have attempted this statement but the columns wouldn't delete accordingly. I appreciate all the help to produce a correct SQL statement. Thanks in advance!
delete y
from TBL_VIDEO y
where exists (select 1 from TBL_VIDEO y2 where y.TIMESTAMP = y2.TIMESTAMP and y2.CAMERA < y.CAMERA);
CREATE TABLE Table12
([ANALYSIS_ID] int, [TIMESTAMP] int, [EMOTION] varchar(5))
;
INSERT INTO Table12
([ANALYSIS_ID], [TIMESTAMP], [EMOTION])
VALUES
(1, 5, 'HAPPY'),
(2, 10, 'SAD'),
(3, 10, 'SAD'),
(4, 15, 'HAPPY'),
(5, 15, 'ANGRY')
;
with cte as (select *, row_number() over (partition by emotion order by [ANALYSIS_ID] ) as rn from Table12)
delete from cte
where rn>1
select * from Table12
output
ANALYSIS_ID TIMESTAMP EMOTION
1 5 HAPPY
2 10 SAD
5 15 ANGRY
You have two questions:
what is wrong with my code
is there a better way to delete the duplicate column entries
For the second question, it's a dup.
For the first question, please refer https://learn.microsoft.com/en-us/sql/t-sql/statements/delete-transact-sql?view=sql-server-2017. (Press F1 on delete). Correct syntax is
delete y
from Table12 y
where exists (
Generic SQL command as below. you can put you column name/ condition and table name.
DELETE T from
(
SELECT ROW_NUMBER()over(partition by column1 order by column2)a,* FROM TABLENAME
)T
where a>1
delete
from TBL_VIDEO y
where y.CAMERA < (select y2.CAMERA
from TBL_VIDEO y2 where
y.TIMESTAMP = y2.TIMESTAMP );

Why IN operator return distinct selection when passing duplicate value (value1 , value1 ....)

Using SQL Server 2008
Why does the IN operator return distinct values when selecting duplicate values?
Table #temp
x | 1 | 2 | 3
--+------------+-------------+------------
1 | first 1 | first 2 | first 3
2 | Second 1 | second 2 | second 3
When I execute this query
SELECT * FROM #temp WHERE x IN (1,1)
it will return
x | 1 | 2 | 3
--+------------+-------------+------------
1 | first 1 | first 2 | first 3
How can I make it so it returns this instead:
x | 1 | 2 | 3
--+------------+-------------+------------
1 | first 1 | first 2 | first 3
1 | first 1 | first 2 | first 3
What is the alternative of IN in this case?
If you want to return duplicates, then you need to phrase the query as a join. The in is simply testing a condition on each row. Whether the condition is met once or twice doesn't matter -- the row either stays in or gets filtered out.
with xes as (
select 1 as x union all
select 1 as x
)
SELECT *
FROM #temp t join
xes
on t.x = xes.x;
EDIT:
If you have a subquery, then it is even simpler:
select *
from #temp t join
(<subquery>) s
on t.x = s.x
This would be a "normal" use of a join.

Recursive SQL statement (Postgresql) - simplified version

This is simplified question for more complicated one posted here:
Recursive SQL statement (PostgreSQL 9.1.4)
Simplified question
Given you have upper triangular matrix stored in 3 columns (RowIndex, ColumnIndex, MatrixValue):
ColumnIndex
1 2 3 4 5
1 2 2 3 3 4
2 4 4 5 6 X
3 3 2 2 X X
4 2 1 X X X
5 1 X X X X
X values are to be calculated using the following algorithm:
M[i,j] = (M[i-1,j]+M[i,j-1])/2
(i= rows, j = columns, M=matrix)
Example:
M[3,4] = (M[2,4]+M[3,3])/2
M[3,5] = (m[2,5]+M[3,4])/2
The full required result is:
ColumnIndex
1 2 3 4 5
1 2 2 3 3 4
2 4 4 5 6 5
3 3 2 2 4 4.5
4 2 1 1.5 2.75 3.625
5 1 1 1.25 2.00 2.8125
Sample data:
create table matrix_data (
RowIndex integer,
ColumnIndex integer,
MatrixValue numeric);
insert into matrix_data values (1,1,2);
insert into matrix_data values (1,2,2);
insert into matrix_data values (1,3,3);
insert into matrix_data values (1,4,3);
insert into matrix_data values (1,5,4);
insert into matrix_data values (2,1,4);
insert into matrix_data values (2,2,4);
insert into matrix_data values (2,3,5);
insert into matrix_data values (2,4,6);
insert into matrix_data values (3,1,3);
insert into matrix_data values (3,2,2);
insert into matrix_data values (3,3,2);
insert into matrix_data values (4,1,2);
insert into matrix_data values (4,2,1);
insert into matrix_data values (5,1,1);
Can this be done?
Test setup:
CREATE TEMP TABLE matrix (
rowindex integer,
columnindex integer,
matrixvalue numeric);
INSERT INTO matrix VALUES
(1,1,2),(1,2,2),(1,3,3),(1,4,3),(1,5,4)
,(2,1,4),(2,2,4),(2,3,5),(2,4,6)
,(3,1,3),(3,2,2),(3,3,2)
,(4,1,2),(4,2,1)
,(5,1,1);
Run INSERTs in a LOOP with DO:
DO $$
BEGIN
FOR i IN 2 .. 5 LOOP
FOR j IN 7-i .. 5 LOOP
INSERT INTO matrix
VALUES (i,j, (
SELECT sum(matrixvalue)/2
FROM matrix
WHERE (rowindex, columnindex) IN ((i-1, j),(i, j-1))
));
END LOOP;
END LOOP;
END;
$$
See result:
SELECT * FROM matrix order BY 1,2;
This can be done in a single SQL select statement, but only because recursion is not necessary. I'll outline the solution. If you actually want the SQL code, let me know.
First, notice that the only items that contribute to the sums are along the diagonal. Now, if we follow the contribution of the value "4" in (1, 5), it contributes 4/2 to (2,5) and 4/4 to (3,5) and 4/8 to (4,5). Each time, the contribution is cut in half, because (a+b)/2 is (a/2 + b/2).
When we extend this, we start to see a pattern similar to Pascal's triangle. In fact, for any given point in the lower triangular matrix (below where you have values), you can find the diagonal elements that contribute to the value. Extend a vertical line up to hit the diagonal and a horizontal line to hit the diagonal. Those are the contributors from the diagonal row.
How much do they contribute? Well, for that we can go to Pascal's triangle. For the first diagonal below where we have values, the contributions are (1,1)/2. For the second diagonal, (1,2,1)/4. For the third, (1,3,3,1)/8 . . . and so on.
Fortunately, we can calculate the contributions for each value using a formula (the "choose" function from combinatorics). The power of 2 is easy. And, determining how far a given cell is from the diagonal is not too hard.
All of this can be combined into a single Postgres SQL statement. However, #Erwin's solution also works. I only want to put the effort into debugging the statement if his solution doesn't meet your needs.
... and here comes the recursive CTE with multiple embedded CTE's (tm):
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE matrix_data (
yyy integer,
xxx integer,
val numeric);
insert into matrix_data (yyy,xxx,val) values
(1,1,2) , (1,2,2) , (1,3,3) , (1,4,3) , (1,5,4)
, (2,1,4) , (2,2,4) , (2,3,5) , (2,4,6)
, (3,1,3) , (3,2,2) , (3,3,2)
, (4,1,2) , (4,2,1)
, (5,1,1)
;
WITH RECURSIVE rr AS (
WITH xx AS (
SELECT MIN(xxx) AS x0
, MAX(xxx) AS x1
FROM matrix_data
)
, mimax AS (
SELECT generate_series(xx.x0,xx.x1) AS xxx
FROM xx
)
, yy AS (
SELECT MIN(yyy) AS y0
, MAX(yyy) AS y1
FROM matrix_data
)
, mimay AS (
SELECT generate_series(yy.y0,yy.y1) AS yyy
FROM yy
)
, cart AS (
SELECT * FROM mimax mm
JOIN mimay my ON (1=1)
)
, empty AS (
SELECT * FROM cart ca
WHERE NOT EXISTS (
SELECT *
FROM matrix_data nx
WHERE nx.xxx = ca.xxx
AND nx.yyy = ca.yyy
)
)
, hot AS (
SELECT * FROM empty emp
WHERE EXISTS (
SELECT *
FROM matrix_data ex
WHERE ex.xxx = emp.xxx -1
AND ex.yyy = emp.yyy
)
AND EXISTS (
SELECT *
FROM matrix_data ex
WHERE ex.xxx = emp.xxx
AND ex.yyy = emp.yyy -1
)
)
-- UPDATE from here:
SELECT h.xxx,h.yyy, md.val / 2 AS val
FROM hot h
JOIN matrix_data md ON
(md.yyy = h.yyy AND md.xxx = h.xxx-1)
OR (md.yyy = h.yyy-1 AND md.xxx = h.xxx)
UNION ALL
SELECT e.xxx,e.yyy, r.val / 2 AS val
FROM empty e
JOIN rr r ON ( e.xxx = r.xxx+1 AND e.yyy = r.yyy)
OR ( e.xxx = r.xxx AND e.yyy = r.yyy+1 )
)
INSERT INTO matrix_data(yyy,xxx,val)
SELECT DISTINCT yyy,xxx
,SUM(val)
FROM rr
GROUP BY yyy,xxx
;
SELECT * FROM matrix_data
;
New result:
NOTICE: drop cascades to table tmp.matrix_data
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 15
INSERT 0 10
yyy | xxx | val
-----+-----+------------------------
1 | 1 | 2
1 | 2 | 2
1 | 3 | 3
1 | 4 | 3
1 | 5 | 4
2 | 1 | 4
2 | 2 | 4
2 | 3 | 5
2 | 4 | 6
3 | 1 | 3
3 | 2 | 2
3 | 3 | 2
4 | 1 | 2
4 | 2 | 1
5 | 1 | 1
2 | 5 | 5.0000000000000000
5 | 5 | 2.81250000000000000000
4 | 3 | 1.50000000000000000000
3 | 5 | 4.50000000000000000000
5 | 2 | 1.00000000000000000000
3 | 4 | 4.00000000000000000000
5 | 3 | 1.25000000000000000000
4 | 5 | 3.62500000000000000000
4 | 4 | 2.75000000000000000000
5 | 4 | 2.00000000000000000000
(25 rows)
while (select max(ColumnIndex+RowIndex) from matrix_data)<10
begin
insert matrix_data
select c1.RowIndex, c1.ColumnIndex+1, (c1.MatrixValue+c2.MatrixValue)/2
from matrix_data c1
inner join
matrix_data c2
on c1.ColumnIndex+1=c2.ColumnIndex and c1.RowIndex-1 = c2.RowIndex
where c1.RowIndex+c1.ColumnIndex=(select max(RowIndex+ColumnIndex) from matrix_data)
and c1.ColumnIndex<5
end