How to sum columns that have a field in common in Postgresql? - sql

How to calculate total length for all rows that have a certain value?
Let's say there's the following table:
id | unit_id | length | column to be filled with total length
1 | 1 | 10
2 | 1 | 4
3 | 1 | 5
4 | 2 | 3
5 | 3 | 3
6 | 3 | 6
In this case, how to update the table, making all the rows that have unit_id of 1 to have the sum of all the length of rows that have unit_id of 1 (10 + 4 + 5 = 19) then both rows that have a unit_id of 3 to have 9.
I've tried
update test.routes
set total_length = (select sum(length) from test.routes where unit_id = unit_id) where unit_id = unit_id
But what it does is that it just updates the entire table with the same value, how to update the correct sum for each unit_id?

try CTE:
t=# with a as (select *, sum(length) over (partition by unit_id) from routes)
t-# update routes u set total_length = a.sum
t-# from a
t-# where a.id = u.id;
UPDATE 6
Time: 0.520 ms
t=# select * from routes ;
id | unit_id | length | total_length
----+---------+--------+--------------
1 | 1 | 10 | 19
2 | 1 | 4 | 19
3 | 1 | 5 | 19
4 | 2 | 3 | 3
5 | 3 | 3 | 3
6 | 4 | 6 | 6
(6 rows)

You need to qualify the reference to attribute unit_id. Otherwise, a constraint like where unit_id = unit_id is (apart from null-values) always true and will therefore sum up everything:
update test.routes r1 set total_length = (select sum(length) from test.routes r2 where r2.unit_id = r1.unit_id)

This should do the work.
update
routes as s
inner join (
select unit_id, sum(length) as total_length from routes group by unit_id
) as g
set
s.total_length = g.total_length
where
s.unit_id = g.unit_id
Here we are creating a temporary table which has total length for the unit_id. By using the join between 2 tables we can do this bit efficiently then using a subquery

Related

generate serial number in decreasing order given a variable in netezza aginity sql

Is there any SQL syntax using netezza SQL, given column number, trying to generate rows for number in decreasing order down to 0.
Below is an example of what I'm trying to do
BEFORE
ID
NUMBER
A
4
B
5
AFTER
ID
NUMBER
A
4
A
3
A
2
A
1
B
5
B
4
B
3
B
2
B
1
please also click to see screenshot for example thanks
You can use the _v_vector_idx table for this purpose
select
id, idx
from
test join _v_vector_idx
on idx <= number
order
by id asc, idx desc ;
Here's the example in action
select * from test
ID | NUMBER
-------+--------
A | 4
B | 5
(2 rows)
select id, idx from test join _v_vector_idx on
idx <= number order by id asc, idx desc ;
ID | IDX
-------+-----
A | 4
A | 3
A | 2
A | 1
A | 0
B | 5
B | 4
B | 3
B | 2
B | 1
B | 0
(11 rows)
insert into test values ('C', 3);
INSERT 0 1
select * from test;
ID | NUMBER
-------+--------
A | 4
B | 5
C | 3
(3 rows)
select id, idx from test join _v_vector_idx
on idx <= number order by id asc, idx desc ;
ID | IDX
-------+-----
A | 4
A | 3
A | 2
A | 1
A | 0
B | 5
B | 4
B | 3
B | 2
B | 1
B | 0
C | 3
C | 2
C | 1
C | 0
(15 rows)

query to count occurances of aparticular column value

Let's say I have a table with the following value
1
1
1
2
2
2
3
3
3
1
1
1
2
2
2
I need to get an out put like this, which counts each occurances of a
particular value
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
1 1
1 2
1 3
2 1
2 2
2 3
NB: This is a sample table Actual table is a complex table with lots of rows and columns and query contains some more conditions
If the number repeats over different "islands" then you need to calculate a value to maintain those islands first (grpnum). That first step can be undertaken by subtracting a raw top-to-bottom row number (raw_rownum) from a partitioned row number. That result gives each "island" a reference unique to that island that can then be used to partition a subsequent row number. As each order by can disturb the outcome I find it necessary to use individual steps and to pass the prior calculation up so it may be reused.
SQL Fiddle
MS SQL Server 2014 Schema Setup:
CREATE TABLE Table1 ([num] int);
INSERT INTO Table1 ([num])
VALUES (1),(1),(1),(2),(2),(2),(3),(3),(3),(1),(1),(1),(2),(2),(2);
Query 1:
select
num
, row_number() over(partition by (grpnum + num) order by raw_rownum) rn
, grpnum + num island_num
from (
select
num
, raw_rownum - row_number() over(partition by num order by raw_rownum) grpnum
, raw_rownum
from (
select
num
, row_number() over(order by (select null)) as raw_rownum
from table1
) r
) d
;
Results:
| num | rn | island_num |
|-----|----|------------|
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 1 |
| 2 | 1 | 5 |
| 2 | 2 | 5 |
| 2 | 3 | 5 |
| 1 | 1 | 7 |
| 1 | 2 | 7 |
| 1 | 3 | 7 |
| 3 | 1 | 9 |
| 3 | 2 | 9 |
| 3 | 3 | 9 |
| 2 | 1 | 11 |
| 2 | 2 | 11 |
| 2 | 3 | 11 |
SQL Server provide row_number() function :
select ID, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RN FROM <TABLE_NAME>
EDIT :
select * , case when (row_number() over (order by (select 1))) %3 = 0 then 3 else
(row_number() over (order by (select 1))) %3 end [rn] from table
I think there is a problem with your sample, in that you have an implied order but not an explicit one. There is no guarantee that the database will keep and store the values the way you have them listed, so there has to be some inherent/explicit ordering mechanism to tell the database to give those values back exactly the way you listed.
For example, if you did this:
update test
set val = val + 2
where val < 3
You would find your select * no longer comes back the way you expected.
You indicated your actual table was huge, so I assume you have something like this you can use. There should be something in the table to indicate the order you want... a timestamp, perhaps, or maybe a surrogate key.
That said, assuming you have something like that and can leverage it, I believe a series of windowing functions would work.
with rowed as (
select
val,
case
when lag (val, 1, -1) over (order by 1) = val then 0
else 1
end as idx,
row_number() over (order by 1) as rn -- fix this once you have your order
from
test
),
partitioned as (
select
val, rn,
sum (idx) over (order by rn) as instance
from rowed
)
select
val, instance, count (1) over (partition by instance order by rn)
from
partitioned
This example orders by the way they are listed in the database, but you would want to change the row_number function to accommodate whatever your real ordering mechanism is.
1 1 1
1 1 2
1 1 3
2 2 1
2 2 2
2 2 3
3 3 1
3 3 2
3 3 3
1 4 1
1 4 2
1 4 3
2 5 1
2 5 2
2 5 3

Find duplicate combinations

I need a query to find duplicate combinations in these tables:
AttributeValue:
id | name
------------------
1 | green
2 | blue
3 | red
4 | 100x200
5 | 150x200
Product:
id | name
----------------
1 | Produkt A
ProductAttribute:
id | id_product | price
--------------------------
1 | 1 | 100
2 | 1 | 200
3 | 1 | 100
4 | 1 | 200
5 | 1 | 100
6 | 1 | 200
7 | 1 | 100 -- duplicate combination
8 | 1 | 100 -- duplicate combination
ProductAttributeCombinations:
id_product_attribute | id_attribute
-------------------------------------
1 | 1
1 | 4
2 | 1
2 | 5
3 | 2
3 | 4
4 | 2
4 | 5
5 | 3
5 | 4
6 | 3
6 | 5
7 | 1
7 | 4
8 | 1
8 | 5
I need SQL that creates result like:
id_product | duplicate_attributes
----------------------------------
1 | {7,8}
If I understand correct, 7 is a duplicate of 1 and 8 is a duplicate of 2. As phrased, your question is a bit confusing, because 7 and 8 are not related to each other and the only table of interest is ProductAttributeCombinations.
If this is the case, then one method is to use string aggregation
with combos as (
select id_product_attribute,
string_agg(id_attribute::text, ',' order by id_attribute) as combo
from ProductAttributeCombinations pac
group by id_product_attribute
)
select *
from combos c
where exists (select 1
from combos c2
where c2.id_product_attribute > c.id_product_attribute and
c2.combo = c.combo
);
Your question leaves some room for interpretation. Here is my educated guess:
For each product, return an array of all instances with the same set of attributes as any other instance of the same product with smaller ID.
WITH combo AS (
SELECT id_product, id, array_agg(id_attribute) AS attributes
FROM (
SELECT pa.id_product, pa.id, pac.id_attribute
FROM ProductAttribute pa
JOIN PoductAttributeCombinations pac ON pac.id_product_attribute = pa.id
ORDER BY pa.id_product, pa.id, pac.id_attribute
) sub
GROUP BY 1, 2
)
SELECT id_product, array_agg(id) AS duplicate_attributes
FROM combo c
WHERE EXISTS (
SELECT 1
FROM combo
WHERE id_product = c.id_product
AND attributes = c.attributes
AND id < c.id
)
GROUP BY 1;
Sorting can be inlined into the aggregate function so we don't need a subquery for the sort (like #Gordon already provided). This is shorter, but also typically slower:
WITH combo AS (
SELECT pa.id_product, pa.id
, array_agg(pac.id_attribute ORDER BY pac.id_attribute) AS attributes
FROM ProductAttribute pa
JOIN PoductAttributeCombinations pac ON pac.id_product_attribute = pa.id
GROUP BY 1, 2
)
SELECT ...
This only returns products with duplicate instances.
SQL Fiddle.
Your table names are rather misleading / contradict the rest of your question. Your sample data is not very clear either, only featuring a single product. I assume there are many in your table.
It's also unclear whether you are using double-quoted table names preserving CaMeL-case spelling. I assume: no.

Partitions with number of rows oracle

I have a view in a Oracle DB, it looks as follows:
id | type | numrows
----|--------|----------
1 | S | 2
2 | L | 3
3 | S | 2
4 | S | 2
5 | L | 3
6 | S | 2
7 | L | 3
8 | S | 2
9 | L | 3
10 | L | 3
The idea is: if TYPE is 'S' then return 2 rows (randomly), and if TYPE is 'L' then return 3 rows (randomly).
Example:
id | type | numrows
----|--------|----------
1 | S | 2
3 | S | 2
2 | L | 3
5 | L | 3
7 | L | 3
you should tell oracle how to get 3 rows or 2 rows. An ideea is to fabricate a row:
select id, type, numrows
from
(select
id,
type,
numrows,
row_number() over (partition by type order by type) rnk --fabricated
from table)
where
(type = 'S' and rnk <= 2 )
or
(type = 'L' and rnk <= 3 );
You can order by anything you want in that analytic function. For example, you can order by dbms_random.random() for random choices.
If your column numrows is correct and that's the number of rows you want to get then the where clause is simpler:
select id, type, numrows
from
(select
id,
type,
numrows,
row_number() over (partition by type order by dbms_random.random()) rnk --fabricated
from table)
where
rnk <= numrows;

SQLite Optimize Query

I'm having trouble with SQLite query optimization, it runs fine, but for large tables it takes too much time and I need some help with optimizing it.
Source table:
-------+----------+----------------
IdMain | IdParent | ColumnToUpdate
-------+----------+----------------
1 | |
2 | 1 | 999 <-- IdParent = 1 \
3 | | \
4 | 5 | 123 > DISTINCT ITEMS COUNT = 1
5 | | / IdParent = 1
6 | 1 | 999 <-- IdParent = 1 / UPDATE Row with IdMain = IdParent
7 | 4 |
8 | 3 | 456
-------+----------+----------------
Query to optimize
UPDATE Table
SET ColumnToUpdate = (SELECT DISTINCT ColumnToUpdate
FROM Table T
WHERE T.ColumnToUpdate IS NOT NULL
AND T.IdParent = Table.IdMain)
WHERE Table.ColumnToUpdate IS NULL
AND (SELECT COUNT(*) FROM (SELECT DISTINCT ColumnToUpdate
FROM Table T2
WHERE T2.ColumnToUpdate IS NOT NULL
AND T2.IdParent = Table.IdMain)) = 1 ;
Expected table
-------+----------+----------------
IdMain | IdParent | ColumnToUpdate
-------+----------+----------------
1 | | 999 <-- UPDATE
2 | 1 | 999
3 | |
4 | 5 | 123
5 | |
6 | 1 | 999
7 | 4 |
8 | 3 | 456
-------+----------+----------------
Pseudo algorithm
FOR Row DO
BEGIN
IF ColumnToUpdate = NULL THEN
BEGIN
// count distinct values in ColumnToUpdate
X = COUNT(DISTINCT(ColumnToUpdate(WITH IdParent = IdMain))
// update row ONLY when number of distinct count equals = 1
IF X = 1 THEN
UPDATE(ColumnToUpdate)
END
END
I've tried to split it up within the source code (currently Delphi) but it works slow too. Is there any way to speed it up?
I wonder if this might speed things up:
UPDATE Table
SET ColumnToUpdate = coalesce((SELECT ColumnToUpdate
FROM Table T
WHERE T.ColumnToUpdate IS NOT NULL AND
T.IdParent = Table.IdMain
GROUP BY ColumnToUpdate
HAVING count(*) = 1),
Table.ColumnToUpdate
)
WHERE Table.ColumnToUpdate IS NULL;
This only executes the subquery once instead of twice.
Also, an index on Table(IdParent, ColumnToUpdate) might also improve performance.