How to get sum of values per id and update existing records in other table - sql

I have two tables like:
ID | TRAFFIC
fd56756 | 4398
645effa | 567899
894fac6 | 611900
894fac6 | 567899
and
USER | ID | TRAFFIC
andrew | fd56756 | 0
peter | 645effa | 0
john | 894fac6 | 0
I need to get SUM ("TRAFFIC") from first table AND set column traffic to the second table where first table ID = second table ID. ID's from first table are not unique, and can be duplicated.
How can I do this?

Table names from your later comment. Chances are, you are reporting table and column names incorrectly.
UPDATE users u
SET "TRAFFIC" = sub.sum_traffic
FROM (
SELECT "ID", sum("TRAFFIC") AS sum_traffic
FROM stats.traffic
GROUP BY 1
) sub
WHERE u."ID" = sub."ID";
Aside: It's unwise to use mixed-case identifiers in Postgres. Use legal, lower-case identifiers, which do not need to be double-quoted, to make your life easier. Start by reading the manual here.

Something like this?
UPDATE users t2 SET t2.traffic = t1.sum_traffic FROM
(SELECT sum(t1.traffic) t1.sum_traffic FROM stats.traffic t1)
WHERE t1.id = t2.id;

Related

Transform Row Values to Column Names

I have a table of customer contacts and their role. Simplified example below.
customer | role | userid
----------------------------
1 | Support | 123
1 | Support | 456
1 | Procurement | 567
...
desired output
customer | Support1 | Support2 | Support3 | Support4 | Procurement1 | Procurement2
-----------------------------------------------------------------------------------
1 | 123 | 456 | null | null | 567 | null
2 | 123 | 456 | 12333 | 45776 | 888 | 56723
So dynamically create number of required columns based on how many user are in that role. It's a small number of roles. Also I can assume max 5 user in that same role. Which means worst case I need to generate 5 columns for each role. The userids don't need to be in any particular order.
My current approach is getting 1 userid per role/customer. Then a second query pulls another id that wasn't part of first results set. And so on. But that way I have to statically create 5 queries. It works. But I was wondering whether there is a more efficient way? Dynamically creating needed columns.
Example of pulling one user per role:
SELECT customer,role,
(SELECT top 1 userid
FROM temp as tmp1
where tmp1.customer=tmp2.customer and tmp1.role=tmp2.role
) as userid
FROM temp as tmp2
group by customer,role
order by customer,role
SQL create with dummy data
create table temp
(
customer int,
role nvarchar(20),
userid int
)
insert into temp values (1,'Support',123)
insert into temp values (1,'Support',456)
insert into temp values (1,'Procurement',567)
insert into temp values (2,'Support',123)
insert into temp values (2,'Support',456)
insert into temp values (2,'Procurement',888)
insert into temp values (2,'Support',12333)
insert into temp values (2,'Support',45776)
insert into temp values (2,'Procurement',56723)
You may need to adapt your approach slightly if you want to avoid getting into the realm of programming user defined table functions (which is what you would need in order to generate columns dynamically). You don't mention which SQL database variant you are using (SQL Server, PostgreSQL, ?). I'm going to make the assumption that it supports some form of string aggregation feature (they pretty much all do), but the syntax for doing this will vary, so you will probably have to adjust the code to your circumstances. You mention that the number of roles is small (5-ish?). The proposed solution is to generate a comma-separated list of user ids, one for each role, using common table expressions (CTEs) and the LISTAGG (variously named STRING_AGG, GROUP_CONCAT, etc. in other databases) function.
WITH tsupport
AS (SELECT customer,
Listagg(userid, ',') AS "Support"
FROM temp
WHERE ROLE = 'Support'
GROUP BY customer),
tprocurement
AS (SELECT customer,
Listagg(userid, ',') AS "Procurement"
FROM temp
WHERE ROLE = 'Procurement'
GROUP BY customer)
--> tnextrole...
--> AS (SELECT ... for additional roles
--> Listagg...
SELECT a.customer,
"Support",
"Procurement"
--> "Next Role" etc.
FROM tsupport a
JOIN tprocurement b
ON a.customer = b.customer
--> JOIN tNextRole ...
Fiddle is here with a result that appears as below based on your dummy data:

update a single column with join lookups

I have a table adjustments with columns adjustable_id | adjustable_type | order_id
order_id is the target column to fill with values, this value should come from another table line_items which has a order_id column.
adjustable_id (int) and _type (varchar) references that table.
table: adjustments
id | adjustable_id | adjustable_type | order_id
------------------------------------------------
100 | 1 | line_item | NULL
101 | 2 | line_item | NULL
table: line_items
id | order_id | other | columns
--------------------------------
1 | 10 | bla | bla
2 | 20 | bla | bla
In the case above I guess I need a join query to update adjustments.order_id first row with value 10, second row with 20 and so on for the other rows using Postgres 9.3+.
In case the lookup fails, I need to delete invalid adjustments rows, for which they have no corresponding line_items.
There are two ways to do this. The first one using a co-related sub-query:
update adjustments a
set order_id = (select lorder_id
from line_items l
where l.id = a.adjustable_id)
where a.adjustable_type = 'line_item';
this is standard ANSI SQL as standard SQL does not define a join condition for the UPDATE statement.
The second way is using a join, which is a Postgres extension to the SQL standard (other DBMS also support that but with different semantics and syntax).
update adjustments a
set order_id = l.order_id
from line_items l
where l.id = a.adjustable_id
and a.adjustable_type = 'line_item';
The join is probably the faster one. Note that both versions (especially the first one) will only work if the join between line_items and adjustments will always return exactly one row from the line_items table. If that is not the case they will fail.
The reason why Arockia's query was "eating your RAM" is that his/her query creates a cross-join between table1 and table1 which is then joined against table2.
The Postgres manual contains a warning about that:
Note that the target table must not appear in the from_list, unless you intend a self-join
update a set A.name=B.name from table1 A join table2 B on
A.id=B.id

Multiple records in a table matched with a column

The architecture of my DB involves records in a Tags table. Each record in the Tags table has a string which is a Name and a foreign kery to the PrimaryID's of records in another Worker table.
Records in the Worker table have tags. Every time we create a Tag for a worker, we add a new row in the Tags table with the inputted Name and foreign key to the worker's PrimaryID. Therefore, we can have multiple Tags with different names per same worker.
Worker Table
ID | Worker Name | Other Information
__________________________________________________________________
1 | Worker1 | ..........................
2 | Worker2 | ..........................
3 | Worker3 | ..........................
4 | Worker4 | ..........................
Tags Table
ID |Foreign Key(WorkerID) | Name
__________________________________________________________________
1 | 1 | foo
2 | 1 | bar
3 | 2 | foo
5 | 3 | foo
6 | 3 | bar
7 | 3 | baz
8 | 1 | qux
My goal is to filter WorkerID's based on an inputted table of strings. I want to get the set of WorkerID's that have the same tags as the inputted ones. For example, if the inputted strings are foo and bar, I would like to return WorkerID's 1 and 3. Any idea how to do this? I was thinking something to do with GROUP BY or JOINING tables. I am new to SQL and can't seem to figure it out.
This is a variant of relational division. Here's one attempt:
select workerid
from tags
where name in ('foo', 'bar')
group by workerid
having count(distinct name) = 2
You can use the following:
select WorkerID
from tags where name in ('foo', 'bar')
group by WorkerID
having count(*) = 2
and this will retrieve your desired result/
Regards.
This article is an excellent resource on the subject.
While the answer from #Lennart works fine in Query Analyzer, you're not going to be able to duplicate that in a stored procedure or from a consuming application without opening yourself up to SQL injection attacks. To extend the solution, you'll want to look into passing your list of tags as a table-valued parameter since SQL doesn't support arrays.
Essentially, you create a custom type in the database that mimics a table with only one column:
CREATE TYPE list_of_tags AS TABLE (t varchar(50) NOT NULL PRIMARY KEY)
Then you populate an instance of that type in memory:
DECLARE #mylist list_of_tags
INSERT #mylist (t) VALUES('foo'),('bar')
Then you can select against that as a join using the GROUP BY/HAVING described in the previous answers:
select workerid
from tags inner join #mylist on tag = t
group by workerid
having count(distinct name) = 2
*Note: I'm not at a computer where I can test the query. If someone sees a flaw in my query, please let me know and I'll happily correct it and thank them.

How can I optimize this query?

I have the following query:
SELECT `masters_tp`.*, `masters_cp`.`cp` as cp, `masters_cp`.`punti` as punti
FROM (`masters_tp`)
LEFT JOIN `masters_cp` ON `masters_cp`.`nickname` = `masters_tp`.`nickname`
WHERE `masters_tp`.`stake` = 'report_A'
AND `masters_cp`.`stake` = 'report_A'
ORDER BY `masters_tp`.`tp` DESC, `masters_cp`.`punti` DESC
LIMIT 400;
Is there something wrong with this query that could affect the server memory?
Here is the output of EXPLAIN
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------------------------------------+
| 1 | SIMPLE | masters_cp | ALL | NULL | NULL | NULL | NULL | 8943 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | masters_tp | ALL | NULL | NULL | NULL | NULL | 12693 | Using where |
Run the same query prefixed with EXPLAIN and add the output to your question - this will show what indexes you are using and the number of rows being analyzed.
You can see from your explain that no indexes are being used, and its having to look at thousands of rows to get your result. Try adding an index on the columns used to perform the join, e.g. nickname and stake:
ALTER TABLE masters_tp ADD INDEX(nickname),ADD INDEX(stake);
ALTER TABLE masters_cp ADD INDEX(nickname),ADD INDEX(stake);
(I've assumed the columns might have duplicated values, if not, use UNIQUE rather than INDEX). See the MySQL manual for more information.
Replace the "masters_tp.* " bit by explicitly naming only the fields from that table you actually need. Even if you need them all, name them all.
There's actually no reason to do a left join here. You're using your filters to whisk away any leftiness of the join. Try this:
SELECT
`masters_tp`.*,
`masters_cp`.`cp` as cp,
`masters_cp`.`punti` as punti
FROM
`masters_tp`
INNER JOIN `masters_cp` ON
`masters_tp`.`stake` = `masters_cp`.stake`
and `masters_tp`.`nickname` = `masters_cp`.`nickname`
WHERE
`masters_tp`.`stake` = 'report_A'
ORDER BY
`masters_tp`.`tp` DESC,
`masters_cp`.`punti` DESC
LIMIT 400;
inner joins tend to be faster than left joins. The query can limit the number of rows that have to be joined using the predicates (aka the where clause). This means that the database is handling, potentially, a lot less rows, which obviously speeds things up.
Additionally, make sure you have a non-clustered index on stake and nickname (in that order).
It is simple query. I think everything is ok with it. You can try add indexes on 'stake' fields or make limit lower.

Remove rows NOT referenced by a foreign key

This is somewhat related to this question:
I have a table with a primary key, and I have several tables that reference that primary key (using foreign keys). I need to remove rows from that table, where the primary key isn't being referenced in any of those other tables (as well as a few other constraints).
For example:
Group
groupid | groupname
1 | 'group 1'
2 | 'group 3'
3 | 'group 2'
... | '...'
Table1
tableid | groupid | data
1 | 3 | ...
... | ... | ...
Table2
tableid | groupid | data
1 | 2 | ...
... | ... | ...
and so on. Some of the rows in Group aren't referenced in any of the tables, and I need to remove those rows. In addition to this, I need to know how to find all of the tables/rows that reference a given row in Group.
I know that I can just query every table and check the groupid's, but since they are foreign keys, I imagine that there is a better way of doing it.
This is using Postgresql 8.3 by the way.
DELETE
FROM group g
WHERE NOT EXISTS
(
SELECT NULL
FROM table1 t1
WHERE t1.groupid = g.groupid
UNION ALL
SELECT NULL
FROM table1 t2
WHERE t2.groupid = g.groupid
UNION ALL
…
)
At the heart of it, SQL servers don't maintain 2-way info for constraints, so your only option is to do what the server would do internally if you were to delete the row: check every other table.
If (and be damn sure first) your constraints are simple checks and don't carry any "on delete cascade" type statements, you can attempt to delete everything from your group table. Any row that does delete would thus have nothing reference it. Otherwise, you're stuck with Quassnoi's answer.