Table just for grouping - sql

Is it a common case to have a table with a single column for the purpose of grouping rows in another table?
I'm inserting data in batches and I want to have an autoincrement key for each batch to be able to group data based on generated id.
Concretely I want to get from this
A
id, x, y, b_id
id PRIMARY KEY
b_id FOREIGN KEY REFERENCES B.id
B
id, timestamp
id PRIMARY KEY
SELECT count(*) as number, B.timestamp FROM A inner join B on A.b_id=B.id
where A.x='value' and A.y='value'
group by B.id;
to
A
id, x, y, timestamp, b_id
id PRIMARY KEY
b_id FOREIGN KEY REFERENCES B.id
B
id
id PRIMARY KEY
SELECT count(*) as number, A.timestamp FROM A
where A.x='value' and A.y='value'
group by A.b_id, A.timestamp;
So basically move timestamp to B (denormalize) and use foreign key only for grouping. I want to avoid having join only for the timestamp placed in B. Tables are quite big (60M of rows) and join is very slow. If I still filter on A and have foreign key only for grouping then that would speed up things a lot.
Concretely, I'm using MySQL.

Denormalization can be acceptable for performance reasons. Just make sure that the performance improvement outweighs the cost of that denormalization. There will be costs not only in additional space requirements (which can cause their own performance issues), but also the cost of potential data errors. For example, when two rows end up in table "A" that have the same b_id, but different timestamp values.

Related

Sql 2 primary keys remove rows where 1 primary key is duplicated

I ve a Sql table (mariaDB) with 2 Primary keys. I want to remove the rows where the first primary key is duplicated.(yes i know that primary keys cant be duplicated but with 2 Primary keys they work like a touple so that it is possible, but in my case not wanted) example:
id(pk)
name(pk)
smth
smth else
1
a
1234
qwerty
1
b
4567
asdf
and i want to remove the 2nd line cause the id key is duplicated.
tried:
almost any delete query with row count
the query i tried last:
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) AS RN
FROM product_names
)
DELETE FROM CTE WHERE RN<>1
To clarify the definition, you cannot have two primary key in a table. But the primary key of your table is composed of two columns.
To improve your schema, you may want to alter your table so that the primary key is only based on first column. However, depending on the database engine, it can be usefule to keep your composite key. It may speed up query which retrieve the second column only from the primary key. In that case you may want to add a unique clause to the first colume of your primary key.
To cleanup your table you can use that, but beware it doesn't have a filter on the second column, meaning any column with the same id can be deleted depending on its order.
WITH duplicated AS (
SELECT id, name, row_number() OVER (PARTITION BY a) row_number
FROM product_names
ORDER BY name
)
DELETE FROM product_names
WHERE (a, b) IN (SELECT a, b FROM duplicated WHERE row_number > 1);

Reading an SQL record and all descendants

What would be the most efficient way to selectively read records from a SQL table, along with all their descendants - i.e., records in other tables with different schemas but of the same database, referencing the starting table via foreign keys?
For example, the following 4 tables have a primary key called id and records in Table A are referenced from the other 3 tables via the column x_id, where x = B, C, or D and x_id is the foreign key pointing to a record of Table X with primary key id.
Table A: id
Table B: id, a_id
Table C: id, a_id, b_id
Table D: id, a_id, c_id
Assuming that somehow the application knows the "graph" of table foreign key dependencies and thus can enumerate all "children" of a record (i.e., all records from other tables directly referencing it via its id), the data can be retrieved via multiple SELECT queries. That, however, becomes very slow for large tables.
Is there any more efficient way of getting the same information?

Non-unique foreign key Oracle?

I have two tables, data/model is fake for simplicity purposes:
Table A:
Order ID Delivered
1 Y
2 N
3 Y
And
Table B:
Order ID Customer ID
1 123
1 234
1 455
2 789
Order ID is a primary key on Table A, and I want to use it as a Foreign Key on Table B.
Is this acceptable, given that Order ID on Table B is not unique?
Please ignore any normalisation/structural issues, my question is simply whether you can have a non-unique foreign key, I just thought the illustration would help..
Thanks,
Dearg
Is this acceptable, given that Order ID on Table B is not unique?
Yes, absolutely. This is the standard way of modeling a 1:many relationship
You should nevertheless find a primary key for TableB. If a customer cannot be assigned to more than one order, then using (order_id, customer_id) as the PK would make sense.

Inner join performance

I have a table that has a lot of foreign keys that I will need to inner join so I can search. There might be upwards of 10 of them, meaning I'd have to do 10 inner joins. Each of the tables being joined may only be a few rows, compared to the massive (millions of rows) table that I am joining them with.
I just need to know if the joins are a fast way (using only Postgres) to do this, or if there might be a more clever way I can do it using subqueries or something.
Here is some made up data as an example:
create table a (
a_id serial primary key,
name character varying(32)
);
create table b (
b_id serial primary key,
name character varying(32)
);
--just 2 tables for simplicity, but i really need like 10 or more
create table big_table (
big_id serial primary key,
a_id int references a(a_id),
b_id int references b(b_id)
);
--filter big_table based on the name column of a and b
--big_table only contains fks to a and b, so in this example im using
--left joins so i can compare by the name column
select big_id,a.name,b.name from big_table
left join a using (a_id)
left join b using (b_id)
where (? is null or a.name=?) and (? is null or b.name=?);
Basically, joins are a fast way. Which way might be the fastest depends on the exact requirements. A couple of hints:
The purpose of your WHERE clause is unclear. It seems you intend to join to all look-up tables and include a condition for each, while you only actually need some of them. That's inefficient. Rather use dynamic-sql and only include in the query what you actually need.
With the current query, since all of your fk columns in the main table can be NULL, you must use LEFT JOIN instead of JOIN or you will exclude rows with NULL values in the fk columns.
The name columns in the look-up tables should certainly be defined NOT NULL. And I would not use the non-descriptive column name "name", that's an unhelpful naming convention. I also would use text instead of varchar(32).

Constraints and optimal table design

Here's my design as is
I want a constraint that will ensure only (at most) one result of
select ID
from A a, B b
where a.ID = b.PartialKey_Ref_A
and a.PartCandidateB = 'valueA'
and b.PartialKeyB = 'valueB'
Incidentally (perhaps changes the optimal design) I want at most one result from
select ID
from A
where PartCandidateA = 'valueA2'
and PartCandidateB = 'valueB2'
How can I enforce the constraint and optimize the design?
I assume that where you write Key, you mean Unique or Primary Key. And that ID means a surrogate (auto-generated) identifier. With these assumptions, the two tables are in 1:n relationship and you could change them into:
Table A
-------
PartCandidateA
PartCandidateB
ID
PRIMARY KEY (ID)
UNIQUE KEY (PartCandidateA, PartCandidateB) --- or PRIMARY if you drop the ID
--- this is your second constraint
Table B
-------
PartCandidateA
PartCandidateB
PartialKeyB
PRIMARY KEY (PartCandidateB, PartialKeyB) --- or UNIQUE
--- this is your first constraint
FOREIGN KEY (PartCandidateA, PartCandidateB)
REFERENCES A (PartCandidateA, PartCandidateB)
So, your query to find the ID will be written as:
SELECT ID
FROM A a, B b
WHERE a.PartCandidateA = b.PartCandidateA
AND a.PartCandidateB = b.PartCandidateB
AND b.PartCandidateB = 'valueA'
AND b.PartialKeyB = 'valueB'
I think I need to add the PartCandidateB column to Table B.
Then I can add the unique constraint on (PartialKeyB,PartCandidateB).
This will increase the DB by sizeof(PartCandidateB)*rows in TableB.
But the constraint will be enforced:)
I don't think this introduces any problems other than the size increase thing
Thanks to everyone
You can simply create a unique contraint on those two columns by creating a unique index over them:
create unique index ind1 on tablea(PartCandidateA, PartCandidateB);