Sorting child elements after their parent element - sql

I'm trying to implement a category table.
A simplified table description is like this
id -- name -- parent_id
assuming a sample data like
id - name - parent_id
1 test1 null
2 test2 null
3 test3 null
4 test4 1
5 test5 4
6 test6 2
7 test7 1
I'm struggling to come up with an sql query that will return the record set in the following order
id - name - parent_id
1 test1 null
4 test4 1
5 test5 4
7 test7 1
2 test2 null
6 test6 2
3 test3 null
Basically the child elements are returned after their parent element.
----------------------- SOLVED BY USING LINQ/recursion in code -------------------------
Not exactly an sql solution, but ultimately it works.

Based on what you are trying to do with the query, you don't need to sort it that way. You just need to ensure that the parents are created first. So run your query sorted by parent ID, put the result into an array and loop over that array. On each iteration do a check to make sure parent exists, if it has a parent. If parent doesn't exist, just move that item to the end of the array and go to the next for now, you should only end up with a few cases that get moved so it remains decently efficient.

What I have always done in the past is split the database up into the following (I'm not the best at SQL though so there may be some other solutions for you).
categories
- category_id | int(11) | Primary Key / Auto_Increment
..
..
sub_categories
- sub_category_id | int(11) | Primary Key / Auto_Increment
- category_id | int(11) | Foreign Key to categories table
..
..

Here is what I would do:
SELECT id, name, parent_id, (CASE WHEN COALESCE(parentid,0)=0 THEN id ELSE (parentid + '.' + id)) as orderid
FROM table
ORDER BY (CASE WHEN COALESCE(parentid,0)=0 THEN id ELSE (parentid + '.' + id))
This should create a new column called orderid that has the parentid dot the id (1.4, 4.5, etc.) For the columns where the parent id is null, it would put just the id. This way you would get the order as 1, 1.4, 4, 4.5, etc.
Please check the code since I wrote this on the fly without testing. It should be close.

The query below works by added an extra order_parent column that contains either the parent id or the row's id, depending on whether it is the parent. It then just sorts primarily by the order_parent id to group them together, then by the parent_id to sort the nulls (actual parents) on top.
Two things:
This has one more column that you originally wanted, so just ignore it.
In case your database returns the nulls of parent_id last, add a DESC.
Good question, by the way!
SELECT id,
name,
parent_id,
(
case
when parent_id is null then id
else parent_id
end
) AS order_parent
FROM myTable
ORDER BY order_parent, parent_id

Related

SQL query for category hierarchy validations

I need to add validation on category creation.
CASE 1: parentId should be valid if supplied
CASE 2: name of sibling could not be duplicated
I have this table categories:
id | parentId | name
-----|-----------|------
1 | NULL | CatA
2 | 1 | CatA.1
(Note: My parent child hierarchy can go up-to nth level)
Now in the above scenario what should not be allowed is:
I cannot provide an invalid parentId
I cannot create a category with name: CatA where parentId = null
I cannot create a category where name: CatA.1 where parentId = 1
Now I am in a nodejs so I need to return these 2 validations errors:
The provided parentId is invalid
Duplicate name detected
Now I want to achieve this using a single optimized SQL query.
I can use if else statements later on the base of query response.
But for me it is really important that I use single query and that query should be as optimized as possible.
What I tried so far is:
SELECT
TOP 1 parentId,
name,
(
CASE
WHEN name = 'CatA.2' THEN 1
ELSE 0
) sortOrder
FROM
catagories
WHERE
parentId = 1
ORDER BY
sortOrder DESC
Now the issue with my query is that it doesn't cover all the scenarios.
Can anyone help me with the query?
The problem with single query is that you have two cases which have different validation needs:
Provided parent_id is null
Provided parent_id is not null
Isn't it easier to write two queries and call the correct one from nodejs?
Query 1: Select rows where parent_id is null and name matches passed name. If it doesn't return any rows then all is OK, otherwise return error. Note that code should be parent_id IS NULL and not parent_id = NULL
Query 2: The query you wrote. If it doesn't return, then parent_id is invalid. If it returns and sortorder = 1 then you have a duplicate, otherwise all is well

How to migrate datas using .sql script?

Im struggling how to migrate datas using .sql script I'm quite new to SQL and trying to figure out how to migrate data's purely on .SQL. I want to add my old data to the new table as a new record with a different structure
Here's my case: I have old two tbl and i want to merge it to my new structured tbl with an additional columns. I'm kinda stuck here since I'm not used in using conditional on .SQL
Prefixes of the tables are schemas
Old table
old.groups
id
group_name
10
Apex
11
Pred
12
Tor
old.sub_groups
parent_id
sub_group
10
sub-apex
11
sub-pred
11
sub-sub-pred
New Table:
Expected Migrated Data
public.new_groups *id is auto incremented
Fresh New populated table
id
group_name
level
parent_id
0
Apex
1
10
1
Pred
1
11
2
Tor
null
null
3
sub-apex
2
10
4
sub-pred
2
11
5
sub-sub-pred
2
11
I want to merge it with conditions. but i can't keep up with SQL queries
Condition 1: If old.groups.id doesn't detect any match on old.sub_groups.parent_id it will be inserted to public.new_groups but the public.new_groups.level and public.new_groups.parent_id will be default to null.
Condition 2: If old.groups.id detects a match on old.sub_groups.parent_id it will be also inserted to public.new_groups then tag the level as 1 (1 means parent group in my structure) but with another new three inserted records which is the sub_groups it detected refer to tbl.new_groups id [3, 4, and 5] and tag the level as 2. and the parent_id will be the parent_id of the old.sub_groups or the id of the parent in old.groups
This is my unfinished Query im only able to call the data its missing out the conditional and the update but i think this is also wrong:
INSERT INTO public.new_groups(
SELECT *, b.sub_group as group_name, b.parent_id FROM old.groups as a
LEFT JOIN old.sub_groups as b ON a.id = b.parent_id....
)
When you created your table like this:
CREATE TABLE new (
id SERIAL PRIMARY KEY ,
group_name VARCHAR(20),
level INTEGER,
parent_id INTEGER
);
You can copy the tables with this statement:
INSERT INTO new(group_name, level, parent_id)
SELECT DISTINCT
group_name,
CASE WHEN subgroups.parent_id IS NULL THEN NULL ELSE 1 END as level,
subgroups.parent_id
FROM old
LEFT JOIN subgroups ON old.id = subgroups.parent_id
UNION ALL
SELECT
sub_group,
2,
parent_id
FROM subgroups;
see: DBFIDDLE
just my id starts with 1, and not with 0.

Recursively duplicating entries

I am attempting to duplicate an entry. That part isn't hard. The tricky part is: there are n entries connected with a foreign key. And for each of those entries, there are n entries connected to that. I did it manually using a lookup to duplicate and cross reference the foreign keys.
Is there some subroutine or method to duplicate an entry and search for and duplicate foreign entries? Perhaps there is a name for this type of replication I haven't stumbled on yet, is there a specific database related title for this type of operation?
PostgreSQL 8.4.13
main entry (uid is serial)
uid | title
-----+-------
1 | stuff
department (departmentid is serial, uidref is foreign key for uid above)
departmentid | uidref | title
--------------+--------+-------
100 | 1 | Foo
101 | 1 | Bar
sub_category of department (textid is serial, departmentref is foreign for departmentid above)
textid | departmentref | title
-------+---------------+----------------
1000 | 100 | Text for Foo 1
1001 | 100 | Text for Foo 2
1002 | 101 | Text for Bar 1
You can do it all in a single statement using data-modifying CTEs (requires Postgres 9.1 or later).
Your primary keys being serial columns makes it easier:
WITH m AS (
INSERT INTO main (<all columns except pk>)
SELECT <all columns except pk>
FROM main
WHERE uid = 1
RETURNING uid AS uidref -- returns new uid
)
, d AS (
INSERT INTO department (<all columns except pk>)
SELECT <all columns except pk>
FROM m
JOIN department d USING (uidref)
RETURNING departmentid AS departmentref -- returns new departmentids
)
INSERT INTO sub_category (<all columns except pk>)
SELECT <all columns except pk>
FROM d
JOIN sub_category s USING (departmentref);
Replace <all columns except pk> with your actual columns. pk is for primary key, like main.uid.
The query returns nothing. You can return pretty much anything. You just didn't specify anything.
You wouldn't call that "replication". That term usually is applied for keeping multiple database instances or objects in sync. You are just duplicating an entry - and depending objects recursively.
Aside about naming conventions:
It would get even simpler with a naming convention that labels all columns signifying "ID of table foo" with the same (descriptive) name, like foo_id. There are other naming conventions floating around, but this is the best for writing queries, IMO.

Auto-incrementing field that depends on another field [duplicate]

This question already has answers here:
Is it possible to use a PG sequence on a per record label?
(4 answers)
Closed 9 years ago.
I have two models, A and B. A has many B. Originally, both A and B had an auto-incrementing primary key field called id, and B had an a_id field. Now I have found myself needing a unique sequence of numbers for each B within an A. I was keeping track of this within my application, but then I thought it might make more sense to let the database take care of it. I thought I could give B a compound key where the first component is a_id and the second component auto-increments, taking into consideration the a_id. So if I insert two records with a_id 1 and one with a_id 2 then I will have something like:
a_id | other_id
1 | 1
1 | 2
2 | 1
If ids with lower numbers are deleted, then the sequence should not recycle these numbers. So if (1, 2) gets deleted:
a_id | other_id
1 | 1
2 | 1
When the next record with a_id 1 is added, the table will look like:
a_id | other_id
1 | 1
2 | 1
1 | 3
How can I do this in SQL? Are there reasons not to do something like this?
I am using in-memory H2 (testing and development) and PostgreSQL 9.3 (production).
The answer to your question is that you would need a trigger to get this functionality. However, you could just create a view that uses the row_number() function:
create view v_table as
select t.*,
row_number() over (partition by a order by id) as seqnum
from table t;
Where I am calling the primary key for the table id.

Update table row with certain id while deleting the recurrent row

I have 2 tables
Table name: Attributes
attribute_id | attribute_name
1 attr_name_1
2 attr_name_2
3 attr_name_1
4 attr_name_2
Table name: Products
product_id | product_name | attribute_id
1 prod_name_1 1
2 prod_name_2 2
3 prod_name_3 3
4 prod_name_4 4
If you can see, attribute_id in the table Products has the following id's (1,2,3,4), instead of (1,2,1,2).
The problem is in the table Attributes, namely, there are repeating values(attribute_names) with different ID, so I want:
To pick One ID of the repeating, from the table Attributes
Update the table Products with that "picked" ID(only in cases that attribute_id has same name in the table Attributes)
And after that, delete the repeating values from the table Attributes witch has no use in the table Products
Output:
Table name: Attributes
attribute_id | attribute_name
1 attr_name_1
2 attr_name_2
Table name: Products
product_id | product_name | attribute_id
1 prod_name_1 1
2 prod_name_2 2
3 prod_name_3 1
4 prod_name_4 2
Demo on SQLFiddle
Note:
it will help me a lot if i use sql instead fixing this issue manually.
update Products
set attribute_id = (
select min(attribute_id)
from Attributes a
where a.attribute_name=(select attribute_name from Attributes a2 where a2.attribute_id=Products.attribute_id)
);
DELETE
FROM Attributes
WHERE attribute_id NOT IN
(
SELECT MIN(attribute_id)
FROM Attributes
GROUP BY attribute_name
);
The following may be faster than #Alexander Sigachov's suggestion, but it does require at least SQL Server 2005 to run it, while Alexander's solution would work on any (reasonable) version of SQL Server. Still, even if only for the sake of providing an alternative, here you go:
WITH Min_IDs AS (
SELECT
attribute_id,
min_attribute_id = MIN(attribute_id) OVER (PARTITION BY attribute_name)
FROM Attributes
)
UPDATE p
SET p.attribute_id = a.min_attribute_id
FROM Products p
JOIN Min_IDs a ON a.attribute_id = p.attribute_id
WHERE a.attribute_id <> a.min_attribute_id
;
DELETE FROM Attributes
WHERE attribute_id NOT IN (
SELECT attribute_id
FROM Products
WHERE attribute_id IS NOT NULL
)
;
The first statement's CTE returns a row set where every attribute_id is mapped to the minimum attribute_id for the same attribute_name. By joining to this mapping set, the UPDATE statement uses it to replace attribute_ids in the Products table.
When subsequently deleting from Attributes, it is enough just to check if Attributes.attribute_id is not found in the Products.attribute_id column, which is what the the second statement does. That is to say, grouping and aggregation, as in the other answer, is not needed at this point.
The WHERE attribute_id IS NOT NULL condition is added to the second query's subquery in case the column is nullable and may indeed contain NULLs. NULLs need to be filtered out in this case, or their presence would result in the NOT IN predicate's evaluation to UNKNOWN, which SQL Server would treat same as FALSE (and so no row would effectively be deleted). If there cannot be NULLs in Products.attribute_id, the condition may be dropped.