Inserting data to multiple tables Postgres - sql

I currently have a MongoDB database with the following schema:
Image: { name: String, src: String, category: String, tags: [String] }
I'd like to migrate this to Postgres and for that I'd have 4 tables
image (id, src, name, category_id)
tag (id, name)
image_tag (image_id, tag_id)
category (id, name)
There might be new tags on every image inserts so when using CTE I need to select all the tags (and only insert new tags if they don't exist). I was thinking about using a cache (redis) to store the already inserted tags (so I don't need to select them from the db).
So my question is should I go with CTE with insert into tags.. where not exists statements or CTE + redis and only inserting tags when it could not be found in the cache?

So here is the small statement to insert an image with a category and multiple tags into multiple tables of a postgres database. The following expression assumes that the name in the tables category and tag has an unique constraint defined. For completion I also created an statement without that constraint (see the examples section).
Postgres statement
WITH image_values(image_name, src, category) AS (
VALUES
('Goldkraut', 'goldkraut.jpg', 'logo')
),
tag_values(tag_name) AS (
VALUES
('music'), ('band')
),
category_select AS (
SELECT id, name FROM category
WHERE name IN (SELECT category FROM image_values)
),
category_insert AS (
INSERT INTO category(name)
SELECT category FROM image_values
ON CONFLICT (name) DO NOTHING
RETURNING id, name
),
category_created AS (
SELECT id, name FROM category_select
UNION ALL
SELECT id, name FROM category_insert
),
tag_select AS (
SELECT id, name FROM tag
WHERE name IN (SELECT tag_name FROM tag_values)
),
tag_insert AS (
INSERT INTO tag(name)
SELECT tag_name FROM tag_values
ON CONFLICT (name) DO NOTHING
RETURNING id, name
),
tag_created AS (
SELECT id, name FROM tag_select
UNION ALL
SELECT id, name FROM tag_insert
),
image_insert AS (
INSERT INTO image(src, name, category_id)
SELECT src, image_name, category_created.id
FROM image_values
LEFT JOIN category_created ON(image_values.category=category_created.name)
RETURNING id, src, name, category_id
),
image_tag_insert AS (
INSERT INTO image_tag(image_id, tag_id)
SELECT image_insert.id, tag_created.id FROM image_insert
CROSS JOIN tag_created
RETURNING image_id, tag_id
)
SELECT image_insert.*, category_created.name as category_name, image_tag_insert.*, tag_created.name as "tag.name"
FROM image_tag_insert
LEFT JOIN image_insert ON (image_id = image_insert.id)
LEFT JOIN category_created ON (category_created.id = image_insert.category_id)
LEFT JOIN tag_created ON (tag_created.id = tag_id)
Explanation to the statement
In the first common table expression (CTE) image_values you will define all values for an image that has in a 1:1 relation. In the next expression tag_values all tag names for that image are defined.
Now lets start with the categories. To know if a category with the name already exist, you query for an category entry in category_select. In the expression category_insert you will create an new entry for the category if not already exits (instead of querying again from the database we use the cte category_select to find out if we already have an category with this name). To store the category id in the image table we need the category entry whether the existing (from category_select) or the inserted (from category_insert) so we union this two expressions in category_created.
Now we use the same pattern for the tags. Query for existing tags tag_select, insert tags if not exist tag_insert and union this entries in tag_created.
At next we insert the image in image_insert. Therefore we select the values from the expression image_values and join the expression category_created to get the id of the category. To insert the the relation image to tag we will need the id of the inserted image so we will return this value. The other return values are not really necessary but we will use them to get a nicer result set in the final query.
Now we have the primary key of the inserted image and we can store the associations of the image to the tags. In the expression image_tag_insert we select the id of the inserted image and cross join this with every tag id we selected or inserted.
For the final statement it will be enough to just do SELECT * FROM image_tag_insert to execute all the expression. But for an overview what was stored in the database i joined all the relations. So the result will look like this:
Joined result
| id | src | name | category_id | category_name | image_id | tag_id | tag.name |
|----|---------------|-----------|-------------|---------------|----------|--------|----------|
| 1 | goldkraut.jpg | Goldkraut | 2 | logo | 1 | 3 | band |
| 1 | goldkraut.jpg | Goldkraut | 2 | logo | 1 | 1 | music |
Example
On this sqlfiddle you will see the given query in action. In another sqlfiddle i have add some extras to the last statement to format all inserted tags as a list. If you have not add a unique constrain to the name column in the tables tag and category you can use this example

Related

Combine multiple rows with different column values into a single one

I'm trying to create a single row starting from multiple ones and combining them based on different column values; here is the result i reached based on the following query:
select distinct ID, case info when 'name' then value end as 'NAME', case info when 'id' then value end as 'serial'
FROM TABLENAME t
WHERE info = 'name' or info = 'id'
Howerver the expected result should be something along the lines of
I tried with group by clauses but that doesn't seem to work.
The RDBMS is Microsoft SQL Server.
Thanks
SELECT X.ID,MAX(X.NAME)NAME,MAX(X.SERIAL)AS SERIAL FROM
(
SELECT 100 AS ID, NULL AS NAME, '24B6-97F3'AS SERIAL UNION ALL
SELECT 100,'A',NULL UNION ALL
SELECT 200,NULL,'8113-B600'UNION ALL
SELECT 200,'B',NULL
)X
GROUP BY X.ID
For me GROUP BY works
A simple PIVOT operator can achieve this for dynamic results:
SELECT *
FROM
(
SELECT id AS id_column, info, value
FROM tablename
) src
PIVOT
(
MAX(value) FOR info IN ([name], [id])
) piv
ORDER BY id ASC;
Result:
| id_column | name | id |
|-----------|------|------------|
| 100 | a | 24b6-97f3 |
| 200 | b | 8113-b600 |
Fiddle here.
I'm a fan of a self join for things like this
SELECT tName.ID, tName.Value AS Name, tSerial.Value AS Serial
FROM TableName AS tName
INNER JOIN TableName AS tSerial ON tSerial.ID = tName.ID AND tSerial.Info = 'Serial'
WHERE tName.Info = 'Name'
This initially selects only the Name rows, then self joins on the same IDs and now filter to the Serial rows. You may want to change the INNER JOIN to a LEFT JOIN if not everything has a Name and Serial and you want to know which Names don't have a Serial

Adding a LEFT JOIN on a INSERT INTO....RETURNING

My query Inserts a value and returns the new row inserted
INSERT INTO
event_comments(date_posted, e_id, created_by, parent_id, body, num_likes, thread_id)
VALUES(1575770277, 1, '9e028aaa-d265-4e27-9528-30858ed8c13d', 9, 'December 7th', 0, 'zRfs2I')
RETURNING comment_id, date_posted, e_id, created_by, parent_id, body, num_likes, thread_id
I want to join the created_by with the user_id from my user's table.
SELECT * from users WHERE user_id = created_by
Is it possible to join that new returning row with another table row?
Consider using a WITH structure to pass the data from the insert to a query that can then be joined.
Example:
-- Setup some initial tables
create table colors (
id SERIAL primary key,
color VARCHAR UNIQUE
);
create table animals (
id SERIAL primary key,
a_id INTEGER references colors(id),
animal VARCHAR UNIQUE
);
-- provide some initial data in colors
insert into colors (color) values ('red'), ('green'), ('blue');
-- Store returned data in inserted_animal for use in next query
with inserted_animal as (
-- Insert a new record into animals
insert into animals (a_id, animal) values (3, 'fish') returning *
) select * from inserted_animal
left join colors on inserted_animal.a_id = colors.id;
-- Output
-- id | a_id | animal | id | color
-- 1 | 3 | fish | 3 | blue
Explanation:
A WITH query allows a record returned from an initial query, including data returned from a RETURNING clause, which is stored in a temporary table that can be accessed in the expression that follows it to continue work on it, including using a JOIN expression.
You were right, I misunderstood
This should do it:
DECLARE mycreated_by event_comments.created_by%TYPE;
INSERT INTO
event_comments(date_posted, e_id, created_by, parent_id, body, num_likes, thread_id)
VALUES(1575770277, 1, '9e028aaa-d265-4e27-9528-30858ed8c13d', 9, 'December 7th', 0, 'zRfs2I')
RETURNING created_by into mycreated_by
SELECT * from users WHERE user_id = mycreated_by

Find parent based on child table keywords

I'm trying to get the parents based on matching keywords using a child table:
AssetKeyword
============
AssetID (int)
KeywordID (int)
I'm trying to find the Assets having entries in the table, for example keywords 3 and 4 and 5.
I've tried subqueries and aggregates but can't get my head around it. Thankful for any help. Those fridays...
This isn't very dynamic i guess..
select
AssetID
from (
select distinct
AssetID,
KeywordID
from AssetKeyword
where
KeywordID in (3,4,5)
) t
group by
AssetID
having
COUNT(*) = 3
You can use EXISTS:
SELECT a.AssetID, a.Col2, ...
FROM dbo.Asset a
WHERE EXISTS
(
SELECT 1 FROM AssetKeyword ak -- it doesn't matter what you "select" here
WHERE ak.AssetID = a.AssetID
AND ak.KeywordID IN (3, 4, 5)
)
This selects all parent records where there is at least one child with at least one of those keywords.

Select query to get all data from junction table to one field

I have 2 tables and 1 junction table:
table 1 (Log): | Id | Title | Date | ...
table 2 (Category): | Id | Title | ...
junction table between table 1 and 2:
LogCategory: | Id | LogId | CategoryId
now, I want a sql query to get all logs with all categories title in one field,
something like this:
LogId, LogTitle, ..., Categories(that contains all category title assigned to this log id)
can any one help me solve this? thanks
Try this code:
DECLARE #results TABLE
(
idLog int,
LogTitle varchar(20),
idCategory int,
CategoryTitle varchar(20)
)
INSERT INTO #results
SELECT l.idLog, l.LogTitle, c.idCategory, c.CategoryTitle
FROM
LogCategory lc
INNER JOIN Log l
ON lc.IdLog = l.IdLog
INNER JOIN Category c
ON lc.IdCategory = c.IdCategory
SELECT DISTINCT
idLog,
LogTitle,
STUFF (
(SELECT ', ' + r1.CategoryTitle
FROM #results r1
WHERE r1.idLog = r2.idLog
ORDER BY r1.idLog
FOR XML PATH ('')
), 1, 2, '')
FROM
#results r2
Here you have a simple SQL Fiddle example
I'm sure this query can be written using only one select, but this way it is readable and I can explain what the code does.
The first select takes all Log - Category matches into a table variable.
The second part uses FOR XML to select the category names and return the result in an XML instead of in a table. by using FOR XML PATH ('') and placing a ', ' in the select, all the XML tags are removed from the result.
And finally, the STUFF instruction replaces the initial ', ' characters of every row and writes an empty string instead, this way the string formatting is correct.

SQL query that gives distinct results that match multiple columns

Sorry, I couldn't provide a better title for my problem as I am quite new to SQL.
I am looking for a SQL query string that solves the below problem.
Let's assume the following table:
DOCUMENT_ID | TAG
----------------------------
1 | tag1
1 | tag2
1 | tag3
2 | tag2
3 | tag1
3 | tag2
4 | tag1
5 | tag3
Now I want to select all distinct document id's that contain one or more tags (but those must provide all specified tags).
For example:
Select all document_id's with tag1 and tag2 would return 1 and 3 (but not 4 for example as it doesn't have tag2).
What would be the best way to do that?
Regards,
Kai
SELECT document_id
FROM table
WHERE tag = 'tag1' OR tag = 'tag2'
GROUP BY document_id
HAVING COUNT(DISTINCT tag) = 2
Edit:
Updated for lack of constraints...
This assumes DocumentID and Tag are the Primary Key.
Edit: Changed HAVING clause to count DISTINCT tags. That way it doesn't matter what the primary key is.
Test Data
-- Populate Test Data
CREATE TABLE #table (
DocumentID varchar(8) NOT NULL,
Tag varchar(8) NOT NULL
)
INSERT INTO #table VALUES ('1','tag1')
INSERT INTO #table VALUES ('1','tag2')
INSERT INTO #table VALUES ('1','tag3')
INSERT INTO #table VALUES ('2','tag2')
INSERT INTO #table VALUES ('3','tag1')
INSERT INTO #table VALUES ('3','tag2')
INSERT INTO #table VALUES ('4','tag1')
INSERT INTO #table VALUES ('5','tag3')
INSERT INTO #table VALUES ('3','tag2') -- Edit: test duplicate tags
Query
-- Return Results
SELECT DocumentID FROM #table
WHERE Tag IN ('tag1','tag2')
GROUP BY DocumentID
HAVING COUNT(DISTINCT Tag) = 2
Results
DocumentID
----------
1
3
select DOCUMENT_ID
TAG in ("tag1", "tag2", ... "tagN")
group by DOCUMENT_ID
having count(*) > N and
Adjust N and the tag list as needed.
Select distinct document_id
from {TABLE}
where tag in ('tag1','tag2')
group by id
having count(tag) >=2
How you generate the list of tags in the where clause depends on your application structure. If you are dynamically generating the query as part of your code then you might simply construct the query as a big dynamically generated string.
We always used stored procedures to query the data. In that case, we pass in the list of tags as an XML document. - a procedure like that might look something like one of these where the input argument would be
<tags>
<tag>tag1</tag>
<tag>tag2</tag>
</tags>
CREATE PROCEDURE [dbo].[GetDocumentIdsByTag]
#tagList xml
AS
BEGIN
declare #tagCount int
select #tagCount = count(distinct *) from #tagList.nodes('tags/tag') R(tags)
SELECT DISTINCT documentid
FROM {TABLE}
JOIN #tagList.nodes('tags/tag') R(tags) ON {TABLE}.tag = tags.value('.','varchar(20)')
group by id
having count(distict tag) >= #tagCount
END
OR
CREATE PROCEDURE [dbo].[GetDocumentIdsByTag]
#tagList xml
AS
BEGIN
declare #tagCount int
select #tagCount = count(*) from #tagList.nodes('tags/tag') R(tags)
SELECT DISTINCT documentid
FROM {TABLE}
WHERE tag in
(
SELECT tags.value('.','varchar(20)')
FROM #tagList.nodes('tags/tag') R(tags)
}
group by id
having count( distinct tag) >= #tagCount
END
END