Let's say I have a table like this:
CREATE TABLE book (
id INTEGER PRIMARY KEY,
title VARCHAR(32) NOT NULL
);
and I want to support query with offset in order to support API that would return a list of books, ordered by non-unique title field with a given offset and limit.
The question here is what is the most efficient way [1] to define a unique index (or helper column or anything like that) for non-unique title column that could be used as an opaque offset token in queries where I'm using ORDER BY title. I thought about an index created on function that would return a unique numeric position of a row but I'm afraid that this would severely affect timings for INSERTs and UPDATEs for big tables and I think there is an optimal solution for that.
While this is straightforward for ORDER BY {unique_field} queries [2] I don't see an easy way to achieve the same for non-unique fields.
Also let's assume that solution should work in postgresql and mysql.
Notes:
[1] Since straightforward solutions like SELECT id, title FROM book ORDER BY title OFFSET [number] LIMIT [number] work extremely bad for big numeric offset values, I would introduce some sort of opaque token that would represent an offset in a given set in my API for getting book chunks.
So API method that would return a list of books ordered by title with a given offset would look like this (pseudocode):
BookPage getBooks(optional string offsetToken, int limit)
where BookPage is defined as follows:
class BookPage {
nonnull List<Book> books;
nonnull string offsetToken; // expected to be used to return a next page
}
Example use, book table contains 2*N books:
// 1st call
BookPage page1 = getBooks(null, 2); // get first 2 books
BookPage page2 = getBooks(page1.offsetToken, 2); // get next 2 books
BookPage page3 = getBooks(page2.offsetToken, 2); // get next 2 books
//...
BookPage pageN = getBooks(pageN-1.offsetToken, 2); // get last 2 books
and a concatenation of lists page1.books, page2.books, ... pageN.books would produce a list of books ordered by title in ascending order.
[2] For example: If getBooks API would use offset queries where books ordered by id (which is a primary key) offsetToken would be an id of the last book and implementation of getBooks API would look as follows (pseudocode):
BookPage getBook(optional string offsetToken, int limit) {
Long startId = (offsetToken != null ? toLong(offsetToken) : null);
page.books = (SELECT id, title FROM books
WHERE :startId IS null OR id>:startId
ORDER BY id
LIMIT :limit);
page.offsetToken = toString(lastElementOf(page.books).id)
return page;
}
The easiest and simplest solution that I found so far is to use a non-unique column in conjunction with the primary key. It slightly complicates select queries - e.g. for original problem you need to write something like (title = :title AND id > :id) OR (title > :title), where :title and :id constitute offsetToken (last item's title and id).
Related
I want to map an array of key value pairs of GroupCount to a composite type of GroupsResult mapping only specific keys.
I'm using unnest to turn the array into rows, and then use 3 separate select statements to pull out the values.
This feels like a lot of code for something so simple.
Is there an easier / more concise way to do the mapping from the array type to the GroupsResult type?
create type GroupCount AS (
Name text,
Count int
);
create type GroupsResult AS (
Cats int,
Dogs int,
Birds int
);
WITH unnestedTable AS (WITH resultTable AS (SELECT ARRAY [ ('Cats', 5)::GroupCount, ('Dogs', 2)::GroupCount ] resp)
SELECT unnest(resp)::GroupCount t
FROM resultTable)
SELECT (
(SELECT (unnestedTable.t::GroupCount).count FROM unnestedTable WHERE (unnestedTable.t::GroupCount).name = 'Cats'),
(SELECT (unnestedTable.t::GroupCount).count FROM unnestedTable WHERE (unnestedTable.t::GroupCount).name = 'Dogs'),
(SELECT (unnestedTable.t::GroupCount).count FROM unnestedTable WHERE (unnestedTable.t::GroupCount).name = 'Birds')
)::GroupsResult
fiddle
http://sqlfiddle.com/#!17/56aa2/1
A bit simpler. :)
SELECT (min(u.count) FILTER (WHERE name = 'Cats')
, min(u.count) FILTER (WHERE name = 'Dogs')
, min(u.count) FILTER (WHERE name = 'Birds'))::GroupsResult
FROM unnest('{"(Cats,5)","(Dogs,2)"}'::GroupCount[]) u;
db<>fiddle here
See:
Aggregate columns with additional (distinct) filters
Subtle difference: our original raises an exception if one of the names pops up more than once, while this will just return the minimum count. May or may not be what you want - or be irrelevant if duplicates can never occur.
For many different names, crosstab() is typically faster. See:
PostgreSQL Crosstab Query
Question
When dealing with a one-to-many or many-to-many SQL relationship in Golang, what is the best (efficient, recommended, "Go-like") way of mapping the rows to a struct?
Taking the example setup below I have tried to detail some approaches with Pros and Cons of each but was wondering what the community recommends.
Requirements
Works with PostgreSQL (can be generic but not include MySQL/Oracle specific features)
Efficiency - No brute forcing every combination
No ORM - Ideally using only database/sql and jmoiron/sqlx
Example
For sake of clarity I have removed error handling
Models
type Tag struct {
ID int
Name string
}
type Item struct {
ID int
Tags []Tag
}
Database
CREATE TABLE item (
id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY
);
CREATE TABLE tag (
id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
name VARCHAR(160),
item_id INT REFERENCES item(id)
);
Approach 1 - Select all Items, then select tags per item
var items []Item
sqlxdb.Select(&items, "SELECT * FROM item")
for i, item := range items {
var tags []Tag
sqlxdb.Select(&tags, "SELECT * FROM tag WHERE item_id = $1", item.ID)
items[i].Tags = tags
}
Pros
Simple
Easy to understand
Cons
Inefficient with the number of database queries increasing proportional with number of items
Approach 2 - Construct SQL join and loop through rows manually
var itemTags = make(map[int][]Tag)
var items = []Item{}
rows, _ := sqlxdb.Queryx("SELECT i.id, t.id, t.name FROM item AS i JOIN tag AS t ON t.item_id = i.id")
for rows.Next() {
var (
itemID int
tagID int
tagName string
)
rows.Scan(&itemID, &tagID, &tagName)
if tags, ok := itemTags[itemID]; ok {
itemTags[itemID] = append(tags, Tag{ID: tagID, Name: tagName,})
} else {
itemTags[itemID] = []Tag{Tag{ID: tagID, Name: tagName,}}
}
}
for itemID, tags := range itemTags {
items = append(Item{
ID: itemID,
Tags: tags,
})
}
Pros
A single database call and cursor that can be looped through without eating too much memory
Cons
Complicated and harder to develop with multiple joins and many attributes on the struct
Not too performant; more memory usage and processing time vs. more network calls
Failed approach 3 - sqlx struct scanning
Despite failing I want to include this approach as I find it to be my current aim of efficiency paired with development simplicity. My hope was by explicitly setting the db tag on each struct field sqlx could do some advanced struct scanning
var items []Item
sqlxdb.Select(&items, "SELECT i.id AS item_id, t.id AS tag_id, t.name AS tag_name FROM item AS i JOIN tag AS t ON t.item_id = i.id")
Unfortunately this errors out as missing destination name tag_id in *[]Item leading me to believe the StructScan is not advanced enough to recursively loop through rows (no criticism - it is a complicated scenario)
Possible approach 4 - PostgreSQL array aggregators and GROUP BY
While I am sure this will not work I have included this untested option to see if it could be improved upon so it may work.
var items = []Item{}
sqlxdb.Select(&items, "SELECT i.id as item_id, array_agg(t.*) as tags FROM item AS i JOIN tag AS t ON t.item_id = i.id GROUP BY i.id")
When I have some time I will try and run some experiments here.
the sql in postgres :
create schema temp;
set search_path = temp;
create table item
(
id INT generated by default as identity primary key
);
create table tag
(
id INT generated by default as identity primary key,
name VARCHAR(160),
item_id INT references item (id)
);
create view item_tags as
select id,
(
select
array_to_json(array_agg(row_to_json(taglist.*))) as array_to_json
from (
select tag.name, tag.id
from tag
where item_id = item.id
) taglist ) as tags
from item ;
-- golang query this maybe
select row_to_json(row)
from (
select * from item_tags
) row;
then golang query this sql:
select row_to_json(row)
from (
select * from item_tags
) row;
and unmarshall to go struct:
pro:
postgres manage the relation of data. add / update data with sql functions.
golang manage business model and logic.
it's easy way.
.
I can suggest another approach which I have used before.
You make a json of the tags in this case in the query and return it.
Pros: You have 1 call to the db, which aggregates the data, and all you have to do is parse the json into an array.
Cons: It's a bit ugly. Feel free to bash me for it.
type jointItem struct {
Item
ParsedTags string
Tags []Tag `gorm:"-"`
}
var jointItems []*jointItem
db.Raw(`SELECT
items.*,
(SELECT CONCAT(
'[',
GROUP_CONCAT(
JSON_OBJECT('id', id,
'name', name
)
),
']'
)) as parsed_tags
FROM items`).Scan(&jointItems)
for _, o := range jointItems {
var tempTags []Tag
if err := json.Unmarshall(o.ParsedTags, &tempTags) ; err != nil {
// do something
}
o.Tags = tempTags
}
Edit: code might behave weirdly so I find it better to use a temporary tags array when moving instead of using the same struct.
You can use carta.Map() from https://github.com/jackskj/carta
It tracks has-many relationships automatically.
I have the classic arrangement for a many to many relation in a small flashcard like application built using SQLite. Every card can have multiple tags, and every tag can have multiple cards. This two entities having each a table with a third table to link records.
This is the table for Cards:
CREATE TABLE Cards (CardId INTEGER PRIMARY KEY AUTOINCREMENT,
Text TEXT NOT NULL,
Answer INTEGER NOT NULL,
Success INTEGER NOT NULL,
Fail INTEGER NOT NULL);
This is the table for Tags:
CREATE TABLE Tags (TagId INTEGER PRIMARY KEY AUTOINCREMENT,
Name TEXT UNIQUE NOT NULL);
This is the cross reference table:
CREATE TABLE CardsRelatedToTags (CardId INTEGER,
TagId INTEGER,
PRIMARY KEY (CardId, TagId));
I need to get a table of cards with their associated tags in a column separated by commas.
I can already get what I need for a single row knowing its Id with the following query:
SELECT Cards.CardId, Cards.Text,
(SELECT group_concat(Tags.Name, ', ') FROM Tags
JOIN CardsRelatedToTags ON CardsRelatedToTags.TagId = Tags.TagId
WHERE CardsRelatedToTags.CardId = 1) AS TagsList
FROM Cards
WHERE Cards.CardId = 1
This will result in something like this:
CardId | Text | TagsList
1 | Some specially formatted text | Tag1, Tag2, TagN...
How to get this type of result (TagsList from group_concat) for every row in Cards using a SQL query? It is advisable to do so from the performance point of view? Or I need to do this sort of "presentation" work in application code using a simpler request to the DB?
Answering your code question:
SELECT
c.CardId,
c.Text,
GROUP_CONCAT(t.Name,', ') AS TagsList
FROM
Cards c
JOIN CardsRelatedToTags crt ON
c.CardId = crt.CardId
JOIN Tags t ON
crt.TagId = t.TagId
WHERE
c.CardId = 1
GROUP BY c.CardId, c.Text
Now, to the matter of performance. Databases are a powerful tool and do not end on simple SELECT statements. You can definitely do what you need inside a DB (even SQLite). It is a bad practice to use a SELECT statement as a feed for one column inside another SELECT. It would require scanning a table to get result for each row in your input.
I have 2 tables:
CREATE TABLE article (
id serial NOT NULL,
title text,
tags integer[] -- array of tag id's from TAG table
)
CREATE TABLE tag (
id serial NOT NULL,
description character varying(250) NOT NULL
)
... and need to select tags from TAG table held in ARTICLE's 'tags integer[]' based on article's title.
So tried something like
SELECT *
FROM tag
WHERE tag.id IN ( (select article.tags::int4
from article
where article.title = 'some title' ) );
... which gives me
ERROR: cannot cast type integer[] to integer
LINE 1: ...FROM tag WHERE tag.id IN ( (select article.tags::int4 from ...
I am Stuck with PostgreSql 8.3 in both dev and production environment.
Use the array overlaps operator &&:
SELECT *
FROM tag
WHERE ARRAY[id] && ANY (SELECT tags FROM article WHERE title = '...');
Using contrib/intarray you can even index this sort of thing quite well.
Take a look at section "8.14.5. Searching in Arrays", but consider the tip at the end of that section:
Tip: Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.
You did not mention your Postgres version, so I assume you are using an up-to-date version (8.4, 9.0)
This should work then:
SELECT *
FROM tag
WHERE tag.id IN ( select unnest(tags)
from article
where title = 'some title' );
But you should really consider changing your table design.
Edit
For 8.3 the unnest() function can easily be added, see this wiki page:
http://wiki.postgresql.org/wiki/Array_Unnest
I have two tables A & B, and B has a many:1 relationship with A.
When querying rows from A I'd also like to have corresponding B records returned as an array and added to the result array from A, so I end up with something like this:
A-ROW
field
field
B-ITEMS
item1
item2
item3
Is there a clean way to do this with one query (perhaps a join?), or should I just perform a second query of B on the id from A and add that to the result array?
It would be more efficient to join table B on table A. It will not give you the data in the shape you are looking for. But you can iterate over this result and build the data into the desired shape.
Here is some code to illustrate the idea :
// Join table B on table A through a foreign key
$sql = 'select a.id, a.x, b.y
from a
left join b on b.a_id=a.id
order by a.id';
// Execute query
$result = $this->db->query($sql)->result_array();
// Initialise desired result
$shaped_result = array();
// Loop through the SQL result creating the data in your desired shape
foreach ($result as $row)
{
// The primary key of A
$id = $row['id'];
// Add a new result row for A if we have not come across this key before
if (!array_key_exists($id, $shaped_result))
{
$shaped_result[$id] = array('id' => $id, 'x' => $row['x'], 'b_items' => array());
}
if ($row['y'] != null)
{
// Push B item onto sub array
$shaped_result[$id]['b_items'][] = $row['y'];
}
}
"... just perform a second query of B on the id from A and add that to the result array ..." -- that is the correct solution. SQL won't comprehend nested array structure.
To build on what Smandoli said--
Running the secondary query separately is more efficient because even if row data on the primary table (A) has changed, unchanged data on the secondary table (B) will result in a (MySQL) query cache hit assuming the IDs never change.
This is not necessarily true of the join query approach.
There will also be less data coming over the wire since the join approach will fetch duplicate data for the primary table (A) if the secondary table (B) has multiple rows associated with a single row in the primary table.
Hopefully anyone looking to do this (relatively) common type of data retrieval may find this useful.