Tricky PostgreSQL join and order query - sql

I've got four tables in a PostgreSQL 9.3.6 database:
sections
fields (child of sections)
entries (child of sections)
data (child of entries)
CREATE TABLE section (
id serial PRIMARY KEY,
title text,
"group" integer
);
CREATE TABLE fields (
id serial PRIMARY KEY,
title text,
section integer,
type text,
"default" json
);
CREATE TABLE entries (
id serial PRIMARY KEY,
section integer
);
CREATE TABLE data (
id serial PRIMARY KEY,
data json,
field integer,
entry integer
);
I'm trying to generate a page that looks like this:
section title
field 1 title | field 2 title | field 3 title
entry 1 | data 'as' json | data 1 json | data 3 json <-- table
entry 2 | data 'df' json | data 5 json | data 6 json
entry 3 | data 'gh' json | data 8 json | data 9 json
The way I have it set up right now each piece of 'data' has an entry it's linked to, a corresponding field (that field has columns that determine how the data's json field should be interpreted), a json field to store different types of data, and an id (1-9 here in the table).
In this example there are 3 entries, and 3 fields and there is a data piece for each of the cells in between.
It's set up like this because one section can have different field types and quantity than another section and therefore different quantities and types of data.
Challenge 1:
I'm trying to join the table together in a way that it's sortable by any of the columns (contents of the data for that field's json column). For example I want to be able to sort field 3 (the third column) in reverse order, the table would look like this:
section title
field 1 title | field 2 title | field 3 title
entry 3 | data 'gh' json | data 8 json | data 9 json
entry 2 | data 'df' json | data 5 json | data 6 json
entry 1 | data 'as' json | data 1 json | data 3 json <-- table
I'm open to doing it another way too if there's a better one.
Challenge 2:
Each field has a 'default value' column - Ideally I only have to create 'data' entries when they have a value that isn't that default value. So the table might actually look like this if field 2's default value was 'asdf':
section title
field 1 title | field 2 title | field 3 title
entry 3 | data 'gh' json | data 8 json | data 9 json
entry 2 | data 'df' json | 'asdf' | data 6 json
entry 1 | data 'as' json | 'asdf' | data 3 json <-- table

The key to writing this query is understanding that you just need to fetch all the data for single section and the rest you just join. You also can't with your schema directly filter data by section so you'll need to join entry just for that:
SELECT d.* FROM data d JOIN entries e ON (d.entry = e.id)
WHERE e.section = ?
You can then join field to each row to get defaults, types and titles:
SELECT d.*, f.title, f.type, f."default"
FROM data d JOIN entries e ON (d.entry = e.id)
JOIN fields f ON (d.field = f.id)
WHERE e.section = ?
Or you can select fields in a separate query to save some network traffic.
So this was an answer, here come bonuses:
Use foreign keys instead of integers to refer to other tables, it will make database check consistency for you.
Relations (tables) should be called in singular by convention, so it's section, entry and field.
Referring fields are called <name>_id, e.g. field_id or section_id also by convention.
The whole point of JSON fields is to store a collection with not statically defined data, so it would made much more sense to not use entries and data tables, but single table with JSON containing all the fields instead.
Like this:
CREATE TABLE row ( -- less generic name would be even better
id int primary key,
section_id int references section (id),
data json
)
With data fields containing something like:
{
"title": "iPhone 6",
"price": 650,
"available": true,
...
}

#Suor has provided good advice, some of which you already accepted. I am building on the updated schema.
Schema
CREATE TABLE section (
section_id serial PRIMARY KEY,
title text,
grp integer
);
CREATE TABLE field (
field_id serial PRIMARY KEY,
section_id integer REFERENCES section,
title text,
type text,
default_val json
);
CREATE TABLE entry (
entry_id serial PRIMARY KEY,
section_id integer REFERENCES section
);
CREATE TABLE data (
data_id serial PRIMARY KEY,
field_id integer REFERENCES field,
entry_id integer REFERENCES entry,
data json
);
I changed two more details:
section_id instead of id, etc. "id" as column name is an anti-pattern that's gotten popular since a couple of ORMs use it. Don't. Descriptive names are much better. Identical names for identical content is a helpful guideline. It also allows to use the shortcut USING in join clauses:
Don't use reserved words as identifiers. Use legal, lower-case, unquoted names exclusively to make your life easier.
Are PostgreSQL column names case-sensitive?
Referential integrity?
There is another inherent weakness in your design. What stops entries in data from referencing a field and an entry that don't go together? Closely related question on dba.SE
Enforcing constraints “two tables away”
Query
Not sure if you need the complex design at all. But to answer the question, this is the base query:
SELECT entry_id, field_id, COALESCE(d.data, f.default_val) AS data
FROM entry e
JOIN field f USING (section_id)
LEFT JOIN data d USING (field_id, entry_id) -- can be missing
WHERE e.section_id = 1
ORDER BY 1, 2;
The LEFT JOIN is crucial to allow for missing data entries and use the default instead.
SQL Fiddle.
crosstab()
The final step is cross tabulation. Cannot show this in SQL Fiddle since the additional module tablefunc is not installed.
Basics for crosstab():
PostgreSQL Crosstab Query
SELECT * FROM crosstab(
$$
SELECT entry_id, field_id, COALESCE(d.data, f.default_val) AS data
FROM entry e
JOIN field f USING (section_id)
LEFT JOIN data d USING (field_id, entry_id) -- can be missing
WHERE e.section_id = 1
ORDER BY 1, 2
$$
,$$SELECT field_id FROM field WHERE section_id = 1 ORDER BY field_id$$
) AS ct (entry int, f1 json, f2 json, f3 json) -- static
ORDER BY f3->>'a'; -- static
The tricky part here is the return type of the function. I provided a static type for 3 fields, but you really want that dynamic. Also, I am referencing a field in the json type that may or may not be there ...
So build that query dynamically and execute it in a second call.
More about that:
Dynamic alternative to pivot with CASE and GROUP BY

Related

SQLite, Many to many relations, How to aggregate?

I have the classic arrangement for a many to many relation in a small flashcard like application built using SQLite. Every card can have multiple tags, and every tag can have multiple cards. This two entities having each a table with a third table to link records.
This is the table for Cards:
CREATE TABLE Cards (CardId INTEGER PRIMARY KEY AUTOINCREMENT,
Text TEXT NOT NULL,
Answer INTEGER NOT NULL,
Success INTEGER NOT NULL,
Fail INTEGER NOT NULL);
This is the table for Tags:
CREATE TABLE Tags (TagId INTEGER PRIMARY KEY AUTOINCREMENT,
Name TEXT UNIQUE NOT NULL);
This is the cross reference table:
CREATE TABLE CardsRelatedToTags (CardId INTEGER,
TagId INTEGER,
PRIMARY KEY (CardId, TagId));
I need to get a table of cards with their associated tags in a column separated by commas.
I can already get what I need for a single row knowing its Id with the following query:
SELECT Cards.CardId, Cards.Text,
(SELECT group_concat(Tags.Name, ', ') FROM Tags
JOIN CardsRelatedToTags ON CardsRelatedToTags.TagId = Tags.TagId
WHERE CardsRelatedToTags.CardId = 1) AS TagsList
FROM Cards
WHERE Cards.CardId = 1
This will result in something like this:
CardId | Text | TagsList
1 | Some specially formatted text | Tag1, Tag2, TagN...
How to get this type of result (TagsList from group_concat) for every row in Cards using a SQL query? It is advisable to do so from the performance point of view? Or I need to do this sort of "presentation" work in application code using a simpler request to the DB?
Answering your code question:
SELECT
c.CardId,
c.Text,
GROUP_CONCAT(t.Name,', ') AS TagsList
FROM
Cards c
JOIN CardsRelatedToTags crt ON
c.CardId = crt.CardId
JOIN Tags t ON
crt.TagId = t.TagId
WHERE
c.CardId = 1
GROUP BY c.CardId, c.Text
Now, to the matter of performance. Databases are a powerful tool and do not end on simple SELECT statements. You can definitely do what you need inside a DB (even SQLite). It is a bad practice to use a SELECT statement as a feed for one column inside another SELECT. It would require scanning a table to get result for each row in your input.

Combined SELECT from unnested composite type array and regular column

I have a table my_friends_cards:
id | name | rare_cards_composite[] |
---+---------+------------------------
1 | 'timmy' | { {1923, 'baberuth'}, {1999, 'jeter'}}
2 |'jimmy' | { {1955, 'Joey D'}, {1995, 'juice_head'}}
3 |'bob' | {{2001, 'mo_jeter'}}
I want to make the a request kinda like this:
Select name, (cards.x).player
FROM SELECT UNNEST(base_ball_card) as x
FROM my_friends_cards
WHERE name=ANY(['timmy', 'jimmy'])) as cards
WHERE (cards.x).year > 1990
(I know this doesn't work that there is no 'name' field in the unnested composite array.)
I am getting the feeling that my composite type array column should just be another table, and then I could do a join, but is there anyway around this?
I would expect this result:
[('timmy', 'jeter')
,('jimmy', 'juice_head')]
version: PostgreSQL 9.3.3
Your feeling is correct: a normalized schema with another table instead of the array of composite types would be the superior approach in many respects.
While stuck with your unfortunate design:
Test setup
(You should have provided this.)
CREATE TYPE card AS (year int, cardname text);
CREATE TABLE my_friends_cards (id int, name text, rare_cards_composite card[]);
INSERT INTO my_friends_cards VALUES
(1, 'timmy', '{"(1923,baberuth)","(1999,jeter)"}')
, (2, 'jimmy', '{"(1955,Joey D)","(1995,juice_head)"}')
, (3, 'bob' , '{"(2001,mo_jeter)"}')
;
Query
Requires Postgres 9.3+.
SELECT t.name, c.cardname
FROM my_friends_cards t
, unnest(t.rare_cards_composite) c
WHERE t.name = ANY('{timmy,jimmy}')
AND c.year > 1990;
db<>fiddle here
Old sqlfiddle
Note that the composite type is decomposed in the unnesting.

Recursively duplicating entries

I am attempting to duplicate an entry. That part isn't hard. The tricky part is: there are n entries connected with a foreign key. And for each of those entries, there are n entries connected to that. I did it manually using a lookup to duplicate and cross reference the foreign keys.
Is there some subroutine or method to duplicate an entry and search for and duplicate foreign entries? Perhaps there is a name for this type of replication I haven't stumbled on yet, is there a specific database related title for this type of operation?
PostgreSQL 8.4.13
main entry (uid is serial)
uid | title
-----+-------
1 | stuff
department (departmentid is serial, uidref is foreign key for uid above)
departmentid | uidref | title
--------------+--------+-------
100 | 1 | Foo
101 | 1 | Bar
sub_category of department (textid is serial, departmentref is foreign for departmentid above)
textid | departmentref | title
-------+---------------+----------------
1000 | 100 | Text for Foo 1
1001 | 100 | Text for Foo 2
1002 | 101 | Text for Bar 1
You can do it all in a single statement using data-modifying CTEs (requires Postgres 9.1 or later).
Your primary keys being serial columns makes it easier:
WITH m AS (
INSERT INTO main (<all columns except pk>)
SELECT <all columns except pk>
FROM main
WHERE uid = 1
RETURNING uid AS uidref -- returns new uid
)
, d AS (
INSERT INTO department (<all columns except pk>)
SELECT <all columns except pk>
FROM m
JOIN department d USING (uidref)
RETURNING departmentid AS departmentref -- returns new departmentids
)
INSERT INTO sub_category (<all columns except pk>)
SELECT <all columns except pk>
FROM d
JOIN sub_category s USING (departmentref);
Replace <all columns except pk> with your actual columns. pk is for primary key, like main.uid.
The query returns nothing. You can return pretty much anything. You just didn't specify anything.
You wouldn't call that "replication". That term usually is applied for keeping multiple database instances or objects in sync. You are just duplicating an entry - and depending objects recursively.
Aside about naming conventions:
It would get even simpler with a naming convention that labels all columns signifying "ID of table foo" with the same (descriptive) name, like foo_id. There are other naming conventions floating around, but this is the best for writing queries, IMO.

Fetch a single field from DB table into itab

I want to fetch the a field say excep_point from a transparent table z_accounts for the combination of company_code and account_number. How can I do this in ABAP SQL?
Assume that table structure is
|company_code | account_number | excep_point |
Assuming you have the full primary key...
data: gv_excep_point type zaccounts-excep_point.
select single excep_point
into gv_excep_point
from zaccounts
where company_code = some_company_code
and account_number = some_account_number.
if you don't have the full PK and there could be multiple values for excep_point
data: gt_excep_points type table of zaccounts-excep_point.
select excep_point
into table gt_excep_points
from zaccounts
where company_code = some_company_code
and account_number = some_account_number.
There is at least another variation, but those are 2 I use most often.
For information only. When you selects data into table you can write complex expressions to combine different fields. For example, you have internal table (itab) with two fields "A" and "B". And you are going to select data from DB table (dbtab) wich have 6 columns - "z","x","y","u","v","w". And for example each field is type char2 You aim to cimbine "z","x","y","u" in "A" field of internal table and "v","w" in "B" field. You can write simple code:
select z as A+0(2)
x as A+2(2)
y as A+4(2)
u as A+6(2)
v as B+0(2)
w as B+2(2) FROM dbtab
INTO CORRESPONDING FIELDS OF TABLE itab
WHERE <where condition>.
This simple code makes you job done very simple
In addition to Bryans answer, here is the official online documentation about Open SQL.

SQL Server 2008 localization of tables

I need to localize a SQL Server 2008 database. After investigating recommendations, I have found that it is best to have separate tables or each of the languages for the strings. That way different sorting settings can be set for each table. For example, a typical Product table has ProdID, Product Description, and Price fields. The recommended solution is to set the table structures to have the Product table be ProdID and Price. Then a specific table for each language would have the following structure: ProdID and Description.
My question is how do I create a store procedure that has a parameter which passes in the culture to use for the sub-table and then use that to join the tables? The sub-table needs to change based on the parameter. How can that be done? I am using SQL Server 2008.
First off, are you sure you really want to implement different tables for each culture? It would make more sense to modify your Product table to remove the description, and then add a ProductDescription table with a ProdID, culture, and description field. This way you don't have to toy around with dynamic SQL (which is what you'll have to use) to select the correct table based on the culture parameter.
...specific table for each language would have the following structure: ProdID and Description.
...which is why you're having to look at a really involved setup to get your information out of the database.
A better approach would be to use a single table, and use a code for the language. You don't want to be defining a column per attribute you want translated either, so you'd be looking at implementing something like:
LANGUAGES table
LANGUAGE_ID, pk
LANGUAGE_DESCRIPTION
Example data:
LANGUAGE_ID | LANGUAGE_DESCRIPTION
------------------------------------
1 | ENGLISH
2 | FRENCH
TRANSLATED_ATTRIBUTES table
TRANSLATED_ATTRIBUTE_ID, pk
TRANSLATED_ATTRIBUTE_DESC
Example data:
TRANSLATED_ATTRIBUTE_ID | TRANSLATED_ATTRIBUTE_DESC
------------------------------------
1 | PROD_ID
2 | PROD_DESC
LOCALIZATIONS table
LANGUAGE_ID, pk
TRANSLATED_ATTRIBUTE_ID, pk
TRANSLATED_VALUE
Example data:
LANGUAGE_ID | TRANSLATED_ATTRIBUTE_ID | TRANSLATED_VALUE
----------------------------------------------------------
1 | 1 | Product ID
2 | 1 | Produit ID
You'll want a table associating the TRANSLATED_ATTRIBUTE_ID with a given item - Product is the example you've given so:
ATTRIBUTES table
ATTRIBUTE_ID, pk
ATTRIBUTE_TYPE_CODE, fk
TRANSLATED_ATTRIBUTE_ID, fk
Example data:
ATTRIBUTE_ID | ATTRIBUTE_TYPE_CODE | TRANSLATED_ATTRIBUTE_ID
----------------------------------------------------------------
1 | PRODUCT | 1
If you want to relate on a per product basis:
ATTRIBUTES table
ATTRIBUTE_ID, pk
PRODUCT_ID, fk
TRANSLATED_ATTRIBUTE_ID, fk
Now can you use two parameters - the language (English) & what the item is (Product):
SELECT t.translated_attribute_desc,
t.translated_value
FROM LOCALIZATIONS t
JOIN TRANSLATED_ATTRIBUTES ta ON ta.translated_attribute_id = t.translated_attribute_id
JOIN ATTRIBUTES a ON a.translated_attribute_id = ta.translated_attribute_id
JOIN ATTRIBUTE_TYPE_CODES atc ON atc.attribute_type_code = a.attribute_type_code
JOIN LANGUAGES lang ON lang.language_id = t.language_id
WHERE lang.language_description = 'ENGLISH' --alternate: lang.language_id = 1
AND atc.attribute_type_code = 'PRODUCT'
You can pivot the data as necessary.