Sort SQL results and include missing keys - sql

I have a Postgres table like this (greatly simplified):
id | object_id (foreign id) | key (text) | value (text)
1 | 1 | A | 0foo
2 | 1 | B | 1bar
3 | 1 | C | 2baz
4 | 1 | D | 3ham
5 | 2 | C | 4sam
6 | 3 | F | 5pam
…
(billions of rows)
I select object_ids according to some query (not relevant here), and then sort them according to the value of a specified key.
def sort_query_result(query, sort_by, limit, offset):
return query\
.with_entities(Table.object_id)\
.filter(Table.key == sort_by)\
.order_by(desc(Table.value))\
.limit(limit).offset(offset).subquery()
For example, assume a query matches object_ids 1 and 2 above. When sort_by=C, I want the result to be returned in the order [2, 1], because 4sam > 2baz.
This works well but there's one big problem:
Object ids that are returned by query but do not have any row for the sort_by key, are not returned at all.
For example, for a query that matches object_ids 1 and 2, sort_query_results(query, sort_by='D') == [1]. The object_id 2 is dropped because it has no D, which is undesirable.
Instead, I'd like to return all object_ids from the query. Those without the sort key should be sorted at the end, in any order: sort_query_results(query, sort_by='D') == [1, 2].
What's the best way to achieve that?
Note: I do not have the freedom to change the DB schema or business logic. But I can change the query code. I use SQLAlchemy ORM from Python, but could execute raw Postgres commands if necessary. Thank you.

Related

Filtering Columns in PLSQL

I have a table with tons and tons of columns and I'm trying to select only certain columns based on the data the columns contain. The table is part of an application I'm building in Oracle APEX and looks something like this:
|Row Header|Criteria 1|Criteria 2| Criteria 3 | Criteria 4 |Criteria 5 |
|Category | Type A | Type B | Type B | Type A | Type A |
| ID | 2.3 | 2.4 | 2.5 | 3.1 | 3.2 |
| Part A | Yes | Yes | Yes | No | Yes |
| Part B | Yes | No | Yes | Yes | Yes |
| Part C | No | Yes | Yes | Yes | No |
It goes on like this for around 1000ish criteria and 100ish parts I need to find a way to select all the columns that are of a specific type to its own table using SQL.
Id Like the return to look like this:
|Row Header|Criteria 1|Criteria 5 |
|Category | Type A | Type A |
| ID | 3.1 | 3.2 |
| Part A | No | Yes |
| Part B | Yes | Yes |
| Part C | Yes | No |
This way I only have the columns showing that are part of the "Type A" Category and have an ID greater than 3.
I've looked into GROUP BY and FILTER functions that SQL has to offer as well as PIVOT and I don't believe these will help me, but I'd be happy to be proven wrong.
In a relational database, columns are meant to be discrete, non-repeating attributes of a thing. Rows are meant to be multiple instances of that thing. Your table is reversed, using columns for what should be rows, and rows for what should be columns. Another factor is that Oracle limits you to 1000 columns, and you start undergoing severe performance degradation when you exceed 254 columns. Tables simply weren't meant to have hundreds, let alone thousands, of columns. So first step is to pivot your table like this:
Criteria_No, Cat, ID, PtA, PtB, PtC
---------------------------------------------
Row 1: Criteria 1, Type A, 2.3, Yes, Yes, No
Row 2: Criteria 2, Type B, 2.4, Yes, No, Yes
Row 3: Criteria 3, Type B, 2.5, Yes, Yes, Yes
. . . thousands more
But even then, you mentioned that you have 100s of "parts", so Parts A, B, C aren't the only three - the series continues. If so, it would be a violation of normal form to have such a repeating list in a single row. So you have one more step to fix your design: Break this into three tables.
CRITERIA
Criteria_No, Cat, ID
---------------------------------------------
Row 1: Criteria 1, Type A, 2.3
Row 2: Criteria 2, Type B, 2.4
Row 3: Criteria 3, Type B, 2.5
PARTS
Part, anything-else-about-part
-----------------
Part A, blah
Part B, blah,
Part C, blah
. . .
And now the bridge table between them:
CRITERIA_PARTS
Criteria_No, Part
-----------------
1, Part A
1, Part B
1, Part C
2, Part A,
2, Part B,
. . . and so on
You should also place a foreign key on each of the bridge table columns to point to their respective parent tables to ensure data integrity.
Now you query by joining the tables together in your SQL.
Updated: you asked how to move data into this new criteria table from your existing one. Use dynamic SQL like this:
BEGIN
FOR i IN 1..1000
LOOP
EXECUTE IMMEDIATE 'INSERT INTO criteria (criteria_no,cat,id) SELECT criteria_'||i||',category,id FROM oldtable';
END LOOP;
COMMIT;
END;
But of course set the 1000 to the real # of category_n columns.

Filter json values regardless of keys in PostgreSQL

I have a table called diary which includes columns listed below:
| id | user_id | custom_foods |
|----|---------|--------------------|
| 1 | 1 | {"56": 2, "42": 0} |
| 2 | 1 | {"19861": 1} |
| 3 | 2 | {} |
| 4 | 3 | {"331": 0} |
I would like to count how many diaries having custom_foods value(s) larger than 0 each user have. I don't care about the keys, since the keys can be any number in string.
The desired output is:
| user_id | count |
|---------|---------|
| 1 | 2 |
| 2 | 0 |
| 3 | 0 |
I started with:
select *
from diary as d
join json_each_text(d.custom_foods) as e
on d.custom_foods != '{}'
where e.value > 0
I don't even know whether the syntax is correct. Now I am getting the error:
ERROR: function json_each_text(text) does not exist
LINE 3: join json_each_text(d.custom_foods) as e
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
My using version is: psql (10.5 (Ubuntu 10.5-1.pgdg14.04+1), server 9.4.19). According to PostgreSQL 9.4.19 Documentation, that function should exist. I am so confused that I don't know how to proceed now.
Threads that I referred to:
Postgres and jsonb - search value at any key
Query postgres jsonb by value regardless of keys
Your custom_foods column is defined as text, so you should cast it to json before applying json_each_text. As json_each_text by default does not consider empty jsons, you may get the count as 0 for empty jsons from a separate CTE and do a UNION ALL
WITH empty AS
( SELECT DISTINCT user_id,
0 AS COUNT
FROM diary
WHERE custom_foods = '{}' )
SELECT user_id,
count(CASE
WHEN VALUE::int > 0 THEN 1
END)
FROM diary d,
json_each_text(d.custom_foods::JSON)
GROUP BY user_id
UNION ALL
SELECT *
FROM empty
ORDER BY user_id;
Demo

PostgreSQL: Efficiently split JSON array into rows

I have a table (Table A) that includes a text column that contains JSON encoded data.
The JSON data is always an array with between one and a few thousand plain object.
I have another table (Table B) with a few columns, including a column with a datatype of 'JSON'
I want to select all the rows from table A, split the json array into its elements and insert each element into table B
Bonus objective: Each object (almost) always has a key, x. I want to pull the value of x out into column, and delete x from the original object (if it exists).
E.g.: Table A
| id | json_array (text) |
+----+--------------------------------+
| 1 | '[{"x": 1}, {"y": 8}]' |
| 2 | '[{"x": 2, "y": 3}, {"x": 1}]' |
| 3 | '[{"x": 8, "z": 2}, {"z": 3}]' |
| 4 | '[{"x": 5, "y": 2, "z": 3}]' |
...would become: Table B
| id | a_id | x | json (json) |
+----+------+------+--------------------+
| 0 | 1 | 1 | '{}' |
| 1 | 1 | NULL | '{"y": 8}' |
| 2 | 2 | 2 | '{"y": 3}' |
| 3 | 2 | 1 | '{}' |
| 4 | 3 | 8 | '{"y": 2}' |
| 5 | 3 | NULL | '{"z": 3}' |
| 6 | 4 | 5 | '{"y": 2, "z": 3}' |
This initially has to work on a few million rows, and would then need to be run at regular intervals, so making it efficient would be a priority.
Is it possible to do this without using a loop and PL/PgSQL? I haven't been making much progress.
The json data type is not particularly suitable (or intended) for modification at the database level. Extracting "x" objects from the JSON object is therefore cumbersome, although it can be done.
You should create your table B (with hopefully a more creative column name than "json"; I am using item here) and make the id column a serial that starts at 0. A pure json solution then looks like this:
INSERT INTO b (a_id, x, item)
SELECT sub.a_id, sub.x,
('{' ||
string_agg(
CASE WHEN i.k IS NULL THEN '' ELSE '"' || i.k || '":' || i.v END,
', ') ||
'}')::json
FROM (
SELECT a.id AS a_id, (j.items->>'x')::integer AS x, j.items
FROM a, json_array_elements(json_array) j(items) ) sub
LEFT JOIN json_each(sub.items) i(k,v) ON i.k <> 'x'
GROUP BY sub.a_id, sub.x
ORDER BY sub.a_id;
In the sub-query this extracts the a_id and x values, well as the JSON object. In the outer query the JSON object is broken into its individual pieces and the objects with key x thrown out (the LEFT JOIN ON i.k <> 'x'). In the select list the pieces are put back together again with string concatenation and grouped into compound objects.
This necessarily has to be like this because json has no built-in manipulation functions of any consequence. This works on PG versions 9.3+, i.e. since time immemorial insofar as JSON support is concerned.
If you are using PG9.5+, the solution is much simpler through a cast to jsonb:
INSERT INTO b (a_id, x, item)
SELECT a.id, (j.items->>'x')::integer, j.items #- '{x}'
FROM a, jsonb_array_elements(json_array::jsonb) j(items);
The #- operator on the jsonb data type does all the dirty work here. Obviously, there is a lot of work going on behind the scenes, converting json to jsonb, so if you find that you need to manipulate your JSON objects more frequently then you are better off using the jsonb type to begin with. In your case I suggest you do some benchmarking with EXPLAIN ANALYZE SELECT ... (you can safely forget about the INSERT while testing) on perhaps 10,000 rows to see which works best for your setup.

Custom ordering in sqlite

Is there a way to have a custom order by query in sqlite?
For example, I have essentially an enum
_id|Name|Key
------------
1 | One | Named
2 | Two | Contributing
3 | Three | Named
4 | Four | Key
5 | Five | Key
6 | Six | Contributing
7 | Seven | Named
And the 'key' columns have ordering. Say Key > Named > Contributing.
Is there a way to make
SELECT * FROM table ORDER BY Key
return something to the effect of
_id|Name|Key
------------
4 | Four | Key
5 | Five | Key
1 | One | Named
3 | Three | Named
7 | Seven | Named
2 | Two | Contributing
6 | Six | Contributing
this?
SELECT _id, Name, Key
FROM my_table t
ORDER BY CASE WHEN key = 'Key' THEN 0
WHEN key = 'Named' THEN 1
WHEN key = 'Contributing' THEN 2 END, id;
If you have a lot of CASE's (or complicated set of conditions), Adam's solution may result in an extremely large query.
SQLite does allow you to write your own functions (in C++). You could write a function to return values similar to the way Adam does, but because you're using C++, you could work with a much larger set of conditions (or separate table, etc).
Once the function is written, you can refer to it in your SELECT as if it were a built-in function:
SELECT * FROM my_table ORDER BY MyOrder(Key)
Did you try (not tested on my side but relying on a technique I previously used):
ORDER BY KEY = "Key" DESC,
KEY = "Named" DESC,
KEY = "Contributing" DESC

How to properly group SQL results set?

SQL noob, please bear with me!!
I am storing a 3-tuple in a database (x,y, {signal1, signal2,..}).
I have a database with tables coordinates (x,y) and another table called signals (signal, coordinate_id, group) which stores the individual signal values. There can be several signals at the same coordinate.
The group is just an abitrary integer which marks the entries in the signal table as belonging to the same set (provided they belong to the same coordinate). So that any signals with the same 'coordinate_id' and 'group' together form a tuple as shown above.
For example,
Coordinates table Signals table
-------------------- -----------------------------
| id | x | y | | id | signal | coordinate_id | group |
| 1 | 1 | 2 | | 1 | 45 | 1 | 1 |
| 2 | 2 | 5 | | 2 | 95 | 1 | 1 |
| 3 | 33 | 1 | 1 |
| 4 | 65 | 1 | 2 |
| 5 | 57 | 1 | 2 |
| 6 | 63 | 2 | 1 |
This would produce the tuples (1,2 {45,95,33}), (1,2,{65,57}), (2,5, {63}) and so on.
I would like to retrieve the sets of {signal1, signal2,...} for each coordinate. The signals belonging to a set have the same coordinate_id and group, but I do not necessarily know the group value. I only know that if the group value is the same for a particular coordinate_id, then all those with that group form one set.
I tried looking into SQL GROUP BY, but I realized that it is for use with aggregate functions.
Can someone point out how to do this properly in SQL or give tips for improving my database structure.
SQLite supports the GROUP_CONCAT() aggregate function similar to MySQL. It rolls up a set of values in the group and concatenates them together comma-separated.
SELECT c.x, c.y, GROUP_CONCAT(s.signal) AS signal_list
FROM Signals s
JOIN Coordinates ON s.coordinate_id = c.id
GROUP BY s.coordinate_id, s.group
SQLite also permits the mismatch between columns in the select-list and columns in the group-by clause, even though this isn't strictly permitted by ANSI SQL and most implementations.
personally I would write the database as 3 tables:
x_y(x, y, id) coords_groups(pos, group, id) signals(group, signal)
with signals.group->coords_groups.id and coords_groups.pos->x_y.id
as you are trying to represent a sort-of 4 dimensional array.
then, to get from a couple of coordinates (X, Y) an ArrayList of List of Signal you can use this
SELECT temp."group", signals.signal
FROM (
SELECT cg."group", cg.id
FROM x_y JOIN coords_groups AS cg ON x_y.id = cg.pos
WHERE x_y.x=X AND x_y.y=Y )
AS temp JOIN signals ON temp.id=signals."group"
ORDER BY temp."group" ASC
(X Y are in the innermost where)
inside this sort of pseudo-code:
getSignalsGroups(X, Y)
ArrayList<List<Signals>> a
List<Signals> temp
query=sqlLiteExecute(THE_SQL_SNIPPET, x, y)
row=query.fetch() //fetch the first row to set the groupCounter
actualGroup=row.group
temp.add(row.signal)
for(row : query) //foreach row add the signal to the list
if(row.group!=actualGroup) //or reset the list if is a new group
a.add(actualGroup, temp)
actualGroup=row.group; temp= new List
temp.add(row.signal)
return a