Postgresql order by out of order - sql

I have a database where I need to retrieve the data as same order as it was populated in the table. The table name is bible When I type in table bible; in psql, it prints the data in the order it was populated with, but when I try to retrieve it, some rows are always out of order as in the below example:
table bible
-[ RECORD 1 ]-----------------------------------------------------------------------------------------------------------------------------------------
id | 1
day | 1
book | Genesis
chapter | 1
verse | 1
text | In the beginning God created the heavens and the earth.
link | https://api.biblia.com/v1/bible/content/asv.txt.txt?passage=Genesis1.1&key=dc5e2d416f46150bf6ceb21d884b644f
-[ RECORD 2 ]-----------------------------------------------------------------------------------------------------------------------------------------
id | 2
day | 1
book | John
chapter | 1
verse | 1
text | In the beginning was the Word, and the Word was with God, and the Word was God.
link | https://api.biblia.com/v1/bible/content/asv.txt.txt?passage=John1.1&key=dc5e2d416f46150bf6ceb21d884b644f
-[ RECORD 3 ]-----------------------------------------------------------------------------------------------------------------------------------------
id | 3
day | 1
book | John
chapter | 1
verse | 2
text | The same was in the beginning with God.
link | https://api.biblia.com/v1/bible/content/asv.txt.txt?passage=John1.2&key=dc5e2d416f46150bf6ceb21d884b644f
Everything is in order, but when I try to query the same thing using for example: select * from bible where day='1' or select * from bible where day='1' order by day or select * from bible where day='1' order by day, id;, I always get some rows out of order either in the day selected (here 1) or any other day.
I have been using Django to interfere with Postgres database, but since I found this problem, I tried to query using SQL, but nothing, I still get rows out of order, although they all have unique ids which I verified with select count(distinct id), count(id) from bible;
- [ RECORD 1 ]------------------------------------------------------------------------------------------------------
id | 1
day | 1
book | Genesis
chapter | 1
verse | 1
text | In the beginning God created the heavens and the earth.
link | https://api.biblia.com/v1/bible/content/asv.txt.txt?passage=Genesis1.1&key=dc5e2d416f46150bf6ceb21d884b644f
-[ RECORD 2 ]-----------------------------------------------------------------------------------------------------------------------------------------
id | 10
day | 1
book | Colossians
chapter | 1
verse | 18
text | And he is the head of the body, the church: who is the beginning, the firstborn from the dead; that in all things he might have the preemine
nce.
link | https://api.biblia.com/v1/bible/content/asv.txt.txt?passage=Colossians1.18&key=dc5e2d416f46150bf6ceb21d884b644f
-[ RECORD 3 ]-----------------------------------------------------------------------------------------------------------------------------------------
id | 11
day | 1
book | Genesis
chapter | 1
verse | 2
text | And the earth was waste and void; and darkness was upon the face of the deep: and the Spirit of God moved upon the face of the waters.
link | https://api.biblia.com/v1/bible/content/asv.txt.txt?passage=Genesis1.2&key=dc5e2d416f46150bf6ceb21d884b644f
As you could see above if you notice, the ids are out of order 1, 10, 11.
my table
Table "public.bible";
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------+------+-----------+----------+---------+----------+--------------+-------------
id | text | | | | extended | |
day | text | | | | extended | |
book | text | | | | extended | |
chapter | text | | | | extended | |
verse | text | | | | extended | |
text | text | | | | extended | |
link | text | | | | extended | |
Access method: heap
The id field is of type text because I used pandas's to_sql() method to populate the bible table. I tried to drop the id column and then I added it again as a PK with ALTER TABLE bible ADD COLUMN id SERIAL PRIMARY KEY; but I still get data return out of order.
Is there anyway I can retrieve the data with ordering with id, without having some of the rows totally out of order? Thank you in advance!

Thou shalt cast thy id to integer to order it as number.
SELECT * FROM bible ORDER BY cast(id AS integer);

While #jordanvrtanoski is correct, the way to do this is django is:
>>> Bible.objects.extra(select={'id': 'CAST(id AS INTEGER)'}).order_by('id').values('id')
<QuerySet [{'id': 1}, {'id': 2}, {'id': 3}, {'id': 10}, {'id': 20}]>
Side note: If you want to filter on day as an example, you can do this:
>>> Bible.objects.extra(select={
'id': 'CAST(id AS INTEGER)',
'day': 'CAST(day AS INTEGER)'}
).order_by('id').values('id', 'day').filter(day=2)
<QuerySet [{'id': 2, 'day': 2}, {'id': 10, 'day': 2}, {'id': 11, 'day': 2}, {'id': 20, 'day': 2}]>
Otherwise you get this issue: (notice 1 is followed by 10 and not 2)
>>> Bible.objects.order_by('id').values('id')
<QuerySet [{'id': '1'}, {'id': '10'}, {'id': '2'}, {'id': '20'}, {'id': '3'}]>
I HIGHLY suggest you DO NOT do any of this, and set your tables correctly (have the correct column types and not have everything as text), or your query performance is going to suck.. BIG TIME

Building on both answers of #jordanvrtanoski and #Javier Buzzi, and some search online, the issue is because the ids are of type TEXT (or VARCHAR too), so, you would need to cast the id to type INTEGER as in the following:
ALTER TABLE bible ALTER COLUMN id TYPE integer USING (id::integer);
Now here is my table
Table "public.bible"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------+---------+-----------+----------+-----------------------------------------+----------+--------------+-------------
id | integer | | | nextval('bible_id_seq'::regclass) | plain | |
day | text | | | | extended | |
book | text | | | | extended | |
chapter | text | | | | extended | |
verse | text | | | | extended | |
text | text | | | | extended | |
link | text | | | | extended | |
Indexes:
"lesson_unique_id" UNIQUE CONSTRAINT, btree (id)
Referenced by:
TABLE "notes_note" CONSTRAINT "notes_note_verse_id_5586a4bf_fk" FOREIGN KEY (verse_id) REFERENCES days_lesson(id) DEFERRABLE INITIALLY DEFERRED
Access method: heap
Hope this helps other people, and thank you everyone!

Related

Calculate Equation From Seperate Tables Data

I'm working on my senior High School Project and am reaching out to the community for help! (As my teacher doesn't know the answer to my question).
I have a simple "Products" table as shown below:
I also have a "Orders" table shown below:
Is there a way I can create a field in the "Orders" table named "Total Cost", and make that automaticly calculate the total cost from all the products selected?
Firstly, I would advise against storing calculated values, and would also strongly advise against using calculated fields in tables. In general, calculations should be performed by queries.
I would also strongly advise against the use of multivalued fields, as your images appear to show.
In general, when following the rules of database normalisation, most sales databases are structured in a very similar manner, containing with the following main tables (amongst others):
Products (aka Stock Items)
Customers
Order Header
Order Line (aka Order Detail)
A good example for you to learn from would be the classic Northwind sample database provided free of charge as a template for MS Access.
With the above structure, observe that each table serves a purpose with each record storing information pertaining to a single entity (whether it be a single product, single customer, single order, or single order line).
For example, you might have something like:
Products
Primary Key: Prd_ID
+--------+-----------+-----------+
| Prd_ID | Prd_Desc | Prd_Price |
+--------+-----------+-----------+
| 1 | Americano | $8.00 |
| 2 | Mocha | $6.00 |
| 3 | Latte | $5.00 |
+--------+-----------+-----------+
Customers
Primary Key: Cus_ID
+--------+--------------+
| Cus_ID | Cus_Name |
+--------+--------------+
| 1 | Joe Bloggs |
| 2 | Robert Smith |
| 3 | Lee Mac |
+--------+--------------+
Order Header
Primary Key: Ord_ID
Foreign Keys: Ord_Cust
+--------+----------+------------+
| Ord_ID | Ord_Cust | Ord_Date |
+--------+----------+------------+
| 1 | 1 | 2020-02-16 |
| 2 | 1 | 2020-01-15 |
| 3 | 2 | 2020-02-15 |
+--------+----------+------------+
Order Line
Primary Key: Orl_Order + Orl_Line
Foreign Keys: Orl_Order, Orl_Prod
+-----------+----------+----------+---------+
| Orl_Order | Orl_Line | Orl_Prod | Orl_Qty |
+-----------+----------+----------+---------+
| 1 | 1 | 1 | 2 |
| 1 | 2 | 3 | 1 |
| 2 | 1 | 2 | 1 |
| 3 | 1 | 1 | 4 |
| 3 | 2 | 3 | 2 |
+-----------+----------+----------+---------+
You might also opt to store the product description & price on the order line records, so that these are retained at the point of sale, as the information in the Products table is likely to change over time.

How should I design a table where a row can have different columns depending on the type of row?

I'm planning to use the Reddit API and store my saved posts in a database. The saves can be of two types - Comments or Posts, both of them have few common columns - author, score, subreddit etc. and a few columns unique to each category:
comment - body_text, comment_id, parent_id, etc.
posts - selftext,link_url,is_video, etc.
I decided to separate the 2 categories into their own tables - Comments table and Posts table. But I don't know how to link these tables to the master table "saves".
My current solution is to have a column kind for the type of save. The comment_id and post_id link the save to its own table. However, this feels like a messy solution and a bit cumbersome. A save can either have a comment_id or a link_id (but not both or neither), and I also have to manage this constraint.
Saves Table :
+----+---------+-------+---------------------------------------------+---------+------------+---------+
| ID | Kind | title | post_url | author | comment_id | post_id |
+----+---------+-------+---------------------------------------------+---------+------------+---------+
| 1 | comment | abc | https://redd.i/redditpostid/redditcommentid | FusionX | 1 | NULL |
| 2 | post | xyz | https://redd.i/redditpostid | XnoisuF | NULL | 1 |
+----+---------+-------+---------------------------------------------+---------+------------+---------+
Post Table :
+----+---------+-------------------------------------------+-----------------------+--------------+--------------+
| ID | is_self | selftext | post_url | num_comments | thumbnail |
+----+---------+-------------------------------------------+-----------------------+--------------+--------------+
| 1 | no | NULL | i.imgur.com/xyz.jpg | 1020 | someimageurl |
| 2 | yes | "some random selftext of variable length" | redd.it/redditpostid/ | 10 | |
+----+---------+-------------------------------------------+-----------------------+--------------+--------------+
Comment table:
+----+---------------------------------+---------------------+--------------------+
| ID | body_html | reddit_comment_id | reddit_parent_id |
+----+---------------------------------+---------------------+--------------------+
| 1 | comment text of variable length | <reddit comment id> | <reddit parent id> |
+----+---------------------------------+---------------------+--------------------+
(reddit ID's are different from my table's own IDs and are only relevant at reddit's end)
Is there a better way to design this database?
I think you should move the owning side of the relation to the two other tables.
So instead of having comment_id and post_id columns in saves table, have a saves_id column in post table and comment table.

Adding a field to differentiate parts of tables

I have several gigabites of arducopter binary flight logs. Each log is a series of messages.
MessageType1: param1, param2, param3
MessageType2: param3, param4, param5, param6
...
The logs are self describing in the sense that the first time a message appears in the log it tells what are the names of the params.
MessageType1: timestamp, a, b
MessageType1: value 1, value 2, value 3
MessageType2: timestamp, c, d, e
MessageType1: value 4, value 5, value 6
MessageType1: value 7, value 8, value 9
MessageType2: value 10, value 11, value 12, value 13
I have written a python script that takes the logs apart and creates tables for each message type in a sqlite database where the message type is the table name and the parameter name is the column name.
Table MessageType1
| Flight Index | Timestamp | a | b |
|--------------|-----------|-------|---------|
| ... | | | |
| "Flight 1" | 111 | 14725 | 10656.0 |
| "Flight 1" | 112 | 57643 | 10674.0 |
| "Flight 1" | 113 | 57157 | 13674.0 |
| ... | | | |
| "Flight 2" | 111 | 56434 | 16543.7 |
| "Flight 2" | 112 | 56434 | 16543.7 |
Table MessageType2
| Flight Index | Timestamp | c | d | e |
|--------------|-----------|-------|---------|--------|
| ... | | | | |
| "Flight 1" | 111 | 14725 | 10656.0 | 462642 |
| "Flight 1" | 112 | 57643 | 10674.0 | 426428 |
| "Flight 1" | 113 | 57157 | 13674.0 | 642035 |
| ... | | | | |
| "Flight 2" | 111 | 56434 | 16543.7 | 365454 |
| "Flight 2" | 112 | 56434 | 16543.7 | 754632 |
| ... | | | | |
For a single log this database is good enough but i would like to add several logs. Meaning messages of several logs of same type go into a single table.
In this case I added a column "Flight Index" which is what I would like to have but:
Each log processed should have a unique identifier
The identifier should be minimal in size, as im dealing with tables that have possibly millions of rows.
Im thinking of adding the flight index as an integer and just iterating the number when processing logs and if the database exists taking the last row of a table and using its index + 1. Is this optimal or is there a SQL native way of operating?
Am i doing something wrong in general as I'm not experienced with SQL?
EDIT: added a second table to show that messages dont have the same number of parameters and example messages.
You can achieve this with two tables
Table 1
Flights
Flight name, Flight number, date, device, etc. (any other data points make sense)
"Flight 1", 1, 1/1/2018,...
"Flight 2", 2, 1/2/2018,...
Table 2
Flight_log
Flight_number, timestamp, parameter1, parameter2,
1,111,14725,10656.0
1,112,57643,10674.0
1,113,57157,13674.0
...
2,111,56434,16543.7
2,112,56434,16543.7
Before you load Flight_logs table you should have an entry in Flights table, you can do a "lookup" do get the Flight_number from Flight table
After reading about data normalization I ended up with the following database.
This minimizes the number of tables. I could have done 35 tables (one for each message) and right parameters for each column, but that would make the database more fragile in the case where the parameters in a message are changed.
EDIT: replaced the image as datamodler got fixed.

Query M:N contains

I am trying to filter a set of tables that includes an M:N junction table in Android Room (SQLite).
An image can have many subjects. I'd like to allow filtering by a subject, so that I get a row with complete image information (including all subjects). So if an image had (National Park, Yosemite) filtering for either would result in one row with both keywords. Unless I messed something up, a typical join will result in multiple rows such that matching Yosemite would get the right image, but you'd be lacking National Park. I came up with this:
SELECT *,
(SELECT GROUP_CONCAT(name)
FROM meta_subject_junction
JOIN subject
ON subject.id = meta_subject_junction.subjectId
WHERE meta_subject_junction.metaId = meta.id) AS keywords,
(SELECT documentUri
FROM image_parent
WHERE meta.parentId = image_parent.id ) AS parentUri
FROM meta
Now this gets me the complete rows, but I think at this point I'd need to:
WHERE keywords LIKE(%YOSEMITE%)
and I think the LIKE is less than ideal, not to mention an imprecise match. Is there a better way to accomplish this? Thanks, this is bending my novice SQL brain.
Further details
meta
+----+----------+--+
| id | name | |
+----+----------+--+
| 1 | yosemite | |
| 2 | bryce | |
| 3 | flowers | |
+----+----------+--+
subject
+----+---------------+--+
| id | name | |
+----+---------------+--+
| 1 | National Park | |
| 2 | Yosemite | |
| 3 | Tulip | |
+----+---------------+--+
junction
+--------+-----------+
| metaId | subjectId |
+--------+-----------+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 3 | 3 |
+--------+-----------+
Although I may have done something wrong, as far as I can tell Android Room doesn't like:
+----+-----------+---------------+
| id | name | subject |
+----+-----------+---------------+
| 1 | yosemite | National Park |
| 1 | yosemite | Yosemite |
+----+-----------+---------------+
so I'm trying to reduce the rows:
+----+-----------+-------------------------+
| id | name | subject |
+----+-----------+-------------------------+
| 1 | yosemite | National Park, Yosemite |
+----+-----------+-------------------------+
which the above query does. However, I also want to query for a subject. So that National Park filter will yield:
+----+-----------+-------------------------+
| id | name | subject |
+----+-----------+-------------------------+
| 1 | yosemite | National Park, Yosemite |
| 2 | bryce | National Park |
+----+-----------+-------------------------+
I'd like to be more precise/efficient than LIKE with the already 'concat' subject. Most of my attempts end up with no results in Room (multi-row) or reducing the subject to only the filter keyword.
Update
Here's a test I've been using to compare the actual SQL results from a query to what Android Room ends up with:
http://sqlfiddle.com/#!7/0ac11/10/0
That join query is interpreted as four objects in Android Room, so I'm trying to reduce the rows, but retain the full subject results while filtering for any image containing the subject keyword.
If you want multiple keywords, then where and group by and having can be used:
select image_id
from image_subject
where subject_id in ('a', 'b', 'c') -- whatever
group by image-id
having count(distinct subject_id) = 3; -- same count as in `where`
This gets the result I need, though I'd love to hear a better option if this is particularly inefficient.
SELECT meta.*,
(SELECT GROUP_CONCAT(name)
FROM junction
JOIN subject
ON subject.id = junction.subjectId
WHERE junction.metaId = meta.id) AS keywords,
junction.subjectId
FROM meta
LEFT JOIN junction ON junction.metaId = meta.id
WHERE subjectId IN (1,2)
GROUP BY meta.id
+----+----------+------------------------+-----------+
| id | name | keywords | subjectId |
+----+----------+------------------------+-----------+
| 1 | yosemite | National Park,Yosemite | 2 |
| 2 | bryce | National Park | 1 |
+----+----------+------------------------+-----------+
http://sqlfiddle.com/#!7/86a76/13

Primary key auto-increment manipulation

Is there any way to have a primary key with a feature that increments it but fills in gaps? Assuming I have the following table:
____________________
| ID | Value |
| 1 | A |
| 2 | B |
| 3 | C |
^^^^^^^^^^^^^^^^^^^^^
Notice that the value is only an example, the order has nothing to do with the question.
Once I remove the row with the ID of 2 (the table will look like this):
____________________
| ID | Value |
| 1 | A |
| 3 | C |
^^^^^^^^^^^^^^^^^^^^^
And I add another row, with regular auto-increment feature it will look like this:
____________________
| ID | Value |
| 1 | A |
| 3 | C |
| 4 | D |
^^^^^^^^^^^^^^^^^^^^^
As expected.
The output I'd want would be:
____________________
| ID | Value |
| 1 | A |
| 2 | D |
| 3 | C |
^^^^^^^^^^^^^^^^^^^^^
Where the gap is filled with the new row. Also note that maybe, in memory, it would look different. But the point is that the primary key would fill the gaps.
When having the primary keys (for instance) 1, 2, 3, 6, 7, 10, 11, 4 should be first filled in, then 5, 8 and so on... When the table is empty (even if it had a million of rows before) it should start over from 1.
How do I accomplish that? Is there any built-in feature similar to that? Can I implement it?
EDIT: If it's not possible, why not?
No, you don't want to do that, as juergen-d said. It's unlikely to do what you think it is doing, and it will do it even less in a multi-user environment.
In a multiuser environment you are likely to get voids even when there are no deletes, just from aborted inserts.