Most efficient ordering post database design - sql

I have posts table that is has post_order column. I store order of each post in it. when I change the order of a row from 25 to 15, I should update all the row from 15 to end.
It's good for few rows, But in thousands rows it's worst.
Is there any better design for ordering posts, that is more efficient?

Why not change with related order instead of all the way down from 15? Lets say you have a table like this:
Post Post_Order
----------------------
x 1
y 2
z 3
. .
. .
t 10
if you want to change t to be the first post, you can change t's post_order to 1 and set row which has order 1 (x) to the value you selected first (10).

You can use the old BASIC trick (from the time BASIC still had line numbers), of leaving gaps.
For example (shamelessly copied from Kuzgun's answer):
x 10
y 20
z 30
. .
. .
t 100
Then moving t to, say, second place would involve updating just one row:
x 10
t 15
y 20
z 30
. .
. .
Of course, you'll still need to move more than one row from time to time (when they "bunch up" too much), but this should be relatively rare (and you can make initial gaps larger if that becomes a problem).
Alternatively, just continue doing what you're doing.
Unless reordering thousands of items is really frequent, modern DBMS on modern hardware shouldn't have much trouble with it - just be careful to do it from one command, for example...
UPDATE POST
SET POST_ORDER = POST_ORDER + 1
WHERE POST_ORDER > 1 -- AND other criteria
...instead of issuing a separate UPDATE for each row.

Related

A more efficient way to sum the difference between columns in postgres?

For my application I have a table with these three columns: user, item, value
Here's some sample data:
user item value
---------------------
1 1 50
1 2 45
1 23 35
2 1 88
2 23 44
3 2 12
3 1 27
3 5 76
3 23 44
What I need to do is, for a given user, perform simple arithmetic against everyone else's values.
Let's say I want to compare user 1 against everyone else. The calculation looks something like this:
first_user second_user result
1 2 SUM(ABS(50-88) + ABS(35-44))
1 3 SUM(ABS(50-27) + ABS(45-12) + ABS(35-44))
This is currently the bottleneck in my program. For example, many of my queries are starting to take 500+ milliseconds, with this algorithm taking around 95% of the time.
I have many rows in my database and it is O(n^2) (it has to compare all of user 1's values against everyone else's matching values)
I believe I have only two options for how to make this more efficient. First, I could cache the results. But the resulting table would be huge because of the NxN space required, and the values need to be relatively fresh.
The second way is to make the algorithm much quicker. I searched for "postgres SIMD" because I think SIMD sounds like the perfect solution to optimize this. I found a couple related links like this and this, but I'm not sure if they apply here. Also, they seem to both be around 5 years old and relatively unmaintained.
Does Postgres have support for this sort of feature? Where you can "vectorize" a column or possibly import or enable some extension or feature to allow you to quickly perform these sorts of basic arithmetic operations against many rows?
I'm not sure where you get O(n^2) for this. You need to look up the rows for user 1 and then read the data for everyone else. Assuming there are few items and many users, this would be essentially O(n), where "n" is the number of rows in the table.
The query could be phrased as:
select t1.user, t.user, sum(abs(t.value - t1.value))
from t left join
t t1
on t1.item = t.item and
t1.user <> t.user and
t1.user = 1
group by t1.user, t.user;
For this query, you want an index on t(item, user, value).

How to select next rows from any row in a table

I have a table. In that table there is a column named CIS. This CIS column contains a unique number in each row. I want to get 50 rows from a specific row captured via CIS number.
For example lets say i have the following table
CIS MODEL
--- -----
123 1
212 2
213 3
325 4
452 3
. .
. .
. .
841 4
And i just have a CIS number nothing else more. Let say my CIS number is 212. I want to get next 50 rows from the row with the CIS number 212. How can i do that?
Does this do what you want?
select top 50 t.*
from table t
where cis > 212
order by cis;
It assumes that "next" means the next rows ordered by the CIS number.
Based on your question and comment, it sounds like you want to get the next 50 rows based on the order those rows were inserted in the table. As others have suggested, SQL Server does not have a method for retrieving by insert order. A SELECT query with no ORDER BY does not retrieve data in any particular order, even though it might seem to. If you are interested in why that is, you may want to review the post at https://stackoverflow.com/a/10064571/4656137
Since you want to know what is next and there is no concept of insert order, SQL needs to know what order to evaluate the rows to provide what you are looking for. One option might be if you have an auto-incrementing key column on that table, you could try to order by that (using Gordan's example).
Note: I would have simply commented on Gordon's response, but I don't have enough reputation to comment on others answers yet.
Hopefully this helps some.

What is the best way to reassign ordinal number of a move operation

I have a column in the sql server called "Ordinal" that is used to indicate the display order of the rows. It starts from 0 and skips 10 for the next row. so we have something like this:
Id Ordinal
1 0
2 20
3 10
It skips 10 because we wanted to be able to move item in between items (based on ordinal) without having to reassign ordinal number for the entire table.
As you can imagine eventually, Ordinal number will need to be reassign somehow for a move in between operation either on surrounding rows or for the entire table as the unused ordinal numbers between the target items are all used up.
Is there any algorithm that I can use to effectively reorder the ordinal number for the move operation taken in the consideration like long term maintainability of the table and minimizing update operations of the table?
You can re-number the sequences using a somewhat complicated UPDATE statement:
UPDATE u
SET u.sequence = 10 * (c.num_below-1)
FROM test u
JOIN (
SELECT t.id, count(*) AS num_below
FROM test t
JOIN test tr ON tr.sequence <= t.sequence
GROUP BY t.id
) c ON c.id=u.id
The idea is to obtain a count of items with the sequence lower than that of the current row, multiply the count by ten, and assign it as the new count.
The content of test before the UPDATE:
ID Sequence
__ ________
1 0
2 10
3 20
4 12
The content of test after the UPDATE:
ID Sequence
__ ________
1 0
2 30
3 10
4 20
Now the sequence numbers are evenly spread again, so you can continue inserting in the middle until you run out of new sequence numbers; then you can re-number again.
Demo.
These won't answer your question directly--I just thought I might suggest some other approaches:
One possibility--don't try to do it by hand. Have your software manage the numbers. If they need re-writing, just save them with new numbers.
a second--use a "Linked List" instead. In each record store the index of the next record you want displayed, then have your code load that directly into a linked list.
Yet another simple approach. Let's say you're inserting a new record with an ordinal equal x.
First, check if there's a row having ordinal value equal x. In case there's one, just update all the records having the ordinal value equal or bigger than x increasing them by y. Then, you are safe to insert a new record.
This way you're sure you'll not run update every time and of course, you'll keep the order.

Getting fields that take less than certain distinct values

IF I have data with two columns feature and feature_value, just like the example data set below
feature feature_value
X 1
X 1
X 2
Y 7
Y 8
Y 9
Z 100
and I want to get only feature,feature_value columns for features that have less than 3 distinct values (in this case only columns having X and Z), what is the efficient way to do it? Using Count(Distinct) and applying the where condition or is there any faster way?
Please note that this answer uses generic SQL, and since your question isn't clear as to why you'd only get the "X" and "Y" records when you want those that appear 3 or less, I've also taken the liberty of understanding your answer to mean you're looking only for the features that appear less than three times in the feature column, per your question saying you're looking for "columns [sic] for features that have less than 3 distinct values." If you meant 3 or more, you can easily adjust that in the below subquery by changing < to >=.
Subquery to GROUP BY feature and get the count, then select only those records.
SELECT * FROM my WHERE feature IN
(SELECT feature
FROM my
GROUP BY feature
HAVING COUNT(*) < 3)
;
http://sqlfiddle.com/#!2/29c05/1

Index structure to maximize speed across any combination of index columns

I have a database with about five possible index columns, all of which are useful in different ways. Let's call them System, Source, Heat, Time, and Row. Using System and Row together will make a unique key, and if sorted by System-Row the database will also be sorted for any combination of the five index variables (in the order I listed them above).
My problem is that I use all combinations of these columns: sometimes I want to JOIN each System-Row to the next System-(Row+1), sometimes I want to GROUP or WHERE by System-Source-Heat, sometimes I want to look at all entries of System-Source WHERE Time is in a specific window, etc.
Basically, I want an index structure that functions similarly to every possible permutation of those five indexes (in the correct order, of course), without actually making every permutation (although I am willing to do so if necessary). I'm doing statistics / analytics, not traditional database work, so the size of the index and speed of creating / updating it is not a concern; I only care about speeding my improvised queries as I tend to think them up, run them, wait 5-10 minutes, and then never use them again. Thus my main concern is reducing the "wait 5-10 minutes" to something more like "wait 1-2 minutes."
My sorted data would look something like this:
Sys So H Ti R
1 1 0 .1 1
1 1 1 .2 2
1 1 1 .3 3
1 1 2 .3 4
1 2 0 .5 5
1 2 0 .6 6
1 2 1 .8 7
1 2 2 .8 8
EDIT: It may simplify things a bit that System virtually always needs to be included as the first column to make any of the other 4 columns in sorted order.
If you are ONLY concerned with SELECT speed and don't care about INSERT, then you can materialize ALL the combinations as INDEXED views. You only need 24 times the storage of the original table, making one table and 23 INDEXED VIEWs of 5 columns each.
e.g.
create table data (
id int identity primary key clustered,
sys int,
so int,
h float,
ti datetime,
r int);
GO
create view dbo.data_v1 with schemabinding as
select sys, so, h, ti, r
from dbo.data;
GO
create unique clustered index cix_data_v1 on data_v1(sys, h, ti, r, so)
GO
create view dbo.data_v2 with schemabinding as
select sys, so, h, ti, r
from dbo.data;
GO
create unique clustered index cix_data_v2 on data_v2(sys, ti, r, so, h)
GO
-- and so on and so forth, keeping "sys" anchored at the front
Do note, however
Q. Why isn't my indexed view being picked up by the query optimizer for use in the query plan? (search within linked article)
If space IS an issue, then the next best thing is to create individual indexes on each of the 4 columns, leading with system, i.e. (sys,ti), (sys,r) etc. These can be used together if it will help the query, otherwise it will revert to a full table scan.
Sorry for taking a while to get back to this, I had to work on something else for a few weeks. Anyway, after trying a bunch of things (including everything suggested here, even the brute-force "make an index for every permutation" method), I haven't found any indexing method that significantly improves performance.
However, I HAVE found an alternate, non-indexing solution: selecting only the rows and columns I'm interested in into intermediary tables, and then working with those instead of the complete table (so I use about 5 mil rows of 6 cols instead of 30 mil rows of 35 cols). The initial select and table creation is a bit slow, but the steps after that are so much faster I actually save time even if I only run it once (and considering how often I change things, it's usually much more than once).
I have a suspicion that the reason for this vast improvement will be obvious to most SQL users (probably something about pagefile size), and I apologize if so. My only excuse is that I'm a statistician trying to teach myself how to do this as I go, and while I'm pretty decent at getting what I want done to happen (eventually), my understanding of the mechanics of how it's being done are distressingly close to "it's a magic black box, don't worry about it."