SQL Relational Ordering Based on User Ranking - sql

I am using Oracle 11g. The rankings are not static. They change based on how a user ranks the items.
I would like to keep a live record of rankings 1-10 using both a Rank (1-10) column and the ID of the object. So that a user can continually rate a collection of objects and the rankings stay correct. I've found decent a decent workaround below but there are problems with gaps:
// REMOVAL
UPDATE ... SET orderingID=NULL WHERE orderingID=removedID
UPDATE ... SET orderingID=orderingID-1 WHERE orderingID > removedID
// INSERTION
UPDATE ... SET orderingID=orderingID+1 WHERE orderingID > insertionID
UPDATE ... SET orderID=insertionID WHERE ID=addedID
If I have a table as such:
RANK | OBJECT
1 | AA
6 | BB
Using the above alogithm I ran into problems with gaps in the number. For example, if I insert a 2 i get:
RANK | OBJECT
1 | AA
2 | DD
7 | BB
The 6 becomes a 7 when it should stay at 6. Is there a way to do this in SQL using update statements or a store procedure ?

Related

Why postgres returns unordered data in select query, after updation of row?

I am bit confused over default ordering of the rows returned by postgres.
postgres=# select * from check_user;
id | name
----+------
1 | x
2 | y
3 | z
4 | a
5 | c1\
6 | c2
7 | c3
(7 rows)
postgres=# update check_user set name = 'c1' where name = 'c1\';
UPDATE 1
postgres=# select * from check_user;
id | name
----+------
1 | x
2 | y
3 | z
4 | a
6 | c2
7 | c3
5 | c1
(7 rows)
Before any updation, it was returning rows ordered by id, but after updation, the order has changed. So my question is that if order by is not specified, what default ordering does postgres uses ?
Thanks in advance.
Put very simply the "default order" is whatever it happens to read from the disk. Updating a row will not change the row in place... Usually it marks the old row as deleted and writes a new one.
When postgres reads rows from pages of memory, it will (probably) read them in the order they are stored on the page. It will read pages in whatever order it thinks is quickest (that may or may not be how they appear on disk). It can change based on whether or not it decides to use an index. So it can suddenly change without your app asking for anything different.
If you don't specify an order by it will not take any action to re-order them.
NEVER rely on the default order. It is undefined behaviour.
SQL tables represent unordered sets.
SQL results sets are unordered unless you explicitly include an order by.
Your select has no order by. Hence, the rows can come back in any order. Even running the same query twice can produce different orders.

django database design when you will have too many rows

I have a django web app with postgres db; the general operation is that every day I have an array of values that need to be stored in one of the tables.
There is no foreseeable need to query the values of the array but need to be able to plot the values for a specific day.
The problem is that this array is pretty big and if I were to store it in the db, I'd have 60 million rows per year but if I store each row as a blob object, I'd have 60 thousand rows per year.
Is is a good decision to use a blob object to reduce table size when you do not want to query with the row of values?
Here are the two options:
option1: keeping all
group(foreignkey)| parent(foreignkey) | pos(int) | length(int)
A | B | 232 | 45
A | B | 233 | 45
A | B | 234 | 45
A | B | 233 | 46
...
option2: collapsing the array into a blob:
group(fk)| parent(fk) | mean_len(float)| values(blob)
A | B | 45 |[(pos=232, len=45),...]
...
so I do NOT want to query pos or length but I want to query group or parent.
An example of read query that I'm talking about is:
SELECT * FROM "mytable"
LEFT OUTER JOIN "group"
ON ( "group"."id" = "grouptable"."id" )
ORDER BY "pos" DESC LIMIT 100
which is a typical django admin list_view page main query.
I tried loading the data and tried displaying the table in the django admin page without doing any complex query (just a read query).
When I get pass 1.5 millions rows, the admin page freezes. All it takes is a some count query on that table to cause the app to crash so I should definitely either keep the data as a blob or not keep it in the db at all and use the filesystem instead.
I want to emphasize that I've used django 1.8 as my test bench so this is not a postgres evaluation but rather a system evaluation with django admin and postgres.

How can I perform a similar UPDATE-WHERE statement on SparkSQL2.0?

How can I implement an SQL query like this, in SparkSQL 2.0 using DataFrames and Scala language? I've read a lot of posts but none of them seems to achieve what I need, or if you can point me one, would do. Here's the problem:
UPDATE table SET value = 100 WHERE id = 2
UPDATE table SET value = 70 WHERE id = 4
.....
Suppose that you have a table table with two columns like this:
id | value
--- | ---
1 | 1
2 | null
3 | 3
4 | null
5 | 5
Is there a way to implement the above query, using map, match cases, UDFs or if-else statements? The values that I need to store in the value field are not sequential, so I have specific values to put there. I'm aware too that it is not possible to modify a immutable data when dealing with DataFrames. I have no code to share because I can't get it to work nor reproduce any errors.
Yes you can, it's very simple. You can use when and otherwise.
val pf = df.select($"id", when($"id" === 2, lit(100)).otherwise(when($"id" === 4, lit(70)).otherwise($"value")).as("value"))

Pulling items out of a DB with weighted chance

Let's say I had a table full of records that I wanted to pull random records from. However, I want certain rows in that table to appear more often than others (and which ones vary by user). What's the best way to go about this, using SQL?
The only way I can think of is to create a temporary table, fill it with the rows I want to be more common, and then pad it with other randomly selected rows from the table. Is there a better way?
One way I can think of is to create another column in the table which is a rolling sum of your weights, then pull your records by generating a random number between 0 and the total of all your weights, and pull the row with the highest rolling sum value less than the random number.
For example, if you had four rows with the following weights:
+---+--------+------------+
|row| weight | rollingsum |
+---+--------+------------+
| a | 3 | 3 |
| b | 3 | 6 |
| c | 4 | 10 |
| d | 1 | 11 |
+---+--------+------------+
Then, choose a random number n between 0 and 11, inclusive, and return row a if 0<=n<3, b if 3<=n<6, and so on.
Here are some links on generating rolling sums:
http://dev.mysql.com/tech-resources/articles/rolling_sums_in_mysql.html
http://dev.mysql.com/tech-resources/articles/rolling_sums_in_mysql_followup.html
I don't know that it can be done very easily with SQL alone. With T-SQL or similar, you could write a loop to duplicate rows, or you can use the SQL to generate the instructions for doing the row duplication instead.
I don't know your probability model, but you could use an approach like this to achieve the latter. Given these table definitions:
RowSource
---------
RowID
UserRowProbability
------------------
UserId
RowId
FrequencyMultiplier
You could write a query like this (SQL Server specific):
SELECT TOP 100 rs.RowId, urp.FrequencyMultiplier
FROM RowSource rs
LEFT JOIN UserRowProbability urp ON rs.RowId = urp.RowId
ORDER BY ISNULL(urp.FrequencyMultiplier, 1) DESC, NEWID()
This would take care of selecting a random set of rows as well as how many should be repeated. Then, in your application logic, you could do the row duplication and shuffle the results.
Start with 3 tables users, data and user-data. User-data contains which rows should be prefered for each user.
Then create one view based on the data rows that are prefered by the the user.
Create a second view that has the none prefered data.
Create a third view which is a union of the first 2. The union should select more rows from the prefered data.
Then finally select random rows from the third view.

How should I go about implementing an "autonumber" field in SQL Server 2005?

I'm aware of IDENTITY fields but I have a feeling that I couldn't use one to solve my problem.
Let's say I have multiple clients. Each client has multiple orders. Each client needs to have their orders numbered sequentially, specific to them.
Example table structure:
Orders:
OrderID | ClientID | ClientOrderID | etc...
Some example rows for this table would be:
OrderID | ClientID | ClientOrderID | etc...
1 | 1 | 1 | ...
2 | 1 | 2 | ...
3 | 2 | 1 | ...
4 | 3 | 1 | ...
5 | 1 | 3 | ...
6 | 2 | 2 | ...
I know the naive way would be to take the MAX ClientOrderID for any client and use that value for INSERTs but that would be subject to concurrency issues. I was considering using a transaction but I'm not quite sure what the broadest isolation scope that can be used for this. I'll be using LINQ to SQL but I have feeling that isn't relevant.
Somebody correct me if I'm wrong, but as long as your MAX() call is in the same step as your insert, you won't have a problem with concurrency.
So, you could not do
select #newOrderID=max(ClientOrderID) + 1
from orders
where clientid=#myClientID;
insert into ( ClientID, ClientOrderID, ...)
values( #myClientID, #newOrderID, ...);
But you can do
insert into ( ClientID, ClientOrderID, ...)
select #myClientID, max(ClientOrderID) + 1, ...
from orders
where clientid=#myClientID;
I'm assuming OrderID is an identity column.
Again, if I'm incorrect on this, please let me know. Preferably with a URL
You could use a Repository pattern to handle your Orders and let it control the number of each specific clients order number. If you implement the OrderRepository correctly it could control the concurrency and number the order before saving it to the database (let the repository and not the db set the number).
Repository pattern: http://martinfowler.com/eaaCatalog/repository.html
One possibility (though I don't like to do this) is to have a lookup table that would tell you the greatest Order Number given for each vendor. Inside of a transaction, you'd fetch the most recent one from VendorOrderNumber, save your new order, increment the value in VendorOrderNumber, commit transaction.
This is an odd way to store data, but assuming you need it, there is nothing built-in that you can use.
Your suggestion of Max(ClientOrderID) is straight forward and pretty easy to implement (follow John MacIntyre's advice). It will probably work acceptably well on tables with a few thousand orders. As the table grows this approach will of course slow down.
Nick DeVore's suggestion of a lookup table is a little messier to implement but won't substantially be affected by data growth.
Depending on where/when you actually need the ClientOrderID, you could calculate the id when needed like this:
SELECT *,
ROW_NUMBER() OVER(ORDER BY OrderID) AS ClientOrderID
FROM Orders
WHERE ClientID = 1
This assumes that the ClientOrderIDs are in the same sequence as the OrderID. Without actually persisting the ID, it is awkward to use as a key to anything else. This approach should not be affected by data growth.