Is there a single query that can update a "sequence number" across multiple groups? - sql

Given a table like below, is there a single-query way to update the table from this:
| id | type_id | created_at | sequence |
|----|---------|------------|----------|
| 1 | 1 | 2010-04-26 | NULL |
| 2 | 1 | 2010-04-27 | NULL |
| 3 | 2 | 2010-04-28 | NULL |
| 4 | 3 | 2010-04-28 | NULL |
To this (note that created_at is used for ordering, and sequence is "grouped" by type_id):
| id | type_id | created_at | sequence |
|----|---------|------------|----------|
| 1 | 1 | 2010-04-26 | 1 |
| 2 | 1 | 2010-04-27 | 2 |
| 3 | 2 | 2010-04-28 | 1 |
| 4 | 3 | 2010-04-28 | 1 |
I've seen some code before that used an # variable like the following, that I thought might work:
SET #seq = 0;
UPDATE `log` SET `sequence` = #seq := #seq + 1
ORDER BY `created_at`;
But that obviously doesn't reset the sequence to 1 for each type_id.
If there's no single-query way to do this, what's the most efficient way?
Data in this table may be deleted, so I'm planning to run a stored procedure after the user is done editing to re-sequence the table.

You can use another variable storing the previous type_id (#type_id). The query is ordered by type_id, so whenever there is a change in type_id, sequence has to be reset to 1 again.
Set #seq = 0;
Set #type_id = -1;
Update `log`
Set `sequence` = If(#type_id=(#type_id:=`type_id`), (#seq:=#seq+1), (#seq:=1))
Order By `type_id`, `created_at`;

I don't know MySQL very well, but you could use a sub query though it may be very slow.
UPDATE 'log' set 'sequence' = (
select count(*) from 'log' as log2
where log2.type_id = log.type_id and
log2.created_at < log.created_at) + 1
You'll get duplicate sequences, though, if two type_ids have the same created_at date.

Related

How to add data or change schema to production database

I am new to working with databases and I want to make sure I understand the best way to add or remove data from a database without making a mess of any related data.
Here is a scenario I am working with:
I have a Tags table, with an Identity ID column. The Tags can be selected via the web application to categorize stories that are submitted by a user. When the database was first seeded; like tags were seeded in order together. As you can see all the Campuses (cities) were 1-4, the Colleges (subjects) are 5-7, and Populations are 8-11.
If this database is live in production and the client wants to add a new Campus (City) tag, what is the best way to do this?
All the other city tags are sort of organized at the top, it seems like the only option is to insert any new tags at to bottom of the table, where they will end up taking whatever the next ID available is. I suppose this is fine because the Display category column will allow us to know which categories these new tags actually belong to.
Is this typical? Is there better ways to set up the database or handle this situation such that everything remains more organized?
Thank you
+----+------------------+---------------+-----------------+--------------+--------+----------+
| ID | DisplayName | DisplayDetail | DisplayCategory | DisplayOrder | Active | ParentID |
+----+------------------+---------------+-----------------+--------------+--------+----------+
| 1 | Albany | NULL | 1 | 0 | 1 | NULL |
| 2 | Buffalo | NULL | 1 | 1 | 1 | NULL |
| 3 | New York City | NULL | 1 | 2 | 1 | NULL |
| 4 | Syracuse | NULL | 1 | 3 | 1 | NULL |
| 5 | Business | NULL | 2 | 0 | 1 | NULL |
| 6 | Dentistry | NULL | 2 | 1 | 1 | NULL |
| 7 | Law | NULL | 2 | 2 | 1 | NULL |
| 8 | Student-Athletes | NULL | 3 | 0 | 1 | NULL |
| 9 | Alumni | NULL | 3 | 1 | 1 | NULL |
| 10 | Faculty | NULL | 3 | 2 | 1 | NULL |
| 11 | Staff | NULL | 3 | 3 | 1 | NULL |
+----+------------------+---------------+-----------------+--------------+--------+----------+
The terms "top" and "bottom" which you use aren't really applicable. "Albany" isn't at the "Top" of the table - it's merely at the top of the specific view you see when you query the table without specifying a meaningful sort order. It defaults to a sort order based on the Id or an internal ROWID parameter, which isn't the logical way to show this data.
Data in the table isn't inherently ordered. If you want to view your tags organized by their category, simply order your query by DisplayCategory (and probably by DisplayOrder afterwards), and you'll see your data properly organized. You can even create a persistent View that sorts it that way for your convenience.

SQL Query not Working on ORDER BY

I have a SQL Table that I'm trying to query and order the return. I am able to query just fine and the SQL Statement that I'm using is also working with the exception of the last ORDER BY statement that I need to execute. The sort order is as follows:
Sort the Status column so that 'open' is on top, 'closed' on bottom
Order the 'Flag' column so that empty (null) values are on bottom (above Status = Closed) and values on top
Order the results of items 1 and 2 by the Number column
Here is an example of the raw data:
| Flag | Number | Status |
|------------------------|
| a | 1 | open |
| | 5 | open |
| | 3 | closed |
| a | 4 | open |
| a | 2 | closed |
Here is what I'm going for:
| Flag | Number | Status |
|------------------------|
| a | 1 | open |
| a | 4 | open |
| | 5 | open |
| a | 2 | closed |
| | 3 | closed |
The query statement that I'm using is as follows:
sqlCom.CommandText = "SELECT * FROM Questions
WHERE Identifier = #identifier
AND Flag <> 'DELETED'
ORDER BY Status DESC
, (CASE WHEN Flag is null THEN 1 ELSE 0 END) ASC
, Flag DESC
, [Number] * 1 ASC";
Now, everything works fine, but the 3rd item above (sorting by Number column) doesn't work. Any ideas why?
What I'm currently getting:
| Flag | Number | Status |
|------------------------|
| a | 4 | open | <-- Out of order. Should be below the next record
| a | 1 | open | <-- Out of order. Should be one record up
| | 5 | open | <-- OK
| | 6 | open | <-- OK
| | 3 | closed | <-- OK
| a | 2 | closed | <-- OK
Thanks in advance for any helpful input. I have tried fiddling with the query in SSMS but no luck.
Your third sort expression is on Flag. Those values are being sorted alphabetically before the QNumber sort applies. And note that case matters in the ordering as well.
Here's how I would write it:
ORDER BY
Status DESC, -- might be better to use a case expression
CASE WHEN Flag IS NOT NULL THEN 0 ELSE 1 END,
QNumber
Since your data in the examples contradicts the data in the screenshot, it's not clear whether you needed to remove the third sort column entirely or just sort by ignoring the case of the text.

Pick a record based on a given value in postgres

I have a table in postgres like below,
alg_campaignid | alg_score | cp | sum
----------------+-----------+---------+----------
9829 | 30.44056 | 12.4000 | 12.4000
9880 | 29.59280 | 12.0600 | 24.4600
9882 | 29.59280 | 12.0600 | 36.5200
9827 | 29.27504 | 11.9300 | 48.4500
9821 | 29.14840 | 11.8800 | 60.3300
9881 | 29.14840 | 11.8800 | 72.2100
9883 | 29.14840 | 11.8800 | 84.0900
10026 | 28.79280 | 11.7300 | 95.8200
10680 | 10.31504 | 4.1800 | 100.0000
From which i have to select a record based on randomly generated number from 0 to 100.i.e first record should be returned if random number picked is between 0 and 12.4000,second if rendom is between 12.4000 and 24.4600,and likewise last if random no is between 95.8200 and 100.0000.
For Example
if the random number picked is 8 then the first record should be returned
or
if the random number picked is 48 then the fourth record should be returned
Is it possible to do this postgres if so kindly recommend a solution for this..
Yes, you can do this in Postgres. If you want to generate the number in the database:
with r as (
select random() * 100 as r
)
select t.*
from table t cross join r
where t.sum <= r.r
order by t.sum desc
limit 1;

Transaction management and temporary tables in SQL Server

Sorry for the title, perhaps it's not very clear.
I have some SQL queries in a script that depend on each other.
The script uses a temporary table in which the data is inserted (the #temp_data table).
This is the expected output:
___________________________________
| speed1 | speed2 | distance |
| 1 | NULL | 10 |
| 3 | NULL | 40 |
| 5 | NULL | 90 |
| NULL | 1 | 10 |
| NULL | 3 | 40 |
| NULL | 5 | 90 |
Here is the query structure (I didn't include the actual query since it's too big):
-- First group
queryForSpeed1
queryToUpdateDistanceBasedOnSpeed1
-- Second group
queryForSpeed2
queryToUpdateDistanceBasedOnSpeed2
If I run the first group of queries (queryForSpeed1 and queryToUpdateDistanceBasedOnSpeed1) separately from the second group then I get the expected output: only the speed1 and distance columns contain data:
___________________________________
| speed1 | speed2 | distance |
| 1 | NULL | 10 |
| 3 | NULL | 40 |
| 5 | NULL | 90 |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
The same happens when I run the second group:
___________________________________
| speed1 | speed2 | distance |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
| NULL | 1 | 10 |
| NULL | 2 | 40 |
| NULL | 3 | 90 |
BUT, when I run both groups: all the distances are NULL:
___________________________________
| speed1 | speed2 | distance |
| 1 | NULL | NULL |
| 3 | NULL | NULL |
| 5 | NULL | NULL |
| NULL | 1 | NULL |
| NULL | 2 | NULL |
| NULL | 3 | NULL |
I believe this is somehow related to transaction management and temporary tables, although I wasn't able to find anything relevant to solve the problem on Google.
From what I've read, SQL Server keeps a transaction log where it stores every update, insert and whatever... when it arrives at the end of the script it actually does all those insertions and updates.
So the update I did for the distance column finds all the speeds as being NULL because the data wasn't yet inserted in the temporary table from the previous updates, but at the end of the query the speeds are inserted in the table so that's why they are visible.
I played a bit with the GO statement to execute my script in batches, but no luck so far...
What am I doing wrong? Can someone point me in the right direction, please?
EDIT
Here is the actual query.
The problem is not related to transactions, but rather to the way you conduct updates to #temp_speed_profile. The second pass through #temp_speed_profile retrieves all six records. Speed_new is null in first record of Voyage_Id, consequently #distance becomes null. As you retain the value of #distance in next turn, it remains null.
Problem goes away when using different temporary tables because second pass works on second set of data only.
A note on cursors - when defining one make sure to add local and fast_forward. Local because it is limiting cursors' scope, and fast_forward to optimize fetches.
It is almost certainly caused by the way you have written your queries.
To confirm, just rewrite your queries using #temp_data1 and #temp_data2, rather than a single table #temp_data.

Eliminate full table scan due to BETWEEN (and GROUP BY)

Description
According to the explain command, there is a range that is causing a query to perform a full table scan (160k rows). How do I keep the range condition and reduce the scanning? I expect the culprit to be:
Y.YEAR BETWEEN 1900 AND 2009 AND
Code
Here is the code that has the range condition (the STATION_DISTRICT is likely superfluous).
SELECT
COUNT(1) as MEASUREMENTS,
AVG(D.AMOUNT) as AMOUNT,
Y.YEAR as YEAR,
MAKEDATE(Y.YEAR,1) as AMOUNT_DATE
FROM
CITY C,
STATION S,
STATION_DISTRICT SD,
YEAR_REF Y FORCE INDEX(YEAR_IDX),
MONTH_REF M,
DAILY D
WHERE
-- For a specific city ...
--
C.ID = 10663 AND
-- Find all the stations within a specific unit radius ...
--
6371.009 *
SQRT(
POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) +
(COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) *
POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) ) <= 50 AND
-- Get the station district identification for the matching station.
--
S.STATION_DISTRICT_ID = SD.ID AND
-- Gather all known years for that station ...
--
Y.STATION_DISTRICT_ID = SD.ID AND
-- The data before 1900 is shaky; insufficient after 2009.
--
Y.YEAR BETWEEN 1900 AND 2009 AND
-- Filtered by all known months ...
--
M.YEAR_REF_ID = Y.ID AND
-- Whittled down by category ...
--
M.CATEGORY_ID = '003' AND
-- Into the valid daily climate data.
--
M.ID = D.MONTH_REF_ID AND
D.DAILY_FLAG_ID <> 'M'
GROUP BY
Y.YEAR
Update
The SQL is performing a full table scan, which results in MySQL performing a "copy to tmp table", as shown here:
+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+
| 1 | SIMPLE | C | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | Y | range | YEAR_IDX | YEAR_IDX | 4 | NULL | 160422 | Using where |
| 1 | SIMPLE | SD | eq_ref | PRIMARY | PRIMARY | 4 | climate.Y.STATION_DISTRICT_ID | 1 | Using index |
| 1 | SIMPLE | S | eq_ref | PRIMARY | PRIMARY | 4 | climate.SD.ID | 1 | Using where |
| 1 | SIMPLE | M | ref | PRIMARY,YEAR_REF_IDX,CATEGORY_IDX | YEAR_REF_IDX | 8 | climate.Y.ID | 54 | Using where |
| 1 | SIMPLE | D | ref | INDEX | INDEX | 8 | climate.M.ID | 11 | Using where |
+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+
Answer
After using the STRAIGHT_JOIN:
+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+
| 1 | SIMPLE | C | const | PRIMARY | PRIMARY | 4 | const | 1 | Using temporary; Using filesort |
| 1 | SIMPLE | S | ALL | PRIMARY | NULL | NULL | NULL | 7795 | Using where |
| 1 | SIMPLE | SD | eq_ref | PRIMARY | PRIMARY | 4 | climate.S.STATION_DISTRICT_ID | 1 | Using index |
| 1 | SIMPLE | Y | ref | PRIMARY,STAT_YEAR_IDX | STAT_YEAR_IDX | 4 | climate.S.STATION_DISTRICT_ID | 1650 | Using where |
| 1 | SIMPLE | M | ref | PRIMARY,YEAR_REF_IDX,CATEGORY_IDX | YEAR_REF_IDX | 8 | climate.Y.ID | 54 | Using where |
| 1 | SIMPLE | D | ref | INDEX | INDEX | 8 | climate.M.ID | 11 | Using where |
+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+
Related
http://dev.mysql.com/doc/refman/5.0/en/how-to-avoid-table-scan.html
http://dev.mysql.com/doc/refman/5.0/en/where-optimizations.html
Optimize SQL that uses between clause
Thank you!
ONE Request... It looks like you KNOW your data. Add the keyword "STRAIGHT_JOIN" and see the results...
SELECT STRAIGHT_JOIN ... the rest of your query...
Straight-join tells MySql to DO IT AS I HAVE LISTED. So, your CITY table is the first in the FROM list, thus indicating you expect that to be your primary... Additionally, your WHERE clause of the CITY is the immediate filter. With that being said, it will probably fly through the rest of the query...
Hope it helps... Its worked for me with gov't data of millions of records queried and joined to 10+ lookup tables where mySql was trying to think for me.
in order to do efficient between queries you are going to want a b tree index on your YEAR column. for example:
CREATE INDEX id_index USING BTREE ON YEAR_REF (YEAR);
BTREE indexes allow for efficient range queries, if this is in fact the root problem then having an index like this should get rid of the full table scan and have it only scan the part of the table that is in the range. read more about btrees on wikipedia
However, as with any optimisation advice, you should measure to make sure that you don't do more harm than good.
Can you change from searching within a radius to search in a bounding box?
You know the city so you can calculate a bounding box in your application.
Perhaps this
S.LATITUDE_DECIMAL >= latitude_lower and
S.LATITUDE_DECIMAL <= latitude_upper and
S.LONGITUDE_DECIMAL >= longitude_lower and
S.LONGITUDE_DECIMAL <= longitude_upper
could be a little faster?