Is there a another way to subtract the smallest value from all the values of a column, effectively offset the values?
The only way I have found becomes horribly complicated for more complex queries.
CREATE TABLE offsettest(value NUMBER);
INSERT INTO offsettest VALUES(100);
INSERT INTO offsettest VALUES(200);
INSERT INTO offsettest VALUES(300);
INSERT INTO offsettest VALUES(400);
SELECT value - (SELECT MIN(value) FROM offsettest) FROM offsettest;
DROP TABLE offsettest;
I'd like to limit it to a single query (no stored procedures, variables, etc) if possible and standard SQL is preferred (although I am using Oracle).
I believe this works as of ANSI 1999.
SELECT value - MIN(value) OVER() FROM offsettest;
It would have helped you see your actual query, though, since depending on whether you need to manipulate more than one column this way, and the various minimums come from different rows, there may be more efficient ways to do it. If the OVER() works for you, then fine.
Related
I came across an old script that in essence does the following:
CREATE TABLE #T (ColA VARCHAR (20), ID INT)
INSERT INTO #T VALUES ('BBBBBBBB', 1), ('AAAAAAA', 4), ('RRRRRR', 3)
CREATE TABLE #S (ColA VARCHAR (100), ID INT)
INSERT INTO #S
SELECT * FROM #T
ORDER BY ID -- odd to do an order by in an insert statement, but that's the code as it is...
SELECT * FROM #S
DROP TABLE #T, #S
First, I want to mention that I am aware of the fact that tables such as the ones I created here do not have an actual order, we just order the resultset if we want.
However, if you run the script above on a SQL version 2008, you will get the results ordered in the order that was specified in the insert statement. On a 2016 machine, this is not the case. There it returns the rows in the order they were created in the first place. Does anyone know what changes cause this different behaviour?
Thanks a lot!
As to your example - nothing is changed. The relation in the relation theory is represented in the SQL with a table. And the relation is not ordered. So, you are not allowed to defined how rows are ordered when they are materialized - and you should not care about this.
If you want to SELECT the data in a ordered way each time, you must specified unique order by criteria.
Also, in your example - you can SELECT the data one billion times and the data can be returned as "you inserted" it each time, but on the very next time you can get different results. The engine returns the data in the "best" way according to it when there is no order specified, but this can change anytime.
As you know - unless order by is specified, the database engine returns the rows in an arbitrary order - How this order is generated has to do with the internal parts of the database engine - the algorithm may change between versions, even between service packs, without any need for documentation since it's known to be arbitrary.
Please note that arbitrary is not the same as random - meaning you should not expect to get different row order each time you run the query - in fact, you will probably get the same row order every time until something changes - that might be a restart to the server, a rebuild of an index, another row added to the table, an index created or removed - I can't say because it's not documented anywhere.
Moreover, unless you have an Identity column in your table, the optimizer will simply ignore the order by clause in the insert...select statement, exactly because what you already wrote in your question - Database tables have no intrinsic order.
Order the result set of a query by the specified column list and,
optionally, limit the rows returned to a specified range. The order
in which rows are returned in a result set are not guaranteed unless
an ORDER BY clause is specified.
MSSQL Docs
I want to insert multiple rows efficiently into VERTICA. In PostgreSQL (and probably other SQL implementations) it is possible to INSERT multiple rows in one statement, which is a lot faster, than doing single inserts (especially when in Autocommit mode).
A minimal self-contained example to load two rows in a newly created table could look like this (a):
CREATE TABLE my_schema.my_table (
row_count int,
some_float float,
some_string varchar(8));
INSERT INTO my_schema.my_table (row_count, some_float, some_string)
VALUES (1,1.0,'foo'),(2,2.0,'bar');
But the beauty of this is, that the order in which the values are bunched can be changed to be something like (b):
INSERT INTO my_schema.my_table (some_float, some_string, row_count)
VALUES (1.0,'foo',1),(2.0,'bar',2);
Furthermore, this syntax allows for leaving out columns which are then filled by default values (such as auto incrementing integers etc.).
However, VERTICA does not seem to have the possibility to do a multi-row insert with the same fine-tuning. In fact, the only way to emulate a similar behaviour seems to be to UNION several selects together for something like (c):
INSERT INTO my_schema.my_table SELECT 1,1.0,'foo' UNION SELECT 2,2.0,'bar';
as in this answer: Vertica SQL insert multiple rows in one statement .
However, this seems to be working only, when the order of the inserted columns matches the order of their initial definition. My question is, it is possible to craft a single insert like (c) but with the possibility of changing column order as in (b)? Or am I tackling the problem completely wrong? If so, what alternative is there to a multi-row insert? Should I try COPY LOCAL?
Just list the columns in the insert:
INSERT INTO my_schema.my_table (row_count, some_float, some_string)
SELECT 1,1.0,'foo'
UNION ALL
SELECT 2,2.0,'bar';
Note the use of UNION ALL instead of UNION. UNION incurs overhead for removing duplicates, which is not needed.
In SQLite, if I do:
CREATE TABLE fraction (
id Int,
tag Int,
num Int,
den Int,
PRIMARY KEY (id)
);
INSERT INTO fraction VALUES (1,1,3,4);
INSERT INTO fraction VALUES (2,1,5,6);
INSERT INTO fraction VALUES (3,2,3,8);
INSERT INTO fraction VALUES (4,2,5,7);
INSERT INTO fraction VALUES (5,1,10,13);
INSERT INTO fraction VALUES (6,2,5,7);
SELECT fraction.tag, max(1.0 * fraction.num / fraction.den)
FROM fraction
GROUP BY fraction.tag;
I will get the result:
1|0.833333333333333
2|0.714285714285714
Then, if I issue:
SELECT fraction.tag, max(1.0 * fraction.num / fraction.den),
fraction.num, fraction.den
FROM fraction
GROUP BY fraction.tag;
I will get the result:
1|0.833333333333333|5|6
2|0.714285714285714|5|7
The latter is what I would expect, but it seems like a happy accident more than anything predictable or reliable. For example, were the aggregate function sum instead of min, some type of "rider" column wouldn't make sense.
In a current project that I'm doing, I'm using a table joined to itself to simulate the latter:
SELECT DISTINCT fraction_a.tag, fraction_a.high,
fraction_b.num, fraction_b.den
FROM
(SELECT fraction.tag, max(1.0 * fraction.num / fraction.den) AS high
FROM fraction
GROUP BY fraction.tag)
AS fraction_a JOIN
(SELECT fraction.tag, fraction.num, fraction.den
FROM fraction)
AS fraction_b
ON fraction_a.tag = fraction_b.tag
AND fraction_a.high = 1.0 * fraction_b.num / fraction_b.den;
yielding
1|0.833333333333333|5|6
2|0.714285714285714|5|7
But I find that syntax ugly, impractical and unmaintainable.
As I'll be porting my project between several dialects of SQL, I need a solution that is reliable in all dialects. So, if I have to bite the bullet and use the ugly syntax I will, but I'd prefer using the cleaner one.
When you're using GROUP BY, the database has to create a single output row from (possibly) multiple input rows.
Columns mentioned in the GROUP BY clause have the same value for all rows in the group, so this is the output value to be used.
Columns with some aggregate function use that to compute the output value.
However, other columns are a problem, because there might be different values in the group.
The SQL standard forbids this.
MySQL forgets to check for this error, and gives some random row's value for the output.
SQLite allows this for compatibility with MySQL.
Since version 3.7.11, when you're using MIN or MAX, SQLite guarantees that the other columns will come from the record that has the minimum/maximum value.
Including non-aggregated columns in your SELECT clause that don't appear in your GROUP BY clause is non-portable and will likely cause errors / unexpected results. The syntax you're using is not cleaner - it is plain wrong and happens to work on SQLite. It won't work on Oracle (causing a syntax error), it won't work as expected on MySQL (where it will return random values from the group), and it likely won't work on other RDBMS.
The most straightforward way to implement this would be to use a windowing function - but since you need to support SQLite, that's out of the question.
Please note that your second approach (the "ugly" query) will return multiple rows per tag if you happen to have several maxima. This might or might not be what you want.
So bite the bullet and use something like your ugly approach - it's portable and will work as expected.
Is there a "semi-portable" way to get the md5() or the sha1() of an entire row? (Or better, of an entire group of rows ordered by all their fields, i.e. order by 1,2,3,...,n)? Unfortunately not all DBs are PostgreSQL... I have to deal with at least microsoft SQL server, Sybase, and Oracle.
Ideally, I'd like to have an aggregator (server side) and use it to detect changes in groups of rows. For example, in tables that have some timestamp column, I'd like to store a unique signature for, say, each month. Then I could quickly detect months that have changed since my last visit (I am mirrorring certain tables to a server running Greenplum) and re-load those.
I've looked at a few options, e.g. checksum(*) in tsql (horror: it's very collision-prone, since it's based on a bunch of XORs and 32-bit values), and hashbytes('MD5', field), but the latter can't be applied to an entire row. And that would give me a solution just for one of the SQL flavors I have to deal with.
Any idea? Even for just one of the SQL idioms mentioned above, that would be great.
You could calculate the hashbytes value for the entire row on an update trigger, I used this as part of an ETL process where previously they were comparing all columns in the tables, the speed increase was huge.
Hashbytes works on varchar, nvarchar, or varbinary datatypes, and I wanted to compare integer keys and text fields, casting everything would have been a nightmare, so I used the FOR XML clause in SQL server as follows:
CREATE TRIGGER get_hash_value ON staging_table
FOR UPDATE, INSERT AS
UPDATE staging_table
SET sha1_hash = (SELECT hashbytes('sha1', (SELECT col1, col2, col3 FOR XML RAW)))
GO
alternatively, you could calculate the values in a similar way outside of a trigger, if you plan to do many updates on all the rows by using a subquery with the for xml clause also. If going this route, you can even change it to a SELECT *, but not in the trigger, as each time you run it you would be getting a different value because the sha1_hash column would be different each time.
You could modify the select statement to get more than 1 row
In MSSQL -- You can use HashBytes across the entire row by using xml..
SELECT MBT.id,
hashbytes('MD5',
(SELECT MBT.*
FROM (
VALUES(NULL))foo(bar)
FOR xml auto)) AS [Hash]
FROM <Table> AS MBT;
You need the from (values(null))foo(bar) clause to use xml auto, it serves no other purpose..
tis my first question so sorry if its not well structured, I have looked for the answer for a while but no joy so here goes..
basically I have 20 columns and want to take the result of adding columns (a+b) (b+c) etc and make this the value of my new columns,
when i do a simple select statement the values appear as expected but i cant seem to get them to appear into a new table
the columns are varchars
this is one of the 20 select queries
((accidentlogs.before_T18/16-accidentlogs.before_T19/16)/21.954),
It seems like such an easy function and it probably is but stick a fork in me on this one
You can use the result of a SELECT Statement as a value for an INSERT Statement. The exact syntax may vary for the SQL Dialect you use (Oracle, Postgres, MySql...)
This is the code for postgres:
INSERT INTO table (field1, field2...) SELECT 'value1', 'value2'...