I'm looking to use some data, but the data doesn't have a row. I'm supposed to do this without creating a row in the table, so is there a way I can create a row only for the query I'm using and not the main table? I don't have access to creating rows on the table.
Here's the Data I am trying to use. I get it by using a formula on some of the existing data.
Select <Columns>
From<Table>
left join (select sum(case when number = '440' then amount end)
/sum(case when number ='430' then amount end) GP from <table>)
on <relations>
I need to use it in one of the columns. I want to create a small little row that will only be active on this query, and won't appear in the main database. I don't have access to edit the main database. Can I create it in this query?
Is it possible?
If it isn't possible, let me know via an answer, and that will close the question.
I am not sure if this is what you want (I need more details to understand the problem), but you can easily add rows with UNION ALL for example:
SELECT numberOfThings,
TypeOfThings
FROM thingsTable tt
UNION ALL
SELECT 24, 'Fake Type of Things'
This way, this will only be within your query and not in the DB.
(You can use this in your subqueries)
If this helps you can read further about combining queries here.
You can use a common table expression to make a virtual table that appears to have an additional row of your choosing for purposes of select.
For example, create the real table:
create user foo login;
create table dont_touch ( x text);
insert into dont_touch values ('one fish'),('two fish'), ('red fish');
revoke all on dont_touch from foo;
grant select on dont_touch to foo ;
Then log in as "foo" and do:
with t as (select * from dont_touch union all select 'blue fish')
select * from t;
However, queries upon this t might not be able to benefit from indexes that exist on the underlying real table, so the queries might be much slower.
Related
I am in the process of writing a custom upsert function for a specific use case for a redshift table. On their docs, AWS suggests two methods which i'm drawing inspiration from. Here is what i want to accomplish:
Insert any new rows to an existing table, but only if they don't already exist.
There is never a need to delete or modify an existing row (for my use case)
I have so far come up with two separate ways to do this, but I'm wondering what the tradeoffs of each could be
using an EXCEPT query for insertion of only new rows from a temp table:
insert into persisted_table (
select *
from temp_table
except
select *
from persisted_table
);
store results of aUNION ALL query on temp table with persisted table, and use that as the persisted table
insert into new_table (
select *
from temp_table
union
select *
from persisted_table
);
alter table persisted_table rename to old_perisisted_table_marked_for_deletion;
alter table new_table rename to persisted_table;
I'm aware that union all is slow and generally not recommended for bulk/large scale operations. Apart from that though are there any arguments that could influence this decision?
The first advise I'd give is to remember that Redshift is a cluster. Whatever process you select, if the data is large, you will want the comparison to determine if the row already exists to stay "on node". You will want the tables in questions to be distributed by the same key.
Next I would think about what the indexes are into the data. The processes you laid out will compare all columns when comparing. Is this needed? If a subset of columns can be the key this can make things more efficient.
insert into persisted_table (
select * from temp_table a left join persisted_table b on {keys}
where a.keys is null );
Hopefully these aspects will help your decision process
I have some SQL that is broken into two SELECT statements. The first SELECT statement inserts results INTO a temp table. The second SELECT statement is a COALESCE that reads data from the temp table the first one inserted data into. I need to be able to run these together (one after the other) and unfortunately cannot put these into a Stored Procedure due to the old reporting tool my company uses. The reporting tool must read from a VIEW or a TABLE. I wanted to put these into a VIEW, but have researched that a view cannot have more than one SELECT. Any ideas and examples on how to accomplish this? My original post/solution showing the SQL is in this post.
The temp table select could be converted to be a CTE (With clause), and the 2nd part the select query of the view.
Alternatively you could just inline it with sub-selects, but depending on complexity that might make it harder to maintain.
CREATE VIEW yourView AS
WITH myFirstSelect(someFields) AS
(
SELECT somefields FROM sometable
)
SELECT * from myFirstSelect
Docs : https://learn.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-2017
I have table in Oracle database which is called my_table for example. It is type of log table. It has an incremental column which is named "id" and "registration_number" which is unique for registered users. Now I want to get latest changes for registered users so I wrote queries below to accomplish this task:
First version:
SELECT t.*
FROM my_table t
WHERE t.id =
(SELECT MAX(id) FROM my_table t_m WHERE t_m.registration_number = t.registration_number
);
Second version:
SELECT t.*
FROM my_table t
INNER JOIN
( SELECT MAX(id) m_id FROM my_table GROUP BY registration_number
) t_m
ON t.id = t_m.m_id;
My first question is which of above queries is recommended and why? And second one is if sometimes there is about 70.000 insert to this table but mostly the number of inserted rows is changing between 0 and 2000 is it reasonable to add index to this table?
An analytical query might be the fastest way to get the latest change for each registered user:
SELECT registration_number, id
FROM (
SELECT
registration_number,
id,
ROW_NUMBER() OVER (PARTITION BY registration_number ORDER BY id DESC) AS IDRankByUser
FROM my_table
)
WHERE IDRankByUser = 1
As for indexes, I'm assuming you already have an index by registration_number. An additional index on id will help the query, but maybe not by much and maybe not enough to justify the index. I say that because if you're inserting 70K rows at one time the additional index will slow down the INSERT. You'll have to experiment (and check the execution plans) to figure out if the index is worth it.
In order to check for faster query, you should check the execution plan and cost and it will give you a fair idea. But i agree with solution of Ed Gibbs as analytics make query run much faster.
If you feel this table is going to grow very big then i would suggest partitioning the table and using local indexes. They will definitely help you to form faster queries.
In cases where you want to insert lots of rows then indexes slow down insertion as with each insertion index also has to be updated[I will not recommend index on ID]. There are 2 solutions i have think of for this:
You can drop index before insertion and then recreate it after insertion.
Use reverse key indexes. Check this link : http://oracletoday.blogspot.in/2006/09/there-is-option-to-create-index.html. Reverse key index can impact your query a bit so there will be trade off.
If you look for faster solution and there is a really need to maintain list of last activity for each user, then most robust solution is to maintain separate table with unique registration_number values and rowid of last record created in log table.
E.g. (only for demo, not checked for syntax validity, sequences and triggers omitted):
create table my_log(id number not null, registration_number number, action_id varchar2(100))
/
create table last_user_action(refgistration_number number not null, last_action rowid)
/
alter table last_user_action
add constraint pk_last_user_action primary key (registration_number) using index
/
create or replace procedure write_log(p_reg_num number, p_action_id varchar2)
is
v_row_id rowid;
begin
insert into my_log(registration_number, action_id)
values(p_reg_num, p_action_id)
returning rowid into v_row_id;
update last_user_action
set last_action = v_row_id
where registration_number = p_reg_num;
end;
/
With such schema you can simple query last actions for every user with good performance:
select
from
last_user_action lua,
my_log l
where
l.rowid (+) = lua.last_action
Rowid is physical storage identity directly addressing storage block and you can't use it after moving to another server, restoring from backups etc. But if you need such functionality it's simple to add id column from my_log table to last_user_action too, and use one or another depending on requirements.
I need to write a query to retrieve a big list of ids.
We do support many backends (MySQL, Firebird, SQLServer, Oracle, PostgreSQL ...) so I need to write a standard SQL.
The size of the id set could be big, the query would be generated programmatically. So, what is the best approach?
1) Writing a query using IN
SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)
My question here is. What happens if n is very big? Also, what about performance?
2) Writing a query using OR
SELECT * FROM TABLE WHERE ID = id1 OR ID = id2 OR ... OR ID = idn
I think that this approach does not have n limit, but what about performance if n is very big?
3) Writing a programmatic solution:
foreach (var id in myIdList)
{
var item = GetItemByQuery("SELECT * FROM TABLE WHERE ID = " + id);
myObjectList.Add(item);
}
We experienced some problems with this approach when the database server is queried over the network. Normally is better to do one query that retrieve all results versus making a lot of small queries. Maybe I'm wrong.
What would be a correct solution for this problem?
Option 1 is the only good solution.
Why?
Option 2 does the same but you repeat the column name lots of times; additionally the SQL engine doesn't immediately know that you want to check if the value is one of the values in a fixed list. However, a good SQL engine could optimize it to have equal performance like with IN. There's still the readability issue though...
Option 3 is simply horrible performance-wise. It sends a query every loop and hammers the database with small queries. It also prevents it from using any optimizations for "value is one of those in a given list"
An alternative approach might be to use another table to contain id values. This other table can then be inner joined on your TABLE to constrain returned rows. This will have the major advantage that you won't need dynamic SQL (problematic at the best of times), and you won't have an infinitely long IN clause.
You would truncate this other table, insert your large number of rows, then perhaps create an index to aid the join performance. It would also let you detach the accumulation of these rows from the retrieval of data, perhaps giving you more options to tune performance.
Update: Although you could use a temporary table, I did not mean to imply that you must or even should. A permanent table used for temporary data is a common solution with merits beyond that described here.
What Ed Guiness suggested is really a performance booster , I had a query like this
select * from table where id in (id1,id2.........long list)
what i did :
DECLARE #temp table(
ID int
)
insert into #temp
select * from dbo.fnSplitter('#idlist#')
Then inner joined the temp with main table :
select * from table inner join temp on temp.id = table.id
And performance improved drastically.
First option is definitely the best option.
SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)
However considering that the list of ids is very huge, say millions, you should consider chunk sizes like below:
Divide you list of Ids into chunks of fixed number, say 100
Chunk size should be decided based upon the memory size of your server
Suppose you have 10000 Ids, you will have 10000/100 = 100 chunks
Process one chunk at a time resulting in 100 database calls for select
Why should you divide into chunks?
You will never get memory overflow exception which is very common in scenarios like yours.
You will have optimized number of database calls resulting in better performance.
It has always worked like charm for me. Hope it would work for my fellow developers as well :)
Doing the SELECT * FROM MyTable where id in () command on an Azure SQL table with 500 million records resulted in a wait time of > 7min!
Doing this instead returned results immediately:
select b.id, a.* from MyTable a
join (values (250000), (2500001), (2600000)) as b(id)
ON a.id = b.id
Use a join.
In most database systems, IN (val1, val2, …) and a series of OR are optimized to the same plan.
The third way would be importing the list of values into a temporary table and join it which is more efficient in most systems, if there are lots of values.
You may want to read this articles:
Passing parameters in MySQL: IN list vs. temporary table
I think you mean SqlServer but on Oracle you have a hard limit how many IN elements you can specify: 1000.
Sample 3 would be the worst performer out of them all because you are hitting up the database countless times for no apparent reason.
Loading the data into a temp table and then joining on that would be by far the fastest. After that the IN should work slightly faster than the group of ORs.
For 1st option
Add IDs into temp table and add inner join with main table.
CREATE TABLE #temp (column int)
INSERT INTO #temp (column)
SELECT t.column1 FROM (VALUES (1),(2),(3),...(10000)) AS t(column1)
Try this
SELECT Position_ID , Position_Name
FROM
position
WHERE Position_ID IN (6 ,7 ,8)
ORDER BY Position_Name
I've got 10 tables that I'm joining together to create a view. I'm only selecting the ID from each table, but in the view, each ID can show more than once, however the combination of all ID's will always be unique. Is there a way to create another column in this view that will be a unique ID?
I'd like to be able to store the unique ID and use it to query against the view in order to get all the other ID's.
I had a similar issue where I needed to establish a hierarchy across multiple tables. If you are using an integer as the id in each of the tables, you could simply convert the ids of each table to a varchar and prefix them with a different letter for each table. For instance
CREATE VIEW LocationHierarchy as
SELECT 'C' + CONVERT(VARCHAR,[Id]) as Id
,[Name]
,'S' + CONVERT(VARCHAR,[State]) as parent
FROM [City]
UNION
SELECT 'S' + CONVERT(VARCHAR,[Id]) as Id
,[Name]
,'C' + CONVERT(VARCHAR,[Suburb]) as parent
FROM [Suburb]
etc
The effectiveness of this solution will depend on how large your dataset is.
I think you can do that using ROW_NUMBER(), at least if you can guarantee an ordering of the view. For example:
SELECT
ROW_NUMBER() OVER (ORDER BY col1, col2, col3) as UniqueId
FROM <lotsa joins>
As long as the order stays the same, and only fields are added at the end, the id will be unique.
Yes, recently I have had the same requirement.
When creating the view, keep the select statement as a TEMP_TABLE and then use Row_number() function on this TEMP_TABLE.
Here's an example:
CREATE VIEW VIEW_NM
AS
SELECT Row_number() OVER(ORDER BY column_nm DESC) AS 'RowNumber' FROM
(SELECT COL1,COL2 FROM TABLE1
UNION
SELECT COL1,COL2 FROM TABLE2
) AS TEMP_TABLE;
No you cannot do that in a view, what you can do is create a temporary table to hold all this information and create a key or unique id there for each row.
The fact that you're attempting to do this points to much bigger problems in the database design and/or application architecture.
Since you have 10 tables and I'm going to guess that the DB designer(s) just slapped on ID INT IDENTITY to all of the tables you will end up having about (2^31)^10 possible rows in the table.
The only data type that I can think of that might cover that number would be to translate all of the integers into 0-padded strings and put them all together as a big CHAR.
My guess though is that your real problem isn't getting this ID for a view but some other thing that you're attempting to do, which is the question that you should be asking. Just a hunch though.
One possibility would be to call a function from your view that calculates a hash code on all of the ID columns. If you use a decent cryptographic hash algorithm, the odds of a collision are minuscule (probably more likely that the disk would deliver bad data). Even easier, you could of course just concatenate the different IDs into a single, much larger ID, perhaps as a binary or varbinary column.
Storing that ID and being able to query against it would be a bit more work. I can't think of a way to both compute it and store it from a view. You would probably need a stored procedure to create it first; the details depend heavily on the specifics of your app.