postgresql querying on multiple identical tables - sql

Hello I have several databases, such as mytable_2009_11_19_03 where last 2 numbers identify the hour (table for 03:00), now I want to query something from _00 to _23 .
It could be done in such way but it is really clumsy
select * from mytable_2009_11_19_00 where type = 15
UNION
select * from mytable_2009_11_19_01 where type = 15
UNION
...........
select * from mytable_2009_11_19_23 where type = 15
How could I do it easier?
regards

This table structure sounds like it was intended to be part of a data partitioning scheme. This is not bad design. This is a very good thing!
Time based data like this is always being added to in the front and dropped off the back as it expires. Using one huge table would result in large amounts of index fragmentation as the data updates and in very large maintenance times for operations like VACUUM.
If you follow the link I included in the first paragraph and read all about partitioning, you'll be able to use CHECK constraints to make date searches very fast. Any SELECT that includes a WHERE timestamp > x AND timestamp < y will be quick.
Now, if these tables don't include any timestamps then the partitioning with CHECK constraints won't work and you will just have to write scripts to write the clumsy UNION queries for you.

If you know that all the tables generated are always identical in schema, in PostgreSQL you could create a parent table and set the newly created tables to INHERIT from the parent.
CREATE TABLE mytable_2009_11_19 (LIKE mytable_2009_11_19_00);
ALTER TABLE mytable_2009_11_19_00 INHERIT mytable_2009_11_19;
ALTER TABLE mytable_2009_11_19_01 INHERIT mytable_2009_11_19;
.
.
.
ALTER TABLE mytable_2009_11_19_23 INHERIT mytable_2009_11_19;
SELECT * FROM mytable_2009_11_19 where type = 15;
This is similar to using a view, but there are differences (listing the CONs first):
(CON) This method requires you to be able to ALTER the individual tables, which you may not have the rights to do. A VIEW does not require this level of access.
(CON) This method requires the tables to all have the same structure or, at the very least, the parent must have ONLY the common elements between all the children.
(PRO) To add tables to a VIEW, you have to drop and redefine the VIEW (and it's permissions, if necessary). With a parent table, it's easy to add or remove tables by ALTERing each one with INHERIT/NOINHERIT.
(PRO) If your schema contains fields like date and timestamp and the structure rarely changes, you can build a single parent table and use INHERIT/NOINHERIT on a rolling basis to provide a time "window" that you can query against without having to query the entire history.

The easiest solution would likely be to build a view of all of the tables, then you can query them easily. You could easily write a procedure to generate the view. Also, if you use "union all", it will be faster, if the result you want is all of the rows (as opposed to distinct rows) and you can still grab distinct rows by selecting distinct from the view if you need it sometimes.

Related

google-bigquery Is there a way to copy a table and have it be updated when the original is?

Basically we have a lot of tables throughout several datasets but only want to share a few of the tables with people and the only way for access control is on a dataset level so the idea was to make a copy of the tables we wanted to show in a new dataset that would be dynamically updated when the original was. Thanks!
There is no way to create a table that updates based on another table's contents. The best way to do this is to define a logical view, which is treated similarly to a table. If you want to give access to all columns in the underlying table, you can define your view (make sure to use standard SQL) as:
SELECT * FROM `your-project.your_dataset.table_name`;
If the target table is partitioned, you can define a view that exposes the partitioning column:
SELECT *, DATE(_PARTITIONTIME) AS partition_date
FROM `your-project.your_dataset.table_name`;
The view will stay up to date with whatever the contents of the underlying table are.

How do I manage large data set spanning multiple tables? UNIONs vs. Big Tables?

I have an aggregate data set that spans multiple years. The data for each respective year is stored in a separate table named Data. The data is currently sitting in MS ACCESS tables, and I will be migrating it to SQL Server.
I would prefer that data for each year is kept in separate tables, to be merged and queried at runtime. I do not want to do this at the expense of efficiency, however, as each year is approx. 1.5M records of 40ish fields.
I am trying to avoid having to do an excessive number of UNIONS in the query. I would also like to avoid having to edit the query as each new year is added, leading to an ever-expanding number of UNIONs.
Is there an easy way to do these UNIONs at runtime without an extensive SQL query and high system utility? Or, if all the data should be managed in one large table, is there a quick and easy way to append all the tables together in a single query?
If you really want to store them in separate tables, then I would create a view that does that unioning for you.
create view AllData
as
(
select * from Data2001
union all
select * from Data2002
union all
select * from Data2003
)
But to be honest, if you use this, why not put all the data into 1 table. Then if you wanted you could create the views the other way.
create view Data2001
as
(
select * from AllData
where CreateDate >= '1/1/2001'
and CreateDate < '1/1/2002'
)
A single table is likely the best choice for this type of query. HOwever you have to balance that gainst the other work the db is doing.
One choice you did not mention is creating a view that contains the unions and then querying on theview. That way at least you only have to add the union statement to the view each year and all queries using the view will be correct. Personally if I did that I would write a createion query that creates the table and then adjusts the view to add the union for that table. Once it was tested and I knew it would run, I woudl schedule it as a job to run on the last day of the year.
One way to do this is by using horizontal partitioning.
You basically create a partitioning function that informs the DBMS to create separate tables for each period, each with a constraint informing the DBMS that there will only be data for a specific year in each.
At query execution time, the optimiser can decide whether it is possible to completely ignore one or more partitions to speed up execution time.
The setup overhead of such a schema is non-trivial, and it only really makes sense if you have a lot of data. Although 1.5 million rows per year might seem a lot, depending on your query plans, it shouldn't be any big deal (for a decently specced SQL server). Refer to documentation
I can't add comments due to low rep, but definitely agree with 1 table, and partitioning is helpful for large data sets, and is supported in SQL Server, where the data will be getting migrated to.
If the data is heavily used and frequently updated then monthly partitioning might be useful, but if not, given the size, partitioning probably isn't going to be very helpful.

How to create a view and update it later?

Is it possible to update the same view with new data? Using UPDATE after CREATE didn't seem to work.
Have only 1 table. Want the view to be a subset of that table. After using the view, I would like the same view to hold a different subset of the data from the only table.
Was thinking I can create the view, drop it, then create it again with the same name ut different subset from the table.....but not sure if there is a better way?
Create view ID 1-10 if it does not exist.
.
. //
. //
.
Update view **ID** 2-10
Any help is appreciated.
I think you are misunderstanding the purpose of a view. What you are trying to do can be handled with a simple select by changing the WHERE clause. A view normally represents a fixed window into the table (or tables) defined by its selection criteria. You wouldn't normally go changing the view dynamically to represent different selection criteria. Normally, you'd simply do a select either against the table or against the view itself, if you're selecting a subset of the columns in the view or doing a join of multiple tables in the view. Since you have a single table, I'd suggest just constructing the query you need dynamically and skipping the view entirely.
select * from table where ID > 0 and ID <= 10
then
select * from table where ID > 1 and ID <= 10
Note that in many cases you could save this as a stored procedure and parameterize the query if need be. If your language/framework supports it, use parameterized queries when issuing simple commands as well.
Yes, updates to base tables will be visible in a view (all other things being equal).
More information would be helpful.
Views are the wrong tool for this.
You should probably make a stored procedure that takes the ID range as a parameter.
To answer the question, you're looking for the ALTER VIEW statement.

What SQL query shows me the tables and indexes used by a view on Informix?

What SQL query shows me the tables & indexes used by a view on Informix?
I know how to find the "original create statement" for a view in SYS_VIEWS, but that requires a human brain scanning/grokking that select. I believe I can find if the tables are indexed, once they have been identified.
Background: I need to make certain that some critical views point to current (e.g. after being "reorganized") tables. Too often I have seen views pointing to old backup tables, that were no longer indexed, and took forever to query.
I need to identify these queries regularly and "remind" the tuning DBA to rebuild the views/indexes.
The table sysdepend documents view dependencies. The columns are:
btabid - base table ID number
btype - normally T for table or V for view
dtabid - dependent table ID number
dtype - normally T for table or V for view
Consequently, for a given view with tabid N, you can write:
SELECT b.owner, b.tabname, d.*
FROM "informix".systables b, "informix".sysdepend d
WHERE d.dtabid = N
AND d.btabid = b.tabid;
If you only know the view name, then determining the view's tabid is surprisingly tricky if your database is a MODE ANSI database where you may have multiple tables with the same table name (or view name in this case) but each with a different owner. However, in the usual case (a non-ANSI database, or a unique table/view name), the query is easy enough:
SELECT b.owner, b.tabname, d.*
FROM "informix".systables b, "informix".sysdepend d
WHERE d.dtabid = (SELECT v.tabid FROM "informix".systables v
WHERE v.tabname = "viewname"
)
AND d.btabid = b.tabid;
The question asks about the indexes used by a view. Indexes are not used by a view, per se; indexes are used by the query engine when processing a query, but the indexes used could change depending on the total query - so different indexes might be used for these two queries:
SELECT * FROM SomeView;
SELECT * FROM SomeView
WHERE Column1 BETWEEN 12 AND 314;
The indexes that will be used are not recorded anywhere in the system catalog; they are redetermined dynamically when a statement is prepared.
The question also notes:
Background: I need to make certain that some critical views point to current (e.g. after being "reorganized") tables. Too often I have seen views pointing to old backup tables, that were no longer indexed, and took forever to query.
How do you do your reorganization? Do you create a new table with the desired structure, copy the data from old to new, then rename old, rename new? That probably would be the explanation - the table renaming reworks the views that reference the table. What form of reorganization are you doing? Can you use a different technique? A classic standby is to use ALTER INDEX indexname TO CLUSTER (after altering it to NOT CLUSTER if it was already clustered). This rebuilds the table and the indexes - without the views breaking. Alternatively, you can consider an ALTER FRAGMENT operation.
It also seems a bit odd to keep the old tables around. That suggests that your reorganization is more a question of dropping old data. Maybe you should fragment your table by date ranges, so that you detach a fragment when it has reached its 'end of useful life' date. Dropping the tables would also drop the views that depend on it, ensuring that you rebuild the views with the new table names.
Another alternative, therefore, is to simply ensure that the reorganization drops and recreates the views.
I need to identify these queries regularly and "remind" the tuning DBA to rebuild the views/indexes.
Worrisome...it should just be part of the standard procedure for completing the reorganization. Basically, you have a bug to report in that procedure - it does not ensure that the views are fully operational.

Why is using '*' to build a view bad?

Why is using '*' to build a view bad ?
Suppose that you have a complex join and all fields may be used somewhere.
Then you just have to chose fields needed.
SELECT field1, field2 FROM aview WHERE ...
The view "aview" could be SELECT table1.*, table2.* ... FROM table1 INNER JOIN table2 ...
We have a problem if 2 fields have the same name in table1 and table2.
Is this only the reason why using '*' in a view is bad?
With '*', you may use the view in a different context because the information is there.
What am I missing ?
Regards
I don't think there's much in software that is "just bad", but there's plenty of stuff that is misused in bad ways :-)
The example you give is a reason why * might not give you what you expect, and I think there are others. For example, if the underlying tables change, maybe columns are added or removed, a view that uses * will continue to be valid, but might break any applications that use it. If your view had named the columns explicitly then there was more chance that someone would spot the problem when making the schema change.
On the other hand, you might actually want your view to blithely
accept all changes to the underlying tables, in which case a * would
be just what you want.
Update: I don't know if the OP had a specific database vendor in mind, but it is now clear that my last remark does not hold true for all types. I am indebted to user12861 and Jonny Leeds for pointing this out, and sorry it's taken over 6 years for me to edit my answer.
Although many of the comments here are very good and reference one common problem of using wildcards in queries, such as causing errors or different results if the underlying tables change, another issue that hasn't been covered is optimization. A query that pulls every column of a table tends to not be quite as efficient as a query that pulls only those columns you actually need. Granted, there are those times when you need every column and it's a major PIA having to reference them all, especially in a large table, but if you only need a subset, why bog down your query with more columns than you need.
Another reason why "*" is risky, not only in views but in queries, is that columns can change name or change position in the underlying tables. Using a wildcard means that your view accommodates such changes easily without needing to be changed. But if your application references columns by position in the result set, or if you use a dynamic language that returns result sets keyed by column name, you could experience problems that are hard to debug.
I avoid using the wildcard at all times. That way if a column changes name, I get an error in the view or query immediately, and I know where to fix it. If a column changes position in the underlying table, specifying the order of the columns in the view or query compensates for this.
These other answers all have good points, but on SQL server at least they also have some wrong points. Try this:
create table temp (i int, j int)
go
create view vtemp as select * from temp
go
insert temp select 1, 1
go
alter table temp add k int
go
insert temp select 1, 1, 1
go
select * from vtemp
SQL Server doesn't learn about the "new" column when it is added. Depending on what you want this could be a good thing or a bad thing, but either way it's probably not good to depend on it. So avoiding it just seems like a good idea.
To me this weird behavior is the most compelling reason to avoid select * in views.
The comments have taught me that MySQL has similar behavior and Oracle does not (it will learn about changes to the table). This inconsistency to me is all the more reason not to use select * in views.
Using '*' for anything production is bad. It's great for one-off queries, but in production code you should always be as explicit as possible.
For views in particular, if the underlying tables have columns added or removed, the view will either be wrong or broken until it is recompiled.
Using SELECT * within the view does not incur much of a performance overhead if columns aren't used outside the view - the optimizer will optimize them out; SELECT * FROM TheView can perhaps waste bandwidth, just like any time you pull more columns across a network connection.
In fact, I have found that views which link almost all the columns from a number of huge tables in my datawarehouse have not introduced any performance issues at all, even through relatively few of those columns are requested from outside the view. The optimizer handles that well and is able to push the external filter criteria down into the view very well.
However, for all the reasons given above, I very rarely use SELECT *.
I have some business processes where a number of CTEs are built on top of each other, effectively building derived columns from derived columns from derived columns (which will hopefully one day being refactored as the business rationalizes and simplifies these calculations), and in that case, I need all the columns to drop through each time, and I use SELECT * - but SELECT * is not used at the base layer, only in between the first CTE and the last.
The situation on SQL Server is actually even worse than the answer by #user12861 implies: if you use SELECT * against multiple tables, adding columns to a table referenced early in the query will actually cause your view to return the values of the new columns under the guise of the old columns. See the example below:
-- create two tables
CREATE TABLE temp1 (ColumnA INT, ColumnB DATE, ColumnC DECIMAL(2,1))
CREATE TABLE temp2 (ColumnX INT, ColumnY DATE, ColumnZ DECIMAL(2,1))
GO
-- populate with dummy data
INSERT INTO temp1 (ColumnA, ColumnB, ColumnC) VALUES (1, '1/1/1900', 0.5)
INSERT INTO temp2 (ColumnX, ColumnY, ColumnZ) VALUES (1, '1/1/1900', 0.5)
GO
-- create a view with a pair of SELECT * statements
CREATE VIEW vwtemp AS
SELECT *
FROM temp1 INNER JOIN temp2 ON 1=1
GO
-- SELECT showing the columns properly assigned
SELECT * FROM vwTemp
GO
-- add a few columns to the first table referenced in the SELECT
ALTER TABLE temp1 ADD ColumnD varchar(1)
ALTER TABLE temp1 ADD ColumnE varchar(1)
ALTER TABLE temp1 ADD ColumnF varchar(1)
GO
-- populate those columns with dummy data
UPDATE temp1 SET ColumnD = 'D', ColumnE = 'E', ColumnF = 'F'
GO
-- notice that the original columns have the wrong data in them now, causing any datatype-specific queries (e.g., arithmetic, dateadd, etc.) to fail
SELECT *
FROM vwtemp
GO
-- clean up
DROP VIEW vwTemp
DROP TABLE temp2
DROP TABLE temp1
It's because you don't always need every variable, and also to make sure that you are thinking about what you specifically need.
There's no point getting all the hashed passwords out of the database when building a list of users on your site for instance, so a select * would be unproductive.
Once upon a time, I created a view against a table in another database (on the same server) with
Select * From dbname..tablename
Then one day, a column was added to the targetted table. The view started returning totally incorrect results until it was redeployed.
Totally incorrect : no rows.
This was on Sql Server 2000.
I speculate that this is because of syscolumns values that the view had captured, even though I used *.
A SQL query is basically a functional unit designed by a programmer for use in some context. For long-term stability and supportability (possibly by someone other than you) everything in a functional unit should be there for a purpose, and it should be reasonably evident (or documented) why it's there - especially every element of data.
If I were to come along two years from now with the need or desire to alter your query, I would expect to grok it pretty thoroughly before I would be confident that I could mess with it. Which means I would need to understand why all the columns are called out. (This is even more obviously true if you are trying to reuse the query in more than one context. Which is problematic in general, for similar reasons.) If I were to see columns in the output that I couldn't relate to some purpose, I'd be pretty sure that I didn't understand what it did, and why, and what the consequences would be of changing it.
It's generally a bad idea to use *. Some code certification engines mark this as a warning and advise you to explicitly refer only the necessary columns. The use of * can lead to performance louses as you might only need some columns and not all. But, on the other hand, there are some cases where the use of * is ideal. Imagine that, no matter what, using the example you provided, for this view (aview) you would always need all the columns in these tables. In the future, when a column is added, you wouldn't need to alter the view. This can be good or bad depending the case you are dealing with.
I think it depends on the language you are using. I prefer to use select * when the language or DB driver returns a dict(Python, Perl, etc.) or associative array(PHP) of the results. It makes your code alot easier to understand if you are referring to the columns by name instead of as an index in an array.
No one else seems to have mentioned it, but within SQL Server you can also set up your view with the schemabinding attribute.
This prevents modifications to any of the base tables (including dropping them) that would affect the view definition.
This may be useful to you for some situations. I realise that I haven't exactly answered your question, but thought I would highlight it nonetheless.
And if you have joins using select * automatically means you are returning more data than you need as the data in the join fields is repeated. This is wasteful of database and network resources.
If you are naive enough to use views that call other views, using select * can make them even worse performers (This is technique that is bad for performance on its own, calling mulitple columns you don't need makes it much worse).