Oracle Query on 24 tables with same columns - sql

I have 24 tables table1, table2, table3 ... with same columns to keep track of customers data on hourly basis and one rate table which rate is applied for a specific hour, rateId is a foregin key in all the 24 other tables, i need a dynamic query to fetch data from those tables on date and time basis. Can any one provide an example or guide me for that query.

You should not store the same data in 24 different tables. Partitioning (mentioned in a comment) is a very good solution when you have lots and lots of data and want to split it for performance reasons.
In any case, one way to structure your query is:
select t.*
from ((select * from table1) union all
(select * from table2) union all
. . .
(select * from table24)
) t
where <whatever you want>
You can then join this to whatever other tables you like (using rateId, for instance), filter on the fields, or whatever.
If you need to know the table where something came from, then you can get this as well:
select t.*
from ((select t.*, 1 as which from table1 t) union all
(select t.*, 2 as which from table2 t) union all
. . .
(select t.*, 24 as which from table24 t)
) t
where <whatever you want>
Note: I am using * here because the OP explicitly states that the tables have the same format. Even so, it is probably a good idea to list all the columns in each subquery.
EDIT:
As Bill suggests in the comment, you might want to turn this into a view. That way, you can write lots of queries on the tables, without worrying about the detailed tables. (And, better yet, you can fix the data structure by combining the tables, then change the view, and existing queries will work).

Related

SQL: Move duplicates to another table where condition

I am quite new to SQL and Stackoverflow, so pardon the layout of my post.
Currently, I am struggling with putting the following workflow into an executable SQL statement:
I have a table containing the following columns:
ID (not unique)
PARTYTYPE (1 or 2)
DATE column
several other, not relevant columns
Now I need to find those observations (rows) that have the same ID and same PARTYTYPE but are not the most recent, i.e. have a date in the DATE column that is less than the most recent for the given combination of PARTYTYPE and ID. The rows that satisfy this condition need to be moved to another table with the same table scheme in order to archive them.
Is there an efficient, yet simple way to accomplish this in SQL?
I have been looking for a long time, but since it involves finding duplicates with certain conditions and inserting it into a table, it is a rather specific problem.
This is what I have so far:
INSERT INTO table_history
select ID, PARTYTYPE, count(*) as count_
from table
group by ID, PARTYTYPE, DATE
having DATE = MAX(DATE)
Any help would be appreciated!
The way you describe the SQL almost exactly conforms to a correlated subquery:
INSERT INTO table_history( . . . )
select t.*
from table t
where date < (select max(date)
from table t2
where t2.id = t.id and t2.partytype = t.partytype
);

Optimize data retrieval from different servers with more than 10M records

SELECT party_code , max(date) AS date FROM
server1.table1 WITH (nolock) GROUP BY party_code
UNION
SELECT party_code , max(date) AS date FROM
server2.table1 WITH (nolock) GROUP BY party_code
UNION
SELECT party_code , max(date) AS date FROM
server3.table1 WITH (nolock) GROUP BY party_code
Like shown above I have similarly 17 tables on different servers, so I union them to get records. The total data sums up to more than 36 crores (360 millions) which effects the database execution time and ability to retrieve records. Can someone help me as to how to optimize this. Or any other solution to it.
First, you need a covering index on your tables. So if you do not have it already create this index on all your tables:
CREATE NONCLUSTERED INDEX IX_Table1_party_code__date
ON server1.table1 (party_code) INCLUDE (date)
Second, replace UNION with UNION ALL operators. Union does the sorting and comparing of datasets which you don't need if you need to keep records from each server separately.
If that doesn't help enough, maybe you can look in some of the other options:
Maybe you can first UNION ALL all the records (adding ServerID column in the process) and then do one GROUP BY on the dataset (on party_code and ServerID), but I can't tell for sure whatever this will be better or worse (you'll have to test).
Try using indexed views.
Staging tables that will be calculated and filled during night?
Do not use Union, instead you can use Union all and finally delete
duplicate records if any.
Insert all records in a stage(temp) table and finally delete duplicate records if any.
If number of records are huge you can you SSIS for faster process

distinct values from multiple fields within one table ORACLE SQL

How can I get distinct values from multiple fields within one table with just one request.
Option 1
SELECT WM_CONCAT(DISTINCT(FIELD1)) FIELD1S,WM_CONCAT(DISTINCT(FIELD2)) FIELD2S,..FIELD10S
FROM TABLE;
WM_CONCAT is LIMITED
Option 2
select DISTINCT(FIELD1) FIELDVALUE, 'FIELD1' FIELDNAME
FROM TABLE
UNION
select DISTINCT(FIELD2) FIELDVALUE, 'FIELD2' FIELDNAME
FROM TABLE
... FIELD 10
is just too slow
if you were scanning a small range in the data (not full scanning the whole table) you could use WITH to optimise your query
e.g:
WITH a AS
(SELECT field1,field2,field3..... FROM TABLE WHERE condition)
SELECT field1 FROM a
UNION
SELECT field2 FROM a
UNION
SELECT field3 FROM a
.....etc
For my problem, I had
WL1 ... WL2 ... correlation
A B 0.8
B A 0.8
A C 0.9
C A 0.9
how to eliminate the symmetry from this table?
select WL1, WL2,correlation from
table
where least(WL1,WL2)||greatest(WL1,WL2) = WL1||WL2
order by WL1
this gives
WL1 ... WL2 ... correlation
A B 0.8
A C 0.9
:)
The best option in the SQL is the UNION, though you may be able to save some performance by taking out the distinct keywords:
select FIELD1 FROM TABLE
UNION
select FIELD2 FROM TABLE
UNION provides the unique set from two tables, so distinct is redundant in this case. There simply isn't any way to write this query differently to make it perform faster. There's no magic formula that makes searching 200,000+ rows faster. It's got to search every row of the table twice and sort for uniqueness, which is exactly what UNION will do.
The only way you can make it faster is to create separate indexes on the two fields (maybe) or pare down the set of data that you're searching across.
Alternatively, if you're doing this a lot and adding new fields rarely, you could use a materialized view to store the result and only refresh it periodically.
Incidentally, your second query doesn't appear to do what you want it to. Distinct always applies to all of the columns in the select section, so your constants with the field names will cause the query to always return separate rows for the two columns.
I've come up with another method that, experimentally, seems to be a little faster. In affect, this allows us to trade one full-table scan for a Cartesian join. In most cases, I would still opt to use the union as it's much more obvious what the query is doing.
SELECT DISTINCT CASE lvl WHEN 1 THEN field1 ELSE field2 END
FROM table
CROSS JOIN (SELECT LEVEL lvl
FROM DUAL
CONNECT BY LEVEL <= 2);
It's also worthwhile to add that I tested both queries on a table without useful indexes containing 800,000 rows and it took roughly 45 seconds (returning 145,000 rows). However, most of that time was spent actually fetching the records, not running the query (the query took 3-7 seconds). If you're getting a sizable number of rows back, it may simply be the number of rows that is causing the performance issue you're seeing.
When you get distinct values from multiple columns, then it won't return a data table. If you think following data
Column A Column B
10 50
30 50
10 50
when you get the distinct it will be 2 rows from first column and 1 rows from 2nd column. It simply won't work.
And something like this?
SELECT 'FIELD1',FIELD1, 'FIELD2',FIELD2,...
FROM TABLE
GROUP BY FIELD1,FIELD2,...

How do I calculate a moving average using MySQL?

I need to do something like:
SELECT value_column1
FROM table1
WHERE datetime_column1 >= '2009-01-01 00:00:00'
ORDER BY datetime_column1;
Except in addition to value_column1, I also need to retrieve a moving average of the previous 20 values of value_column1.
Standard SQL is preferred, but I will use MySQL extensions if necessary.
This is just off the top of my head, and I'm on the way out the door, so it's untested. I also can't imagine that it would perform very well on any kind of large data set. I did confirm that it at least runs without an error though. :)
SELECT
value_column1,
(
SELECT
AVG(value_column1) AS moving_average
FROM
Table1 T2
WHERE
(
SELECT
COUNT(*)
FROM
Table1 T3
WHERE
date_column1 BETWEEN T2.date_column1 AND T1.date_column1
) BETWEEN 1 AND 20
)
FROM
Table1 T1
Tom H's approach will work. You can simplify it like this if you have an identity column:
SELECT T1.id, T1.value_column1, avg(T2.value_column1)
FROM table1 T1
INNER JOIN table1 T2 ON T2.Id BETWEEN T1.Id-19 AND T1.Id
I realize that this answer is about 7 years too late. I had a similar requirement and thought I'd share my solution in case it's useful to someone else.
There are some MySQL extensions for technical analysis that include a simple moving average. They're really easy to install and use: https://github.com/mysqludf/lib_mysqludf_ta#readme
Once you've installed the UDF (per instructions in the README), you can include a simple moving average in a select statement like this:
SELECT TA_SMA(value_column1, 20) AS sma_20 FROM table1 ORDER BY datetime_column1
When I had a similar problem, I ended up using temp tables for a variety of reasons, but it made this a lot easier! What I did looks very similar to what you're doing, as far as the schema goes.
Make the schema something like ID identity, start_date, end_date, value. When you select, do a subselect avg of the previous 20 based on the identity ID.
Only do this if you find yourself already using temp tables for other reasons though (I hit the same rows over and over for different metrics, so it was helpful to have the small dataset).
My solution adds a row number in table. The following example code may help:
set #MA_period=5;
select id1,tmp1.date_time,tmp1.c,avg(tmp2.c) from
(select #b:=#b+1 as id1,date_time,c from websource.EURUSD,(select #b:=0) bb order by date_time asc) tmp1,
(select #a:=#a+1 as id2,date_time,c from websource.EURUSD,(select #a:=0) aa order by date_time asc) tmp2
where id1>#MA_period and id1>=id2 and id2>(id1-#MA_period)
group by id1
order by id1 asc,id2 asc
In my experience, Mysql as of 5.5.x tends not to use indexes on dependent selects, whether a subquery or join. This can have a very significant impact on performance where the dependent select criteria change on every row.
Moving average is an example of a query which falls into this category. Execution time may increase with the square of the rows. To avoid this, chose a database engine which can perform indexed look-ups on dependent selects. I find postgres works effectively for this problem.
In mysql 8 window function frame can be used to obtain the averages.
SELECT value_column1, AVG(value_column1) OVER (ORDER BY datetime_column1 ROWS 19 PRECEDING) as ma
FROM table1
WHERE datetime_column1 >= '2009-01-01 00:00:00'
ORDER BY datetime_column1;
This calculates the average of the current row and 19 preceding rows.

Aggregate functions in WHERE clause in SQLite

Simply put, I have a table with, among other things, a column for timestamps. I want to get the row with the most recent (i.e. greatest value) timestamp. Currently I'm doing this:
SELECT * FROM table ORDER BY timestamp DESC LIMIT 1
But I'd much rather do something like this:
SELECT * FROM table WHERE timestamp=max(timestamp)
However, SQLite rejects this query:
SQL error: misuse of aggregate function max()
The documentation confirms this behavior (bottom of page):
Aggregate functions may only be used in a SELECT statement.
My question is: is it possible to write a query to get the row with the greatest timestamp without ordering the select and limiting the number of returned rows to 1? This seems like it should be possible, but I guess my SQL-fu isn't up to snuff.
SELECT * from foo where timestamp = (select max(timestamp) from foo)
or, if SQLite insists on treating subselects as sets,
SELECT * from foo where timestamp in (select max(timestamp) from foo)
There are many ways to skin a cat.
If you have an Identity Column that has an auto-increment functionality, a faster query would result if you return the last record by ID, due to the indexing of the column, unless of course you wish to put an index on the timestamp column.
SELECT * FROM TABLE ORDER BY ID DESC LIMIT 1
I think I've answered this question 5 times in the past week now, but I'm too tired to find a link to one of those right now, so here it is again...
SELECT
*
FROM
table T1
LEFT OUTER JOIN table T2 ON
T2.timestamp > T1.timestamp
WHERE
T2.timestamp IS NULL
You're basically looking for the row where no other row matches that is later than it.
NOTE: As pointed out in the comments, this method will not perform as well in this kind of situation. It will usually work better (for SQL Server at least) in situations where you want the last row for each customer (as an example).
you can simply do
SELECT *, max(timestamp) FROM table
Edit:
As aggregate function can't be used like this so it gives error. I guess what SquareCog had suggested was the best thing to do
SELECT * FROM table WHERE timestamp = (select max(timestamp) from table)