Rails - Get latest entries from multiple tables - sql

I am using RoR for app development and want to implement the following:
For the app's RSS feed I need to pull out 10 latest entries from the database. The part I am not able to figure out is how to get last 10 entries across multiple tables.
Pulling last n records from a single database table is easy but how to pull last n records across multiple database tables.
Any suggestions?

There are none built-in methods in ActiveRecord for this, doing this in SQL very simple You just do UNION.
SELECT * FROM (
SELECT table1.*, 'table_1_type' AS table_type FROM table1
UNION ALL SELECT table2.*, 'table_2_type' AS table_type FROM table2
UNION SELECT table3.*, 'table_3_type' AS table_type FROM table3
) AS my_union ORDER created_at DESC LIMIT 10;
This query will return 10 rows, each row will have tables from each of those. I suggest You to create plain Ruby class that executes this raw query and rebuilds the objects.
Or even better just fetch the values necessary for You and represent them using plain Ruby classes (not using You models).
You can access postgres connection through any AR class, like this MyModelClass.connection
Example of using raw connection:
conn = MyModelClass.connection
res = conn.exec('select tablename, tableowner from pg_tables')
res.each do |row|
row.each do |column|
puts column
end
end
In same fashion You can execute Your query and populate Ruby objects.
UPD
There is also a third option You can use. Create a new model RssItem and back it with SQL view instead of table. The approach is explained in this post:
http://hashrocket.com/blog/posts/sql-views-and-activerecord

Is there a particular reason you want to combine queries for multiple tables into one sql query? I don't think its possible to do a union query.
You can try something like this, its not exactly what you want though:
#all_items = [#record_videos.all, #musics.all, #new_topics.all].flatten.sort_by{ |thing| thing.created_at}
taken from:
rails 3 merge multiple queries from different tables
If it is important to you then the true way to accomplish this in Rails is to use STI (single table inheritance). But that would require restructuring your DB and models.

Related

Select From Multiple and New Tables will be created

My Simple Query is
SELECT MAX(DATETIME)
FROM gss.dbo.contacts_23
WHERE Identification = ''
GROUP BY Identification
How can I select Max(date) from 25 tables and new tables can be created?
These tables like (contacts_22,contacts_25,contacts_29,contacts_36,.. and the new)
I tried to think as following
use union but what about the new tables will be created next time
select all tables start with 'contacts_' to fetch all these tables and the tables will be created next time
SELECT TABLE_NAME
FROM GSS.INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME like 'contacts_%'
but it's not doable with FROM clause
How can I do that GET MAX(DATE) FROM .... AND NEW TABLES
I'm using SQL Server 2012
Thanks in advance
The data model is horrible, and there is no clean way of achieving what you want.
The first way is to create a view which unions all your contacts_ tables, and then to run your query on that view.
Create view contacts_view as
select * from contacts_20
union
select * from contacts_21
...
This is the simplest solution, but only works if you can be certain that you're not creating more tables which need to be included in the view.
The alternative is to use dynamic SQL. This would use your query on the schema database to construct a SQL statement, which you then execute. This can get moderately complex, and is generally hard to test and debug, but does allow new tables to be automatically included in the results.

API, showing results twice and not in correct order

For the first time I'm creating a API to import data to a swift App (iOS)
But I get all my results twice and the results don't know where the belong or are in the wrong order...
I think it's because I have several tables I'm pulling from.
an example:
$sql = "SELECT * FROM node, field_data_body";
//Drupal tables
I have heard about the WHERE statement, but got no clue.
What you do is a implicit cross JOIN, as you select from multiple tables. Without using conditions in your WHERE clause, you had all combinations of those tables.
If your tables are fully separate/independent, you can create 2 different SELECT queries for each of tables.
Otherwise, if there is a relationship between them, you should select one of the JOIN types, that is most appropriate for your goal, and describe his relationship in query using WHERE conditions.
Lets assume that node table has "id" column, and field_data_body has "node_id" column. Then query can be the following:
SELECT * FROM node
LEFT JOIN field_data_body ON node.id = field_data_body.node_id
or the same
SELECT * FROM node
LEFT JOIN field_data_body WHERE node.id = field_data_body.node_id
Here is a good visual representations of SQL JOINs.
As far as I know it's recommended to use the Entity API to query data from Drupal, and here is a document, https://www.drupal.org/node/1343708
And you can also set up your query with View module, and set the output to JSON.
The third option could be using https://www.drupal.org/project/rest_server module.
Perhaps the tables node and field_data_body contain the same data? Your selecte * from statement would take everything, if it's not distinct.

Whats the best way to select fields from multiple tables with a common prefix?

I have sensor data from a client which is in ongoing acquisition. Every week we get a table of new data (about one million rows each) and each table has the same prefix. I'd like to run a query and select some columns across all of these tables.
what would be the best way to go about this ?
I have seen some solutions that use dynammic sql and i was considering writing a stored procedure that would form a dynamic sql statement and execute it for me. But im not sure this is the best way.
I see you are using Postgresql. This is an ideal case for partitioning with constraint exclusion based on dates. You create one master table without data, and the other tables added daily inherit from it. In your case, you don't even have to worry about the nuisance of triggers on INSERT; sounds like there is never any insertion other than the daily bulk creation of a new table. See the link above for full documentation.
Queries can be run against the parent table, and Postgres takes care of looking in all the child tables, plus it is smart enough to skip child tables ruled out by WHERE criteria.
You could query the meta data for tables with the same prefix.
select table_name from information_schema.tables where table_name like 'week%'
Then you could use union all to combine queries like
select * from week001
union all
select * from week002
[...]
However I suggest appending new records to one single table, and use an index on the timestamp column. This would especially speed up queries which span multiple weeks etc. It will simplify your queries a lot, if you only have to deal with one table. If the table is getting too large, you could partition by date etc. So there should be no need to partition manually by having multiple tables.
You are correct, sometimes you have to write dynamic SQL to handle cases such as this.
If all of your tables are loaded you can query for table names within your stored procedure. Something like this:
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE'
Play with that to get the specific table names you need.
How are the table names differentiated? By date? Some incrementing ID?

SQL argument limit in Oracle

It appears that there is a limit of 1000 arguments in an Oracle SQL. I ran into this when generating queries such as....
select * from orders where user_id IN(large list of ids over 1000)
My workaround is to create a temporary table, insert the user ids into that first instead of issuing a query via JDBC that has a giant list of parameters in the IN.
Does anybody know of an easier workaround? Since we are using Hibernate I wonder if it automatically is able to do a similar workaround transparently.
An alternative approach would be to pass an array to the database and use a TABLE() function in the IN clause. This will probably perform better than a temporary table. It will certainly be more efficient than running multiple queries. But you will need to monitor PGA memory usage if you have a large number of sessions doing this stuff. Also, I'm not sure how easy it will be to wire this into Hibernate.
Note: TABLE() functions operate in the SQL engine, so they need us to declare a SQL type.
create or replace type tags_nt as table of varchar2(10);
/
The following sample populates an array with a couple of thousand random tags. It then uses the array in the IN clause of a query.
declare
search_tags tags_nt;
n pls_integer;
begin
select name
bulk collect into search_tags
from ( select name
from temp_tags
order by dbms_random.value )
where rownum <= 2000;
select count(*)
into n
from big_table
where name in ( select * from table (search_tags) );
dbms_output.put_line('tags match '||n||' rows!');
end;
/
As long as the temporary table is a global temporary table (ie only visible to the session), this is the recommended way of doing things (and I'd go that route for anything more than a dozen arguments, let alone a thousand).
I'd wonder where/how you are building that list of 1000 arguments. If this is a semi-permanent grouping (eg all employees based in a particular location) then that grouping should be in the database and the join done there. Databases are designed and built to do joins really quickly. Much quicker than pulling a bunch of id's back to the mid tier and then sending them back to the database.
select * from orders
where user_id in
(select user_id from users where location = :loc)
You can add additional predicates to split the list into chunks of 1000:
select * from orders where user_id IN (<first batch of 1000>)
OR user_id IN (<second batch of 1000>)
OR user_id IN ...
the comments regarding "if these IDs are in your database, use joins/correlation instead" hold true. However, if your list of IDs comes from elsewhere, like a SOLR result, you can get around the temp table requirement by issuing multiple queries, each with no more than 1000 ids present, and then merging the results of the query in memory. If you place the initial list of ids in a unique collection like a hashset, you can pop off 1000 ids at a time.

Is there efficient SQL to query a portion of a large table

The typical way of selecting data is:
select * from my_table
But what if the table contains 10 million records and you only want records 300,010 to 300,020
Is there a way to create a SQL statement on Microsoft SQL that only gets 10 records at once?
E.g.
select * from my_table from records 300,010 to 300,020
This would be way more efficient than retrieving 10 million records across the network, storing them in the IIS server and then counting to the records you want.
SELECT * FROM my_table is just the tip of the iceberg. Assuming you're talking a table with an identity field for the primary key, you can just say:
SELECT * FROM my_table WHERE ID >= 300010 AND ID <= 300020
You should also know that selecting * is considered poor practice in many circles. They want you specify the exact column list.
Try looking at info about pagination. Here's a short summary of it for SQL Server.
Absolutely. On MySQL and PostgreSQL (the two databases I've used), the syntax would be
SELECT [columns] FROM table LIMIT 10 OFFSET 300010;
On MS SQL, it's something like SELECT TOP 10 ...; I don't know the syntax for offsetting the record list.
Note that you never want to use SELECT *; it's a maintenance nightmare if anything ever changes. This query, though, is going to be incredibly slow since your database will have to scan through and throw away the first 300,010 records to get to the 10 you want. It'll also be unpredictable, since you haven't told the database which order you want the records in.
This is the core of SQL: tell it which 10 records you want, identified by a key in a specific range, and the database will do its best to grab and return those records with minimal work. Look up any tutorial on SQL for more information on how it works.
When working with large tables, it is often a good idea to make use of Partitioning techniques available in SQL Server.
The rules of your partitition function typically dictate that only a range of data can reside within a given partition. You could split your partitions by date range or ID for example.
In order to select from a particular partition you would use a query similar to the following.
SELECT <Column Name1>…/*
FROM <Table Name>
WHERE $PARTITION.<Partition Function Name>(<Column Name>) = <Partition Number>
Take a look at the following white paper for more detailed infromation on partitioning in SQL Server 2005.
http://msdn.microsoft.com/en-us/library/ms345146.aspx
I hope this helps however please feel free to pose further questions.
Cheers, John
I use wrapper queries to select the core query and then just isolate the ROW numbers that i wish to take from the query - this allows the SQL server to do all the heavy lifting inside the CORE query and just pass out the small amount of the table that i have requested. All you need to do is pass the [start_row_variable] and the [end_row_variable] into the SQL query.
NOTE: The order clause is specified OUTSIDE the core query [sql_order_clause]
w1 and w2 are TEMPORARY table created by the SQL server as the wrapper tables.
SELECT
w1.*
FROM(
SELECT w2.*,
ROW_NUMBER() OVER ([sql_order_clause]) AS ROW
FROM (
<!--- CORE QUERY START --->
SELECT [columns]
FROM [table_name]
WHERE [sql_string]
<!--- CORE QUERY END --->
) AS w2
) AS w1
WHERE ROW BETWEEN [start_row_variable] AND [end_row_variable]
This method has hugely optimized my database systems. It works very well.
IMPORTANT: Be sure to always explicitly specify only the exact columns you wish to retrieve in the core query as fetching unnecessary data in these CORE queries can cost you serious overhead
Use TOP to select only a limited amont of rows like:
SELECT TOP 10 * FROM my_table WHERE ID >= 300010
Add an ORDER BY if you want the results in a particular order.
To be efficient there has to be an index on the ID column.