I found a program where in loop with hundred records they are selecting from 12 different tables the description of a field value (e.g. umskz, fdgrv etc).
I know that the best way is to take the descriptions when I am getting the values of these fields through a join.
But if for some reason I do not want to do like this, which is the next best practice in order to get the descriptions?
By selecting from the tables for each record? or
by loading them in ITABs, and for each record read the ITABs and get the descriptions?
Of course, when I finish of getting the descriptions in the 2nd way, I will Free the ITABs.
I am looking forward for your opinions.
Thanks
Elias
Your question is a bit fuzzy, so let me first summarize what I understand:
You want to select a large set of data with 1M records from one "central" table. The data has 11 columns that contain codes. You want to join the descriptions for these codes.
Your setup sounds like a star schema. The best option, especially with SAP S/4 HANA, will usually be to create a CDS view that describes the required joins and produces output of exactly the form you need. This allows the database to perform expensive execution path calculations well before you are selecting the data, and thus allows the database to choose the optimal way to give you the data.
The second most efficient way to do this with SAP HANA will be a single OpenSQL SELECT that joins all data with LEFT OUTER joins in a single step. You already found this on your own, but let me repeat it in pseudocode for clarity:
SELECT
<central table>-<field list>,
<first code table>-description AS description_1,
...
<eleventh code table>-description AS description_11
INTO TABLE DATA(data)
FROM <central table>
LEFT OUTER JOIN <first code table>
ON <central table>-<first code field> = <first code table>-key
...
LEFT OUTER JOIN <eleventh code table>
ON <central table>-<eleventh code field> = <eleventh code table>-key.
The third best choice, which is what you are initially asking for, will usually be a pre-selection of the main data, followed by subsequent selections of the code descriptions, plus a final "join" in ABAP. Range tables may simplify selecting the required codes. SORTED tables may ensure the join provides acceptable performance:
" select the main data
SELECT <central table>-<field list>
FROM <central table>
INTO CORRESPONDING FIELDS OF TABLE data.
" collect the codes
LOOP AT data REFERENCE INTO DATA(record).
INSERT VALUE #(
sign = 'I'
option = 'EQ'
low = record-<first code field> )
INTO TABLE first_codes_range.
...
INSERT VALUE #(
sign = 'I'
option = 'EQ'
low = record-<eleventh codde field> )
INTO TABLE eleventh_codes.
ENDLOOP.
" select the descriptions
SELECT <key>, description
FROM <first code table>
INTO TABLE first_descriptions
WHERE <key> IN first_codes.
...
SELECT <key>, description
FROM <eleventh code table>
INTO TABLE eleventh_descriptions
WHERE <key> IN eleventh_codes.
" join main data and descriptions
LOOP AT data REFERENCE INTO record.
record->description_1 =
first_descriptions[ <key> = record-><first code field> ]-description.
...
record->description_11 =
eleventh_descriptions[ <key> = record-><eleventh code field> ]-description.
ENDLOOP.
As #Sandra Rossi points out, performance is a highly individual thing that stands and falls with tiny details and often defies best practices. As a consequence, these suggestions can only ever be that: suggestions. The programmers that wrote the 13 hours job you are improving probably followed their best practices back then...
Related
This s a followup question regarding Jordans answer here: Weird error in BigQuery
I was using to query reference table within "Table_Query" for quit some time. Now, following the recent changes Joradan is referring to, many of our queries are broken... I would like to ask the community advice for alternative solution to what we are doing.
I have tables containing events ("MyTable_YYYYMMDD"). I want to query my data for a period of a specific (or several) campaign. The period of that campaign is stored in a table with all campaigns data (ID, StartCampaignDate, EndCampaignDate). In order to query only the relevant tables, we use Table_Query(), and within the TableQuery() we construct a list of all relevant table names based on the campaigns data.
This query runs in various forms many times with different params. the reason for using wildcard function (rather than query the entire dataset), is performance, execution costs, and maintenance costs. So, having it query all tables and filter just the results is not an option as it drives execution costs too high.
a sample query will look like:
SELECT
*
FROM
TABLE_QUERY([MyProject:MyDataSet] 'table_id IN
(SELECT CONCAT("MyTable_",STRING(Year*100+Month)) TBL_NAME
FROM DWH.Dim_Periods P
CROSS JOIN DWH.Campaigns AS LC
WHERE ID IN ("86254e5a-b856-3b5a-85e1-0f5ab3ff20d6")
AND DATE(P.Date) BETWEEN DATE(StartCampaignDate) AND DATE(EndCampaignDate))')
This is now broken...
My question - the info, which tables should you query is stored on a reference table, How would you query only the relevant tables (partitions) when "TableQuery" is no longer allowed to query reference tables?
Many thanks
The "simple" way I see is split it to two steps
Step 1 - build list that will be used to filter table_id's
SELECT GROUP_CONCAT_UNQUOTED(
CONCAT('"',"MyTable_",STRING(Year*100+Month),'"')
) TBL_NAME_LIST
FROM DWH.Dim_Periods P
CROSS JOIN DWH.Campaigns AS LC
WHERE ID IN ("86254e5a-b856-3b5a-85e1-0f5ab3ff20d6")
AND DATE(P.Date) BETWEEN DATE(StartCampaignDate) AND DATE(EndCampaignDate)
Note the change in your query to transform result to list that you will use in step 2
Step 2 - final query
SELECT
*
FROM
TABLE_QUERY([MyProject:MyDataSet],
'table_id IN (<paste list (TBL_NAME_LIST) built in first query>)')
Above steps are easy to implement in any client you potentially using
If you use it from within BigQuery Web UI - this makes you do a little extra manual "moves" that you might not be happy about
My answer is obvious and you most likely have this already as an option, but wanted to mention
This is not ideal solution. But it seems to do the job.
In my previous query I passed the IDs List as a parameter in an external process that constructed the query. I wanted this process to be unaware to any logic implemented in the query.
Eventually we came up with this solution:
Instead of passing a list of IDs, we pass a JSON that contains the relevant meta data for each ID. We parse this JSON within the Table_Query() function. So instead of querying a physical reference table, we query some sort of a "table variable" that we have put in a JSON.
Below is a sample query that runs on the public dataset that demonstrates this solution.
SELECT
YEAR,
COUNT (*) CNT
FROM
TABLE_QUERY([fh-bigquery:weather_gsod], 'table_id in
(Select table_id
From
(Select table_id,concat(Right(table_id,4),"0101") as TBL_Date from [fh-bigquery:weather_gsod.__TABLES_SUMMARY__]
where table_id Contains "gsod"
)TBLs
CROSS JOIN
(select
Regexp_Replace(Regexp_extract(SPLIT(DatesInput,"},{"),r"\"fromDate\":\"(\d\d\d\d-\d\d-\d\d)\""),"-","") as fromDate,
Regexp_Replace(Regexp_extract(SPLIT(DatesInput,"},{"),r"\"toDate\":\"(\d\d\d\d-\d\d-\d\d)\""),"-","") as toDate,
FROM
(Select
"[
{
\"CycleID\":\"123456\",
\"fromDate\":\"1929-01-01\",
\"toDate\":\"1950-01-10\"
},{
\"CycleID\":\"123456\",
\"fromDate\":\"1970-02-01\",
\"toDate\":\"2000-02-10\"
}
]"
as DatesInput)) RefDates
WHERE TBLs.TBL_Date>=RefDates.fromDate
AND TBLs.TBL_Date<=RefDates.toDate
)')
GROUP BY
YEAR
ORDER BY
YEAR
This solution is not ideal as it requires an external process to be aware of the data stored in the reference tables.
Ideally the BigQuery team will re-enable this very useful functionality.
I'm working in with relatively large data sets; ~200GB. The data is coming from text files that are being imported to SQL via a script. They are being bulkcopy'd into a temp table with the normalized tables waiting to recieve the data.
My question comes from the fact that I'm mostly a scripter so my logic would be to loop through each row and do individual checks per row to put the data where it needs to go but I read a different post on SO saying that's really wrong for SQL.
So my question is, if I have one temp table (31 columns) that is to be normalized between 5 others, what's the best way to go about this?
Table relationship is as follows:
System - Table that contains machine information (e.g. name, domain, etc.)
File - File information (e.g. name, size, directory, etc.)
SystemFile - The many-to-many system<->file relationship table.
Metadata - File metadata (language, etc.) - has foreign key relationship to file primary key
DigitalSignature - File digital signature status - has foreign key relationship to file primary key
Thanks
Dont have any links, don't have enough experience with things like ssis etc to give a balanced view. but when doing the task you are talking about my normal process would be (generic, simple version):
1.look at normalised data set and consider the least dependant components in the data being imported (e.g. order headers created before order items)
2.create queries the select out the data i will have.. these often have this form:
select
t.x,t.y,t.z
from
temp_table as t
left outer join normalise_table as n
on t.x=n.x
and t.y=n.y
and t.z=n.z
where
n.x is null
where temp_table may have lots of columns but these three represent whatever normalised nugget i want to add first, the left outer join and where null make sure i only get the new values - if merging is the same
verify that i am getting good information and that i am only getting the new rows i want. often you have to use group bys or distincts on the temp data to get accurate data for inserting.. something like:
select
t.x,t.y,t.z
from
(select
distinct x,y,z
from
temp_table ) as t
left outer join normalise_table as n
on t.x=n.x
and t.y=n.y
and t.z=n.z
where
n.x is null
3.wrap that select in an insert:
insert into
normalise_table (x,y,z)
select
t.x,t.y,t.z
from
(select
distinct x,y,z
from
temp_table ) as t
left outer join normalise_table as n
on t.x=n.x
and t.y=n.y
and t.z=n.z
where
n.x is null
in this way you are inserting sets of data.. the procedural part is doing this for each set to be inserted, but in general you are not iterating over rows.
BTW T-SQL has a merge command for when you may or may not have the data in the target table (and if you want to remove keys missing from the temp tables)
http://msdn.microsoft.com/en-us/library/bb510625.aspx
Some comments on foreign keys - these tend to be more specific to the situation:
Can you identify the relationship without the primary key? This is the easiest situation to deal with..
Imagine I have inserted my xyz object into a normalised table but it has 100 child rows (abc's) in another table (each child may have 100 children too.. this would mean 10000 rows in the de-normalised data for one xyz)
you would have to go through the validation before but your final query may look something like:
insert into
normalise_table_2 (parentID,a,b,c)
select
n.id,t.a,t.b,t.c
from
(select
distinct x,y,z,a,b,c
from
temp_table ) as t
inner join join normalise_table as n
on t.x=n.x
and t.y=n.y
and t.z=n.z
left outer join normalise_table_2 as n2
on n.id = n2.parentID
and t.a = n2.a
and t.b = n2.b
and t.c = n2.c
where
n2.a is null
or maybe a more readable way:
insert into normalise_table_2 (parentID,a,b,c)
select
*
from (
select distinct
n.id,t.a,t.b,t.c
from
normalise_table as n
inner join temp_table as t
on t.x = n.x
and t.y = n.y
and t.z = n.z
left outer join normalise_table_2 as n2
on t.a = n2.a
and t.b = n2.b
and t.c = n2.c
and n2.parentID = n.id
where
n2.id is null
) as x
If you are having trouble identifying the row without the id here are some points to consider
I often give a unique id to every row in the de-normalised/import data this makes it easier to track what has and has not been done. not to mention paying off in other ways (e.g. when source data has blanks if its they are to be the same as the row above)
I have created temp tables to track relationships like this as I go along.
sometimes (especially for less consistent data) these are not temp tables as they can be used after the fact for analysis what did and didn't import (and where it went), sometimes i have a comments column that the update queries populate with any details about exceptions relating to the import of that row.
sometimes you are lucky and there is some kind of source or oldId field in the target that can be used to link the de-normalised data and normalised version (this is particularly true of system migration type tasks as people often want to be able to look up items in the old system). sometimes this can be weird and wonderful - e.g. using the updated by or created by field looking for a special account that executes this particular process (though i would not particulary recommend that)
Sometimes it makes sense to update the source tables in some way.. e.g. replacing identifiers there
Sometimes you come up with ID ranges or similar that are used for import and you break normal rules about where ids are generated and your import process creates the ID.
this often means shutting down all other access to the target system while the import is executed. may sound mad but sometimes this is the best way for very complex uploads that require a lot of preparation
But often when you think about it there is a particular order you can add your data in and avoid this issue as you will always be able to identify the correct data. I have used the above techniques to make my life easier but I am not sure I have ever HAD to use them..
The only exception I can think of is generating IDs outside of the system which i have had to use, but this was so that IDs would be consistent across multiple trial loads and the final production load. Also data was coming from many sources with many people working on it, it made life easier that they could be in control of their own IDs - but it did bring other issues ;).
Generally I would try and leave the source data alone and ensure that if you re-run any of your scripts then they wont have any effect. this makes the whole system much more robust and gives everyone more confidence as you can re-import the same data or a file that has some of the same data and run everything again and nothing breaks.
note i have not tested any of these queries and just written them off the top of my head so sorry if they are not totally accurate.
EDIT: Here is a more complete set of code that shows exactly what's going on per the answer below.
libname output '/data/files/jeff'
%let DateStart = '01Jan2013'd;
%let DateEnd = '01Jun2013'd;
proc sql;
CREATE TABLE output.id AS (
SELECT DISTINCT id
FROM mydb.sale_volume AS sv
WHERE sv.category IN ('a', 'b', 'c') AND
sv.trans_date BETWEEN &DateStart AND &DateEnd
)
CREATE TABLE output.sums AS (
SELECT id, SUM(sales)
FROM mydb.sale_volue AS sv
INNER JOIN output.id AS ids
ON ids.id = sv.id
WHERE sv.trans_date BETWEEN &DateStart AND &DateEnd
GROUP BY id
)
run;
The goal is to simply query the table for some id's based on category membership. Then I sum these members' activity across all categories.
The above approach is far slower than:
Running the first query to get the subset
Running a second query the sums every ID
Running a third query that inner joins the two result sets.
If I'm understanding correctly, it may be more efficient to make sure that all of my code is completely passed through rather than cross-loading.
After posting a question yesterday, a member suggested I might benefit from asking a separate question on performance that was more specific to my situation.
I'm using SAS Enterprise Guide to write some programs/data queries. I don't have permissions to modify the underlying data, which is stored in 'Teradata'.
My basic problem is writing efficient SQL queries in this environment. For example, I query a large table (with tens of millions of records) for a small subset of ID's. Then, I use this subset to query the larger table again:
proc sql;
CREATE TABLE subset AS (
SELECT
id
FROM
bigTable
WHERE
someValue = x AND
date BETWEEN a AND b
)
This works in a matter of seconds and returns 90k ID's. Next, I want to query this set of ID's against the big table, and problems ensue. I'm wanting to sum values over time for the ID's:
proc sql;
CREATE TABLE subset_data AS (
SELECT
bigTable.id,
SUM(bigTable.value) AS total
FROM
bigTable
INNER JOIN subset
ON subset.id = bigTable.id
WHERE
bigTable.date BETWEEN a AND b
GROUP BY
bigTable.id
)
For whatever reason, this takes a really long time. The difference is that the first query flags 'someValue'. The second looks at all activity, regardless of what's in 'someValue'. For example, I could flag every customer who orders a pizza. Then I would look at every purchase for all customers who ordered pizza.
I'm not overly familiar with SAS so I'm looking for any advice on how to do this more efficiently or speed things up. I'm open to any thoughts or suggestions and please let me know if I can offer more detail. I guess I'm just surprised the second query takes so long to process.
The most critical thing to understand when using SAS to access data in Teradata (or any other external database for that matter) is that the SAS software prepares SQL and submits it to the database. The idea is to try and relieve you (the user) from all the database specific details. SAS does this using a concept called "implict pass-through", which just means that SAS does the translation from SAS code into DBMS code. Among the many things that occur is data type conversion: SAS only has two (and only two) data types, numeric and character.
SAS deals with translating things for you but it can be confusing. For example, I've seen "lazy" database tables defined with VARCHAR(400) columns having values that never exceed some smaller length (like column for a person's name). In the data base this isn't much of a problem, but since SAS does not have a VARCHAR data type, it creates a variable 400 characters wide for each row. Even with data set compression, this can really make the resulting SAS dataset unnecessarily large.
The alternative way is to use "explicit pass-through", where you write native queries using the actual syntax of the DBMS in question. These queries execute entirely on the DBMS and return results back to SAS (which still does the data type conversion for you. For example, here is a "pass-through" query that performs a join to two tables and creates a SAS dataset as a result:
proc sql;
connect to teradata (user=userid password=password mode=teradata);
create table mydata as
select * from connection to teradata (
select a.customer_id
, a.customer_name
, b.last_payment_date
, b.last_payment_amt
from base.customers a
join base.invoices b
on a.customer_id=b.customer_id
where b.bill_month = date '2013-07-01'
and b.paid_flag = 'N'
);
quit;
Notice that everything inside the pair of parentheses is native Teradata SQL and that the join operation itself is running inside the database.
The example code you have shown in your question is NOT a complete, working example of a SAS/Teradata program. To better assist, you need to show the real program, including any library references. For example, suppose your real program looks like this:
proc sql;
CREATE TABLE subset_data AS
SELECT bigTable.id,
SUM(bigTable.value) AS total
FROM TDATA.bigTable bigTable
JOIN TDATA.subset subset
ON subset.id = bigTable.id
WHERE bigTable.date BETWEEN a AND b
GROUP BY bigTable.id
;
That would indicate a previously assigned LIBNAME statement through which SAS was connecting to Teradata. The syntax of that WHERE clause would be very relevant to if SAS is even able to pass the complete query to Teradata. (You example doesn't show what "a" and "b" refer to. It is very possible that the only way SAS can perform the join is to drag both tables back into a local work session and perform the join on your SAS server.
One thing I can strongly suggest is that you try to convince your Teradata administrators to allow you to create "driver" tables in some utility database. The idea is that you would create a relatively small table inside Teradata containing the ID's you want to extract, then use that table to perform explicit joins. I'm sure you would need a bit more formal database training to do that (like how to define a proper index and how to "collect statistics"), but with that knowledge and ability, your work will just fly.
I could go on and on but I'll stop here. I use SAS with Teradata extensively every day against what I'm told is one of the largest Teradata environments on the planet. I enjoy programming in both.
You imply an assumption that the 90k records in your first query are all unique ids. Is that definite?
I ask because the implication from your second query is that they're not unique.
- One id can have multiple values over time, and have different somevalues
If the ids are not unique in the first dataset, you need to GROUP BY id or use DISTINCT, in the first query.
Imagine that the 90k rows consists of 30k unique ids, and so have an average of 3 rows per id.
And then imagine those 30k unique ids actually have 9 records in your time window, including rows where somevalue <> x.
You will then get 3x9 records back per id.
And as those two numbers grow, the number of records in your second query grows geometrically.
Alternative Query
If that's not the problem, an alternative query (which is not ideal, but possible) would be...
SELECT
bigTable.id,
SUM(bigTable.value) AS total
FROM
bigTable
WHERE
bigTable.date BETWEEN a AND b
GROUP BY
bigTable.id
HAVING
MAX(CASE WHEN bigTable.somevalue = x THEN 1 ELSE 0 END) = 1
If ID is unique and a single value, then you can try constructing a format.
Create a dataset that looks like this:
fmtname, start, label
where fmtname is the same for all records, a legal format name (begins and ends with a letter, contains alphanumeric or _); start is the ID value; and label is a 1. Then add one row with the same value for fmtname, a blank start, a label of 0, and another variable, hlo='o' (for 'other'). Then import into proc format using the CNTLIN option, and you now have a 1/0 value conversion.
Here's a brief example using SASHELP.CLASS. ID here is name, but it can be numeric or character - whichever is right for your use.
data for_fmt;
set sashelp.class;
retain fmtname '$IDF'; *Format name is up to you. Should have $ if ID is character, no $ if numeric;
start=name; *this would be your ID variable - the look up;
label='1';
output;
if _n_ = 1 then do;
hlo='o';
call missing(start);
label='0';
output;
end;
run;
proc format cntlin=for_fmt;
quit;
Now instead of doing a join, you can do your query 'normally' but with an additional where clause of and put(id,$IDF.)='1'. This won't be optimized with an index or anything, but it may be faster than the join. (It may also not be faster - depends on how the SQL optimizer is working.)
If the id is unique you might add a UNIQUE PRIMARY INDEX(id) to that table, otherwise it defaults to a Non-unique PI.
Knowing about uniquenes helps the optimizer to produce a better plan.
Without more info like an Explain (just put EXPLAIN in front of the SELECT) it's hard to tell how this can be improved.
One alternate solution is to use SAS procedures. I don't know what your actual SQL is doing, but if you're just doing frequencies (or something else that can be done in a PROC), you could do:
proc sql;
create view blah as select ... (your join);
quit;
proc freq data=blah;
tables id/out=summary(rename=count=total keep=id count);
run;
Or any number of other options (PROC MEANS, PROC TABULATE, etc.). That may be faster than doing the sum in SQL (depending on some details, such as how your data is organized, what you're actually doing, and how much memory you have available). It has the added benefit that SAS might choose to do this in-database, if you create the view in the database, which might be faster. (In fact, if you just run the freq off the base table, it's possible that would be even faster, and then join the results to the smaller table).
I have two tables that I am joining with the following query...
select *
from Partners p
inner join OrganizationMembers om on p.ParID = om.OrganizationId
where om.EmailAddress = 'my_email#address.com'
and om.deleted = 0
Which works great but some of the columns from Partners I want to be replaced with similarly named columns from OrganizationMembers. The number of columns I want to replace in the joined table are very few, shouldn't be more than 3.
It is possible to get the result I want by selectively choosing the columns I want in the resulting join like so...
select om.MemberID,
p.ParID,
p.Levelz,
p.encryptedSecureToken,
p.PartnerGroupName,
om.EmailAddress,
om.FirstName,
om.LastName
from Partners p
inner join OrganizationMembers om on p.ParID = om.OrganizationId
where om.EmailAddress = 'my_email#address.com'
and om.deleted = 0
But this creates a very long sequence of select p.a, p.b, p.c, p.d, ... etc ... which I am trying to avoid.
In summary I am trying to get several columns from the Partners table and up to 3 columns from the OrganizationMembers table without having a long column specification sequence at the beginning of the query. Is it possible or am I just dreaming?
select om.MemberID as mem
Use th AS keyword. This is called aliasing.
You are dreaming in your implementation.
Also, as a best practice, select * is something that is typically frowned upon by DBA's.
If you want to limit the results or change anything you must explicitly name the results, as a potential "stop gap you could do something like this.
SELECT p.*, om.MemberId, etc..
But this ONLY works if you want ALL columns from the first table, and then selected items.
Try this:
p.*,
om.EmailAddress,
om.FirstName,
om.LastName
You should never use * though. Always specifying the columns you actually need makes it easier to find out what happens.
But this creates a very long sequence
of select p.a, p.b, p.c, p.d, ... etc
... which I am trying to avoid.
Don't avoid it. Embrace it!
There are lots of reasons why it's best practice to explicity list the desired columns.
It's easier to do searches for where a particular column is being used.
The behavior of the query is more obvious to someone who is trying to maintain it.
Adding a column to the table won't automatically change the behavior of your query.
Removing a column from the table will break your query earlier, making bugs appear closer to the source, and easier to find and fix.
And anything that uses the query is going to have to list all the columns anyway, so there's no point being lazy about it!
Ok, here is my dilemma I have a database set up with about 5 tables all with the exact same data structure. The data is separated in this manner for localization purposes and to split up a total of about 4.5 million records.
A majority of the time only one table is needed and all is well. However, sometimes data is needed from 2 or more of the tables and it needs to be sorted by a user defined column. This is where I am having problems.
data columns:
id, band_name, song_name, album_name, genre
MySQL statment:
SELECT * from us_music, de_music where `genre` = 'punk'
MySQL spits out this error:
#1052 - Column 'genre' in where clause is ambiguous
Obviously, I am doing this wrong. Anyone care to shed some light on this for me?
I think you're looking for the UNION clause, a la
(SELECT * from us_music where `genre` = 'punk')
UNION
(SELECT * from de_music where `genre` = 'punk')
It sounds like you'd be happer with a single table. The five having the same schema, and sometimes needing to be presented as if they came from one table point to putting it all in one table.
Add a new column which can be used to distinguish among the five languages (I'm assuming it's language that is different among the tables since you said it was for localization). Don't worry about having 4.5 million records. Any real database can handle that size no problem. Add the correct indexes, and you'll have no trouble dealing with them as a single table.
Any of the above answers are valid, or an alternative way is to expand the table name to include the database name as well - eg:
SELECT * from us_music, de_music where `us_music.genre` = 'punk' AND `de_music.genre` = 'punk'
The column is ambiguous because it appears in both tables you would need to specify the where (or sort) field fully such as us_music.genre or de_music.genre but you'd usually specify two tables if you were then going to join them together in some fashion. The structure your dealing with is occasionally referred to as a partitioned table although it's usually done to separate the dataset into distinct files as well rather than to just split the dataset arbitrarily. If you're in charge of the database structure and there's no good reason to partition the data then I'd build one big table with an extra "origin" field that contains a country code but you're probably doing it for legitimate performance reason.
Either use a union to join the tables you're interested in http://dev.mysql.com/doc/refman/5.0/en/union.html or by using the Merge database engine http://dev.mysql.com/doc/refman/5.1/en/merge-storage-engine.html.
Your original attempt to span both tables creates an implicit JOIN. This is frowned upon by most experienced SQL programmers because it separates the tables to be combined with the condition of how.
The UNION is a good solution for the tables as they are, but there should be no reason they can't be put into the one table with decent indexing. I've seen adding the correct index to a large table increase query speed by three orders of magnitude.
The union statement cause a deal time in huge data. It is good to perform the select in 2 steps:
select the id
then select the main table with it