SQL select join: is it possible to prefix all columns as 'prefix.*'? - sql

I'm wondering if this is possible in SQL. Say you have two tables A and B, and you do a select on table A and join on table B:
SELECT a.*, b.* FROM TABLE_A a JOIN TABLE_B b USING (some_id);
If table A has columns 'a_id', 'name', and 'some_id', and table B has 'b_id', 'name', and 'some_id', the query will return columns 'a_id', 'name', 'some_id', 'b_id', 'name', 'some_id'. Is there any way to prefix the column names of table B without listing every column individually? The equivalent of this:
SELECT a.*, b.b_id as 'b.b_id', b.name as 'b.name', b.some_id as 'b.some_id'
FROM TABLE_A a JOIN TABLE_B b USING (some_id);
But, as mentioned, without listing every column, so something like:
SELECT a.*, b.* as 'b.*'
FROM TABLE_A a JOIN TABLE_B b USING (some_id);
Basically something to say, "prefix every column returned by b.* with 'something'". Is this possible or am I out of luck?
EDITS
Advice on not using SELECT * and so on is valid advice but not relevant in my context, so please stick to the problem at hand -- is it possible to add a prefix (a constant specified in the SQL query) to all the column names of a table in a join?
My ultimate goal is to be able to do a SELECT * on two tables with a join, and be able to tell, from the names of the columns I get in my result set, which columns came from table A and which columns came from table B. Again, I don't want to have to list columns individually, I need to be able to do a SELECT *.

It seems the answer to your question is no, however one hack you can use is to assign a dummy column to separate each new table. This works especially well if you're looping through a result set for a list of columns in a scripting language such as Python or PHP.
SELECT '' as table1_dummy, table1.*, '' as table2_dummy, table2.*, '' as table3_dummy, table3.* FROM table1
JOIN table2 ON table2.table1id = table1.id
JOIN table3 ON table3.table1id = table1.id
I realize this doesn't answer your question exactly, but if you're a coder this is a great way to separate tables with duplicate column names.

I see two possible situations here. First, you want to know if there is a SQL standard for this, that you can use in general regardless of the database. No, there is not. Second, you want to know with regard to a specific dbms product. Then you need to identify it. But I imagine the most likely answer is that you'll get back something like "a.id, b.id" since that's how you'd need to identify the columns in your SQL expression. And the easiest way to find out what the default is, is just to submit such a query and see what you get back. If you want to specify what prefix comes before the dot, you can use "SELECT * FROM a AS my_alias", for instance.

I totally understand why this is necessary - at least for me it's handy during rapid prototyping when there are a lot of tables necessary to be joined, including many inner joins. As soon as a column name is the same in a second "joinedtable.*" field wild card, the main table's field values are overriden with the joinedtable values. Error prone, frustrating and a violation of DRY when having to manually specify the table fields with aliases over and over...
Here is a PHP (Wordpress) function to achieve this through code generation together with an example of how to use it. In the example, it is used to rapidly generate a custom query that will provide the fields of a related wordpress post that was referenced through a advanced custom fields field.
function prefixed_table_fields_wildcard($table, $alias)
{
global $wpdb;
$columns = $wpdb->get_results("SHOW COLUMNS FROM $table", ARRAY_A);
$field_names = array();
foreach ($columns as $column)
{
$field_names[] = $column["Field"];
}
$prefixed = array();
foreach ($field_names as $field_name)
{
$prefixed[] = "`{$alias}`.`{$field_name}` AS `{$alias}.{$field_name}`";
}
return implode(", ", $prefixed);
}
function test_prefixed_table_fields_wildcard()
{
global $wpdb;
$query = "
SELECT
" . prefixed_table_fields_wildcard($wpdb->posts, 'campaigns') . ",
" . prefixed_table_fields_wildcard($wpdb->posts, 'venues') . "
FROM $wpdb->posts AS campaigns
LEFT JOIN $wpdb->postmeta meta1 ON (meta1.meta_key = 'venue' AND campaigns.ID = meta1.post_id)
LEFT JOIN $wpdb->posts venues ON (venues.post_status = 'publish' AND venues.post_type = 'venue' AND venues.ID = meta1.meta_value)
WHERE 1
AND campaigns.post_status = 'publish'
AND campaigns.post_type = 'campaign'
LIMIT 1
";
echo "<pre>$query</pre>";
$posts = $wpdb->get_results($query, OBJECT);
echo "<pre>";
print_r($posts);
echo "</pre>";
}
The output:
SELECT
`campaigns`.`ID` AS `campaigns.ID`, `campaigns`.`post_author` AS `campaigns.post_author`, `campaigns`.`post_date` AS `campaigns.post_date`, `campaigns`.`post_date_gmt` AS `campaigns.post_date_gmt`, `campaigns`.`post_content` AS `campaigns.post_content`, `campaigns`.`post_title` AS `campaigns.post_title`, `campaigns`.`post_excerpt` AS `campaigns.post_excerpt`, `campaigns`.`post_status` AS `campaigns.post_status`, `campaigns`.`comment_status` AS `campaigns.comment_status`, `campaigns`.`ping_status` AS `campaigns.ping_status`, `campaigns`.`post_password` AS `campaigns.post_password`, `campaigns`.`post_name` AS `campaigns.post_name`, `campaigns`.`to_ping` AS `campaigns.to_ping`, `campaigns`.`pinged` AS `campaigns.pinged`, `campaigns`.`post_modified` AS `campaigns.post_modified`, `campaigns`.`post_modified_gmt` AS `campaigns.post_modified_gmt`, `campaigns`.`post_content_filtered` AS `campaigns.post_content_filtered`, `campaigns`.`post_parent` AS `campaigns.post_parent`, `campaigns`.`guid` AS `campaigns.guid`, `campaigns`.`menu_order` AS `campaigns.menu_order`, `campaigns`.`post_type` AS `campaigns.post_type`, `campaigns`.`post_mime_type` AS `campaigns.post_mime_type`, `campaigns`.`comment_count` AS `campaigns.comment_count`,
`venues`.`ID` AS `venues.ID`, `venues`.`post_author` AS `venues.post_author`, `venues`.`post_date` AS `venues.post_date`, `venues`.`post_date_gmt` AS `venues.post_date_gmt`, `venues`.`post_content` AS `venues.post_content`, `venues`.`post_title` AS `venues.post_title`, `venues`.`post_excerpt` AS `venues.post_excerpt`, `venues`.`post_status` AS `venues.post_status`, `venues`.`comment_status` AS `venues.comment_status`, `venues`.`ping_status` AS `venues.ping_status`, `venues`.`post_password` AS `venues.post_password`, `venues`.`post_name` AS `venues.post_name`, `venues`.`to_ping` AS `venues.to_ping`, `venues`.`pinged` AS `venues.pinged`, `venues`.`post_modified` AS `venues.post_modified`, `venues`.`post_modified_gmt` AS `venues.post_modified_gmt`, `venues`.`post_content_filtered` AS `venues.post_content_filtered`, `venues`.`post_parent` AS `venues.post_parent`, `venues`.`guid` AS `venues.guid`, `venues`.`menu_order` AS `venues.menu_order`, `venues`.`post_type` AS `venues.post_type`, `venues`.`post_mime_type` AS `venues.post_mime_type`, `venues`.`comment_count` AS `venues.comment_count`
FROM wp_posts AS campaigns
LEFT JOIN wp_postmeta meta1 ON (meta1.meta_key = 'venue' AND campaigns.ID = meta1.post_id)
LEFT JOIN wp_posts venues ON (venues.post_status = 'publish' AND venues.post_type = 'venue' AND venues.ID = meta1.meta_value)
WHERE 1
AND campaigns.post_status = 'publish'
AND campaigns.post_type = 'campaign'
LIMIT 1
Array
(
[0] => stdClass Object
(
[campaigns.ID] => 33
[campaigns.post_author] => 2
[campaigns.post_date] => 2012-01-16 19:19:10
[campaigns.post_date_gmt] => 2012-01-16 19:19:10
[campaigns.post_content] => Lorem ipsum
[campaigns.post_title] => Lorem ipsum
[campaigns.post_excerpt] =>
[campaigns.post_status] => publish
[campaigns.comment_status] => closed
[campaigns.ping_status] => closed
[campaigns.post_password] =>
[campaigns.post_name] => lorem-ipsum
[campaigns.to_ping] =>
[campaigns.pinged] =>
[campaigns.post_modified] => 2012-01-16 21:01:55
[campaigns.post_modified_gmt] => 2012-01-16 21:01:55
[campaigns.post_content_filtered] =>
[campaigns.post_parent] => 0
[campaigns.guid] => http://example.com/?p=33
[campaigns.menu_order] => 0
[campaigns.post_type] => campaign
[campaigns.post_mime_type] =>
[campaigns.comment_count] => 0
[venues.ID] => 84
[venues.post_author] => 2
[venues.post_date] => 2012-01-16 20:12:05
[venues.post_date_gmt] => 2012-01-16 20:12:05
[venues.post_content] => Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
[venues.post_title] => Lorem ipsum venue
[venues.post_excerpt] =>
[venues.post_status] => publish
[venues.comment_status] => closed
[venues.ping_status] => closed
[venues.post_password] =>
[venues.post_name] => lorem-ipsum-venue
[venues.to_ping] =>
[venues.pinged] =>
[venues.post_modified] => 2012-01-16 20:53:37
[venues.post_modified_gmt] => 2012-01-16 20:53:37
[venues.post_content_filtered] =>
[venues.post_parent] => 0
[venues.guid] => http://example.com/?p=84
[venues.menu_order] => 0
[venues.post_type] => venue
[venues.post_mime_type] =>
[venues.comment_count] => 0
)
)

The only database I know that does this is SQLite, depending on the settings you configure with PRAGMA full_column_names and PRAGMA short_column_names. See http://www.sqlite.org/pragma.html
Otherwise all I can recommend is to fetch columns in a result set by ordinal position rather than by column name, if it's too much trouble for you to type the names of the columns in your query.
This is a good example of why it's bad practice to use SELECT * -- because eventually you'll have a need to type out all the column names anyway.
I understand the need to support columns that may change name or position, but using wildcards makes that harder, not easier.

This question is very useful in practice. It's only necessary to list every explicit columns in software programming, where you pay particular careful to deal with all conditions.
Imagine when debugging, or, try to use DBMS as daily office tool, instead of something alterable implementation of specific programmer's abstract underlying infrastructure, we need to code a lot of SQLs. The scenario can be found everywhere, like database conversion, migration, administration, etc. Most of these SQLs will be executed only once and never be used again, give the every column names is just waste of time. And don't forget the invention of SQL isn't only for the programmers use.
Usually I will create a utility view with column names prefixed, here is the function in pl/pgsql, it's not easy but you can convert it to other procedure languages.
-- Create alias-view for specific table.
create or replace function mkaview(schema varchar, tab varchar, prefix varchar)
returns table(orig varchar, alias varchar) as $$
declare
qtab varchar;
qview varchar;
qcol varchar;
qacol varchar;
v record;
sql varchar;
len int;
begin
qtab := '"' || schema || '"."' || tab || '"';
qview := '"' || schema || '"."av' || prefix || tab || '"';
sql := 'create view ' || qview || ' as select';
for v in select * from information_schema.columns
where table_schema = schema and table_name = tab
loop
qcol := '"' || v.column_name || '"';
qacol := '"' || prefix || v.column_name || '"';
sql := sql || ' ' || qcol || ' as ' || qacol;
sql := sql || ', ';
return query select qcol::varchar, qacol::varchar;
end loop;
len := length(sql);
sql := left(sql, len - 2); -- trim the trailing ', '.
sql := sql || ' from ' || qtab;
raise info 'Execute SQL: %', sql;
execute sql;
end
$$ language plpgsql;
Examples:
-- This will create a view "avp_person" with "p_" prefix to all column names.
select * from mkaview('public', 'person', 'p_');
select * from avp_person;

In postgres, I use the json functions to instead return json objects....
then, after querying, I json_decode the fields with a _json suffix.
IE:
select row_to_json(tab1.*) AS tab1_json, row_to_json(tab2.*) AS tab2_json
from tab1
join tab2 on tab2.t1id=tab1.id
then in PHP (or any other language), I loop through the returned columns and json_decode() them if they have the "_json" suffix (also removing the suffix. In the end, I get an object called "tab1" that includes all tab1 fields, and another called "tab2" that includes all tab2 fields.

I am in kind of the same boat as OP - I have dozens of fields from 3 different tables that I'm joining, some of which have the same name(ie. id, name, etc). I don't want to list each field, so my solution was to alias those fields that shared a name and use select * for those that have a unique name.
For example :
table a :
id,
name,
field1,
field2 ...
table b :
id,
name,
field3,
field4 ...
select a.id as aID, a.name as aName, a. * , b.id as bID, b.name as bName, b. * .....
When accessing the results I us the aliased names for these fields and ignore the "original" names.
Maybe not the best solution but it works for me....i'm use mysql

DIfferent database products will give you different answers; but you're setting yourself up for hurt if you carry this very far. You're far better off choosing the columns you want, and giving them your own aliases so the identity of each column is crystal-clear, and you can tell them apart in the results.

I totally understand your problem about duplicated field names.
I needed that too until I coded my own function to solve it. If you are using PHP you can use it, or code yours in the language you are using for if you have this following facilities.
The trick here is that mysql_field_table() returns the table name and mysql_field_name() the field for each row in the result if it's got with mysql_num_fields() so you can mix them in a new array.
This prefixes all columns ;)
Regards,
function mysql_rows_with_columns($query) {
$result = mysql_query($query);
if (!$result) return false; // mysql_error() could be used outside
$fields = mysql_num_fields($result);
$rows = array();
while ($row = mysql_fetch_row($result)) {
$newRow = array();
for ($i=0; $i<$fields; $i++) {
$table = mysql_field_table($result, $i);
$name = mysql_field_name($result, $i);
$newRow[$table . "." . $name] = $row[$i];
}
$rows[] = $newRow;
}
mysql_free_result($result);
return $rows;
}

There is no SQL standard for this.
However With code generation (either on demand as the tables are created or altered or at runtime), you can do this quite easily:
CREATE TABLE [dbo].[stackoverflow_329931_a](
[id] [int] IDENTITY(1,1) NOT NULL,
[col2] [nchar](10) NULL,
[col3] [nchar](10) NULL,
[col4] [nchar](10) NULL,
CONSTRAINT [PK_stackoverflow_329931_a] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[stackoverflow_329931_b](
[id] [int] IDENTITY(1,1) NOT NULL,
[col2] [nchar](10) NULL,
[col3] [nchar](10) NULL,
[col4] [nchar](10) NULL,
CONSTRAINT [PK_stackoverflow_329931_b] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
DECLARE #table1_name AS varchar(255)
DECLARE #table1_prefix AS varchar(255)
DECLARE #table2_name AS varchar(255)
DECLARE #table2_prefix AS varchar(255)
DECLARE #join_condition AS varchar(255)
SET #table1_name = 'stackoverflow_329931_a'
SET #table1_prefix = 'a_'
SET #table2_name = 'stackoverflow_329931_b'
SET #table2_prefix = 'b_'
SET #join_condition = 'a.[id] = b.[id]'
DECLARE #CRLF AS varchar(2)
SET #CRLF = CHAR(13) + CHAR(10)
DECLARE #a_columnlist AS varchar(MAX)
DECLARE #b_columnlist AS varchar(MAX)
DECLARE #sql AS varchar(MAX)
SELECT #a_columnlist = COALESCE(#a_columnlist + #CRLF + ',', '') + 'a.[' + COLUMN_NAME + '] AS [' + #table1_prefix + COLUMN_NAME + ']'
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #table1_name
ORDER BY ORDINAL_POSITION
SELECT #b_columnlist = COALESCE(#b_columnlist + #CRLF + ',', '') + 'b.[' + COLUMN_NAME + '] AS [' + #table2_prefix + COLUMN_NAME + ']'
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #table2_name
ORDER BY ORDINAL_POSITION
SET #sql = 'SELECT ' + #a_columnlist + '
,' + #b_columnlist + '
FROM [' + #table1_name + '] AS a
INNER JOIN [' + #table2_name + '] AS b
ON (' + #join_condition + ')'
PRINT #sql
-- EXEC (#sql)

I use to_jsonb function in PostgreSQL 13 to get all fields in joined table as a single column.
select
TABLE_A.*,
to_jsonb(TABLE_B.*) as b,
to_jsonb(TABLE_C.*) as c
from TABLE_A
left join TABLE_B on TABLE_B.a_id=TABLE_A.id
left join TABLE_C on TABLE_C.a_id=TABLE_A.id
where TABLE_A.id=1
As a result you will get the number of TABLE_A columns plus b and c columns:
id
name
some_other_col
b
c
1
Some name
Some other value
{"id":1,"a_id":1,"prop":"value"}
{"id":1,"a_id":1,"prop":"value"}
1
Some other name
Another value
{"id":1,"a_id":1,"prop":"value"}
{"id":1,"a_id":1,"prop":"value"}
You just need to parse the b and c columns to convert them to an object.

Or you could use Red Gate SQL Refactor or SQL Prompt, which expands your SELECT * into column lists with a click of the Tab button
so in your case, if you type in SELECT * FROM A JOIN B ...
Go to the end of *, Tab button, voila! you'll see
SELECT A.column1, A.column2, .... , B.column1, B.column2 FROM A JOIN B
It's not free though

I solved a similar problem of mine by renaming the fields in the tables involved. Yes, I had the privilege of doing this and understand that everybody may not have it. I added prefix to each field within a table representing the table name. Thus the SQL posted by OP would remain unchanged -
SELECT a.*, b.* FROM TABLE_A a JOIN TABLE_B b USING (some_id);
and still give the expected results - ease of identifying which table the output fields belongs to.

Recently ran into this issue in NodeJS and Postgres.
ES6 approach
There aren't any RDBMS features I know of that provide this functionality, so I created an object containing all my fields, e.g.:
const schema = { columns: ['id','another_column','yet_another_column'] }
Defined a reducer to concatenate the strings together with a table name:
const prefix = (table, columns) => columns.reduce((previous, column) => {
previous.push(table + '.' + column + ' AS ' + table + '_' + column);
return previous;
}, []);
This returns an array of strings. Call it for each table and combine the results:
const columns_joined = [...prefix('tab1',schema.columns), ...prefix('tab2',schema.columns)];
Output the final SQL statement:
console.log('SELECT ' + columns_joined.join(',') + ' FROM tab1, tab2 WHERE tab1.id = tab2.id');

select * usually makes for bad code, as new columns tend to get added or order of columns change in tables quite frequently which usually breaks select * in a very subtle ways. So listing out columns is the right solution.
As to how to do your query, not sure about mysql but in sqlserver you could select column names from syscolumns and dynamically build the select clause.

There are two ways I can think of to make this happen in a reusable way. One is to rename all of your columns with a prefix for the table they have come from. I have seen this many times, but I really don't like it. I find that it's redundant, causes a lot of typing, and you can always use aliases when you need to cover the case of a column name having an unclear origin.
The other way, which I would recommend you do in your situation if you are committed to seeing this through, is to create views for each table that alias the table names. Then you join against those views, rather than the tables. That way, you are free to use * if you wish, free to use the original tables with original column names if you wish, and it also makes writing any subsequent queries easier because you have already done the renaming work in the views.
Finally, I am not clear why you need to know which table each of the columns came from. Does this matter? Ultimately what matters is the data they contain. Whether UserID came from the User table or the UserQuestion table doesn't really matter. It matters, of course, when you need to update it, but at that point you should already know your schema well enough to determine that.

If concerned about schema changes this might work for you:
1. Run a 'DESCRIBE table' query on all tables involved.
2. Use the returned field names to dynamically construct a string of column names prefixed with your chosen alias.

There is a direct answer to your question for those who use the MySQL C-API.
Given the SQL:
SELECT a.*, b.*, c.* FROM table_a a JOIN table_b b USING (x) JOIN table_c c USING (y)
The results from 'mysql_stmt_result_metadata()' gives the definition of your fields from your prepared SQL query into the structure MYSQL_FIELD[]. Each field contains the following data:
char *name; /* Name of column (may be the alias) */
char *org_name; /* Original column name, if an alias */
char *table; /* Table of column if column was a field */
char *org_table; /* Org table name, if table was an alias */
char *db; /* Database for table */
char *catalog; /* Catalog for table */
char *def; /* Default value (set by mysql_list_fields) */
unsigned long length; /* Width of column (create length) */
unsigned long max_length; /* Max width for selected set */
unsigned int name_length;
unsigned int org_name_length;
unsigned int table_length;
unsigned int org_table_length;
unsigned int db_length;
unsigned int catalog_length;
unsigned int def_length;
unsigned int flags; /* Div flags */
unsigned int decimals; /* Number of decimals in field */
unsigned int charsetnr; /* Character set */
enum enum_field_types type; /* Type of field. See mysql_com.h for types */
Take notice the fields: catalog,table,org_name
You now know which fields in your SQL belongs to which schema (aka catalog) and table.
This is enough to generically identify each field from a multi-table sql query, without having to alias anything.
An actual product SqlYOG is show to use this exact data in such a manor that they are able to independently update each table of a multi-table join, when the PK fields are present.

Cant do this without aliasing , simply because, how are you going to reference a field in the where clause, if that field exists in the 2 or 3 tables you are joining?
It will be unclear for mysql which one you are trying to reference.

Developing from this solution, this is how I would approach the problem:
First create a list of all the AS statements:
DECLARE #asStatements varchar(8000)
SELECT #asStatements = ISNULL(#asStatements + ', ','') + QUOTENAME(table_name) + '.' + QUOTENAME(column_name) + ' AS ' + '[' + table_name + '.' + column_name + ']'
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'TABLE_A' OR TABLE_NAME = 'TABLE_B'
ORDER BY ORDINAL_POSITION
Then use it in your query:
EXEC('SELECT ' + #asStatements + ' FROM TABLE_A a JOIN TABLE_B b USING (some_id)');
However, this might need modifications because something similar is only tested in SQL Server. But this code doesn't exactly work in SQL Server because USING is not supported.
Please comment if you can test/correct this code for e.g. MySQL.

PHP 7.2 + MySQL/Mariadb
MySQL will send you multiple fields with the same name. Even in the terminal client. But if you want an associative array, you'll have to make the keys yourself.
Thanks to #axelbrz for the original. I've ported it to newer php and cleaned it up a little:
function mysqli_rows_with_columns($link, $query) {
$result = mysqli_query($link, $query);
if (!$result) {
return mysqli_error($link);
}
$field_count = mysqli_num_fields($result);
$fields = array();
for ($i = 0; $i < $field_count; $i++) {
$field = mysqli_fetch_field_direct($result, $i);
$fields[] = $field->table . '.' . $field->name; # changed by AS
#$fields[] = $field->orgtable . '.' . $field->orgname; # actual table/field names
}
$rows = array();
while ($row = mysqli_fetch_row($result)) {
$new_row = array();
for ($i = 0; $i < $field_count; $i++) {
$new_row[$fields[$i]] = $row[$i];
}
$rows[] = $new_row;
}
mysqli_free_result($result);
return $rows;
}
$link = mysqli_connect('localhost', 'fixme', 'fixme', 'fixme');
print_r(mysqli_rows_with_columns($link, 'select foo.*, bar.* from foo, bar'));

I implemented a solution based upon the answer suggesting using dummy or sentinel columns in node. You would use it by generating SQL like:
select
s.*
, '' as _prefix__creator_
, u.*
, '' as _prefix__speaker_
, p.*
from statements s
left join users u on s.creator_user_id = u.user_id
left join persons p on s.speaker_person_id = p.person_id
And then post-processing the row you get back from your database driver like addPrefixes(row).
Implementation (based upon the fields/rows returned by my driver, but should be easy to change for other DB drivers):
const PREFIX_INDICATOR = '_prefix__'
const STOP_PREFIX_INDICATOR = '_stop_prefix'
/** Adds a <prefix> to all properties that follow a property with the name: PREFIX_INDICATOR<prefix> */
function addPrefixes(fields, row) {
let prefix = null
for (const field of fields) {
const key = field.name
if (key.startsWith(PREFIX_INDICATOR)) {
if (row[key] !== '') {
throw new Error(`PREFIX_INDICATOR ${PREFIX_INDICATOR} must not appear with a value, but had value: ${row[key]}`)
}
prefix = key.substr(PREFIX_INDICATOR.length)
delete row[key]
} else if (key === STOP_PREFIX_INDICATOR) {
if (row[key] !== '') {
throw new Error(`STOP_PREFIX_INDICATOR ${STOP_PREFIX_INDICATOR} must not appear with a value, but had value: ${row[key]}`)
}
prefix = null
delete row[key]
} else if (prefix) {
const prefixedKey = prefix + key
row[prefixedKey] = row[key]
delete row[key]
}
}
return row
}
Test:
const {
addPrefixes,
PREFIX_INDICATOR,
STOP_PREFIX_INDICATOR,
} = require('./BaseDao')
describe('addPrefixes', () => {
test('adds prefixes', () => {
const fields = [
{name: 'id'},
{name: PREFIX_INDICATOR + 'my_prefix_'},
{name: 'foo'},
{name: STOP_PREFIX_INDICATOR},
{name: 'baz'},
]
const row = {
id: 1,
[PREFIX_INDICATOR + 'my_prefix_']: '',
foo: 'bar',
[STOP_PREFIX_INDICATOR]: '',
baz: 'spaz'
}
const expected = {
id: 1,
my_prefix_foo: 'bar',
baz: 'spaz',
}
expect(addPrefixes(fields, row)).toEqual(expected)
})
})

What I do is use Excel to concatenate the procedure. For instance, first I select * and get all of the columns, paste them in Excel. Then write out the code I need to surround the column. Say i needed to ad prev to a bunch of columns. I'd have my fields in the a column and "as prev_" in column B and my fields again in column c. In column d i'd have a column.
Then use concatanate in column e and merge them together, making sure to include spaces. Then cut and paste this into your sql code. I've also used this method to make case statements for the same field and other longer codes i need to do for each field in a multihundred field table.

This creates the list of fields with a given prefix
select
name + ' as prefix.' + name + ','
from sys.columns where object_id = object_id('mytable')
order by column_id

Same response as the very good 'PHP (Wordpress) function' but coded for CakePHP 4.3.
Place in src/Controller/Component/MyUtilsComponent.php
<?php
namespace App\Controller\Component;
use Cake\Controller\Component;
use Cake\Datasource\ConnectionManager;
class MyUtilsComponent extends Component
{
public static function prefixedTableFieldsWildcard(string $table, string $alias, string $connexion = 'default'): string
{
$c = ConnectionManager::get($connexion);
$columns = $c->execute("SHOW COLUMNS FROM $table");
$field_names = [];
foreach ($columns as $column) {
$field_names[] = $column['Field'];
}
$prefixed = [];
foreach ($field_names as $field_name) {
$prefixed[] = "`{$alias}`.`{$field_name}` AS `{$alias}.{$field_name}`";
}
return implode(', ', $prefixed);
}
}
Tests and usage
function testPrefixedTableFieldsWildcard(): void
{
$fields = MyUtilsComponent::prefixedTableFieldsWildcard('metas', 'u', 'test');
$this->assertEquals('`u`.`id` AS `u.id`, `u`.`meta_key` AS `u.meta_key`, `u`.`meta_value` AS `u.meta_value`, `u`.`meta_default` AS `u.meta_default`, `u`.`meta_desc` AS `u.meta_desc`', $fields,);
}

You would think that over 13 years Microsoft would have put this in. It would be incredibly useful for debugging purposes.
What i've gotten in the habit of doing is selecting the columns I think I want to compare and then putting an * at the end to catch anything else I might want to look at.
select a.breed, a.size, p.breed, p.size,a.,p.
from animal a
join pet p on a.breed=p.breed
anyway you get the idea.

Related

Create union from multiple column query result

I'm trying to convert the result of an sql query into a set.
The table looks like:
create table edges (
a int,
b int
);
I'd like to use the result of a query as a subquery which needs both columns 'a', and 'b' in a union.
select * from ... where id in (select a from edges union select b from edges))
The above query works, but it would be great if I could convert the result of the subquery into a union already without needing to join it with another subquery which returns the 2nd column.
... in (select makeunion(a, b) from edges)
Can anyone help?
I think if you need to use IN statement you can't avoid UNION. But you can use other WHERE conditions, for example:
where EXISTS(select a from edges where a=id or b=id)
I actually think you can't factorize this code. UNION requires you to use both queries, one first, one after.
But, you can make a php function that does it for you like :
function makeUnion ($table, $rows) {
$ret = '(';
$array_rows = explode (',', $rows);
foreach ($array_rows as $v) {
if ($ret != '(')
$ret .= ' UNION ';
$ret .= 'SELECT ' .$v. ' FROM ' .$table;
}
$ret .= ')';
return $ret;
}
And then use it in your query like this :
$sql->query("... IN " .makeUnion('edges','a,b'));
I haven't tested it, so don't there may be some errors, but I think you get the point.

Creating a new table from grouped substring of existing table

I am having some trouble creating some SQL (for SQL server 2008).
I have a table of tasks that are priority ordered, comma delimited tasks:
Id = 1, LongTaskName = "a,b,c"
Id = 2, LongTaskName = "a,c"
Id = 3, LongTaskName = "b,c"
Id = 4, LongTaskName = "a"
etc...
I am trying to build a new table that groups them by the first task, along with the id:
GroupName: "a", TaskId: 1
GroupName: "a", TaskId: 2
GroupName: "a", TaskId: 4
GroupName: "b", TaskId: 3
Here is the naive, slow, linq code:
foreach(var t in Tasks)
{
var gt = new GroupedTasks();
gt.TaskId = t.Id;
var firstWord = t.LongTaskName.Split(',');
if(firstWord.Count() > 0)
{
gt.GroupName = firstWord.First();
}
else
{
gt.GroupName = t.LongTaskName;
}
GroupedTasks.InsertOnSubmit(gt);
}
I wrote a sql function to do the string split:
create function fn_Split(
#String nvarchar (4000),
#Delimiter nvarchar (10)
)
returns nvarchar(4000)
begin
declare #FirstComma int
set #FirstComma = charindex(#Delimiter,#String)
if(#FirstComma = 0)
return #String
return substring(#String, 0, #FirstComma)
end
go
However, I am getting stuck on the real sql to do the work.
I can get the group by alone:
SELECT dbo.fn_Split(LongTaskName, ',')
FROM [dbo].[Tasks]
GROUP BY dbo.fn_Split(LongTaskName, ',')
And I know I need to head down something like this:
DECLARE #RowSet TABLE (GroupName nvarchar(1024), Id nvarchar(5))
insert into #RowSet
select ???
FROM [dbo].Tasks as T
INNER JOIN
(
SELECT dbo.fn_Split(LongTaskName, ',')
FROM [dbo].[Tasks]
GROUP BY dbo.fn_Split(LongTaskName, ',')
) G
ON T.??? = G.???
ORDER BY ???
INSERT INTO dbo.GroupedTasks(GroupName, Id)
select * from #RowSet
But I am not quite groking how to reference the grouped relationships and am confused about having to call split multiple times.
Any thoughts?
If you only care about the first item in the list, there's no need really for a function. I would recommend this way. You also don't need the #RowSet table variable for any temporary holding.
INSERT dbo.GroupedTasks(GroupName, Id)
SELECT
LEFT(LongTaskName, COALESCE(NULLIF(CHARINDEX(',', LongTaskName)-1, -1), 1024)),
Id
FROM dbo.Tasks;
It is even easier if the tasks are 1-character long, you can use LEFT(LongTaskName, 1) instead of the ugly SUBSTRING/CHARINDEX mess. But I'm guessing your task names are not one character long (if this is the case, you should include some data that varies a bit so that others don't make assumptions about length).
Now, keep in mind that you'll have to do something like this to keep dbo.GroupedTasks up to date every time a dbo.Tasks row is inserted, updated or deleted. How are you going to keep these two tables in sync?
More to the point, you should consider storing the top priority task separately in the first place, either by using a computed column or separating it out before the insert. Munging data together is something that you do with hash tables and arrays in application code, but it rarely has any positive attributes inside a database. You almost always spend more time and effort extracting the data apart than you ever saved by keeping it together in the first place. This will negate the need for a second table at all.
Select Id, Split( ',', LongTaskName ) as GroupName into TasksWithGroupInfo
Does this answer your question?

What is the best way to implement a substring search in SQL?

We have a simple SQL problem here. In a varchar column, we wanted to search for a string anywhere in the field. What is the best way to implement this for performance? Obviously an index is not going to help here, any other tricks?
We are using MySQL and have about 3 million records. We need to execute many of these queries per second so really trying to implement these with the best performance.
The most simple way to do this is so far is:
Select * from table where column like '%search%'
I should further specify that the column is actually a long string like "sadfasdfwerwe" and I have to search for "asdf" in this column. So they are not sentences and trying to match a word in them. Would full text search still help here?
Check out my presentation Practical Fulltext Search in MySQL.
I compared:
LIKE predicates
Regular expression predicates (no better than LIKE)
MyISAM FULLTEXT indexing
Sphinx Search
Apache Lucene
Inverted indexing
Google Custom Search Engine
Today what I would use is Apache Solr, which puts Lucene into a service with a bunch of extra features and tools.
Re your comment: Aha, okay, no. None of the fulltext search capabilities I mentioned are going to help, since they all assume some kind of word boundaries
The other way to efficiently find arbitrary substrings is the N-gram approach. Basically, create an index of all possible sequences of N letters and point to the strings where each respective sequence occurs. Typically this is done with N=3, or a trigram, because it's a point of compromise between matching longer substrings and keeping the index to a manageable size.
I don't know of any SQL database that supports N-gram indexing transparently, but you could set it up yourself using an inverted index:
create table trigrams (
trigram char(3) primary key
);
create table trigram_matches (
trigram char(3),
document_id int,
primary key (trigram, document_id),
foreign key (trigram) references trigrams(trigram),
foreign key (document_id) references mytable(document_id)
);
Now populate it the hard way:
insert into trigram_matches
select t.trigram, d.document_id
from trigrams t join mytable d
on d.textcolumn like concat('%', t.trigram, '%');
Of course this will take quite a while! But once it's done, you can search much more quickly:
select d.*
from mytable d join trigram_matches t
on t.document_id = d.document_id
where t.trigram = 'abc'
Of course you could be searching for patterns longer than three characters, but the inverted index still helps to narrow your search a lot:
select d.*
from mytable d join trigram_matches t
on t.document_id = d.document_id
where t.trigram = 'abc'
and d.textcolumn like '%abcdef%';
I you want to match whole words, look at a FULLTEXT index & MATCH() AGAINST(). And of course, take a load of your database server: cache results for a appropriate amount of time for you specific needs.
First, maybe this is an issue with a badly designed table that stores a delimited string in one field instead of correctly designing to make a related table. If this is the case, you should fix your design.
If you have a field with long descriptive text (saya a notes field) and the search is always by whole word, you can do a full-text search.
Consider if you can require your users to at least give you the first character of what they are searching for if it is an ordinary field like Last_name.
Consider doing an exact match search first and only performing the wildcard match if no results are returned. This will work if you have users who can provide exact matches. We did this once with airport name searches, it came back really fast if they put inthe exact name and slower if they did not.
If you want to search just for strings that are not words that may be somewhere in the text, you are pretty much stuck with bad performance.
mysql fulltext search's quality (for this purpose) is poor, if your language is not English
trigram search gives very good results, for this task
postgreSQL has trigram index, it's easy to use :)
but if you need to do it in mysql, try this, improved version of Bill Karwin's answer:
-each trigram is stored only once
-a simple php class uses the data
<?php
/*
# mysql table structure
CREATE TABLE `trigram2content` (
`trigram_id` int NOT NULL REFERENCES trigrams(id),
`content_type_id` int(11) NOT NULL,
`record_id` int(11) NOT NULL,
PRIMARY KEY (`content_type_id`,`trigram_id`,`record_id`)
);
#each trigram is stored only once
CREATE TABLE `trigrams` (
`id` int not null auto_increment,
`token` varchar(3) NOT NULL,
PRIMARY KEY (id),
UNIQUE token(token)
) DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
SELECT count(*), record_id FROM trigrams t
inner join trigram2content c ON t.id=c.trigram_id
WHERE (
t.token IN ('loc','ock','ck ','blo',' bl', ' bu', 'bur', 'urn')
AND c.content_type_id = 0
)
GROUP by record_id
ORDER BY count(*) DESC
limit 20;
*/
class trigram
{
private $dbLink;
var $types = array(
array(0, 'name'),
array(1, 'city'));
function trigram()
{
//connect to db
$this->dbLink = mysql_connect("localhost", "username", "password");
if ($this->dbLink) mysql_select_db("dbname");
else mysql_error();
mysql_query("SET NAMES utf8;", $this->dbLink);
}
function get_type_value($type_name){
for($i=0; $i<count($this->types); $i++){
if($this->types[$i][1] == $type_name)
return $this->types[$i][0];
}
return "";
}
function getNgrams($word, $n = 3) {
$ngrams = array();
$len = mb_strlen($word, 'utf-8');
for($i = 0; $i < $len-($n-1); $i++) {
$ngrams[] = mysql_real_escape_string(mb_substr($word, $i, $n, 'utf-8'), $this->dbLink);
}
return $ngrams;
}
/**
input: array('hel', 'ell', 'llo', 'lo ', 'o B', ' Be', 'Bel', 'ell', 'llo', 'lo ', 'o ')
output: array(1, 2, 3, 4, 5, 6, 7, 2, 3, 4, 8)
*/
private function getTrigramIds(&$t){
$u = array_unique($t);
$q = "SELECT * FROM trigrams WHERE token IN ('" . implode("', '", $u) . "')";
$query = mysql_query($q, $this->dbLink);
$n = mysql_num_rows($query);
$ids = array(); //these trigrams are already in db, they have id
$ok = array();
for ($i=0; $i<$n; $i++)
{
$row = mysql_fetch_array($query, MYSQL_ASSOC);
$ok []= $row['token'];
$ids[ $row['token'] ] = $row['id'];
}
$diff = array_diff($u, $ok); //these trigrams are not yet in the db
foreach($diff as $n){
mysql_query("INSERT INTO trigrams (token) VALUES('$n')", $this->dbLink);
$ids[$n]= mysql_insert_id();
}
//so many ids than items (if a trigram occurs more times in input, then it will occur more times in output as well)
$result = array();
foreach($t as $n){
$result[]= $ids[$n];
}
return $result;
}
function insertData($id, $data, $type){
$t = $this->getNgrams($data);
$id = intval($id);
$type = $this->get_type_value($type);
$tIds = $this->getTrigramIds($t);
$q = "INSERT INTO trigram2content (trigram_id, content_type_id, record_id) VALUES ";
$rows = array();
foreach($tIds as $n => $tid){
$rows[]= "($tid, $type, $id)";
}
$q .= implode(", ", $rows);
mysql_query($q, $this->dbLink);
}
function updateData($id, $data, $type){
mysql_query("DELETE FROM trigram2content WHERE record_id=".intval($id)." AND content_type_id=".$this->get_type_value($type), $this->dbLink);
$this->insertData($id, $data, $type);
}
function search($str, $type){
$tri = $this->getNgrams($str);
$max = count($tri);
$q = "SELECT count(*), count(*)/$max as score, record_id FROM trigrams t inner join trigram2content c ON t.id=c.trigram_id
WHERE (
t.token IN ('" . implode("', '", $tri) . "')
AND c.content_type_id = ".$this->get_type_value($type)."
)
GROUP by record_id
HAVING score >= 0.6
ORDER BY count(*) DESC
limit 20;";
$query = mysql_query($q, $this->dbLink);
$n = mysql_num_rows($query);
$result = array();
for ($i=0; $i<$n; $i++)
{
$row = mysql_fetch_array($query, MYSQL_ASSOC);
$result[] = $row;
}
return $result;
}
};
and usage:
$t = new trigram();
$t->insertData(1, "hello bello", "name");
$t->insertData(2, "hellllo Mammmma mia", "name");
print_r($t->search("helo", "name"));

MySQL: Compare differences between two tables

Same as oracle diff: how to compare two tables? except in mysql.
Suppose I have two tables, t1 and t2 which are identical in layout but which may contain different data.
What's the best way to diff these two tables?
To be more precise, I'm trying to figure out a simple SQL query that tells me if data from one row in t1 is different from the data from the corresponding row in t2
It appears I cannot use the intersect nor minus. When I try
SELECT * FROM robot intersect SELECT * FROM tbd_robot
I get an error code:
[Error Code: 1064, SQL State: 42000] You have an error in your SQL
syntax; check the manual that corresponds to your MySQL server version
for the right syntax to use near 'SELECT * FROM tbd_robot' at line 1
Am I doing something syntactically wrong? If not, is there another query I can use?
Edit: Also, I'm querying through a free version DbVisualizer. Not sure if that might be a factor.
INTERSECT needs to be emulated in MySQL:
SELECT 'robot' AS `set`, r.*
FROM robot r
WHERE ROW(r.col1, r.col2, …) NOT IN
(
SELECT col1, col2, ...
FROM tbd_robot
)
UNION ALL
SELECT 'tbd_robot' AS `set`, t.*
FROM tbd_robot t
WHERE ROW(t.col1, t.col2, …) NOT IN
(
SELECT col1, col2, ...
FROM robot
)
You can construct the intersection manually using UNION. It's easy if you have some unique field in both tables, e.g. ID:
SELECT * FROM T1
WHERE ID NOT IN (SELECT ID FROM T2)
UNION
SELECT * FROM T2
WHERE ID NOT IN (SELECT ID FROM T1)
If you don't have a unique value, you can still expand the above code to check for all fields instead of just the ID, and use AND to connect them (e.g. ID NOT IN(...) AND OTHER_FIELD NOT IN(...) etc)
I found another solution in this link
SELECT MIN (tbl_name) AS tbl_name, PK, column_list
FROM
(
SELECT ' source_table ' as tbl_name, S.PK, S.column_list
FROM source_table AS S
UNION ALL
SELECT 'destination_table' as tbl_name, D.PK, D.column_list
FROM destination_table AS D
) AS alias_table
GROUP BY PK, column_list
HAVING COUNT(*) = 1
ORDER BY PK
select t1.user_id,t2.user_id
from t1 left join t2 ON t1.user_id = t2.user_id
and t1.username=t2.username
and t1.first_name=t2.first_name
and t1.last_name=t2.last_name
try this. This will compare your table and find all matching pairs, if any mismatch return NULL on left.
Based on Haim's answer I created a PHP code to test and display all the differences between two databases.
This will also display if a table is present in source or test databases.
You have to change with your details the <> variables content.
<?php
$User = "<DatabaseUser>";
$Pass = "<DatabasePassword>";
$SourceDB = "<SourceDatabase>";
$TestDB = "<DatabaseToTest>";
$link = new mysqli( "p:". "localhost", $User, $Pass, "" );
if ( mysqli_connect_error() ) {
die('Connect Error ('. mysqli_connect_errno() .') '. mysqli_connect_error());
}
mysqli_set_charset( $link, "utf8" );
mb_language( "uni" );
mb_internal_encoding( "UTF-8" );
$sQuery = 'SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA="'. $SourceDB .'";';
$SourceDB_Content = query( $link, $sQuery );
if ( !is_array( $SourceDB_Content) ) {
echo "Table $SourceDB cannot be accessed";
exit(0);
}
$sQuery = 'SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA="'. $TestDB .'";';
$TestDB_Content = query( $link, $sQuery );
if ( !is_array( $TestDB_Content) ) {
echo "Table $TestDB cannot be accessed";
exit(0);
}
$SourceDB_Tables = array();
foreach( $SourceDB_Content as $item ) {
$SourceDB_Tables[] = $item["TABLE_NAME"];
}
$TestDB_Tables = array();
foreach( $TestDB_Content as $item ) {
$TestDB_Tables[] = $item["TABLE_NAME"];
}
//var_dump( $SourceDB_Tables, $TestDB_Tables );
$LookupTables = array_merge( $SourceDB_Tables, $TestDB_Tables );
$NoOfDiscrepancies = 0;
echo "
<table border='1' width='100%'>
<tr>
<td>Table</td>
<td>Found in $SourceDB (". count( $SourceDB_Tables ) .")</td>
<td>Found in $TestDB (". count( $TestDB_Tables ) .")</td>
<td>Test result</td>
<tr>
";
foreach( $LookupTables as $table ) {
$FoundInSourceDB = in_array( $table, $SourceDB_Tables ) ? 1 : 0;
$FoundInTestDB = in_array( $table, $TestDB_Tables ) ? 1 : 0;
echo "
<tr>
<td>$table</td>
<td><input type='checkbox' ". ($FoundInSourceDB == 1 ? "checked" : "") ."></td>
<td><input type='checkbox' ". ($FoundInTestDB == 1 ? "checked" : "") ."></td>
<td>". compareTables( $SourceDB, $TestDB, $table ) ."</td>
</tr>
";
}
echo "
</table>
<br><br>
No of discrepancies found: $NoOfDiscrepancies
";
function query( $link, $q ) {
$result = mysqli_query( $link, $q );
$errors = mysqli_error($link);
if ( $errors > "" ) {
echo $errors;
exit(0);
}
if( $result == false ) return false;
else if ( $result === true ) return true;
else {
$rset = array();
while ( $row = mysqli_fetch_assoc( $result ) ) {
$rset[] = $row;
}
return $rset;
}
}
function compareTables( $source, $test, $table ) {
global $link;
global $NoOfDiscrepancies;
$sQuery = "
SELECT column_name,ordinal_position,data_type,column_type FROM
(
SELECT
column_name,ordinal_position,
data_type,column_type,COUNT(1) rowcount
FROM information_schema.columns
WHERE
(
(table_schema='$source' AND table_name='$table') OR
(table_schema='$test' AND table_name='$table')
)
AND table_name IN ('$table')
GROUP BY
column_name,ordinal_position,
data_type,column_type
HAVING COUNT(1)=1
) A;
";
$result = query( $link, $sQuery );
$data = "";
if( is_array( $result ) && count( $result ) > 0 ) {
$NoOfDiscrepancies++;
$data = "<table><tr><td>column_name</td><td>ordinal_position</td><td>data_type</td><td>column_type</td></tr>";
foreach( $result as $item ) {
$data .= "<tr><td>". $item["column_name"] ."</td><td>". $item["ordinal_position"] ."</td><td>". $item["data_type"] ."</td><td>". $item["column_type"] ."</td></tr>";
}
$data .= "</table>";
return $data;
}
else {
return "Checked but no discrepancies found!";
}
}
?>
Problem below, is to compare table before and after i do big update!.
If you use Linux, you can use commands as follow:
In terminal,
mysqldump -hlocalhost -uroot -p schema_name_here table_name_here > /home/ubuntu/database_dumps/dump_table_before_running_update.sql
mysqldump -hlocalhost -uroot -p schema_name_here table_name_here > /home/ubuntu/database_dumps/dump_table_after_running_update.sql
diff -uP /home/ubuntu/database_dumps/dump_some_table_after_running_update.sql /home/ubuntu/database_dumps/dump_table_before_running_update.sql > /home/ubuntu/database_dumps/diff.txt
You will need online tools for
Formatting SQL exported from the dumps,
e.g http://www.dpriver.com/pp/sqlformat.htm [Not the best I've seen]
We have diff.txt, you have to take manually the + - showing inside, which is 1 line of insert statements, that has the values.
Do diff online for the 2 lines - & + in diff.txt, past them in online diff tool
e.g https://www.diffchecker.com [you can save and share it, and has no limit on file size!]
Note: be extra careful if its sensitive/production data!
you can try The big data comparison platform in https://github.com/zhugezifang/dataCompare
this is a introduction of it
Design and practice of open source big data comparison platform
1. Background & current situation
In the process of developing large numbers, it is often encountered that data migration or upgrade, or different business parties have processed data according to their needs, but think that the data on both sides is still the same, so it will be necessary to manually compare the data. So is the data on both sides consistent? If not, what are the differences?
If there is no platform, you need to manually write some SQL scripts for comparison, and there is no evaluation standard. This is inefficient.
"Alibaba's Road to Big Data" actually mentions such a platform, but because it is not used externally, the introduction in the book is relatively simple. Based on previous work experience, a big data comparison platform was developed to assist in verifying data, named dataCompare.
Main solutions:
(1) Verify data and data comparison, which wastes great labor costs
(2) Without a set of standards, the results of verification are difficult to evaluate
(3) Automatic data verification and comparison can be achieved by interface interaction, check or low-code
[enter image description here][1]
2. Purpose
(1) Automatic data verification and comparison can be achieved by interface interaction, check or low-code.
(2) The data team's data comparison efficiency is increased by at least about 50%.
(3) A set of unified data verification scheme to meet the standard specifications of data verification and comparison
3. System architecture design
4. The current version has implemented the following functions
(1) Low-code simple configuration completes the core function of data comparison
(2) Data magnitude comparison and data consistency comparison
5. Follow-up Development Plan
(1) Discrepancy case finding
(2) Data pointer detection---- enumeration value detection, range detection, numerical detection, primary key mode detection
(3) Data comparison task is scheduled and automatically scheduled
(4) Automatically send an email report to the comparison results
6. The core code is opening in githup
https://github.com/zhugezifang/dataCompare
[enter image description here][1]
Based on Haim's answer here's a simplified example if you're looking to compare values that exist in BOTH tables, otherwise if there's a row in one table but not the other it will also return it....
Took me a couple of hours to figure out. Here's a fully tested simply query for comparing "tbl_a" and "tbl_b"
SELECT ID, col
FROM
(
SELECT
tbl_a.ID, tbl_a.col FROM tbl_a
UNION ALL
SELECT
tbl_b.ID, tbl_b.col FROM tbl_b
) t
WHERE ID IN (select ID from tbl_a) AND ID IN (select ID from tbl_b)
GROUP BY
ID, col
HAVING COUNT(*) = 1
ORDER BY ID
So you need to add the extra "where in" clause:
WHERE ID IN (select ID from tbl_a) AND ID IN (select ID from tbl_b)
Also:
For ease of reading if you want to indicate the table names you can use the following:
SELECT tbl, ID, col
FROM
(
SELECT
tbl_a.ID, tbl_a.col, "name_to_display1" as "tbl" FROM tbl_a
UNION ALL
SELECT
tbl_b.ID, tbl_b.col, "name_to_display2" as "tbl" FROM tbl_b
) t
WHERE ID IN (select ID from tbl_a) AND ID IN (select ID from tbl_b)
GROUP BY
ID, col
HAVING COUNT(*) = 1
ORDER BY ID
you can user my own developed tool
https://github.com/hardeepvicky/MySql-Schema-Compare
I tried the above answer but found that if one table has null values and the second table has values in a column then the intersect code above does not report this fact.
select p.pcn,p.period,p.account_no,p.ytd_debit,a.ytd_debit
-- select count(*) -- 157,283
from Plex.account_period_balance p -- 157,283/202207,148,998
join Azure.account_period_balance a -- 157,283/202207,148,998
on p.pcn = a.pcn
and p.period = a.period
and p.account_no = a.account_no -- 157,283
where p.period_display = a.period_display -- 157,283
and p.debit = a.debit -- 157,283
-- and p.ytd_debit = a.ytd_debit -- 148,998
-- and p.ytd_debit != a.ytd_debit -- 0

Is there SQL parameter binding for arrays?

Is there a standard way to bind arrays (of scalars) in a SQL query? I want to bind into an IN clause, like so:
SELECT * FROM junk WHERE junk.id IN (?);
I happen to be using Perl::DBI which coerces parameters to scalars, so I end up with useless queries like:
SELECT * FROM junk WHERE junk.id IN ('ARRAY(0xdeadbeef)');
Clarification: I put the query in its own .sql file, so the string is already formed. Where the answers mention creating the query string dynamically I'd probably do a search and replace instead.
Edit: This question is kind of a duplicate of Parameterizing a SQL IN clause?. I originally thought that it should be closed as such, but it seems like it's accumulating some good Perl-specific info.
If you don't like the map there, you can use the 'x' operator:
my $params = join ', ' => ('?') x #foo;
my $sql = "SELECT * FROM table WHERE id IN ($params)";
my $sth = $dbh->prepare( $sql );
$sth->execute( #foo );
The parentheses are needed around the '?' because that forces 'x' to be in list context.
Read "perldoc perlop" and search for 'Binary "x"' for more information (it's in the "Multiplicative Operators" section).
You specify "this is the SQL for a query with one parameter" -- that won't work when you want many parameters. It's a pain to deal with, of course. Two other variations to what was suggested already:
1) Use DBI->quote instead of place holders.
my $sql = "select foo from bar where baz in ("
. join(",", map { $dbh->quote($_) } #bazs)
. ")";
my $data = $dbh->selectall_arrayref($sql);
2) Use an ORM to do this sort of low level stuff for you. DBIx::Class or Rose::DB::Object, for example.
I do something like:
my $dbh = DBI->connect( ... );
my #vals= ( 1,2,3,4,5 );
my $sql = 'SELECT * FROM table WHERE id IN (' . join( ',', map { '?' } #vals ) . ')';
my $sth = $dbh->prepare( $sql );
$sth->execute( #vals );
And yet another way to build SQL is to use something like SQL::Abstract....
use SQL::Abstract;
my $sql = SQL::Abstract->new;
my $values = [ 1..3 ];
my $query = $sql->select( 'table', '*', { id => { -in => $values } } );
say $query; # => SELECT * FROM table WHERE ( id IN ( ?, ?, ? ) )
With plain DBI you'd have to build the SQL yourself, as suggested above. DBIx::Simple (a wrapper for DBI) does this for you automatically using the '??' notation:
$db->query("select * from foo where bar in (??)", #values);
In python, I've always ended up doing something like:
query = 'select * from junk where junk.id in ('
for id in junkids:
query = query + '?,'
query = query + ')'
cursor.execute(query, junkids)
...which essentially builds a query with one '?' for each element of the list.
(and if there's other parameters in there too, you need to make sure you line things up correctly when you execute the query)
[edit to make the code easier to understand for non-python people. There is a bug, where the query will have an extra comma after the last ?, which I will leave in because fixing it would just cloud the general idea]
I use DBIx::DWIW. It contains a function called InList(). This will create the part of the SQL that is needed for the list. However this only works if you have all your SQL in the program instead of outside in a separate file.
Use
SELECT * FROM junk WHERE junk.id = ANY (?);
instead