How to adapt a GROUP BY across multiple tables? - sql

I'm trying to optimize a SQL query that uses a GROUP BY across multiple tables. Essentially, I have multiple tables which all contain a PID column, and the output I need is a record of every PID in all of the tables as well as a count of how many records across all of those tables contain that PID. When trying to use GROUP BY PID, I get a "column ambiguously defined" error if using multiple tables. Here is an example of the code I am using to retrieve the proper data from one table (can ignore the where clause):
select pid, count(*)
from table1
where vendor_id in(1,2)
and delay_code <=23
and age between 18 and 49 and sex = 'M'
group by pid
Essentially, I want to do this across a group of tables (i.e. table1, table2, table3 etc), but can't figure out how to do so without getting a "column ambiguously defined" error.

You need to identify what record you are referencing. You can do that either by specifying the table, or using an alias. Aliases are required when you have multiple references to the same table.
Specify table:
SELECT table1.pid, COUNT(*)
FROM table1
GROUP BY table1.pid
Use alias:
SELECT t1.pid, COUNT(*)
FROM table1 AS t1
GROUP BY t1.pid

Related

How to Identify matching records in two tables?

I have two tables with same column names. There are a total 40 columns in each table. Both the tables have same unique IDs. If I perform an inner join on the ID columns I get a match on 80% of the data. However, I would like to see if this match has exactly same data in each of the columns.
If there were a few rows like say 50-100 I could have performed a simple union operation ordered by ID and manually checked for the data. But both the tables contain more than 5000 records.
Is a join on each of the columns a valid solution for this or do I need to perform concatenation?
Suppose you have N columns, you can add GROUP BY COL1,COL2,....COLN
select * from table1
union all
select * from table2
group by COL1, COL2, ... , COLN
having count(*)>1;
Reference: link

Duplicate column names in the result are not supported in BigQuery

I'm trying to select some columns in BQ and getting a complaint about duplicate IDs:
Duplicate column names in the result are not supported. Found duplicate(s): id
The query I'm using is:
SELECT
billing_account_id,service.id,service.description,sku.id
FROM `billing-management-edab.billing_dataset.gcp_billing_export_v1_blah_blah_blah`
Why are service.id and sku.id considered duplicates? And how can I get around that in my query?
Give aliases to the two id columns:
SELECT
billing_account_id,
service.id AS service_id,
service.description,
sku.id AS sku_id
FROM `billing-management-edab.billing_dataset.gcp_billing_export_v1_blah_blah_blah`
Actually, you may have left out part of your query, in particular the other table(s) which themselves were aliased as service and sku. But in any case, giving each of the two id columns in your select clause as a distinct alias should resolve your error.

Duplicate values SQL (MS Access)

I need to find duplicate records across 2 or more fields. But using this does not work in Access:
SELECT assay.depth_from, assay.au_gt
FROM assay
GROUP BY depth_from, au_gt
HAVING count(*) >1;
Am I missing something? It does match up with various answers here so not sure what.
I just get a records with duplicate depth_from but the au_gt are not duplicate. Actually not all the depth_from are even all duplicated.
I see two possible syntax issues with your SQL. First, you probably don't need to use the assay. prefix before your field names since you have specified which table you are selecting from, and this makes your reference to those fields in GROUP BY inconsistent. If you do use assay. in your SELECT statement use it in GROUP BY as well. Secondly, you should include count(*) in the SELECT statement. This is basically for the same reason- whatever you reference in GROUP BY and HAVING should be the column names you specified in SELECT. Try this:
SELECT depth_from, au_gt, count(*)
FROM assay
GROUP BY depth_from, au_gt
HAVING count(*) >1;

How to merge two table using order by?

While trying to merge two tables, when rows not matched how do I insert rows based on an order. For example in table_2 I have a column "Type" (sample values 1,2,3 etc), so when I do an insert for unmatched codes I need to insert records with type as 1 first, then 2 etc.
So far I tried below code
WITH tab1 AS
(
select * From TABLE_2 order by Type
)
merge tab1 as Source using TABLE_1 as Target on Target.Code=Source.Code
when matched then update set Target.Description=Source.Description
when not matched then insert (Code,Description,Type)
values (Source.Code,Source.Description,Source.Type);
But I get "The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified." error because of using order by in sub query.
So how do I insert records based on an order while merging two table?
Thanks in advance.
Change
select *
to
select top 100 percent
That will allow you to use ORDER BY in the first select

DISTINCT pulling duplicate column values

The following query is pulling duplicate site_ids, with me using DISTINCT I can't figure out why...
SELECT
DISTINCT site_id,
deal_woot.*,
site.woot_off,
site.name AS site_name
FROM deal_woot
INNER JOIN site ON site.id = site_id
WHERE site_id IN (2, 3, 4, 5, 6)
ORDER BY deal_woot.id DESC LIMIT 5
DISTINCT looks at the entire record, not just the column directly after it. To accomplish what you want, you'll need to use GROUP BY:
Non-working code:
SELECT
site_id,
deal_woot.*,
site.woot_off,
site.name AS site_name
FROM deal_woot
INNER JOIN site ON site.id = site_id
WHERE site_id IN (2, 3, 4, 5, 6)
GROUP BY site_id
Why doesn't it work? If you GROUP BY a column, you should use an aggregate function (such as MIN or MAX) on the rest of the columns -- otherwise, if there are multiple site_woot_offs for a given site_id, it's not clear to SQL which of those values you want to SELECT.
You will probably have to expand deal_woot.* to list each of its fields.
Side-note: If you're using MySQL, I believe it's not technically necessary to specify an aggregate function for the remaining columns. If you don't specify an aggregate function for a column, it chooses a single column value for you (usually the first value in the result set).
Your query is returning DISTINCT rows, it is not just looking at site_id. In other words, if any of the columns are different, a new row is returned from this query.
This makes sense, because if you actually do have differences, what should the server return as values for deal_woot.* ? If you want to do this, you need to specify this - perhaps done by getting distinct site_id's, then getting LIMIT 1 of the other values in a subquery with an appropiate ORDER BY clause.
You are selecting distinct value from one table only. When you join with the other table it will pull all rows that match each of your distinct value from the other table, causing duplicate id's
If you want to select site info and a single row from deal_woot table with the same site_id, you need to use a different query. For example,
SELECT site.id, deal_woot.*, site.woot_off, site.name
FROM site
INNER JOIN
(SELECT site_id, MAX(id) as id FROM deal_woot
WHERE site_id IN (2,3,4,5,6) GROUP BY site_id) X
ON (X.site_id = site.id)
INNER JOIN deal_woot ON (deal_woot.id = X.id)
WHERE site.id IN (2,3,4,5,6);
This query should work regardless of sql dialect/db vendor. For mysql, you can just add group by site_id to your original query, since it lets you use GROUP BY without aggregate functions.
** I assume that deal_woot.id and site.id are primary keys for deal_woot and site tables respectively.