Get Count of Each Distinct Pair - sql

I am trying to create a query that will give me all combinations of original sources and new sources, along with how many times each occur. What I have below seems to do the first part of giving me all of the different pairs, but I am struggling with getting it to display how many occurrences each have.
SELECT DISTINCT original_source, new_source
FROM sources
WHERE identifier = 1
ORDER BY original_source

You need to use GROUP BY
SELECT original_source, new_source, count(1) [Count]
FROM sources
WHERE identifier = 1
GROUP BY original_source, new_source
ORDER BY original_source

Related

(Hive) SQL retrieving data from a column that has 1 to N relationship in another column

How can I retrieve rows where BID comes up multiple times in AID
You can see the sample below, AID and BID columns are under the PrimaryID, and BIDs are under AID. I want to come up with an output that only takes records where BIDs had 1 to many relationship with records on AIDs column. Example output below.
I provided a small sample of data, I am trying to retrieve 20+ columns and joining 4 tables. I have unqiue PrimaryIDs and under those I have multiple unique AIDs, however under these AIDs I can have multiple non-unqiue BIDs that can repeatedly come up under different AIDs.
Hive supports window functions. A window function can associate every row in a group with an attribute of the group. Count() being one of the supported functions. In your case you can use that a and select rows for which that count > 1
The partition by clause you specify which columns define the group, tge same way that you would in the more familiar group by clause.
Something like this:
select * from
(
Select *,
count(*) over (partition by primaryID,AID) counts
from mytable
) x
Where counts>1

Validate that only one value exists

I have a table with two relevant columns. I'll call them EID and MID. They are not unique.
In theory, if the data is set up correctly, there will be many records for each EID and every one of those records should have the same MID.
There are situations where someone may manually update data incorrectly and I need to be able to quickly identify if there is a second MID for any EID.
Ideally, I'd have a query that returns how many MIDs for each EID, but only showing results where there is more than 1 MID. Below is what I'd like the results to look like.
EID Count of Distinct MID values
200345 2
304334 3
I've tried several different forms of queries, but I can't seem to figure out how to reach this result. We're on SQL Server.
You can use the following using COUNT with DISTINCT and HAVING:
SELECT EID, COUNT(DISTINCT MID)
FROM table_name
GROUP BY EID
HAVING COUNT(DISTINCT MID) > 1
demo on dbfiddle.uk

SELECT list expression references column user_id which is neither grouped nor aggregated at [8:5]

I have 2 data sets. One of all patients who got ill (endo-2) and one of a special group of patients that also exists in endo-2 called "xp-56"
I've been trying to run this query and I'm not sure why it isn't working. I want to do counts of 3 columns in endo-2 of those patients that belong in the xp-56 table.
this is the code I've been using with the following error
SELECT list expression references column user_id which is neither grouped nor aggregated at [8:5]
how do I fix this so I never make the same mistake again!
SELECT
Virus_Exposure,
Medical_Delivery,
Number_of_Site
FROM
(
SELECT
medical_id,
COUNT(DISTINCT Virus_id) AS Virus_Exposure,
COUNT(EndoCrin_id) AS Medical_Delivery,
COUNT (site_id_clinic) AS Number_of_Site
FROM
`endo-2`
WHERE
_PARTITIONTIME BETWEEN TIMESTAMP("2017-12-15")
AND TIMESTAMP("2018-01-10")) AS a
RIGHT JOIN
(
SELECT
medical_id
FROM
`xp-56`
ORDER BY
medical_id DESC) AS b
ON
a.medical_id=b.medical_id
GROUP BY
medical_id
Why doesnt the medical_id in table a work?
Why not just do this?
SELECT e.medical_id,
COUNT(DISTINCT e.Virus_id) AS Virus_Exposure,
COUNT(e.EndoCrin_id) AS Medical_Delivery,
COUNT(e.site_id_clinic) AS Number_of_Site
FROM `endo-2` e JOIN
`xp-56` x
ON x.medical_id = e.medical_id
WHERE e._PARTITIONTIME BETWEEN TIMESTAMP("2017-12-15") AND TIMESTAMP("2018-01-10")
GROUP BY e.medical_id;

Querying records that meet muliple criteria

Hi I’m trying to write a query and I’m struggling to figure out how to go about it.
I have a suppliers table and a supplier parts table I want to write a query that lists suppliers that have specified related Parts in the supplier parts table. If a supplier doesn’t have all specified related parts then they should not be listed.
At the moment I have written a very basic query that lists the supplier if they have a related supplier part that meets the criteria.
SELECT id ,name
FROM
efacdb.dbo.suppliers INNER JOIN [efacdb].[dbo].[spmatrix] ON
id = spmsupp
WHERE spmpart
IN ('ALUM_5083', 'ALUM_6082')
I only want to show the supplier if they have both parts related. Does anyone know how I could do this?
Use a subquery with counting distinct occurences:
select * from suppliers s
where 2 = (select count(distinct spmpart) from spmatrix
where id = spmsupp and spmpart in ('ALUM_5083', 'ALUM_6082'))
As a note, you can modify your query to get what you want, just by using an aggregation:
SELECT id, name
FROM efacdb.dbo.suppliers INNER JOIN
[efacdb].[dbo].[spmatrix]
ON id = spmsupp
WHERE spmpart IN ('ALUM_5083', 'ALUM_6082')
GROUP BY id, name
HAVING MIN(spmpart) <> MAX(spmpart);
If you know there are no duplicates, then having count(*) = 2 also solves the problem.

Need SQL query to group together but sort overall

I have a table with Display_UPC, Brand, Item_Description, and other fields. There are several items with the same Display_UPC (all items belonging to the same display), and some displays have multiple brands.
I'm trying to print out a page that shows all of the Display contents (all Display_UPC together) with the various item descriptions, but sorted by Brand (so it starts with the "A" brands at the top of the page, then "B" brands, etc...).
Problem is, if I try:
SELECT DISTINCT *
FROM tbl_All_Displays
ORDER BY Brand, Display_UPC
some of the displays (the ones containing multiple brands) are missing some items because they are different brands. I can get rid of "Brand" in the ORDER BY and it returns complete displays together but they are not sorted by Brand (obviously).
I'm guessing maybe a GROUP BY is needed here but I can't get one to work. If I try something like:
SELECT DISTINCT *
FROM tbl_All_Displays
GROUP BY Display_UPC
ORDER BY Brand, Display_UPC
I get the error:
Column 'tbl_All_Displays.Item_Description' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
But I need Item_Description (and the other various fields) to be displayed on the page. They just aren't important in the ordering/grouping.
Sample Data:
Sample Expected Result:
So, basically, it doesn't matter which brand in a display the query uses in the sort. If a display contains a brand then it's okay for it to belong to that brand "group" if that makes sense. Is this possible?
Note: I deleted and reposted a previos question because it got messy with edits.
Edit: Here is sqlfiddle with sample table data - http://sqlfiddle.com/#!6/5069c
You want to sort first by the min(Brand) of each Display_UPC?
Then you need to sort by a "Group Min" first (fiddle:
SELECT *
FROM Table1
ORDER BY
min(Brand) over (partition by Display_UPC),
Display_UPC,
Brand
dnoeth did a great solution. But i want show the one I was working with.
SQL FIDDLE DEMO
with
minBrand as (
SELECT Display_UPC, MIN(BRAND) Brand
from Table1
GROUP BY Display_UPC
)
select m.Brand MainBrand, t.*
from minBrand m
inner join Table1 t
on m.Display_UPC = t.Display_UPC
order by m.Brand, t.Display_UPC, t.Brand, t.Item_Description