SQL query result from multiple tables without duplicates - sql

I have a number of tables with filtered from all the records customer ID's, Last Order Date and that order Total $, Segment Name. Each filter is based on different criteria but, same customer ID can belong two different tables, two different segments. Same ID would have different values in Last Order and Total in . Segments, table names are A, B, C, D.
I need to group the records from All the segment tables in a way that there are no duplicate ID's in the set. i.e.: if an ID appears in more than one table (say ID 2 is in tables A and B) the result set has to be showing ID columns from the first table, table A.
So I need to list of all the records and their column values from Segment A table, list of all the records and its values from Segment B table except if any ID in Segment B table is in Segment A and list of all the records from Segment C table except if ID from Segment C are in Segment A or B table . I hope it does makes sense.
I made it sound like a question from 70-461 exam :D I've researched it quite thoroughly but perhaps I don't see how to ask that questions. I wonder if anyone would have idea of how to build a query to get that result. Big thanks for any suggestions.
Thanks guys. I couldn't seem to post a screenshot. Let me try to type it via html. There are more segment tables but just typing two to give you an idea. Thanks guys!
Segment A
----------------------------------------
ID | Last Order Date | Total | Segment
----------------------------------------
1 | 01/01/2012 | $1 | A
----------------------------------------
2 | 01/01/2012 | $1 | A
----------------------------------------
3 | 01/01/2012 | $5 | A
----------------------------------------
6 | 01/01/2012 | $7 | A
----------------------------------------
8 | 01/01/2012 | $8 | A
Segment B
ID | Last Order Date | Total | Segment
--------------------------------------
4 | 01/01/2010 | $3 | B
--------------------------------------
2 | 01/01/2010 | $5 | B
--------------------------------------
1 | 01/01/2010 | $2 | B
--------------------------------------
3 | 01/01/2010 | $1 | B
--------------------------------------
5 | 01/01/2010 | $7 | B
Result Set
ID | Last Order Date | Total | Segment
--------------------------------------
1 | 01/01/2012 | $1 | A
--------------------------------------
2 | 01/01/2012 | $1 | A
--------------------------------------
3 | 01/01/2012 | $5 | A
--------------------------------------
4 | 01/01/2010 | $3 | B
--------------------------------------
5 | 01/01/2010 | $7 | B

Here's something to get you started:
SELECT ID, LastOrderDate, Total, Segment
FROM SegmentA
UNION ALL
SELECT ID, LastOrderDate, Total, Segment
FROM SegmentB
WHERE ID NOT IN (SELECT ID FROM SegmentA)
UNION ALL
SELECT ID, LastOrderDate, Total, Segment
FROM SegmentC
WHERE ID NOT IN (SELECT ID FROM SegmentA)
AND ID NOT IN (SELECT ID FROM SegmentB)
UNION ALL
SELECT ID, LastOrderDate, Total, Segment
FROM SegmentD
WHERE ID NOT IN (SELECT ID FROM SegmentA)
AND ID NOT IN (SELECT ID FROM SegmentB)
AND ID NOT IN (SELECT ID FROM SegmentC)
A very simplistic answer, more information is needed if you want to optimize this.

Related

Postgres create view with column values based on another table?

I'm implementing a view to store leaderboard data of the top 10 users that is computed using an expensive COUNT(*). I'm planning on the view to look something like this:
id SERIAL PRIMARY KEY
user_id TEXT
type TEXT
rank INTEGER
count INTEGER
-- adding an index to user_id
-- adding a two-column unique index to user_id and type
I'm having trouble with seeing how this view should be created to properly account for the rank and type. Essentially, I have a big table (~30 million rows) like this:
+----+---------+---------+----------------------------+
| id | user_id | type | created_at |
+----+---------+---------+----------------------------+
| 1 | 1 | Diamond | 2021-05-11 17:35:18.399517 |
| 2 | 1 | Diamond | 2021-05-12 17:35:17.399517 |
| 3 | 1 | Diamond | 2021-05-12 17:35:18.399517 |
| 4 | 2 | Diamond | 2021-05-13 17:35:18.399517 |
| 5 | 1 | Clay | 2021-05-14 17:35:18.399517 |
| 6 | 1 | Clay | 2021-05-15 17:35:18.399517 |
+----+---------+---------+----------------------------+
With the table above, I'm trying to achieve something like this:
+----+---------+---------+------+-------+
| id | user_id | type | rank | count |
+----+---------+---------+------+-------+
| 1 | 1 | Diamond | 1 | 3 |
| 2 | 2 | Diamond | 2 | 1 |
| 3 | 1 | Clay | 1 | 2 |
| 4 | 1 | Weekly | 1 | 5 | -- 3 diamonds + 2 clay obtained between Mon-Sun
| 5 | 2 | Weekly | 2 | 1 |
+----+---------+---------+------+-------+
By Weekly I am counting the time from the last Sunday to the upcoming Sunday.
Is this doable using only SQL, or is some kind of script needed? If doable, how would this be done? It's worth mentioning that there are thousands of different types, so not having to manually specify type would be preferred.
If there's anything unclear, please let me know and I'll do my best to clarify. Thanks!
The "weekly" rows are produced in a different way compared to the "user" rows (I called them two different "categories"). To get the result you want you can combine two queries using UNION ALL.
For example:
select 'u' as category, user_id, type,
rank() over(partition by type order by count(*) desc) as rk,
count(*) as cnt
from scores
group by user_id, type
union all
select 'w', user_id, 'Weekly',
rank() over(order by count(*) desc),
count(*) as cnt
from scores
group by user_id
order by category, type desc, rk
Result:
category user_id type rk cnt
--------- -------- -------- --- ---
u 1 Diamond 1 3
u 2 Diamond 2 1
u 1 Clay 1 2
w 1 Weekly 1 5
w 2 Weekly 2 1
See running example at DB Fiddle.
Note: For the sake of simplicity I left the filtering by timestamp out of the query. If you really needed to include only the rows of the last 7 days (or other period of time), it would be a matter of adding a WHERE clause in both subqueries.
I think this is what you were talking about, right?
WITH scores_plus_weekly AS ((
SELECT id, user_id, 'Weekly' AS type, created_at
FROM scores
WHERE created_at BETWEEN '2021-05-10' AND '2021-05-17'
)
UNION (
SELECT * FROM scores
))
SELECT
row_number() OVER (ORDER BY CASE "type" WHEN 'Diamond' THEN 0 WHEN 'Clay' THEN 1 ELSE 2 END, count(*) DESC) as "id",
user_id,
"type",
row_number() OVER (PARTITION BY count(*) DESC) as "rank",
count(*)
FROM scores_plus_weekly
GROUP BY user_id, "type"
ORDER BY "id";
I'm sure this is not the only way, but I thought the result wasn't too complex. This query first combines the original database with all scores from this week. For the sake of consistency I picked a date range that matches your entire example set. It then groups by user_id and type to get the counts for each combination. The row_numbers will give you the overall rank and the rank per type. A big part of this query consists of sorting by type, so if you're joining another table that contains the order or priority of the types, the CASE can probably be simplified.
Then, lastly, this entire query can be caught in a view using the CREATE VIEW score_ranks AS , followed by your query.

Rows that have same value in a column, sum all values in another column and display 1 row

Example Table user:
ID | USER_ID | SCORE |
1 | 555 | 50 |
2 | 555 | 10 |
3 | 555 | 20 |
4 | 123 | 5 |
5 | 123 | 5 |
6 | 999 | 30 |
The result set should be like
ID | USER_ID | SCORE | COUNT |
1 | 555 | 80 | 3 |
2 | 123 | 10 | 2 |
3 | 999 | 30 | 1 |
Is it possible to generate a sql that can return the table above, so far I can only count the rows where certain user_id appear, but don't know how to sum and show for every user ?
You've included a column called "ID" in both the source data and desired results, but I'm going to assume that these ID values are not related and simply represent the row or line number - otherwise the question doesn't make sense.
In which case, you can simply use:
SELECT
USER_ID,
SUM(SCORE) AS SCORE,
COUNT(USER_ID) AS COUNT
FROM
<Table>
GROUP BY
USER_ID
If you really want to generate the ID column as well, then how you do this depends on the database platform being used. For example on Oracle you could use the ROWNUM pseudocolumn, on SQL Server you will need to use ROW_NUMBER() function (which also works for Oracle).
SELECT ID
,sum(SCORE)
,count(USER_ID)
FROM Table
GROUP BY
ID
I think COUNT is the number of scores per user_id, if so, then your sql request should be :
SELECT
ID,
USER_ID,
SUM(SCORE)AS SCORE,
COUNT(SCORE)AS COUNT
FROM
TABLE
GROUP BY
USER_ID

HQL: DISTINCT Issue

Say I have a table of Customer, Vendor the Customer visited, with each row being a distinct time a certain customer visited a vendor.
Row | Customer | Vendor
1 | 1 | 001
2 | 1 | 001
3 | 1 | 002
4 | 2 | 001
My question is, how can i pull a query to show every distinct visit to a certain vendor. For the above table, I'd like to see output of:
Row | Customer | Vendor
1 | 1 | 001
2 | 1 | 002
3 | 2 | 001
You can simply use DISTINCT clause, assuming that the row column is just for illustration purpose here, and not part of the actual table
SELECT DISTINCT customer, vendor
FROM table
You can use group by:
select min(row) as row, Customer, Vendor
from table t
group by Customer, Vendor;

Selecting unique records from database

Running this query,
select * from table;
Returns the following
|branch | number |
-------------------
| 1 | 123 |
| 1 | 001 |
| 2 | 123 |
| 3 | 123 |
| 4 | 123 |
| 1 | 123 |
| 1 | 789 |
| 2 | 123 |
| 3 | 123 |
| 4 | 009 |
I want to find values that are unique to ONLY branch 1
| 1 | 001 |
| 1 | 789 |
Can this be done without the data being stored in separate tables? I've tried a few "select distinct" queries & don't seem to get the results I'm expecting.
SELECT branch, number
FROM table
WHERE branch = 1
GROUP BY branch, number
If you do not need any aggregates, you can use distinct instead of group by:
select distinct branch
, number
from YourTable
where branch = 1
I guess what I'm trying to say is that I want to find all numbers that are unique to ONLY branch 1. If they are found in any other branch, I don't want to see them.
I guess this is what you want.
SELECT distinct number
FROM MyTable
WHERE branch=1 and number not in
( SELECT distinct number
FROM MyTable
WHERE branch != 1 )
Try this:
SELECT branch, number
FROM table
GROUP BY branch, number
Here is a SQLFiddle for you to have a look at
If you want to limit it to only branch 1, then just add a where clause.
SELECT branch, number
FROM table
WHERE branch = 1
GROUP BY branch, number
To select all values that are unique in column number and have a branch value of 1 you can use the following code:
SELECT branch, number
FROM table1
WHERE number IN (
SELECT number
FROM table1
GROUP BY number
HAVING (COUNT(number ) = 1)
)
AND branch = 1
For a demo see http://sqlfiddle.com/#!2/97145/62

How to get a MAX and a COUNT from a three table join?

I got an interview question where there's a Car sale modeled in a DB. Each Car represents a physical car in a Car sale which refers to a Make and a Model table. A Sale table keeps track of each Car that is sold. A Sale only consists of one Car, so there's a record in Sale per every unique Car that had been sold.
The question was to find-out the name of the most sold Model in the car sale. I answered with a 3-level nested query. The interviewer specifically asked for a solution using joins where I only succeeded in just joining the tables without the aggregates.
How would you join 3 tables as below (Car, Make, Sale) while using two other aggregates?
Here's a rough sketch of the schema. The most sold Model here should return 'Corolla'
Car
| carid| modid | etc...
_________________
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
Make
| mkid | name |
_________________
| 1 | Toyota |
| 2 | Nissan |
| 3 | Chevy |
| 4 | Merc |
| 5 | Ford |
Model
| modid| name | mkid |
________________________
| 1 | Corolla| 1
| 2 | Sunny | 2
| 3 | Carina | 1
| 4 | Skyline| 2
| 5 | Focus | 5
Sale
| sid | carid | etc...
_________________
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
Edit:
Using MS SQL Server 2008
Output needed:
Model Name | Count
_____________________
Corolla | 3
i.e. The model of the Car that has been sold the most.
Notice only 3 Corollas and 2 Sunnys are in the Car table while Sale table corresponds to each of those with other sales detail. The 5 Sale records are actually Corolla, Corolla, Corolla, Sunnnu and Sunny.
Since you are using SQL Server 2008, make use of Common Table Expression and Window Function.
WITH recordList
AS
(
SELECT c.name, COUNT(*) [Count],
DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) rn
FROM Sale a
INNER JOIN Car b
ON a.carid = b.carID
INNER JOIN Model c
ON b.modID = c.modID
GROUP BY c.Name
)
SELECT name, [Count]
FROM recordList
WHERE rn = 1
SQLFiddle Demo
When interviewers ask for this they usually want you to say that you'd use windowed functions. You could give each sale a unique ascending number partitioned by model and the highest sale number you'd get would be the max count.
http://www.postgresql.org/docs/9.1/static/tutorial-window.html
Following query works on oracle 11g . here's fiddle link
SELECT name FROM (
SELECT model.name AS name FROM car , sale , model
WHERE car.carid=sale.carid
AND car.modid=model.modid
GROUP BY model.name
ORDER BY count(*) DESC )
WHERE rownum = 1;
Or
SELECT name FROM (
SELECT model.name AS name FROM car natural join sale natural join model
GROUP BY model.name
ORDER BY count(*) DESC )
WHERE rownum = 1;
OUTPUT
| NAME |
-----------
| Corolla |
Based on your newly added SQL Server 2008 tag. If you are using a different RDBMS you'll probably need to use limit instead of top and place it at the end of the top_sold_car subquery.
select Make.name as Make, Model.name as Model
from (
select top 1 count(*) as num_sold
from Car
group by modid
order by num_sold desc) as top_sold_car
join Model
on (top_sold_car.modid = Model.modid)
join Make
on (Model.mkid = Make.mkid)