Better SQL query to remove equivalent rows - sql

Guys, I'm new at SQL and can't figure out the "right way" to do the last part of a query. I have a table which contains a list of items and their equivalents. There are essentially twice as many rows as needed, and I'm trying to find a SQL way to select 1/2 of the entries so there are no duplicates.
Starting Table with duplicates:
Item Name EquivItem
---- ------ ----------
100 bubba 106
103 gump 109
106 shrimp 100
109 grits 103
And the resulting table would be:
Item Name EquivItem
----- ----- ----------
100 bubba 106
103 gump 109
I was using a couple nested loops in sequential code to filter out the duplicates, but finally wrote a query that works but feels like a hack.
I'm arbitrarily using a WHERE (Item < EquivItem) to select only one of the rows. The actual tables are a bit more complex and I'm afraid there may be a case where this doesn't work.
SELECT *
FROM T
WHERE Item < EquivItem
I'm trying to take some time to figure out the right way to do things before I develop too many bad habits. Any suggestions? Thanks.

Is it possible for more than two items to be equivalent, such as 100 = 103 = 106? Can this happen?
Item Name EquivItem
---- ------ ----------
100 bubba 103
103 gump 106
106 shrimp 103
As long as the the equivalents can't be chained together, and always have a 1-to-1 relationship, your solution looks perfectly fine to me.
If this scenario can happen, I would first scrub the data to make sure that all the EquivItems refer to the lowest Item ID in the chain... and then your original query would still do the job.

Related

Second highest column

I have seen a similar question asked How to get second highest value among multiple columns in SQL ... however the solution won't work for Microsoft Access (Row_Number/Over Partition isn't valid in Access).
My Access query includes dozens of fields. I would like to create a new field/column that would return the second highest value of 10 specific columns that are included in the query, I will call this field "Cover". Something like this:
Product Bid1 Bid2 Bid3 Bid4 Cover
Watch 104 120 115 108 115
Shoe 65 78 79 76 18
Hat 20 22 19 20 20
I can do a really long SWITCH formula such as the following equivalent Excel formula:
IF( AND(Bid1> Bid2, Bid1 > Bid3, Bid1 > Bid4), Bid1,
AND(Bid2> Bid1, Bid2 > Bid3, Bid2 > Bid4), Bid2,
.....
But there must be a more efficient solution. A MAXIF equivalent would work perfectly if MS-Access Query had such a function.
Any ideas? Thank you in advance.
This would be easier if the data were laid out in a more normalized way. The clue is the numbered field names.
Your data is currently organized as a Pivot (known in Access as crosstab), but can easily be Unpivoted.
This data is much easier to work with if laid in a more normalized fashion which is this case would be:
Product Bid Amount
--------- ----- --------
Watch 1 104
Watch 2 120
Watch 3 115
Watch 4 108
Shoe 1 65
Shoe 2 78
Shoe 3 79
Shoe 4 76
Hat 1 20
Hat 2 22
Hat 3 19
Hat 4 20
This way querying becomes simpler.
It looks like you want the maximum of the bids, grouped by Product, so:
select Product, max(amount) as maxAmount
from myTable
group by product
Really, we shouldn't be storing text fields at all, so Product should be an ID number, with associated Product Names stored once in a separate table, instead of several times in the this one, like:
ProdID ProdName
-------- ----------
1 Watch
2 Shoe
3 Hat
... but that's another lesson.
Generally speaking repeating of anything should be avoided... that's pretty much the purpose of a database... but the links below will explain than I. :)
Quackit : Microsoft Access Tutorial
YouTube : DB Planning
Microsoft : Database Design Basics
Microsoft : Database Normalization Basics
Wikipedia : Database Normalization

Is it possible to select a column followed by a wildcard (to retrieve all columns) in SQL?

This might be a weird question, but I'll ask it anyway. When I'm working on queries, I'm usually interested in only a few particular columns at first and when I'm happy with the result, I'll add other columns to the query.
In other words: first I make it work, then I'll add the details. But I usually end up writing another query (that retrieves all table data) just above my "work-in-progress-query" in order to look at column names and inspect the data at a glance. It would be nice if that extra "retrieve all query" wasn't necessary and if I could just use a wildcard directly.
To state it simple: I'd like to do:
SELECT column, * from myTable;
So let's say I've got a table Person:
id name description number categoryId modified created
---------------------------------------------------------------------------------
1 Sven Ugly man 42 67 2014-03-03 2014-03-03
2 Anna Pretty woman 25 33 2014-03-03 2014-03-03
Then I would like to do:
SELECT number, * from Person
Which should be leading to:
number id name description number categoryId modified created
--------------------------------------------------------------------------------------
67 1 Sven Ugly man 42 67 2014-03-03 2014-03-03
33 2 Anna Pretty woman 25 33 2014-03-03 2014-03-03
Is such a thing possible?
Yes it is possible, and used frequently when testing.
Its not encouraged, as the * results in messy returns with columns sharing names. (especially duplicated columns, refrences and keys etc)
Short answer Yes, but not sensible

Database design for a step by step wizard

I am designing a system containing logical steps with some actions associated (but the actions are not part of the question, but they are crucial for each step in the list)!
The ting is that I need to create a way to define all the logical steps in an ordered way, so that I can get the list by query, and also make modifications later on!
Anyone with some experience in this kind of database design?
I have been thinking of having a column named wizard_steps (or something similar), and then use priority to make the order, but for some reason i feel that this design at some point will fail (due to items with same priority, adding new items would then have to rearrange the rest of the items, and so forth)!
Another design I have been thinking about is the use of "next item" as a column in the wizard_step column, but I don't feel this is the correct step eighter!
So to summarize; I am trying to make a list (and the design should be open enought to support multiple lists) of elements where the order is crucial!
Any ideas on how the database should look like?
Thanks!
EDIT: I found this yii component I will check out: http://www.yiiframework.com/extension/simpleworkflow/
Might be a good solution!
If I get you well, your main concern is to create a schema that supports ordered lists and can provide easy insert/reordering of items.
The following table design:
id_list item_priority foreign_itemdef_id
1 1 245
1 2 32
1 3 45
2 1 156
2 2 248
2 3 127
coupled to a table with item definition will be easily queried but will be difficult to maintain, especially for insertions
That one:
id_list first_item_id
1 45
2 38
coupled to the linked list:
item_id next_item foreign_itemdef_id
45 381 56
381 NULL 59
38 39 89
39 42 78
42 NULL 45
Will be both difficult to query and update (you should update the linked list inside a transaction, otherwise your linked list can get corrupted).
I would prefer the first solution for simplicity.
Depending on your update frequency, you may consider using large increments between item_priority to help insertion:
id_list item_priority foreign_itemdef_id
1 1000 245
1 2000 32
1 3000 45
2 1000 156
2 2000 248
2 3000 127
1 2500 46 -- late insertion
1 2750 47 -- late insertion
EDIT:
Here's a query that will hopefully make room for an insertion: it increments priority of all rows above the argument
$query_make_room_for_new_item = "UPDATE item_priority_table SET item_priority = item_priority + 1 WHERE item_priority > ". $new_item_position_priority ." AND id_list = ".$id_list;
Then insert your item with priority $new_item_position_priority

Help managing many-to-one results for lookup

I didn't see this exactly asked, so I'm hoping it wasn't.
I have a table that has multiple columns with code variables and a table that has all lookup codes and descriptions for the whole database. Is there a way to join the lookup values so that everything stays on one row, instead of what i'm getting where one row has the race value and one row has the sex value. Thanks. I'm using TOAD but understand SQL.
Table 1
User_id Race_cd Sex_cd
101 3201 4501
102 3201 4502
103 3202 4501
104 3203 4501
Table 2
CD_Num CD_descrip
3201 White
3202 Black
3203 Asian
4501 Male
4502 Female
I played around for an hour with the joins over your tables, without an easy result.
Then I created views like this :
create view race as select * from lookup where id < 4000
create view sex as select * from lookup where id > 4000
thenafter, the select was just this easy :
select user.id, race.desc, sex.desc from users, race, sex
where user.ra = race.id
and user.se = sex.id
showing up this :
101 White Male
102 White Female
103 Black Male
104 Asian Male
May this inspire you a nice solution ! ( You will naturally have to deal with the "between value and value" predicate when creating your views. )

Compare 2 values of different types inside of subquery

I am using a MS SQL db and I have 3 tables: 'base_info', 'messages', 'config'
bases:
ID Name NameNum
====================================
1 Home 101
2 Castle 102
3 Car 103
messages:
ID Signal RecBy HQ
============================
111 120 Home 1
111 110 Castle 1
111 125 Car 1
222 120 Home 2
222 125 Castle 2
222 130 Car 2
333 100 Home 1
333 110 Car 2
config:
ID SignalRec SignalOut RecBy HQ
====================================
111 60 45 101 1
111 40 60 102 1
222 50 60 102 2
222 30 90 101 2
333 80 10 103 1
Ok so now I have a subquery in which I select the 'SignalRec' and 'SignalOut' from the config table and match it on the messages table by ID and Date(not included above), the problem is that I need it to match where messages.RecBy = config.RecBy but config.RecBy is a string but it's equivalent Name is in the bases table. So I almost need to do a subquery inside a subquery or some type of join and compare the returned value.
Here is what I have so far:
(SELECT TOP 1 config.SignalRec from config WHERE config.ID = messages.ID AND ||I need th other comparison here||...Order By...) As cfgSignalRec,
(SELECT TOP 1 config.SignalOut from config WHERE config.ID = messages.ID AND ||I need th other comparison here||...Order By...) As cfgSignalOut
I tried to make this as clear as possible but if you need more info let me know.
I would normalize out RecBy in your messages table to reference the bases table. Why would you insert the string content there if it's also referenced in bases?
This is exactly why normalization exists: reduce redundancy, reduce ambiguity, and enforce referential integrity.
To make this more clear, RecBy in the messages table should be a foreign key to Bases.
I think this could do the trick (although I have not tried it...)
SELECT
c.SignalRec
FROM config c
INNER JOIN bases b
ON c.RecBy = b.NameNum
INNER JOIN messages m
ON b.Name = m.RecBy
WHERE c.ID = m.ID
However, as Anthony pointed out, you probably want to normalize out the strings in the RecBy column in the messages table, as you have the same data in the bases table.
From your description, it just sounds like you need two JOINS
SELECT TOP 1
c.SignalRec
FROM
config c
INNER JOIN
bases b
ON c.RecBy = b.NameNum
INNER JOIN
messages m
ON b.Name = m.RecBy
I think I might have not been clear enough what I wanted to do, sorry about that.
The data is actually different in the 2 tables, although the correlations are the same. It's kind of confusing to explain without going into detail about how the system works.
I actually found a very fast way of doing this.
Inside my sub-query I do this:
(SELECT TOP 1 config.Signal FROM config,bases
WHERE config.ID = messages.ID AND bases.Name = messages.RecBy AND bases.NameNum =
config.RecBy Order By...)
So this essentially compares the 2 RecBy's of different tables even though one is an integer and the other is a string. It reminds me of a match and look up in Excel.