postgreSQL query - finding presence of id in any field of a record - sql

I have two tables which look like the following
tools:
id | part name
---------------
0 | hammer
1 | sickle
2 | axe
people:
personID | ownedTool1 | ownedTool2 | ownedTool3 ..... ownedTool20
------------------------------------------------------------------
0 | 2 | 1 | 3 ... ... 0
I'm trying to find out how many people own a particular tool. A person cannot own multiple copies of the same tool.
The only way I can think of doing this is something like
SELECT COUNT(*)
FROM tools JOIN people ON tools.id = people.ownedTool1.id OR tools.id = people.ownedTool2 ... and so on
WHERE tools.id = 0
to get the number of people who own hammers. I believe this will work, however, this involves having 20 OR statements in the query. Surely there is a more appropriate way to form such a query and I'm interested to learn how to do this.

You shouldn't have 20 columns each possibly containing an ID in the first place. You should properly establish a normalized schema. If a tool can belong to only one user - but a user can have multiple tools, you should establish a One to Many relationship. Each tool will have a user id in its row that maps back to the user it belongs to. If a tool can belong to one or more users you will need to establish a Many to Many relationship. This will require an intermediate table that contains rows of user_id to tool_id mappings. Having a schema set up appropriately like that will make the query you're looking to perform trivial.
In your particular case it seems like a user can have many tools and a tool can be "shared" by many users. For your many-to-many relation all you would have to do is count the number of rows in that intermediate table having your desired tool_id.
Something like this:
SELECT COUNT(ID) FROM UserTools Where ToolID = #desired_tool_id
Googling the terms I bolded should get you pointed in the correct direction. If you're stuck with that schema then the way you pointed out is the only way to do it.

If you cannot change the model (and I'm sure you will tell us that), then the only sensible way to work around this broken datamodel is to create a view that will give you a normalized view (pun intended) on the data:
create view normalized_people
as
select personid,
ownedTool1 as toolid
from people
union all
select personid,
ownedTool2 as toolid
from people
select personid,
ownedTool3 as toolid
from people
... you get the picture ...
Then your query is as simple as
select count(personid)
from normalized_people
where toolid = 0;

You received your (warranted) lectures about the database design.
As to your question, there is a simple way:
SELECT count(*) AS person_ct
FROM tbl t
WHERE translate((t)::text, '()', ',,')
~~ ('%,' || #desired_tool_id::text || ',%')
Or, if the first column is person_id and you want to exclude that one from the search:
SELECT count(*) AS person_ct
FROM tbl t
WHERE replace((t)::text, ')', ',')
~~ ('%,' || #desired_tool_id::text || ',%')
Explanation
Every table is accompanied by a matching composite type in PostgreSQL. So you can query any table this way:
SELECT (tbl) FROM tbl;
Yields one column per row, holding the whole row.
PostgreSQL can cast such a row type to text in one fell swoop: (tbl)::text
I replace both parens () with a comma , so every value of the row is delimited by commas ,.
My second query does not translate the opening parenthesis, so the first column (person_id) is excluded from the search.
Now I can search all columns with a simple LIKE (~~) expression using the desired number delimited by commas ~~ %,17,%
Voilá: all done with one simple command. This is reliable as long as you don't have columns like text or int[] in your table that could also hold ,17, within their values, or additional columns with numbers, which could lead to false positives.
It won't deliver performance wonders as it cannot use standard indexes. (You could create a GiST or GIN index on an expression using the tgrm module in pg 9.1, but that's another story.)
Anyway, if you want to optimize, you'd better start by normalizing your table layout as has been suggested.

Related

SQL: Combine tables and order all rows / Avoid storing lots of null data

Typically in SQL JOINS requires you to join two tables ON a specific column, and then rows get merged. That isn't what I'm looking for.
Is it possible to join tables in SQL in a way that you can ORDER BY columns with the same name, such that x rows are returned, where x = the sum of rows in table 1 and table 2.
To hopefully clarify what I mean, here's an example query:
SELECT * FROM (combined Real and Placeholder items)
ORDER BY StartDate, OnDayIndex
and here's what results might look like:
ID OnDayIndex StartDate ItemType Name TemplatePointer
12308 2 1996-09-18 Real Actual Name Null
10309 11 1996-09-19 Placeholder Null 123
30310 5 1996-09-20 Real Actual Name Null
30410 6 1996-09-20 Placeholder Null 456
My use case is a calendar application with recurring events. To save space, it doesn't make sense to store every recurrence of an event. If it weren't for the particulars of my use case I'd just store a template with a rule and recurring events would be generated when viewed, except for one-off events. The problem is the calendar app I'm working on allows you to move items around in the day they're and saves way you order the items. I'm already using a ranked model gem (link here: https://github.com/mixonic/ranked-model) to cut down on the number of writes needed to update the "onDayIndex". The template approach on its own turns into a bit of a nightmare when "onDayIndex" is factored in... (I could say more...)
So I'd like to store slimmed down 'Placeholder' items that store the items' position and a pointer to template, perhaps in a separate table if possible.
If this isn't possible, an alternative approach I've considered for conserving space is moving most columns from the Items table to an ItemData table, and storing an ItemDataID on Items.
But I'd really like to know if it is possible, as I'm pretty junior in SQL, as well as any other vital information I may be missing.
I'm using Rails with a Postgres database.
Are you talking about using UNION / UNION ALL to stack result sets on top of each other, but where the sources have different columns?
If so, you need to fill in the 'missing' columns (you can only UNION two sets if their signatures match, or can be coerced to match).
For example...
SELECT col1, col2, NULL as col3
FROM tbl1
UNION ALL
SELECT col1, NULL AS col2, col3
FROM tbl2
Note: UNION expends additional resources to remove duplicates from the results. Use UNION ALL if such effort is wasted.

SQL Key Value Pair Query

I have two tables:
Product Table
ID (PK), Description, CategoryID, SegmentID, TypeID, SubTypeID, etc.
Attribute Table
ID (PK), ProductID (FK), Key, Value
And I would like to query these two tables in a join that returns 1 row for each product with all of the Key/Value pair records in the Attribute table returned in a single column, perhaps separated by a pipe character (Key1: Value1 | Key2: Value2 | Key3: Value3 | etc.). Each product could have a different number of key/value pairs, with some products have as few as 2-3 and some having as many as 30. I would like to figure out how to get the query results to look something like this (perhaps selected into a new table):
product.ID, product.Description, [special attributes column], product.CategoryID, product.SegmentID, etc.
example result:
65839, "WonderWidget", "HeightInInches: 26 | WeightInLbs: 5 | Color: Black", "Widgets", "Commerical"
Conversely, it would be helpful to figure out how to take the query results, formatted as mentioned above, and push them back into the original Attribute table. For example, if we output the query above into a table where the [special attributes column] was modified (values updated/corrected by a human), it would be nice to know how to use the table containing the [special attributes column] to update the original Attribute table. I think for that to be possible, the Attribute.ID field would need to be included in the query output.
In the end, what I am trying to accomplish is way to export the Product and Attribute data out to 1 row per product with all the attribute data so that it can be reviewed/updated/corrected by a human in something as simple as an Excel file, and then pushed back into SQL. I think I can figure out how to do all of that once I get over the hurdle of figuring out how to get the products and attributes out as one row per product. Perhaps the correct answer is to pivot all of the attributes into columns, but I'm afraid the query would be incredibly wide and wasteful. Open to suggestions for this as well. Changing to a document type database is not an option right now; need to figure out the best way to handle this in relational SQL.
You first need to group the Key value pairs. This can be achieved using a concat operatoor like ||, you need to think about nulls as well. NUll concatenated with NULL is still NULL in most DBs.
SELECT ProductID, Key || ':' || Value as KeyValue FROM AttributeTable
Then you would need to group those using an aggregating function like STRING_AGG (Assuming SQL Server above 2017). Other databases have different aggregate functions Mysql f ex uses GROUP_CONCAT
https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-2017
https://www.geeksforgeeks.org/mysql-group_concat-function/
SELECT ProductID, STRING_AGG( Key || ':' || Value, '|') as Key Value FROM AttributeTable GROUP BY ProductId
I can expand on the answer if you can provide more information.

select a column according to specific integer value in database string field

I want to select id in the database table where allot field have a specific integer value in the string.
example:- In the allot column I want to search value 26 in Comma(,) separated string, here result should be id=72
Fix your data structure! You should be using a junction/association table with one row per value and per id. That is the SQL way to represent the data. Why is your structure bad?
Data should be stored using the appropriate type. Numbers should be stored as numbers, not strings.
Columns should contain one value.
Databases have great data structures for storing lists of values. The best known one is tables. Strings are not the appropriate data structures.
SQL engines have (relatively) poor string processing capabilities.
Operations on strings do not (in almost all cases) take advantage of indexes and other engine optimizations.
If these are ids, then foreign key relationships should be properly declared.
Sometimes, we are stuck with other people's really, really bad design decisions. In those cases, you can use like:
SELECT p.id
FROM Prospects p
WHERE ',' || allot || ',' like '%,26,%';
try the commend 'SELECT id FROM (table_name) WHERE allot LIKE '%,26,%'
the '%x%' will look for anything with an x in it in the provided column
basically if you find something with x give it to me
Using the LIKE operator you should be able to solve your requirement. Considering that your table name is Prospects:
SELECT id FROM Prospects
WHERE allot LIKE '%,26,%'
EDIT-1: You can narrow down the search finer by adding additional commas in the query as mentioned here!
EDIT-2: To additionally handle scenarios, you can have the same query with a UNION like this. This is not something that you should be looking to implement, but implement a stored procedure to check these scenarios and handle it in your logic.
SELECT id FROM Prospects
WHERE allot LIKE '%,26,%'
UNION
SELECT id FROM Prospects
WHERE allot LIKE '%,26%'
UNION
SELECT id FROM Prospects
WHERE allot LIKE '%26,%'
Hope this answers your question!

SQL or statement vs multiple select queries

I'm having a table with an id and a name.
I'm getting a list of id's and i need their names.
In my knowledge i have two options.
Create a forloop in my code which executes:
SELECT name from table where id=x
where x is always a number.
or I'm write a single query like this:
SELECT name from table where id=1 OR id=2 OR id=3
The list of id's and names is enormous so i think you wouldn't want that.
The problem of id's is the id is not always a number but a random generated id containting numbers and characters. So talking about ranges is not a solution.
I'm asking this in a performance point of view.
What's a nice solution for this problem?
SQLite has limits on the size of a query, so if there is no known upper limit on the number of IDs, you cannot use a single query.
When you are reading multiple rows (note: IN (1, 2, 3) is easier than many ORs), you don't know to which ID a name belongs unless you also SELECT that, or sort the results by the ID.
There should be no noticeable difference in performance; SQLite is an embedded database without client/server communication overhead, and the query does not need to be parsed again if you use a prepared statement.
A "nice" solution is using the INoperator:
SELECT name from table where id in (1,2,3)
Also, the IN operator is syntactic sugar built for exactly this purpose..
SELECT name from table where id IN (1,2,3,4,5,6.....)
Hoping that you are getting the list of ID's on which you have to perform a query for names as input temp table #InputIDTable,
SELECT name from table WHERE ID IN (SELECT id from #InputIDTable)

Query - If more than one record ID# (non primary key) matches, use later date or larger primary key

I built a MS Access Database that takes a survey to create a custom report. The survey application that was used does not give us the reports we need. I usually grab the data (excel) and import it in access and build report the way we need them.
For this first time, we have people redoing the survey because they are updating something or they forgot to add something. I need to be able to grab the most recent surveys data so we don't get a duplicate when we run the report. (My main report is composed of several subreports. Some subreports will not visible if null, and any questions not answered are hidden and shrinked to prevent bulky reports with unnecessary whitespace.)
record ID (PK) | FName | LName | IDNum | Completed
1 | Bob | Smith | 57 | 3/31/2013 5:00pm
2 | Bob | Smith | 57 | 3/31/2013 7:00pm
I want record ID 2 or the one that was completed at 7pm.
The queries and reports are already completed so i have been trying to find a way to add a line of code in the criteria line of my query to grab the most recent record if the IDnum matches with more than one record.
I have been trying to find the best way to do it for the past several hours. I don't think that having my table be modified to 'table without duplicates' as after the database is complete, someone less technical will be using it. All they are going to do is import a new excel file to overwrite the table and the queries do everything to build the report. I don't want to manually delete the duplicate records either.
I know I need to do something along the lines with
IIF(count(IDNum)>1, *something, *something)
*I get stuck on the true and false part. How do i tell access that it needs to check within the table again to find the record with the larger primary key?
I thought this was going to be easy but i guess i was wrong. lol
I am fairly new at MS Access so I know I am not using the full potential and i might be going at this at the wrong angle. Any advice would be appreciated greatly.
I'm a student going into Info Systems, so i would really like to learn how to do this.
I believe the query you are looking for is
SELECT t1.*
FROM YourTable t1 INNER JOIN
(SELECT IDNum, MAX(Completed) AS MaxOfCompleted
FROM YourTable GROUP BY IDNum
) t2
ON t1.IDNum = t2.IDNum AND t1.Completed = t2.MaxOfCompleted;
When you are using an if function it should be iif not iff.
I'd recommend a correlated subquery, such as the following:
SELECT
Data.RecordID
, Data.FName
, Data.LName
, Data.IDNum
, Data.Completed
FROM
Data
WHERE
Data.Completed IN
(
SELECT TOP 1
DataSQ.Completed
FROM
Data as DataSQ
WHERE
DataSQ.IDNum = Data.IDNum
GROUP BY
DataSQ.Completed
ORDER BY
DataSQ.Completed DESC
)
GROUP BY
Data.RecordID
, Data.FName
, Data.LName
, Data.IDNum
, Data.Completed
;
Explanation
Instead of using a function such as Max or IIF, you can embed another SELECT query within the WHERE clause of your main query. The nested query is used to determine the most recent Completed date for every IDNum. Unlike selecting the most recent survey directly from your table with SELECT TOP 1 + ORDER BY, which would only return one record, the WHERE clause in your nested query refers back to the main query and produces a result for each IDNum. This is known as the Top N per Group pattern, and I've found it to be very useful. Note that in the nested query you will need to use a table name alias so that Access will be able to differentiate between the two queries.
Also, I'd generally recommend against trying to use a table PK to perform sorts. There are many cases when the PK order value will not be a good indicator of the values of related fields.
This code worked when tested on dummy data - best of luck!