I have a quick question regarding writing a SQL query to obtain a complete entry from two or more entries where the data is missing in different columns.
This is the example, suppose I have this table:
Client Id | Name | Email
1234 | John | (null)
1244 | (null) | john#example.com
Would it be possible to write a query that would return the following?
Client Id | Name | Email
1234 | John | john#example.com
I am finding this particularly hard because these are 2 entires in the same table.
I apologize if this is trivial, I am still studying SQL and learning, but I wasn't able to come up with a solution for this and I although I've tried looking online I couldn't phrase the question in the proper way, I suppose and I couldn't really find the answer I was after.
Many thanks in advance for the help!
Yes, but actually no.
It is possible to write a query that works with your example data.
But just under the assumption that the first part of the mail is always equal to the name.
SELECT clients.id,clients.name,bclients.email FROM clients
JOIN clients bclients ON upper(clients.name) = upper(substring(bclients.email from 0 for position('#' in bclients.email)));
db<>fiddle
Explanation:
We join the table onto itself, to get the information into one row.
For this we first search for the position of the '#' in the email, get the substring from the start (0) of the string for the amount of characters until we hit the # (result of positon).
To avoid case-problems the name and substring are cast to uppercase for comparsion.
(lowercase would work the same)
The design is flawed
How can a client have multiple ids and different kind of information about the same user at the same time?
I think you want to split the table between clients and users, so that a user can have multiple clients.
I recommend that you read information about database normalization as this provides you with necessary knowledge for successfull database design.
Related
I haven't been able to figure out exactly how to put together this SQL string. I'd really appreciate it if someone could help me out. I am using Access 2016, so please only provide answers that will work with Access. I have two queries that both have different fields except for one in common. I need to find the minimum absolute difference between the two similar columns. Then, I need to be able to pull the data from that corresponding record. For instance,
qry1.Col1 | qry1.Col2
-----------|-----------
10245.123 | Have
302044.31 | A
qry2.Col1 | qry2.Col2
----------------------
23451.321 | Great
345622.34 | Day
Find minimum absolute difference in a third query, qry3. For instance, Min(Abs(qry1!Col1 - qry2!Col1) I imagine it would produce one of these tables for each value in qry1.Col1. For the value 10245.123,
qry3.Col1
----------
13206.198
335377.217
Since 13206.198 is the minimum absolute difference, I want to pull the record corresponding to that from qry2 and associate it with the data from qry1 (I'm assuming this uses a JOIN). Resulting in a fourth query like this,
qry4.Col1 (qry1.Col1) | qry4.Col2 (qry1.Col2) | qry4.Col3 (qry2.Col2)
----------------------------------------------------------------------
10245.123 | Have | Great
302044.31 | A | Day
If this is all doable in one SQL string, that would be great. If a couple of steps are required, that's okay as well. I just would like to avoid having to time consumingly do this using loops and RecordSet.Findfirst in VBA.
You can use a correlated subquery:
select q1.*,
(select top 1 q2.col2
from qry2 as q2
order by abs(q2.col1 - q1.col1), q2.col2
) as qry2_col2
from qry1 as q1;
I am sure this question has been asked before, but I'm so new to SQL, I can't even combine the correct search terms to find an answer! So, apologies if this is a repetition.
The db I'm creating has to be created at run-time, then the data is entered after creation. Some fields will have a varying number of entries, but the number is unknown at creation time.
I'm struggling to come up with a db design to handle this variation.
As an (anonymised) example, please see below:
| salad_name | salad_type | salad_ingredients | salad_cost |
| apple | fruity | apple | cheap |
| unlikely | meaty | sausages, chorizo | expensive |
| normal | standard | leaves, cucumber, tomatoes | mid |
As you can see, the contents of "salad_ingredients" varies.
My thoughts were:
just enter a single, comma-separated string and separate at run-time. Seems hacky, and couldn't search by salad_ingredients!
have another table, for each salad, such as "apple_ingredients", which could have a varying number of rows for each ingredient. However, I can't do this, because I don't know the salad_name at creation time! :(
Have a separate salad_ingredients table, where each row is a salad_name, and there is an arbitrary number of ingredients fields, say 10, so you could have up to 10 ingredients. Again, seems slightly hacky, as I don't like to unused fields, and what happens if a super-complicated salad comes along?
Is there a solution that I've missed?
Thanks,
Dan
based on my experience the best solution is based on a normalized set of tables
table salads
id
salad_name
salad_type
salad_cost
.
table ingredients
id
name
and
table salad_ingredients
id
id_salad
id_ingredients
where id_salad is the corresponding if from salads
and id_ingredients is the corresponding if from ingredients
using proper join you can get (select) and filter (where) all the values you need
Following join is supposed to retrieve user info along with their messages for users with a certain status:
SELECT * FROM user, message WHERE message.user_id=user.id AND user.status=1
The problem is that all rows about a certain user in the result set contain redundant columns that repeat the same data about that user (those fields retrieved from user table), only fields from the message table contain non-redundant information. Something like this:
user.id username email message.id subject
1 jane jane#gmail.com 120 Notification
1 jane jane#gmail.com 122 Re:Hello
1 jane jane#gmail.com 125 Quotation
2 john john#yahoo.com 127 Hi jane
2 john john#yahoo.com 128 Fix thiss
2 john john#yahoo.com 129 Ok
3 jim jim#msn.com 140 Re:Re:Quotation
As you can see many data are redundant and we do not want to first find the users and then go about their messages in loop like structures or something like that. Loops that cause micro-queries should be avoided at all costs.
I am not concerned about the output of my program, that is well handled in the UI. I think perhaps the network traffic produced by returning the result of this query could be substantially reduced if somehow I can manage to eliminate the repetition of user data in all rows pertaining to that user.
There are several things you should know.
The first is that the default SQL JOIN construct is essentially a set cross product, restricted by the WHERE clause. This means it's multiplicative - you get duplicate results out which you then prune down. You also have to be careful in the presence of NULL fields.
The second is that there is a 'DISTINCT' keyword. When you prefix a column in the selection with this, you'll get at most one instance of a certain value for that column in the results. So, as per your query, 'SELECT DISTINCT user.id FROM' will eliminate the redundancies on the server side.
The third is that the correct way to solve this problem is likely not using the * operator. I suggest:
SELECT user.id,username,email,subject FROM message m,user WHERE m.user_id=user.id AND user.status=1
This uses the simple, easy-to-understand implicit-join syntax and should be valid SQL on whatever server. I can vouch for it working with MySQL, at least. It also aliases the 'message' table to 'm' as shorthand.
As you surmise, this will reduce the traffic from the SQL server to your database.
edit: if you want to eliminate the "redundant" email information, you can't - you must make two distinct queries. SQL results are tables and must be rectangular, with all known values filled. There's no 'ditto' entry.
edit 2: You only have to make two queries. For instance:
SELECT subject FROM message WHERE message.id IN (SELECT user.id FROM user WHERE status=1)
This is one query that contains a nested query, so it's really making two database hits. But it doesn't have any programmatic loops.
In the straight sql query there is not if you keep them as a single query. If you are programmatically printing this out, then you would order by user id and only reprint that information if the user id changes.
In the SQL standard, you would use NATURAL JOIN; this joins on common column names and only preserves one copy of those common names.
In practice, you carefully list the columns you want, rather than resorting to the '*' shorthand notation.
Assuming you can use stored procedure, you could write one to run the above query and then use a cursor to store nulls for 'redundant information' to get something like
user.id username email message.id subject
1 jane jane#gmail.com 120 Notification
null null null 122 Re:Hello
null null null 125 Quotation
2 john john#yahoo.com 127 Hi jane
null null null 128 Fix thiss
null null null 129 Ok
3 jim jim#msn.com 140 Re:Re:Quotation
and then return this resultset in a temporary table. but while this may reduce network traffic, it will add a processing overhead
Another way is to run 2 queries, one to get the user information, and the other to get the message information with just the linked user id and then do the "join" using application server side code. something like
SELECT DISTINCT user.* FROM user, message WHERE message.user_id=user.id AND user.status=1
and
SELECT user.id, message.* FROM user, message WHERE message.user_id=user.id AND user.status=1
which will result in 2 trips to the database, instead of 1, which might eventually be slower, even if the network traffic is reduced.
And another way is to bunch these 2 into a single resultset with something like
SELECT user.* FROM user, message WHERE message.user_id=user.id AND user.status=1
UNION ALL
SELECT user.id, message.* FROM user, message WHERE message.user_id=user.id AND user.status=1
to get something like
user.id username/message.id email/subject
1 jane jane#gmail.com
2 john john#yahoo.com
3 jim jim#msn.com
1 120 Notification
1 122 Re:Hello
1 125 Quotation
2 127 Hi jane
2 128 Fix thiss
2 129 Ok
3 140 Re:Re:Quotation
and then use application server logic to separate it out. reduced network traffic but more application server load / marginally more database server load.
But the saved network traffic is rarely worth the added complexity.
I've been beating my head on the desk trying to figure this one out. I have a table that stores job information, and reasons for a job not being completed. The reasons are numeric,01,02,03,etc. You can have two reasons for a pending job. If you select two reasons, they are stored in the same column, separated by a comma. This is an example from the JOBID table:
Job_Number User_Assigned PendingInfo
1 user1 01,02
There is another table named Pending, that stores what those values actually represent. 01=Not enough info, 02=Not enough time, 03=Waiting Review. Example:
Pending_Num PendingWord
01 Not Enough Info
02 Not Enough Time
What I'm trying to do is query the database to give me all the job numbers, users, pendinginfo, and pending reason. I can break out the first value, but can't figure out how to do the second. What my limited skills have so far:
select Job_number,user_assigned,SUBSTRING(pendinginfo,0,3),pendingword
from jobid,pending
where
SUBSTRING(pendinginfo,0,3)=pending.pending_num and
pendinginfo!='00,00' and
pendinginfo!='NULL'
What I would like to see for this example would be:
Job_Number User_Assigned PendingInfo PendingWord PendingInfo PendingWord
1 User1 01 Not Enough Info 02 Not Enough Time
Thanks in advance
You really shouldn't store multiple items in one column if your SQL is ever going to want to process them individually. The "SQL gymnastics" you have to perform in those cases are both ugly hacks and performance degraders.
The ideal solution is to split the individual items into separate columns and, for 3NF, move those columns to a separate table as rows if you really want to do it properly (but baby steps are probably okay if you're sure there will never be more than two reasons in the short-medium term).
Then your queries will be both simpler and faster.
However, if that's not an option, you can use the afore-mentioned SQL gymnastics to do something like:
where find ( ',' |fld| ',', ',02,' ) > 0
assuming your SQL dialect has a string search function (find in this case, but I think charindex for SQLServer).
This will ensure all sub-columns begin and start with a comma (comma plus field plus comma) and look for a specific desired value (with the commas on either side to ensure it's a full sub-column match).
If you can't control what the application puts in that column, I would opt for the DBA solution - DBA solutions are defined as those a DBA has to do to work around the inadequacies of their users :-).
Create two new columns in that table and make an insert/update trigger which will populate them with the two reasons that a user puts into the original column.
Then query those two new columns for specific values rather than trying to split apart the old column.
This means that the cost of splitting is only on row insert/update, not on _every single select`, amortising that cost efficiently.
Still, my answer is to re-do the schema. That will be the best way in the long term in terms of speed, readable queries and maintainability.
I hope you are just maintaining the code and it's not a brand new implementation.
Please consider to use a different approach using a support table like this:
JOBS TABLE
jobID | userID
--------------
1 | user13
2 | user32
3 | user44
--------------
PENDING TABLE
pendingID | pendingText
---------------------------
01 | Not Enough Info
02 | Not Enough Time
---------------------------
JOB_PENDING TABLE
jobID | pendingID
-----------------
1 | 01
1 | 02
2 | 01
3 | 03
3 | 01
-----------------
You can easily query this tables using JOIN or subqueries.
If you need retro-compatibility on your software you can add a view to reach this goal.
I have a tables like:
Events
---------
eventId int
eventTypeIds nvarchar(50)
...
EventTypes
--------------
eventTypeId
Description
...
Each Event can have multiple eventtypes specified.
All I do is write 2 procedures in my site code, not SQL code
One procedure converts the table field (eventTypeIds) value like "3,4,15,6" into a ViewState array, so I can use it any where in code.
This procedure does the opposite it collects any options your checked and converts it in
If changing the schema is an option (which it probably should be) shouldn't you implement a many-to-many relationship here so that you have a bridging table between the two items? That way, you would store the number and its wording in one table, jobs in another, and "failure reasons for jobs" in the bridging table...
Have a look at a similar question I answered here
;WITH Numbers AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS N
FROM JobId
),
Split AS
(
SELECT JOB_NUMBER, USER_ASSIGNED, SUBSTRING(PENDING_INFO, Numbers.N, CHARINDEX(',', PENDING_INFO + ',', Numbers.N) - Numbers.N) AS PENDING_NUM
FROM JobId
JOIN Numbers ON Numbers.N <= DATALENGTH(PENDING_INFO) + 1
AND SUBSTRING(',' + PENDING_INFO, Numbers.N, 1) = ','
)
SELECT *
FROM Split JOIN Pending ON Split.PENDING_NUM = Pending.PENDING_NUM
The basic idea is that you have to multiply each row as many times as there are PENDING_NUMs. Then, extract the appropriate part of the string
While I agree with DBA perspective not to store multiple values in a single field it is doable, as bellow, practical for application logic and some performance issues. Let say you have 10000 user groups, each having average 1000 members. You may want to have a table user_groups with columns such as groupID and membersID. Your membersID column could be populated like this:
(',10,2001,20003,333,4520,') each number being a memberID, all separated with a comma. Add also a comma at the start and end of the data. Then your select would use like '%,someID,%'.
If you can not change your data ('01,02,03') or similar, let say you want rows containing 01 you still can use " select ... LIKE '01,%' OR '%,01' OR '%,01,%' " which will insure it match if at start, end or inside, while avoiding similar number (ie:101).
I'm working now for a while on a reporting applications where I use hibernate to define my queries. However, more and more I get the feeling that for reporting use cases this is not the best approach.
The queries only result partial columns, and thus not typed objects
(unless you cast all fields in java).
It is hard to express queries without going straight into sql or
hql.
My current problem is that I want to get the top N per group, for example the last 5 days per element in a group, where on each day I display the amount of visitors.
The result should look like:
| RowName | 1-1-2009 | 2-1-2009 | 3-1-2009 | 4-1-2009 | 5-1-2009
| SomeName| 1 | 42 | 34 | 32 | 35
What is the best approach to transform the data which is stored per day per row to an output like this? Is it time to fall back on regular sql and work with untyped data?
I really want to use typed objects for my results but java makes my life pretty hard for that. Any suggestions are welcome!
Using the Criteria API, you can do this:
Session session = ...;
Criteria criteria = session.createCriteria(MyClass.class);
criteria.setFirstResult(1);
criteria.setMaxResults(5);
... any other criteria ...
List topFive = criteria.list();
To do this in vanilla SQL (and to confirm that Hibernate is doing what you expect) check out this SO post: