Can I do SQL-style query on an in-memory dataset (or cellarray, or structure, etc) in MATLAB?
Why I ask is, sometimes, I don't want to talk to the database for 1000 times when I want to do different operations on each of the 1000 rows of data. Instead, I'd rather read all 1000 from the database and operate on them in MATLAB.
For example, I have read the following out of from the database:
age first_name last_name income
30 Mike Smith 45
17 David Oxgon 17
22 Osama Lumbermaster 3
Now I want to find out the full names of the people that are under the age of 25. I know how to do it, but is there any syntax as clean and intuitive as SQL like this?
SELECT first_name + ' ' + last_name AS name FROM people WHERE age < income
In the docs page Access Data in a Table (see the example Index Using a Logical Expression) it shows that your examples could be achieved as follows:
MyTable({'first_name','last_name'}, MyTable.age < MyTable.income)
These docs don't specifically explain how to merge the name and surname into one variable but I'm sure it's easy. Give it a try and let us know if you get it.
Related
I have a quick question regarding writing a SQL query to obtain a complete entry from two or more entries where the data is missing in different columns.
This is the example, suppose I have this table:
Client Id | Name | Email
1234 | John | (null)
1244 | (null) | john#example.com
Would it be possible to write a query that would return the following?
Client Id | Name | Email
1234 | John | john#example.com
I am finding this particularly hard because these are 2 entires in the same table.
I apologize if this is trivial, I am still studying SQL and learning, but I wasn't able to come up with a solution for this and I although I've tried looking online I couldn't phrase the question in the proper way, I suppose and I couldn't really find the answer I was after.
Many thanks in advance for the help!
Yes, but actually no.
It is possible to write a query that works with your example data.
But just under the assumption that the first part of the mail is always equal to the name.
SELECT clients.id,clients.name,bclients.email FROM clients
JOIN clients bclients ON upper(clients.name) = upper(substring(bclients.email from 0 for position('#' in bclients.email)));
db<>fiddle
Explanation:
We join the table onto itself, to get the information into one row.
For this we first search for the position of the '#' in the email, get the substring from the start (0) of the string for the amount of characters until we hit the # (result of positon).
To avoid case-problems the name and substring are cast to uppercase for comparsion.
(lowercase would work the same)
The design is flawed
How can a client have multiple ids and different kind of information about the same user at the same time?
I think you want to split the table between clients and users, so that a user can have multiple clients.
I recommend that you read information about database normalization as this provides you with necessary knowledge for successfull database design.
I'm working with a database that has a locations table such as:
locationID | locationHierarchy
1 | 0
2 | 1
3 | 1,2
4 | 1
5 | 1,4
6 | 1,4,5
which makes a tree like this
1
--2
----3
--4
----5
------6
where locationHierarchy is a csv string of the locationIDs of all its ancesters (think of a hierarchy tree). This makes it easy to determine the hierarchy when working toward the top of the tree given a starting locationID.
Now I need to write code to start with an ancestor and recursively find all descendants. MySQL has a function called 'find_in_set' which easily parses a csv string to look for a value. It's nice because I can just say "find in set the value 4" which would give all locations that are descendants of locationID of 4 (including 4 itself).
Unfortunately this is being developed on SQL Server 2014 and it has no such function. The CSV string is a variable length (virtually unlimited levels allowed) and I need a way to find all ancestors of a location.
A lot of what I've found on the internet to mimic the find_in_set function into SQL Server assumes a fixed depth of hierarchy such as 4 levels maximum) which wouldn't work for me.
Does anyone have a stored procedure or anything that I could integrate into a query? I'd really rather not have to pull all records from this table to use code to individually parse the CSV string.
I would imagine searching the locationHierarchy string for locationID% or %,{locationid},% would work but be pretty slow.
I think you want like -- in either database. Something like this:
select l.*
from locations l
where l.locationHierarchy like #LocationHierarchy + ',%';
If you want the original location included, then one method is:
select l.*
from locations l
where l.locationHierarchy + ',' like #LocationHierarchy + ',%';
I should also note that SQL Server has proper support for recursive queries, so it has other options for hierarchies apart from hierarchy trees (which are still a very reasonable solution).
Finally It worked for me..
SELECT * FROM locations WHERE locationHierarchy like CONCAT(#param,',%%') OR
o.unitnumber like CONCAT('%%,',#param,',%%') OR
o.unitnumber like CONCAT('%%,',#param)
I've struggled with this for a while now trying to figure out how to do this most efficiently.
The problem is as follows. I have items in a database to be marketed for specific age groups such as ages 10 to 20 or ages 16+ and I need to be able to make a query like, find item that is for 17 year old
Here are my two best ideas (but I don't like either, as I think they're both inefficient).
Have a csv column with values like 10-20 and 16+ , retrieve the entire list, and parse through it (Bad idea, I know, I'm fresh out of ideas here though)
Have a csv column with values like 10,11,12,13...20 for ranges, so I can look for it using WHERE ages LIKE "%17%", and for cases like 16+ I'd have to retrieve those special cases using something like WHERE ages LIKE "%+%" and parse through those.
I'm of course leaning towards the second option, but in the very best scenario, I'm running two queries one for regular items, and one for things like 16+
Is there a better way? If not, do you think you could make either of my models more efficient? Thanks.
You can do it like this:
Add lower_age and upper_age columns to your table, both integers that allow NULLs.
If lower_age is NULL then there is no lower bound.
If upper_age is NULL then there is no upper bound.
Combine COALESCE and BETWEEN for your queries.
To clarify (4), you want to say things like this:
select *
from your_table
where $n between coalesce(lower_age, $n) and coalesce(upper_age, $n)
where $n is the age you're looking for. BETWEEN uses inclusive bounds so coalesce(lower_age, $n) ignores $n if lower_age is not NULL and gives you $n >= $n (i.e. an automatic true on that bound) if lower_age is NULL; similarly for the upper_age.
If something is suitable for only 11 year olds, then your [lower_age,upper_age] closed interval would be [11, 11], 16+ would be [16, NULL], six and lower would be [NULL, 6], everyone would be [NULL, NULL], and no one would be [23, 11] or anything else with lower_age > upper_age (or, more likely, invalid data that a CHECK constraint would throw a hissy fit over).
You can do this a number of ways. If you store the age of the user(whatever) in the row. Then you can query the age and with > 16 or < 30 or between 10-20 whatever. The other option is to store this as a bitwise. Have a reference table and store your different ranges if they can have multiples then you just add the two row values together.
1 = 10
2 = 16+
4 = 10-20
8 = 20-30
16 = 20+
32 = 30+
.
.
.
.
then in the table that stores the persons info you can set the column to an int or bigint take your preference and then for whatever groups they belong to you can determine this by the number for example:
Table of Users
ID Name BitWise
1 test 2
2 something 6 (2+4)
3 blah 24 (8+16)
However I think that it may be a bit overkill with the bitwise you might be best just storing the age as a number an running queries against that. More than likely this will be the most efficient.
You have a range of options (no pun intended). For age recommendations, the easiest way is to store a min_age and max_age and query like this:
select * from item where :age between min_age and max_age
where you have to decide whether you allow nulls for these columns (then you need to use coalesce() or nvl() or whatever function your database provides for dealing with comparisons with nulls), or set boundary values for these columns where you can be sure :age will always fall in between.
Alternatively, you can use a m:n table
create table item_ages (item_id int not null, age int not null, constraint item_ages_pk primary key (item_id, age)
and fill it with explicit values:
item_id | age
-------------
1 | 16
1 | 17
1 | 18
and so on. This is more cumbersome tha using a range, but also more flexible, and since your database can index the table and probably store that index in memory, queries should be fast. You only have to touch this table when a new item is entered or the age range for a particular item changes.
Note that CBRRacer's answer has similar properties: both share the idea that you prepare a datastructure that can easily be indexed, and answer the filter question from that index. This is a popular method for storing marketing data in ecommerce applications. The extreme end of that range would be to use a dedicated package for storing inverted indexes for that purpose. But for a simple age recommendation that's of course overkill.
Someting like this:
SELECT *
FROM tablename
WHERE 17 BETWEEN start_age AND end_age
I've been beating my head on the desk trying to figure this one out. I have a table that stores job information, and reasons for a job not being completed. The reasons are numeric,01,02,03,etc. You can have two reasons for a pending job. If you select two reasons, they are stored in the same column, separated by a comma. This is an example from the JOBID table:
Job_Number User_Assigned PendingInfo
1 user1 01,02
There is another table named Pending, that stores what those values actually represent. 01=Not enough info, 02=Not enough time, 03=Waiting Review. Example:
Pending_Num PendingWord
01 Not Enough Info
02 Not Enough Time
What I'm trying to do is query the database to give me all the job numbers, users, pendinginfo, and pending reason. I can break out the first value, but can't figure out how to do the second. What my limited skills have so far:
select Job_number,user_assigned,SUBSTRING(pendinginfo,0,3),pendingword
from jobid,pending
where
SUBSTRING(pendinginfo,0,3)=pending.pending_num and
pendinginfo!='00,00' and
pendinginfo!='NULL'
What I would like to see for this example would be:
Job_Number User_Assigned PendingInfo PendingWord PendingInfo PendingWord
1 User1 01 Not Enough Info 02 Not Enough Time
Thanks in advance
You really shouldn't store multiple items in one column if your SQL is ever going to want to process them individually. The "SQL gymnastics" you have to perform in those cases are both ugly hacks and performance degraders.
The ideal solution is to split the individual items into separate columns and, for 3NF, move those columns to a separate table as rows if you really want to do it properly (but baby steps are probably okay if you're sure there will never be more than two reasons in the short-medium term).
Then your queries will be both simpler and faster.
However, if that's not an option, you can use the afore-mentioned SQL gymnastics to do something like:
where find ( ',' |fld| ',', ',02,' ) > 0
assuming your SQL dialect has a string search function (find in this case, but I think charindex for SQLServer).
This will ensure all sub-columns begin and start with a comma (comma plus field plus comma) and look for a specific desired value (with the commas on either side to ensure it's a full sub-column match).
If you can't control what the application puts in that column, I would opt for the DBA solution - DBA solutions are defined as those a DBA has to do to work around the inadequacies of their users :-).
Create two new columns in that table and make an insert/update trigger which will populate them with the two reasons that a user puts into the original column.
Then query those two new columns for specific values rather than trying to split apart the old column.
This means that the cost of splitting is only on row insert/update, not on _every single select`, amortising that cost efficiently.
Still, my answer is to re-do the schema. That will be the best way in the long term in terms of speed, readable queries and maintainability.
I hope you are just maintaining the code and it's not a brand new implementation.
Please consider to use a different approach using a support table like this:
JOBS TABLE
jobID | userID
--------------
1 | user13
2 | user32
3 | user44
--------------
PENDING TABLE
pendingID | pendingText
---------------------------
01 | Not Enough Info
02 | Not Enough Time
---------------------------
JOB_PENDING TABLE
jobID | pendingID
-----------------
1 | 01
1 | 02
2 | 01
3 | 03
3 | 01
-----------------
You can easily query this tables using JOIN or subqueries.
If you need retro-compatibility on your software you can add a view to reach this goal.
I have a tables like:
Events
---------
eventId int
eventTypeIds nvarchar(50)
...
EventTypes
--------------
eventTypeId
Description
...
Each Event can have multiple eventtypes specified.
All I do is write 2 procedures in my site code, not SQL code
One procedure converts the table field (eventTypeIds) value like "3,4,15,6" into a ViewState array, so I can use it any where in code.
This procedure does the opposite it collects any options your checked and converts it in
If changing the schema is an option (which it probably should be) shouldn't you implement a many-to-many relationship here so that you have a bridging table between the two items? That way, you would store the number and its wording in one table, jobs in another, and "failure reasons for jobs" in the bridging table...
Have a look at a similar question I answered here
;WITH Numbers AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS N
FROM JobId
),
Split AS
(
SELECT JOB_NUMBER, USER_ASSIGNED, SUBSTRING(PENDING_INFO, Numbers.N, CHARINDEX(',', PENDING_INFO + ',', Numbers.N) - Numbers.N) AS PENDING_NUM
FROM JobId
JOIN Numbers ON Numbers.N <= DATALENGTH(PENDING_INFO) + 1
AND SUBSTRING(',' + PENDING_INFO, Numbers.N, 1) = ','
)
SELECT *
FROM Split JOIN Pending ON Split.PENDING_NUM = Pending.PENDING_NUM
The basic idea is that you have to multiply each row as many times as there are PENDING_NUMs. Then, extract the appropriate part of the string
While I agree with DBA perspective not to store multiple values in a single field it is doable, as bellow, practical for application logic and some performance issues. Let say you have 10000 user groups, each having average 1000 members. You may want to have a table user_groups with columns such as groupID and membersID. Your membersID column could be populated like this:
(',10,2001,20003,333,4520,') each number being a memberID, all separated with a comma. Add also a comma at the start and end of the data. Then your select would use like '%,someID,%'.
If you can not change your data ('01,02,03') or similar, let say you want rows containing 01 you still can use " select ... LIKE '01,%' OR '%,01' OR '%,01,%' " which will insure it match if at start, end or inside, while avoiding similar number (ie:101).
I have a SQL server table that has a list of Usernames and a list of points ranging from 1 to 10,000.
I want to display the number of points in my VB.NET program from the username given. For example, if I type in "john" in the first box the program will give me a message box saying whatever amount of points "john" has. I'm not really sure about SQL queries so please help me out here.
This is the Table Structure:
Usernames Points
-----------------------------
John 20
Kate 40
Dan 309
Smith 4958
Depending on the structure of the table, a suitable query is one of these:
select sum(points) as points
from usernames
where name='username';
or
select points
from usernames
where name='username';
or
select count(*) as points
from usernames
where name='username';