How to simulate ifs in a sql query that is not database server dependent? - sql

Given the below table:
|idAsPrimaryKey|Id - it has a diff name, but it is easier like this|column A|
How can I select in a single sql query, not database server specific, something similar to:
List of results = null
for each different id:
if there is a row for this id that has for column A the value V1
ListOfResults add this found row
else
if there is a row for this id that has for column A the value V2
ListOfResults add this found row
else
add to ListOfResults the first row found for this id

Quite easy, since you don't seem to know anything about SQL, here's a "teach a man how to fish..." answer.
You have an amount of data and "only" a language how to get data, nothing to really "program". (Of course there are functions and procedures and so on, but those are used in other circumstances or the programmer makes things more complicated than necessary)
Because of this, you have to find a way, how to combine the data, sometimes even with itself, to get what you want. This blog post explains the basics of joins (that's how you combine tables or data from subqueries): A Visual Explanation of SQL Joins (for critics of this post, please read on...)
With this basic knowledge you should now try to create a query, where you join your table to itself two times. To choose the right value for your ListOfResults you then have to use the COALESCE() function. It returns the first of its parameters which isn't NULL.
Here comes the critic for the link I posted above. The Venn diagramms used in the first link don't represent how much data you get back from joining. For this to learn, read this answer here on SO: sql joins as venn diagram
Okay, now you learned, that you might get more data back than you might expect. And here comes another problem in your wording of your question. There's no "first" row in relational databases, you have to exactly describe which row you want, else the data you get back is actually worth nothing. You get random data. A solution for both problems is using GROUP BY and (important!) an appropriate aggregate function.
This should be enough info for you to solve the problem. Feel free to ask more questions if anything is unclear.

Related

Replacing Cursors in SQL Script?

I have created an SQL script that runs really fast and is doing everything I want. It uses a cursor to loop through the parent record and then does some calculations on each and then outputs the results into a temporary table. I have another cursor in that one to extract all the children records of that parent and again does some work and puts it into a temporary table.
My senior Dev is saying cursors are awful and I should do it another way, but also doesn't tell me what a better way is.
So how do I loop through records and do steps of calculations and create an output for each record without using a cursor?
I'm sorry due to work product and how large the script is I can't post it's code. The format of it is:
cursor loops through table that holds parent records
For each parent record it takes field values and does conversions from strings to time.
Those conversions are then used in between statements to figure out if a time falls between the 2 field times
An insert statement with the output is put into a temp table and summed at the end.
Another cursor is created in the parent cursor to pull child records of the parent record from another table. The same process as the parent happens.
I'm not actually upset with my script, its working as intended, its running very quickly so far, but I am open to better practices if possible.
First of all, I hope you're aware that SQL Server 2008 is out of support even with SP4 for a few months now.
Second, as others already said, your Senior DBA is right about cursors. And if your code is too big to post it here, it probably is too time consuming for him to go through it, understand your code and then change it for you. I would expect a senior to give some hints on what to search for, though.
About your question, I find it very hard to think of an answer because your description only gives me a vague idea what you're trying to accomplish. E.g., what are the "field values" that you convert to time?
As of my experience, SQL Server does a pretty good job interpreting datetime strings. You may also find cast/convert and datepart useful.
As far as I understood your parent/child table, you'd probably want to use a table join here. They are well explained here: https://www.w3schools.com/sql/sql_join.asp
You may aggregate the result set to sum(). But again, my understanding of your endeavour is to vague.

I'm being asked to create IN queries for different GUIDs...huh?

I'm a GIS intern.
I've been asked:
"Could you also create IN queries for the different sets of GUID’s? Here is an example:
"GlobalID" IN '{58BEE03F-1656-4BD5-B53D-B887E93A5287}', '{009C7364-8D77-46B3-A531-B60ED4E5B407}', '{0105263C-1305-4AB9-A00A-4BED01832177}')"
I'm not sure what that means or why I'd have to do it. What I can tell you is that I have several .shp that I have geocoded and then created global IDs for.
I've googled this for hours now and am no closer to understanding the request than I was. It could be that the answer is staring me in the face but I don't think I know enough to know that.
Thank you,
Kathy
In order to create and understand IN queries, first you'll have to understand the basics of a query. It sounds like this might not be something you're familiar with, so I'll start with that.
There are 3 main parts to a query, SELECT, FROM, and WHERE.
SELECT is the information (or columns) you want to return. You can SELECT * to select all columns or SELECT specificColumn1, specificColumn2 to select specific columns.
The next step is the FROM statement. From determines what table(s) you will be querying. You can query multiple tables here if you like and tables can also be aliased like so: FROM table1 t1.
The third statement is the WHERE statement, which specifies any conditions that the query is required to meet. In your case, this is where your IN statement will go. There are a ton of different keywords you can use here, but I'll just give a quick sample query for you (keep in mind I have no idea what your schema looks like).
SELECT *
FROM GUIDData
WHERE GlobalID IN ('{58BEE03F-1656-4BD5-B53D-B887E93A5287}', '{009C7364-8D77-46B3-A531-B60ED4E5B407}', '{0105263C-1305-4AB9-A00A-4BED01832177}');
So what this query will do, is it will give you all the data for each item in the GUIDData table with a global ID of {58BEE03F-1656-4BD5-B53D-B887E93A5287}, {009C7364-8D77-46B3-A531-B60ED4E5B407}, or {0105263C-1305-4AB9-A00A-4BED01832177}.
Did this help?

Determine if a SQL Insert/Update statement affects the result from a stored Select Statement

Thought this would be a good place to ask for some "brainstorming." Apologies if it's a little broad/off subject.
I was wondering if anyone here had any ideas on how to approach the following problem:
First assume that I have a select statement stored somewhere as an object (this can be the tree form of the query). For example (for simplicity):
SELECT A, B FROM table_A WHERE A > 10;
It's easy to determine the below would change the result of the above query:
INSERT INTO table_A (A,B) VALUES (12,15);
But, given any possible Insert/Update/Whatever statement, as well as any possible starting Select (but we know the Selects and can analyze them all day) I'd like to determine if it would affect the result of the Select Statement.
It's fine to assume that there won't be any "outside" queries, and that we know about all the queries being sent to the DB. It is also assumed we know the DB schema.
No, this isn't for homework. Just a brain teaser I've been thinking about and started to get stuck on (obviously, SQL can get very complicated.)
Based on the reply to the comment, I'd say that without additional criteria, this ranges between very hard and impossible.
Very hard (leastways, it would be for me) because you'd have to write something to parse and interpret your SQL statements into a workable frame of reference for your goals. Doable, but can it be worth the effort?
Impossible because some queries transcend phrases like "Byzantinely complex". (Think nested queries, correlated subqueries, views, common table expressions, triggers, outer joins, and who knows what all.) Without setting criteria such as "no subqueries, no views or triggers, no more than X joins" and so forth, the problem becomes open-ended enough to warrant an NP Complete answer.
My first thought would be to put a trigger on table_A, where if any of the columns you're affecting (col A in this case) changes to meet (or no longer meet) the condition (> 10 here), then the trigger records that an "affecting" change has taken place.
E.g. have another little table to record a "last update timestamp", which the trigger could pop a getdate() into when it detects such a change.
Then, you could check that table to see if the timestamp has changed since the last time you ran the select query - if it has, then you know you need to re-run it, if it hasn't, then you know the results would be the same.
The table could hold many such timestamps (one per row, perhaps with the table/trigger name as a key value in another column) to service many such triggers.
Advantage? Being done in a trigger on the table means no risk of a change that could affect the select statement being missed.
Disadvantage? I guess depending on how your select statements come into existence, you might have an undesirable/unmanageable overhead in creating the trigger(s).

How would you do give the user a preference for how from an SQL table is to be printed?

I'm given a task from a prospective employer which involves SQL tables. One requirement that they mentioned is that they want the name retrieved from a table called "Employees" to come in the form at of either "<LastName>, <FirstName>" OR "<FirstName> <MiddleName> <LastName> <Suffix>".
This appears confusing to me because this kind of sounds like they're asking me to make a function or something. I could probably do this in a programming language and have the information retrieved that way, but to do this in the SQL table exclusively is weird to me. Since I'm rather new to SQL and my familiarity with SQL doesn't exceed simple tasks such as creating databases, tables, fields, inserting data into fields, updating fields in records, deleting records in tables which meet a specific condition, and selecting fields from tables.
I hope that this isn't considered cheating since I mentioned that this was for a prospective employer, but if I was still in school then I could just outright ask a professor where I can find a clue for this or he would've outright told me in class. But, for a prospective job, I'm not sure who I would ask about any confusion. Thanks in advance for anyone's help.
A SQL query has a fixed column output: you can't change it. To achieve this. you could have a concatenate with a CASE statement to make it one varchar column, but then you need something (parameter) to switch the CASE.
So, this is presentation, not querying SQL.
I'd return all 4 columns mentioned and decide how I want them in the client.
Unless you have just been asked for 2 different queries on the same SQL table
You haven't specified the RDBMS, but in SQL Server you could accomplish this using Computed Columns.
Typically, you would use a View over the table..

Strategy for avoiding a common sql development error (misleading result on join bug)

Sometimes when i'm writing moderately complex SELECT statements with a few JOINs, wrong key columns are sometimes used in the JOIN statement that still return valid-looking results.
Because the auto numbering values (especially early in development) all tend to fall in similar ranges (sub 100s or so) the SELECT sill produces some results. These results often look valid at first glance and a problem is not detected until much, much later making debugging much more difficult because familiarity with the data structures and code has staled. (Gone stale in the dev's mind.)
i just spent several hours tracking down yet another of this issue that i've run into a too many times before. i name my tables and columns carefully, write my SQL statements methodically but this is an issue i can't seem to competely avoid. It comes back and bites me for hours of productivity about twice a year on average.
My question is: Has anyone come up with a clever method for avoiding this; what i assume is probably a common SQL bug/mistake?
i have thought of trying to auto-number starting with different start values but this feels cludgy and would get ugly trying to keep such a scheme straight for data models with dozens of tables... Any better ideas?
P.S.
i am very careful and methodical in naming my tables and columns. Patient table gets PatientId column, Facility get a FacilityId etc. This issues tends to arise when there are join tables involved where the linkage takes on extra meaning such as: RelatedPatientId, ReferingPatientId, FavoriteItemId etc.
When writing long complex SELECT statements try to limit the result to one record.
For instance, assume you have this gigantic enormous awesome CMS system and you have to write internal reports because the reports that come with it are horrendous. You notice that there are about 500 tables. Your select statement joins 30 of these tables. Your result should limit your row count by using a WHERE clause.
My advice is to rather then get all this code written and generalized for all cases, break the problem up and use WHERE and limit the row count to only say a record. Check all fields, if they look ok, break it up and let your code return more rows. Only after further checking should you generalize.
It bites a lot of us who keep adding more and more joins until it seems to look ok, but only after Joe Blow the accountant runs the report does he realize that the PO for 4 million was really the telephone bill for the entire year. Somehow that join got messed up!
One option would be to use your natural keys.
More practically, Red Gate SQL Prompt picks the FK columns for me.
I also tend to build up one JOIN at a time to see how things look.
If you have a visualization or diagramming tool for your SQL statements, you can follow the joins visually, and any errors will become immediately apparent, provided you have followed a sensible naming scheme for your primary and foreign keys.
Your column names should take care of this unless you named them all "ID". Are you writing multiple select statement using the same tables? You may want to create views for the more common ones.
If you're using SQL Server, you can use GUID columns as primary keys (that's what we do). You won't have problems with collisions again.
You could use GUIDs as your primary keys, but it has its pros and cons.
This pro is actually not mentioned on that page.
I have never tried doing this myself - I use a tool on top of SQL that makes incorrect joins very unlikely, so I don't have this problem. I just thought I'd mention it as another option though!
For IDs use TableNameID, for example for table Person, use PersonID
Use db model and look at the drawing when writing queries.
This way join looks like:
... ON p.PersonID = d.PersonID
as opposed to:
... ON p.ID = d.ID
Auto-increment integer PKs are among your best friends.