SQL query returns different number of rows depending on the columns selected - sql

Thanks for the contributions so far. After more digging I am re-stating the question (and indeed the title of the question) as follows:
I am selecting just 2 columns from a view that contains several columns. The view returns 50,497 rows if I select all columns, but only 50,496 (i.e. 1 fewer) when I select just 2 columns, these being [Patient_ID] (which is a bigint column) and [Condition_Code] (a varchar(6) column).
Version 1:
SELECT * FROM [vw_Query1]
returns 50,497 rows.
But:
SELECT [Patient_ID], [Condition_Code] FROM [vw_Query1]
returns 50,496 rows.
I can post the code for [vw_Query1] if required, but an understanding at a fundamental level how this can happen when no GROUP BY clause has been used is the key question for me.
UPDATE:
It turns out that if I exclude one particular column, I get the lower number of rows of 50,496. This column is unique in having a Case-Sensitive collation. I still dont understand why it is dropping one particular row but at least I am getting closer to an understanding.

Related

Selecting a large number of rows by index using SQL

I am trying to select a number of rows by the value of a column called ID. I know you can do this pretty easily by:
SELECT col1, col2, col3 FROM mytable WHERE id IN (1,2,3,4,5...)
However, what if there are a few million IDs I want to select and the IDs don't always have pattern (which means I can't use something like BETWEEN x AND y)? Does this select statement still work or is there better ways of doing so?
The actual application is this. Filters are specified by users, which is compared to some attributes of the records. From those filters, we create a subset of the data which is of interest to a particular user. There are about 30 million records each with roughly ~3000 attributes (which is stored in roughly 30 tables, but every table has ID as a primary key), so every time someone makes a query about their desired subset of records, we'd have to join many tables, apply those filters, and figure out what his subset looks like. In order to avoid joining many tables all the time, I thought maybe it's a better idea to join the tables once, figure out the id of the selected subset, and this way each time a new query is made, all we have to do is select the relevant columns of the rows that match the filtered ids.
This depends on the database and the interface you are using. For a few hundred or thousand values, no problem. But your question specifies millions. And that could start to get into limits on the length of the query -- either specified by the database, the tool you are using, or intermediate libraries.
If you have so many ids, I would strongly recommend that you load them into a table in the database with the id as the primary key. Then use join or exists to identify the rows in your table that match.
Often, such a list would be generated in the database anyway. In that case, you can use a subquery or CTE and just include that code in your final query.

SQL to identify duplicate columns from table having hundreds of column

I've 250+ columns in customer table. As per my process, there should be only one row per customer however I've found few customers who are having more than one entry in the table
After running distinct on entire table for that customer it still returns two rows for me. I suspect one of column may be suffixed with space / junk from source tables resulting two rows of same information.
select distinct * from ( select * from customer_table where custoemr = '123' ) a;
Above query returns two rows. If you see with naked eye to results there is not difference in any of column.
I can identify which column is causing duplicates if I run query every time for each column with distinct but thinking that would be very manual task for 250+ columns.
This sounds like very dumb question but kind of stuck here. Please suggest if you have any better way to identify this, thank you.
Solving this one-time issue with sql is too much effort. Simply copy-paste to excel, transpose data into columns and use some simple function like "if a==b then 1 else 0".

Update multiple NULL values using the NON-NULL value available in SQL

I have two tables that store two sets of ID's which are the same, when any of the four ID's are NULL there's an issue on the front end application. These four values always vary as to which can be NULL but there will always be one with the correct entry.
My question is can I enter these four values into a temp table then update all the NULL values using the column which has actually has a value? As the column with the correct value changes all the time it makes it harder.
Basically i'm making a stored proc but can't figure this logic out.
It sounds like you just need to use coalesce to find the non-NULL value.
coalesce(table1.col1, table1.col2, table2,col1, table2.col2)
The only caveat is that if two columns have different non-NULL values, then this expression returns the first one (in the order you list the columns) it finds. But if you don't have that situation occur, or if you can specify which column you'd use when it does occur, this should work regardless of what combination of columns has NULL.
Using table expression to join two tables then from the result table, update the column missing data with the ones have.

String Grouping from a single column in Oracle database having million rows and removing duplicates

We have a huge table and one of the column contains queries like e.g. in row 1
1. (((firstname:Adam OR firstname:Neil ) AND lastname:Lee) ) AND category:"Legal" AND type:Individual
and in row 2 of same column
2. (((firstname:Adam* OR firstname:Neil ) AND lastname:Lee) ) AND category:"Legal" AND type:Organization
Similarly there are few other types of Query strings which are used eventually to query external services.
Issue is based on certain criteria I have to group and remove duplicates from this table.
There are few rules to determine grouping of Strings in different rows.One of them is that if first name and lastname are same then ignore category and type values, therefore above two rows will be grouped to one. There are around million rows. Comparing Strings and doing grouping is not looking elegant solution. What could be best possible solution using sql.

How are these tasks done in SQL?

I have a table, and there is no column which stores a field of when the record/row was added. How can I get the latest entry into this table? There would be two cases in this:
Loop through entire table and get the largest ID, if a numeric ID is being used as the identifier. But this would be very inefficient for a large table.
If a random string is being used as the identifier (which is probably very, very bad practise), then this would require more thinking (I personally have no idea other than my first point above).
If I have one field in each row of my table which is numeric, and I want to add it up to get a total (so row 1 has a field which is 3, row 2 has a field which is 7, I want to add all these up and return the total), how would this be done?
Thanks
1) If the id is incremental, "select max(id) as latest from mytable". If a random string was used, there should still be an incremental numeric primary key in addition. Add it. There is no reason not to have one, and databases are optimized to use such a primary key for relations.
2) "select sum(mynumfield) as total from mytable"
for the last thing use a SUM()
SELECT SUM(OrderPrice) AS OrderTotal FROM Orders
assuming they are all in the same column.
Your first question is a bit unclear, but if you want to know when a row was inserted (or updated), then the only way is to record the time when the insert/update occurs. Typically, you use a DEFAULT constraint for inserts and a trigger for updates.
If you want to know the maximum value (which may not necessarily be the last inserted row) then use MAX, as others have said:
SELECT MAX(SomeColumn) FROM dbo.SomeTable
If the column is indexed, MSSQL does not need to read the whole table to answer this query.
For the second question, just do this:
SELECT SUM(SomeColumn) FROM dbo.SomeTable
You might want to look into some SQL books and tutorials to pick up the basic syntax.