How to manipulate a column selected by * in SQLite? - sql

I want a query to return all rows and all columns with one caveat: if, in a given row, colN is null, then instead return the string 'FOO'.
Why dont I just use SELECT col1, col2, ..., COALESCE(colN, 'FOO')?
I am implementing an abstract interface and thus I am required to use SELECT queries which SELECT * (because I cannot make assumptions on what columns there are). I can only assume 1 columns exists: colN.
What would this provide me?
I need this because this query is used in combination with a UNION and this allows me to keep track of the origin of the data.
Any ideas on how to do this?

One thing you could do is
SELECT *, COALESCE(colN, 'FOO') as CoalescedColN
if it's possible to adjust the other select(s) in the UNION accordingly

I don't know if SQL Lite can use this technique but this is what I would do in most other dbs:
select * from
(SELECT col1, col2, ..., COALESCE(colN, 'FOO') from table ) a

Related

find duplicate records in the table with all columns are the same

Say If I have a table with hundreds of columns. The task is that I want to find out duplicate records with all the columns are the same, basically find out identical records.
I tried group by as the following
select *
from some_table
group by *
having count(*) > 1
but it seems like group by * is not allowed in sql. Anyone has some idea as to what kind of command I could run to find out identical records? Thanks in advance.
Just put comma separated list of columns instead of * in both places - select and group by. Buy not count - the count(*) should remain as is.
I verified it on SQL Server, but I am pretty sure it is ANSI SQL and should work on most (any?) ANSI SQL compatible RDBMS.
Postgresql solution, I think.
SELECT all rows, and use EXCEPT ALL to remove one of each (the SELECT DISTINCT). Now we will have the duplicates only.
select * from table
except all
select distinct * from table
You have to list out all the columns:
select col1, col2, col3, . . .
from t
group by col1, col2, col3, . . .
having count(*) > 1;
MSSQL 2016+
Add a new column in the table to hash all the columns, MSSQL HashBytes
notes to consider:
you need to convert all the columns to Varchar or Varbinary.
is you comparison case sensitive, if yes use upper() or lower()
Null values, use column sperator.
the hashing algorithm Performance on the server.
for me usualy go for something like
select col1 , col2, col3 , col4
,HASHBYTES ( 'MD5',
concat(
Convert (varbinary ,col1),'|'
,Convert (varbinary ,col2),'|'
,Convert (varbinary ,col3),'|'
,Convert (varbinary ,col4),'|'
)
) as Row_Hash
from table1
the the row_hash can be use as a singl column in the table/CTE to present the content of all the other columns
you can count by it and Order by it to find the duplicates

Adding a Records Count row to top of query results in SQL

I want a row count above my query results. I found an article that suggested using sections but the summary and select query do not have matching columns/data types.
Ex.
Total Records 25
Col1 Col2 Col3...
XXXX XXXX XXXX
example from the suggestion I found but my columns and datatypes do not match between the two queries
SELECT * FROM (SELECT [Section]=2, Col1, Col2, ..., Value1, Value2
FROM #TEMP
UNION ALL
SELECT [Section]=1, 'Total', '----', ..., SUM(Value1), SUM(Value2)
FROM #TEMP
) AS T
ORDER BY [Section], Col1, ...
To be polite, you are not using the tool the way it is meant to be used. The contents of those columns are strongly typed. Each one contains strings, dates, numbers, etc, and you're adding another row with strings on top.
The only way I can see this working is if you were to convert all of your columns to VARCHAR columns and cast all of your data to VARCHAR(MAX) in the query.
Otherwise, I think that the most reasonable solution would be to perform a second query for the totals.

Row_number() function for Informix

Does informix has a function similar to the SQLServer and Oracle's row_number()?
I have to make a query using row_number() between two values, but I don't know how.
This is my query in SQLServer:
SELECT col1, col2
FROM (SELECT col1, col2, ROW_NUMBER()
OVER (ORDER BY col1) AS ROWNUM FROM table) AS TB
WHERE TB.ROWNUM BETWEEN value1 AND value2
Some help?
If, as it appears, you are seeking to get first rows 1-100, then rows 101-200, and so on, then you can use a more direct (but non-standard) syntax. Other DBMS have analogous notations, handled somewhat differently.
To fetch rows 101-200:
SELECT SKIP 100 FIRST 100 t.*
FROM Table AS T
WHERE ...other criteria...
You can use a host variable in place of either literal 100 (or a single prepared statement with different values for the placeholders on different iterations).

Is there a difference between DISTINCT colname and DISTINCT(colname)?

I've seen both versions around. On iSeries DB2 you can use either and as far as I can tell they do the same thing. Is there a difference?
No, there is no difference because DISTINCT is a keyword and not a function call.
It's the same difference as between SOME_COLUMN and (SOME_COLUMN) (without any keyword in front)
If you have only one column in your select, then there is no difference.
However when you use distinct outside as -
select disctinct col1, col2, col3 from table
It applies distinct on the group tuple of (col1, col2, col3).
Finally there is no difference in using distinct as select distinct or select distinct()

Where clause on a column that's a result of a UDF

I have a user defined function (e.g. myUDF(a,b)) that returns an integer.
I am trying to ensure this function will be called only once and its results can be used as a condition in the WHERE clause:
SELECT col1, col2, col3,
myUDF(col1,col2) AS X
From myTable
WHERE x>0
SQL Server tries to detect x as column, but it's really an alias for a computed value.
How can you re-write this query so that the filtering can be done on the computed value without having to execute the UDF more than once?
With Tbl AS
(SELECT col1, col2, col3, myUDF(col1,col2) AS X
From table myTable )
SELECT * FROM Tbl WHERE X > 0
If you are using SQL Server 2005 and beyond, you can use Cross Apply:
Select T.col1, T.col2, FuncResult.X
From Table As T
Cross Apply ( Select myUdf(T.col1, T.col2) As X ) As FuncResult
Where FuncResult.X > 0
try
SELECT col1, col2, col3, dbo.myUDF(col1,col2) AS X
From myTable
WHERE dbo.myUDF(col1,col2) >0
but be aware that this will cause a scan since it is not SARGable
Here is another way
select * from(
SELECT col1, col2, col3, dbo.myUDF(col1,col2) AS X
From myTable ) as y
WHERE x>0
SQL Server does not allow you to reference columns by alias. You either have to write out the column twice:
SELECT col1, col2, col3, myUDF(col1,col2) AS X
From table myTable
WHERE myUDF(col1,col2) > 0
Or use a subquery:
SELECT *
FROM (
SELECT col1, col2, col3, myUDF(col1,col2) AS X
From table myTable
) as subq
WHERE x > 0
Depending on the udf and how useful or frequently used it is, you may consider adding it to the table as a computed column. You could then filter on the column as normal and not have to write out the function at all in queries.
I'm not 100% sure what you are doing but since x isn't a column I would remove it from your SQL statement so you have :
SELECT col1, col2, col3, myUDF(col1,col2) AS X From myTable
And then add the condition to your code so you only call it when x > 0
Your question is best answered by the "With" clauses (CTE's I think, in MSSS).
Really the best question is: Should I store this computed value or recalculate it for every row, each and every time I query the table.
Are there 10 rows in the table and always 10 rows?
Are rows being added constantly?
Do you have a purge strategy in place or just let it grow?
Query that table only once a month?
If this is a "long running" function (even after you've optimized the hell out of it), why do you want to execute it more than once, ever?
You asked for once, but you are really asking for once per row, per query.
Storing the answer in an index or "virtual column"
Pros:
Calculate exactly once per row.
Query times don't grow linearly.
Cons:
Increases insert/update time
Calculating every time
Pros:
Insert/update time optimized
Cons:
Query time grows with row count. (not scalable)
If you're querying once a month, why do you care how bad the performance is, go tune something that actually has a big impact on your operations (very slightly facetious).
If you're not inserting a bunch (depends on your hardware) of rows per second, is spending that time up front going to make a big difference?