Determine datatypes of columns - SQL selection - sql

Is it possible to determine the type of data of each column after a SQL selection, based on received results? I know it is possible though information_schema.columns, but the data I receive comes from multiple tables and is joint together and the data is renamed. Besides that, I'm not able to see or use this query or execute other queries myself.
My job is to store this received data in another table, but without knowing beforehand what I will receive. I'm obviously able to check for example if a certain column contains numbers or text, but not if it is originally stored as a TINYINT(1) or a BIGINT(128). How to approach this? To clarify, it is alright if the data-types of the columns of the source and destination aren't entirely the same, but I don't want to reserve too much space beforehand (or too less for that matter).
As I'm typing, I realize I'm formulation the question wrong. What would be the best approach to handle described situation? I thought about altering tables on the run (e.g. increasing size if needed), but that seems a bit, well, wrong and not the proper way.
Thanks

Can you issue the following query about your new table after you create it?
SELECT *
INTO JoinedQueryResults
FROM TableA AS A
INNER JOIN TableB AS B ON A.ID = B.ID
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'JoinedQueryResults'
Is the query too big to run before knowing how big the results will be? Get a idea of how many rows it may return, but the trick with queries with joins is to group on the columns you are joining on, to help your estimate return more quickly. Here's of an example of just returning a row count from the query above which would have created the JoinedQueryResults table above.
SELECT SUM(A.NumRows * B.NumRows)
FROM (SELECT ID, COUNT(*) AS NumRows
FROM TableA
GROUP BY ID) AS A
INNER JOIN (SELECT ID, COUNT(*) AS NumRows
FROM TableB
GROUP BY ID) AS B ON A.ID = B.ID
The query above will run faster if all you need is a record count to help you estimate a size.
Also try instantiating a table for your results with a query like this.
SELECT TOP 0 *
INTO JoinedQueryResults
FROM TableA AS A
INNER JOIN TableB AS B ON A.ID = B.ID

Related

SQL Server Query In Operator

I have this type of record:
Rajkot,Gandhinagar
but I want the above record to be changed like the record below.
'Rajkot','Gandhinagar'
As I want to use IN operator to get result.
Note that using a junction table will usually perform better as noted in the comments.
Nevertheless, assuming you are stuck with the design:
TableA
ID ValueList
1 Uno,Dos,Tres
2 Foo,Bar,Baz,Quux
And you want to do the equivalent of this:
Select *
from TableA a
where #Value in ValueList -- ERROR
Try this:
Select *
from TableA a
where ','+ValueList+',' like '%,'+#Value+',%'
If you want to do this:
select *
from TableA b
where b.Value in (select ValueList from TableA a where a.ID = b.ID)
Try:
select *
from TableB b
where exists (
select 1 from TableA a
where a.ID = b.ID and ','+a.ValueList+',' like '%,'+b.Value+',%'
)
Notes on design and performance: This design prevents any index being used on the column ValueList. This may not be a problem if:
TableA is very small and has very few rows (e.g. < 10 rows). This is because if the data fits into one or two pages, the overhead involved with looking up the index may be greater than the overhead involved in just scanning the page and doing string comparisons.
Or only a very small subset of rows are actually being searched.
For example, if you are looking up individual rows by a unique key, or a few tens of rows by an efficient index, and just want to filter based on whether a string is in ValueList, this may be faster than a junction table, because the data is held in the same page.
It may also be faster than filtering client-side (because rows which fail the test don't have to be returned to the client).
In other words, if you are not searching by values from this list, but merely filtering by them, it may not be worth putting them in to a junction table.
As always one should not be dogmatic about design, but test.
Code :
select ''''+substring('Rajkot,Gandhinagar',1,charindex(',','Rajkot,Gandhinagar',0)-1)+'''' + ',' +''''+
substring('Rajkot,Gandhinagar',charindex(',','Rajkot,Gandhinagar',0)+1,len('Rajkot,Gandhinagar'))+''''

Is exist a simple way to select all data from specific table

I have two tables: A, B. I need to select all data from B.
I could do this like
SELECT id, b1, b2, ... b20 FROM A,B WHERE A.id = B.id;
it's not cool solution. I will need to update this statement if I modify B's database. Is exist something like ?
SELECT *(B)
It selected all data from B and didn't selected any data from A.
My databases
A
id
a1
a2
...
a20
B
id
b1
b2
...
b20
So if you want create database seriously you shouldn't see on complexity of way but on efficiency and speed. So my advice to you is use JOIN that is the best solution for selecting data from two and more tables, because this way is fast as possible, at least for my more cleaner like for example inserted select.
You wrote: I need to select all data from B this means SELECT * FROM B not that you wrote.
My advice to your is using this:
SELECT * FROM A <INNER / LEFT / RIGHT / NATURAL> JOIN B ON A.id = B.id;
or to select specific columns
SELECT A.column1, A.column2, ... FROM A <INNER / LEFT / RIGHT / NATURAL> JOIN B ON A.id = B.id;
Note:
NATURAL JOIN will work in above example since the primary key
and the foreign key in the two tables have the same name. So you must be very
careful in using NATURAL JOIN queries in the absence of properly
matched columns.
So you really should think about how you will create database and how you will working with database, how you will pulling data for View from database, how you will insert new potential data to database etc.
Regards man!
Use following query:
SELECT * FROM B;
or
SELECT * FROM B INNER JOIN A ON A.id = B.id;
if you want to join tables A and B.
I suspect that the others have sufficiently answered your question about selecting all fields from table B. That's great, as you really should understand the SQL basics. If they haven't, I'd also advise that you check out SQLite.org site for a clarification on SQL syntax understood by SQLite.
But, assuming you've answered your question, I just want to voice two words of caution about using the asterisk syntax.
First, what if, at some later date, you add a column to B that is a big hairy blob (e.g. a multimegabyte image). If you use the * (or B.*) syntax to retrieve all of the columns from B, you may be retrieving a ton of information you might not need for your particular function. Don't retrieve data from SQLite if you don't need it.
Second, is your Objective C retrieving the data from your select statement on the basis of the column names of the result, or based upon the index number of the column in question. If you're doing the latter, then using the * syntax can be dangerous, because you can break your code if the physical order of columns in your table ever changes.
Using named columns can solve the problem of memory/performance issues in terms of retrieving too much data, as well as isolating the Objective C from the physical implementation of the table in SQLite. Generally, I would not advise developers to use the * syntax when retrieving data from a SQL database. Perhaps this isn't an issue for a trivial project, but as projects become more complicated, you may want to think carefully about the implications of the * syntax.
I don't know if the following query would work in sqlite, I know it works in Oracle, but you can give it a try...
SELECT B.* FROM A,B WHERE A.id = B.id;

Why does my left join in Access have fewer rows than the left table?

I have two tables in an MS Access 2010 database: TBLIndividuals and TblIndividualsUpdates. They have a lot of the same data, but the primary key may not be the same for a given person's record in both tables. So I'm doing a join between the two tables on names and birthdates to see which records correspond. I'm using a left join so that I also get rows for the people who are in TblIndividualsUpdates but not in TBLIndividuals. That way I know which records need to be added to TBLIndividuals to get it up to date.
SELECT TblIndividuals.PersonID AS OldID,
TblIndividualsUpdates.PersonID AS UpdateID
FROM TblIndividualsUpdates LEFT JOIN TblIndividuals
ON ( (TblIndividuals.FirstName = TblIndividualsUpdates.FirstName)
and (TblIndividuals.LastName = TblIndividualsUpdates.LastName)
AND (TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or (TblIndividuals.DateBorn is null
and (TblIndividuals.MidName is null and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName))));
TblIndividualsUpdates has 4149 rows, but the query returns only 4103 rows. There are about 50 new records in TblIndividualsUpdates, but only 4 rows in the query result where OldID is null.
If I export the data from Access to PostgreSQL and run the same query there, I get all 4149 rows.
Is this a bug in Access? Is there a difference between Access's left join semantics and PostgreSQL's? Is my database corrupted (Compact and Repair doesn't help)?
ON (
TblIndividuals.FirstName = TblIndividualsUpdates.FirstName
and
TblIndividuals.LastName = TblIndividualsUpdates.LastName
AND (
TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or
(
TblIndividuals.DateBorn is null
and
(
TblIndividuals.MidName is null
and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName
)
)
)
);
What I would do is systematically remove all the join conditions except the first two until you find the records drop off. Then you will know where your problem is.
This should never happen. Unless rows are being inserted/deleted in the meantime,
the query:
SELECT *
FROM a LEFT JOIN b
ON whatever ;
should never return less rows than:
SELECT *
FROM a ;
If it happens, it's a bug. Are you sure the queries are exactly like this (and you have't omitted some detail, like a WHERE clause)? Are you sure that the first returns 4149 rows and the second one 4103 rows? You could make another check by changing the * above to COUNT(*).
Drop any indexes from both tables which include those JOIN fields (FirstName, LastName, and DateBorn). Then see whether you get the expected
4,149 rows with this simplified query.
SELECT
i.PersonID AS OldID,
u.PersonID AS UpdateID
FROM
TblIndividualsUpdates AS u
LEFT JOIN TblIndividuals AS i
ON
(
(i.FirstName = u.FirstName)
AND (i.LastName = u.LastName)
AND (i.DateBorn = u.DateBorn)
);
For whatever it is worth, since this seems to be a deceitful bug and any additional information could help resolving it, I have had the same problem.
The query is too big to post here and I don't have the time to reduce it now to something suitable, but I can report what I found. In the below, all joins are left joins.
I was gradually refining and changing my query. It had a derived table in it (D). And the whole thing was made into a derived table (T) and then joined to a last table (L). In any case, at one point in its development, no field in T that originated in D participated in the join to L. It was then the problem occurred, the total number of rows mysteriously became less than the main table, which should be impossible. As soon as I again let a field from D participate (via T) in the join to L, the number increased to normal again.
It was as if the join condition to D was moved to a WHERE clause when no field in it was participating (via T) in the join to L. But I don't really know what the explanation is.

Sql query optimization using IN over INNER JOIN

Given:
Table y
id int clustered index
name nvarchar(25)
Table anothertable
id int clustered Index
name nvarchar(25)
Table someFunction
does some math then returns a valid ID
Compare:
SELECT y.name
FROM y
WHERE dbo.SomeFunction(y.id) IN (SELECT anotherTable.id
FROM AnotherTable)
vs:
SELECT y.name
FROM y
JOIN AnotherTable ON dbo.SomeFunction(y.id) ON anotherTable.id
Question:
While timing these two queries out I found that at large data sets the first query using IN is much faster then the second query using an INNER JOIN. I do not understand why can someone help explain please.
Execution Plan
Generally speaking IN is different from JOIN in that a JOIN can return additional rows where a row has more than one match in the JOIN-ed table.
From your estimated execution plan though it can be seen that in this case the 2 queries are semantically the same
SELECT
A.Col1
,dbo.Foo(A.Col1)
,MAX(A.Col2)
FROM A
WHERE dbo.Foo(A.Col1) IN (SELECT Col1 FROM B)
GROUP BY
A.Col1,
dbo.Foo(A.Col1)
versus
SELECT
A.Col1
,dbo.Foo(A.Col1)
,MAX(A.Col2)
FROM A
JOIN B ON dbo.Foo(A.Col1) = B.Col1
GROUP BY
A.Col1,
dbo.Foo(A.Col1)
Even if duplicates are introduced by the JOIN then they will be removed by the GROUP BY as it only references columns from the left hand table. Additionally these duplicate rows will not alter the result as MAX(A.Col2) will not change. This would not be the case for all aggregates however. If you were to use SUM(A.Col2) (or AVG or COUNT) then the presence of the duplicates would change the result.
It seems that SQL Server doesn't have any logic to differentiate between aggregates such as MAX and those such as SUM and so quite possibly it is expanding out all the duplicates then aggregating them later and simply doing a lot more work.
The estimated number of rows being aggregated is 2893.54 for IN vs 28271800 for JOIN but these estimates won't necessarily be very reliable as the join predicate is unsargable.
Your second query is a bit funny - can you try this one instead??
SELECT y.name
FROM dbo.y
INNER JOIN dbo.AnotherTable a ON a.id = dbo.SomeFunction(y.id)
Does that make any difference?
Otherwise: look at the execution plans! And possibly post them here. Without knowing a lot more about your tables (amount and distribution of data etc.) and your system (RAM, disk etc.), it's really really hard to give a "globally" valid statement
Well, for one thing: get rid of the scalar UDF that is implied by dbo.SomeFunction(y.id). That will kill your performance real good. Even if you replace it with a one-row inline table-valued function it will be better.
As for your actual question, I have found similar results in other situations and have been similarly perplexed. The optimizer just treats them differently; I'll be interested to see what answers others provide.

SQL Method of checking that INNER / LEFT join doesn't duplicate rows

Is there a good or standard SQL method of asserting that a join does not duplicate any rows (produces 0 or 1 copies of the source table row)? Assert as in causes the query to fail or otherwise indicate that there are duplicate rows.
A common problem in a lot of queries is when a table is expected to be 1:1 with another table, but there might exist 2 rows that match the join criteria. This can cause errors that are hard to track down, especially for people not necessarily entirely familiar with the tables.
It seems like there should be something simple and elegant - this would be very easy for the SQL engine to detect (have I already joined this source row to a row in the other table? ok, error out) but I can't seem to find anything on this. I'm aware that there are long / intrusive solutions to this problem, but for many ad hoc queries those just aren't very fun to work out.
EDIT / CLARIFICATION: I'm looking for a one-step query-level fix. Not a verification step on the results of that query.
If you are only testing for linked rows rather than requiring output, then you'd use EXISTS.
More correctly, you need a "semi-join" but this isn't supported by most RDBMS unless as EXISTS
SELECT a.*
FROM TableA a
WHERE EXISTS (SELECT * FROM TableB b WHERE a.id = b.id)
Also see:
Using 'IN' with a sub-query in SQL Statements
EXISTS vs JOIN and use of EXISTS clause
SELECT JoinField
FROM MyJoinTable
GROUP BY JoinField
HAVING COUNT(*) > 1
LIMIT 1
Is that simple enough? Don't have Postgres but I think it's valid syntax.
Something along the lines of
SELECT a.id, COUNT(b.id)
FROM TableA a
JOIN TableB b ON a.id = b.id
GROUP BY a.id
HAVING COUNT(b.id) > 1
Should return rows in TableA that have more than one associated row in TableB.