How to do damage with SQL by adding to the end of a statement? - sql

Perhaps I am not creative or knowledgeable enough with SQL... but it looks like there is no way to do a DROP TABLE or DELETE FROM within a SELECT without the ability to start a new statement.
Basically, we have a situation where our codebase has some gigantic, "less-than-robust" SQL generation component that never uses prepared statements and we now have an API that interacts with this legacy component.
Right now we can modify a query by appending to the end of it, but have been unable to insert any semicolons. Thus, we can do something like this:
/query?[...]&location_ids=loc1')%20or%20L1.ID%20in%20('loc2
which will result in this
SELECT...WHERE L1.PARENT_ID='1' and L1.ID IN ('loc1') or L1.ID in ('loc2');...
This is just one example.
Basically we can append pretty much anything to the end of any/most generated SQL queries, less adding a semicolon.
Any ideas on how this could potentially do some damage? Can you add something to the end of a SQL query that deletes from or drops tables? Or create a query so absurd that it takes up all CPU and never completes?

You said that this:
/query?[...]&location_ids=loc1')%20or%20L1.ID%20in%20('loc2
will result in this:
SELECT...WHERE L1.PARENT_ID='1' and L1.ID IN ('loc1') or L1.ID in ('loc2');
so it looks like this:
/query?[...]&location_ids=');DROP%20TABLE users;--
will result in this:
SELECT...WHERE L1.PARENT_ID='1' and L1.ID IN ('');DROP TABLE users;--');
which is a SELECT, a DROP and a comment.

If it’s not possible to inject another statement, you limited to the existing statement and its abilities.
Like in this case, if you are limited to SELECT and you know where the injection happens, have a look at PostgreSQL’s SELECT syntax to see what your options are. Since you’re injecting into the WHERE clause, you can only inject additional conditions or other clauses that are allowed after the WHERE clause.
If the result of the SELECT is returned back to the user, you may want to add your own SELECT with a UNION operation. However, PostgreSQL requires compatible data types for corresponding columns:
The two SELECT statements that represent the direct operands of the UNION must produce the same number of columns, and corresponding columns must be of compatible data types.
So you would need to know the number and data types of the columns of the original SELECT first.
The number of columns can be detected with the ORDER BY clause by specifying the column number like ORDER BY 3, which would order the result by the values of the third column. If the specified column does not exist, the query will fail.
Now after determining the number of columns, you can inject a UNION SELECT with the appropriate number of columns with an null value for each column of your UNION SELECT:
loc1') UNION SELECT null,null,null,null,null --
Now you determine the types of each column by using a different value for each column one by one. If the types of a column are incompatible, you may an error that hints the expected data type like:
ERROR: invalid input syntax for integer
ERROR: UNION types text and integer cannot be matched
After you have determined enough column types (one column may be sufficient when it’s one that is presented the user), you can change your SELECT to select whatever you want.

Related

Why does joining on different data types produce a conversion type inconsistently?

As I try to join tables together on a value that's represented in different data types, I get really odd errors. Please consider the following:
I have two tables; let's say one is in database "CoffeeWarehouse," and the other is in database "CoffeeAnalytics":
Table 1: CoffeeWarehouse.dbo.BeanInfo
Table 2: CoffeeAnalytics.dbo.BeanOrderRecord
Now, both tables have a field called OrderNumber (although in table 2, it's spelled as[order number]); in Table 1, it's represented as a string, and in Table 2, it's represented as a float.
I proceed to join the tables together:
SELECT ordernumber,
bor.*
FROM CoffeeWarehouse.dbo.BeanInfo AS bni
LEFT JOIN CoffeeAnalytics.dbo.BeanOrderRecord AS bor ON bor.[order number] = bni.ordernumber;
If I specify the order numbers I'd like by adding the following:
WHERE bni.ordernumber = '48911'
then I see the complete table I'd like- all the fields from the table I've joined are populated properly.
If I add more order numbers, it works too:
WHERE bni.ordernumber IN ('48911', '83716', '98811', ...)
Now for the problem:
Suppose I want to select everything in the table where another field, i.e. CountryOfOrigin, is not null. I'm not going to enter several thousand order numbers- I just want to use a where clause to weed out the rows with incomplete data.
So I add the following to my original query:
WHERE bor.CountryOfOrigin IS NOT NULL
When I execute, I get this error:
Msg 8114, Level 16, State 5, Line 1
Error converting data type varchar to float.
I get the same error if I even simply use this as a where clause:
WHERE bni.ordernumber IS NOT NULL
Why is this the case? When I specify the ordernumber, the join works well- when I want to select many ordernumbers, I get a conversion error.
Any help/insight?
The SQL Server query optimiser can choose different paths to get your results, even with the same query from minute to minute.
In this query, say:
SELECT ordernumber,
bor.*
FROM CoffeeWarehouse.dbo.BeanInfo AS bni
LEFT JOIN CoffeeAnalytics.dbo.BeanOrderRecord AS bor ON bor.[order number] = bni.ordernumber
WHERE bni.ordernumber = '48911';
The query optimiser may, for example, take one of two paths:
It may choose to use BeanInfo as the "driving" table, use an index to narrow down the rows in that table to, say, a single row with order number 48911, and then join to BeanOrderRecord using just that one order number.
It may choose to use BeanOrderRecord as the driving table, join the two tables together by order number to get a full set of results, and then filter that resultset by the order number.
Which path the query optimiser takes will depend on a variety of things, including defined indexes, the number of rows in the table, cardinality, and so on.
Now, if it just so happens that one of your order numbers isn't convertible to a float—say someone typed '!2345' by accident—the first optimiser choice may always work, and the second one may always fail. But you don't get to choose which path the optimiser takes.
This is why you're seeing what you think of as weird results. In one of your queries, all the order numbers are being analysed and that's triggering the error, in another only order numbers that are convertible to float are being analysed, so there's no error. But it's basically just luck that it's working out the way it is. It could just as well be the other way around, or neither query might ever work.
This is one reason it's bad to store things in inappropriate data types. Fixing that would be the obvious solution.
A dirty and terrible fix, however, might be to always cast your FLOAT to a VARCHAR when doing the order number comparison, as I believe it's always safe to cast from FLOAT to VARCHAR. Though you may need to experiment to make sure the resulting VARCHAR value is formatted the same as your order number (or cast to INTEGER first...)
You'll have to resort to some quite fiddly trickery to get any performance out of your existing setup, though. If they were both VARCHAR values you could easily make the table join very fast by indexing each order number column, but as it is the casting you'll have to do will render normal indexes unusable for a join.
If you're using a recent version of SQL Server, you can use TRY_CAST to find the problem row(s):
SELECT * FROM BeanOrderRecord WHERE TRY_CAST([order number] AS VARCHAR) IS NULL
...will find rows with any FLOAT [order number] which can't be converted to a VARCAHR.

What does * mean in sql?

For example, I know what SELECT * FROM example_table; means. However, I feel uncomfortable not knowing what each part of the code means.
The second part of a SQL query is the name of the column you want to retrieve for each record you are getting.
You can obviously retrieve multiple columns for each record, and (only if you want to retrieve all the columns) you can replace the list of them with *, which means "all columns".
So, in a SELECT statement, writing * is the same of listing all the columns the entity has.
Here you can find probably the best tutorial for SQL learning.
I am providing you answer by seperating each part of code.
SELECT == It orders the computer to include or select each content from the database name(table ) .
(*) == means all {till here code means include all from the database.}
FROM == It refers from where we have to select the data.
example_table == This is the name of the database from where we have to select data.
the overall meaning is :
include all data from the databse whose name is example_table.
thanks.
For a beginner knowing the follower concepts can be really useful,
SELECT refers to attributes that you want to have displayed in your final query result. There are different 'SELECT' statements such as 'SELECT DISTINCT' which returns only unique values (if there were duplicate values in the original query result)
FROM basically means from which table you want the data. There can be one or many tables listed under the 'FROM' statement.
WHERE means the condition you want to satisfy. You can also do things like ordering the list by using 'order by DESC' (no point using order by ASC as SQL orders values in ascending order after you use the order by clause).
Refer to W3schools for a better understanding.

DB2/SQL equivalent of SAS's sum(of ) function

SAS has a sum(of col1 - coln ) function which finds the sum of all the values from col1, col2, col3...coln. (ie, you don't have to list out all the column names, as long as they are numbered consecutively). This is a handy shortcut to find a sum of several (suitably named) variables.
Question - Is there a DB2/SQL equivalent of this? I have 50 columns (they are named col1, col2, col3....col50 and I need to find the sum of them.
ie:
select sum(col1, col2, col3,....,col50) AggregateSum
from foo.table
No, DB2 has no such beast, at least to my knowledge. However, you can dynamically create such a query by first querying the database metadata to extract the columns for a given table.
From memory, DB2 has a sysibm.syscolumns table which basically contains the column information that you could use to construct a query on the fly.
You would first use a query like:
select column for sysibm.syscolumns
where schema = 'foo' and tablename = 'table'
and column like 'col%'
(the column names may not match exactly but, since they're not the same on the differing variants of DB2 (DB2/z, DB2/LUW, iSeries DB2, etc) anyway, that hardly matters).
Then use the results of that query to construct your actual query:
select col1+col2+...+colN AggregateSum from foo.table
where the col1+col2+...+colN bit has been built from the previous query.
If, as you mention in a comment, you only want the eighteen "highest" columns (e.g., if columns 1 thru 100 exist, you only want 83 thru 100), you can modify the first query to do that, with something like:
select column for sysibm.syscolumns
where schema = 'foo' and tablename = 'table'
and column like 'col%'
order by column desc
fetch first 18 rows only
but, in that case, you may want to call the columns col0001, col0145 and so on, or make the sorting able to handle variable width numbers.
Although it may be easier (if you can't change the column names) to get all the columns colNNN, sort them yourself by the numeric (not string) value after the col, and throw away all but the last eighteen when constructing the second query).
Both these options will return only eighteen rows maximum.
But you may also want to think, in that case, about moving the variable data to another table, if that's possible in your situation. If you ever find yourself maintaining an array within a table, it's usually better to separate that out.
So your main table would then be something like:
main_id primary key
other_data
and your auxiliary table would be akin to:
main_id foreign key to main(main_id)
sequence_nm
other_data
primary key (main_id, sequence_num)
That would allow you to have sparse data if needed, and also to add data without having to change the schema of the main table. The query to get the latest eighteen results would be a little more complicated but still a relatively simple join of the two tables.

Oracle CONTAINS() not returning results for numbers

So I have this table with a full text indexed column "value". The field contains some number strings in it, which are correctly returned when i use a query like so:
select value
from mytable
where value like '17946234';
Trouble is, it's incredibly slow because there are a lot of rows in this table, but when i use the CONTAINS operator I get no results:
select value
from mytable
where CONTAINS ( value, '17946234',1)>0
Anyone got any thoughts?
Unfortunately, I'm not an Oracle dude, and the same query works fine in SQL Server. I feel like it must be a stoplist or something with the Oracle Lexer that I can change, but not really sure how.
This could be due to INDEXTYPE IS CTXSYS.CONTEXT in general, or the index not having been updated after the looked for records where added (CONTEXT type indexes are not transactional, whilst CTXCAT type ones are).
However, if you did not accidentally lose the wildcard in your statement (In other words: If no wildcard is required indeed.), you could just query
select value from mytable where value = '17946234';
which could possibly be backed by an ordinary index. (Depending on your specific data distribution and the queries run, an index might not help query performance.)
Otherwise
select value from mytable where instr(value, '17946234') > 0;
might be sufficient.

I have a large query, how do I debug this?

So, I get this error message:
EDT ERROR: syntax error at or near "union" at character 436
The query in question is a large query that consists of 12 smaller queries all connected together with UNION ALL, and each small query has two inner join statements. So, something like:
SELECT table.someid as id
,table.lastname as name
,table2.groupname as groupname
, 'Leads ' as Type
from table
inner join table3 on table3.specificid = table.someid
INNER JOIN table2 on table3.specificid=table2.groupid
where table3.deleted=0
and table.someid > 0
and table2.groupid in ('2','3','4')
LIMIT 5
UNION all
query2....
Note that table2 and table3 are the same tables in each query, and the fields from table2 and table3 are also the same, I think.
Quick question (I am still kinda new to all this):
What does 'Leads ' as Type mean? Unlike the other statements preceding an AS, this one isn't written like table.something.
Quick edit question: What does table2.groupid in ('2','3','4') mean?
I checked each small query one by one, each one works and returns a result, though the results are always empty for some reason(this may or may not be dependent on the user logged in though, as some PHP code generated this query).
As for the results themselves, most of them look something like this (they are arranged horizontally though):
id(integer)
name (character varying(80))
groupname (character varying(100))
type (unknown)
The difference in the results are twofold:
1)Most of the results contain the same field names but quite a few of them have different field lengths. Like some will say character varying (80), while others will say character varying (100), please correct me if this is actually not field length.
2)2 of the queries contain different fields, but only the id field is different, and it's probably because they don't have the "as id" part.
I am not quite sure of what the requirements of UNION ALL are, but if I think, it is meant to only work if all the fields are the same, but if that funky number changes (the one in the brackets), then are the fields considered to be different even if they have the same name?
Also, what's strange is that some of the queries returned the exact same fields, with the same field length, so I tried to UNION ALL only those queries, but no luck, still got a syntax error at UNION.
Another important thing I should mention is that the DB used to be MySQL, but we changed to PostGreSQL, so this bug might be a result of the change (i.e. code that might work in MySQL but not in PostGres).
Thanks for your time.
You can have only one "LIMIT xxx" clause. At the end of the query, and not before the UNION.
The error you get is due to missing parentheses here:
...
LIMIT 5
UNION all
...
The manual:
(ORDER BY and LIMIT can be attached to a subexpression if it is
enclosed in parentheses. Without parentheses, these clauses will be
taken to apply to the result of the UNION, not to its right-hand input
expression.)
Later example:
Sum results of a few queries and then find top 5 in SQL
The only real way I have found to debug big queries is to break it into understandable parts and debug each subexpression independently:
Does each show the expected rows?
Are the resulting fields and types as expected?
For union, do the result fields and types exactly match corresponding other subexpressions?