I have millions of records in a set. I would like to retrieve all the records that match the same pattern.
For example I may have :
id=4444?mode=mode1?fieldA=abc
id=4444?mode=mode1?fieldA=azerty
id=4444?mode=mode1?fieldA=qwerty
id=4444?mode=mode1?fieldA=foo
id=4444?mode=mode1?fieldA=bar
Is it possible to make a query to get all the above records without knowing in advance the value of the fieldA ? Something like this in regex :
id=4444?mode=mode1?fieldA=[\w]*
Thanks for you time.
Yes, it can be done. You would need to query by a secondary index first to narrow the result set to a manageable size first, then write a filter using Lua which filters out the ones you don't want. This filter could take the regex you want to match against (passed in dynamically) and return only those records that match.
Whilst this would work, it would not be as performant as the key-value operations in Aerospike. You would definitely want to benchmark such a solution before putting it into production.
Predicate filtering was added in release 3.12 on March 15. You can use the stringRegex method of the PredExp class of the Java client to build complex filters such as the one you mentioned. It also currently exists for the C, C# and Go clients.
There's a similar example in the Aerospike Java client:
Statement stmt = new Statement();
stmt.setNamespace(params.namespace);
stmt.setSetName(params.set);
stmt.setFilter(Filter.range(binName, begin, end));
stmt.setPredExp(
PredExp.stringBin("bin3"),
PredExp.stringValue("prefix.*suffix"),
PredExp.stringRegex(RegexFlag.ICASE | RegexFlag.NEWLINE)
);
The RegexFlag class in com.aerospike.client.query defines which regular expressions you can use, and how they'd behave.
Looking at the documentation, there are fairly good instructions on how to build up a SQL query.
My code looks like this:
$sqlQuery = new SQLQuery();
$sqlQuery->setFrom('PropertyPage')->selectField('PropertyType')->setDistinct(true);
I'm aiming to get the following SQL:
SELECT DISTINCT PropertyType FROM PropertyPage
but instead I'm getting back this:
SELECT DISTINCT *, PropertyType FROM PropertyPage
Even their own example seems to give back a 'SELECT DISTINCT *. How can I avoid this?
Why do you use SQLQuery directly?
With pure ORM it should go like this:
$result = PropertyPage::get()->setQueriedColumns(array('PropertyType'))->distinct();
which returns a DataList you can loop over.
See setQueriedColumns() and distinct()
What version of SS framework are you using? Distinct was added in 3.1.7 iirc.
Just to add to #wmk's answer as well as directly address how to do it with SQLQuery, your call to selectField is the reason why that query happened. The documentation to selectField shows that it adds an additional field to select, not restrict the list to just that field.
The reason why their examples (A and B) also had the issue is that the first parameter for the constructor for SQLQuery is a default value of "*" for select statement.
To still use your code as a base, replace your use of selectField with setSelect. Your query will then look like this:
SELECT DISTINCT PropertyType FROM PropertyPage
It isn't bad to directly query the database using SQLQuery especially if you really just want the raw data of a particular column or the result itself cannot be expressed against a DataObject (eg. using column aliases). You would also have a small performance improvement that PHP would not have to instantiate a DataObject.
While saying that, it can be far more useful having a DataObject and the various functions that it exposes per record.
One way of conducting an SQL query is the defined NamedQuery in JPA:
Query query = entityManager.createNamedQuery("Users.findByName");
An alternative to this is running it without defining a NamedQuery:
Query query = entityManager.createQuery("SELECT SELECT u FROM Users u");
From what i see, NamedQuery is favorable for it is defined at one-place-for-all in the entity class and is available to a pojo that has a use for it without getting into SQL.
Are there any differences between the two?
The only difference is that one is a String that only can be used in the class where is declared, the one of your second example
Query query = entityManager.createQuery(" SELECT u FROM Users u");
The NamedQueries can be used in different DAO without need to define them again, just call and PersistenceProvider will find them using #NamedQueries - #NamedQuery, or xml files when you define them.
Basically a named queries are a powerful tool for organizing query
definition and improving application performance.
Also a good important stuff is that some provider processed the JPQL inside the namedqueries at the startup time, this gives a hit on the performance, in the second case that you set in yout question, persistence provider is not aware of the query existence and does not have the chance to process it on startup and need to run the process when is required.
I saw a query run in a log file on an application. and it contained a query like:
SELECT ID FROM CUST_ATTR49 WHERE 1=0
what is the use of such a query that is bound to return nothing?
A query like this can be used to ping the database. The clause:
WHERE 1=0
Ensures that non data is sent back, so no CPU charge, no Network traffic or other resource consumption.
A query like that can test for:
server availability
CUST_ATTR49 table existence
ID column existence
Keeping a connection alive
Cause a trigger to fire without changing any rows (with the where clause, but not in a select query)
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
This may be also used to extract the table schema from a table without extracting any data inside that table. As Andrea Colleoni said those will be the other benefits of using this.
A usecase I can think of: you have a filter form where you don't want to have any search results. If you specify some filter, they get added to the where clause.
Or it's usually used if you have to create a sql query by hand. E.g. you don't want to check whether the where clause is empty or not..and you can just add stuff like this:
where := "WHERE 0=1"
if X then where := where + " OR ... "
if Y then where := where + " OR ... "
(if you connect the clauses with OR you need 0=1, if you have AND you have 1=1)
As an answer - but also as further clarification to what #AndreaColleoni already mentioned:
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
Purpose as an on/off switch
I am using this as a switch (on/off) statement for portions of my Query.
If I were to use
WHERE 1=1
AND (0=? OR first_name = ?)
AND (0=? OR last_name = ?)
Then I can use the first bind variable (?) to turn on or off the first_name search criterium. , and the third bind variable (?) to turn on or off the last_name criterium.
I have also added a literal 1=1 just for esthetics so the text of the query aligns nicely.
For just those two criteria, it does not appear that helpful, as one might thing it is just easier to do the same by dynamically building your WHERE condition by either putting only first_name or last_name, or both, or none. So your code will have to dynamically build 4 versions of the same query. Imagine what would happen if you have 10 different criteria to consider, then how many combinations of the same query will you have to manage then?
Compile Time Optimization
I also might add that adding in the 0=? as a bind variable switch will not work very well if all your criteria are indexed. The run time optimizer that will select appropriate indexes and execution plans, might just not see the cost benefit of using the index in those slightly more complex predicates. Hence I usally advice, to inject the 0 / 1 explicitly into your query (string concatenating it in in your sql, or doing some search/replace). Doing so will give the compiler the chance to optimize out redundant statements, and give the Runtime Executer a much simpler query to look at.
(0=1 OR cond = ?) --> (cond = ?)
(0=0 OR cond = ?) --> Always True (ignore predicate)
In the second statement above the compiler knows that it never has to even consider the second part of the condition (cond = ?), and it will simply remove the entire predicate. If it were a bind variable, the compiler could never have accomplished this.
Because you are simply, and forcedly, injecting a 0/1, there is zero chance of SQL injections.
In my SQL's, as one approach, I typically place my sql injection points as ${literal_name}, and I then simply search/replace using a regex any ${...} occurrence with the appropriate literal, before I even let the compiler have a stab at it. This basically leads to a query stored as follows:
WHERE 1=1
AND (0=${cond1_enabled} OR cond1 = ?)
AND (0=${cond2_enabled} OR cond2 = ?)
Looks good, easily understood, the compiler handles it well, and the Runtime Cost Based Optimizer understands it better and will have a higher likelihood of selecting the right index.
I take special care in what I inject. Prime way for passing variables is and remains bind variables for all the obvious reasons.
This is very good in metadata fetching and makes thing generic.
Many DBs have optimizer so they will not actually execute it but its still a valid SQL statement and should execute on all DBs.
This will not fetch any result, but you know column names are valid, data types etc. If it does not execute you know something is wrong with DB(not up etc.)
So many generic programs execute this dummy statement for testing and fetching metadata.
Some systems use scripts and can dynamically set selected records to be hidden from a full list; so a false condition needs to be passed to the SQL. For example, three records out of 500 may be marked as Privacy for medical reasons and should not be visible to everyone. A dynamic query will control the 500 records are visible to those in HR, while 497 are visible to managers. A value would be passed to the SQL clause that is conditionally set, i.e. ' WHERE 1=1 ' or ' WHERE 1=0 ', depending who is logged into the system.
quoted from Greg
If the list of conditions is not known at compile time and is instead
built at run time, you don't have to worry about whether you have one
or more than one condition. You can generate them all like:
and
and concatenate them all together. With the 1=1 at the start, the
initial and has something to associate with.
I've never seen this used for any kind of injection protection, as you
say it doesn't seem like it would help much. I have seen it used as an
implementation convenience. The SQL query engine will end up ignoring
the 1=1 so it should have no performance impact.
Why would someone use WHERE 1=1 AND <conditions> in a SQL clause?
If the user intends to only append records, then the fastest method is open the recordset without returning any existing records.
It can be useful when only table metadata is desired in an application. For example, if you are writing a JDBC application and want to get the column display size of columns in the table.
Pasting a code snippet here
String query = "SELECT * from <Table_name> where 1=0";
PreparedStatement stmt = connection.prepareStatement(query);
ResultSet rs = stmt.executeQuery();
ResultSetMetaData rsMD = rs.getMetaData();
int columnCount = rsMD.getColumnCount();
for(int i=0;i<columnCount;i++) {
System.out.println("Column display size is: " + rsMD.getColumnDisplaySize(i+1));
}
Here having a query like "select * from table" can cause performance issues if you are dealing with huge data because it will try to fetch all the records from the table. Instead if you provide a query like "select * from table where 1=0" then it will fetch only table metadata and not the records so it will be efficient.
Per user milso in another thread, another purpose for "WHERE 1=0":
CREATE TABLE New_table_name as select * FROM Old_table_name WHERE 1 =
2;
this will create a new table with same schema as old table. (Very
handy if you want to load some data for compares)
An example of using a where condition of 1=0 is found in the Northwind 2007 database. On the main page the New Customer Order and New Purchase Order command buttons use embedded macros with the Where Condition set to 1=0. This opens the form with a filter that forces the sub-form to display only records related to the parent form. This can be verified by opening either of those forms from the tree without using the macro. When opened this way all records are displayed by the sub-form.
In ActiveRecord ORM, part of RubyOnRails:
Post.where(category_id: []).to_sql
# => SELECT * FROM posts WHERE 1=0
This is presumably because the following is invalid (at least in Postgres):
select id FROM bookings WHERE office_id IN ()
It seems like, that someone is trying to hack your database. It looks like someone tried mysql injection. You can read more about it here: Mysql Injection
I have 150+ SQL queries in separate text files that I need to analyze (just the actual SQL code, not the data results) in order to identify all column names and table names used. Preferably with the number of times each column and table makes an appearance. Writing a brand new SQL parsing program is trickier than is seems, with nested SELECT statements and the like.
There has to be a program, or code out there that does this (or something close to this), but I have not found it.
I actually ended up using a tool called
SQL Pretty Printer. You can purchase a desktop version, but I just used the free online application. Just copy the query into the text box, set the Output to "List DB Object" and click the Format SQL button.
It work great using around 150 different (and complex) SQL queries.
How about using the Execution Plan report in MS SQLServer? You can save this to an xml file which can then be parsed.
You may want to looking to something like this:
JSqlParser
which uses JavaCC to parse and return the query string as an object graph. I've never used it, so I can't vouch for its quality.
If you're application needs to do it, and has access to a database that has the tables etc, you could run something like:
SELECT TOP 0 * FROM MY_TABLE
Using ADO.NET. This would give you a DataTable instance for which you could query the columns and their attributes.
Please go with antlr... Write a grammar n follow the steps..which is given in antlr site..eventually you will get AST(abstract syntax tree). For the given query... we can traverse through this and bring all table ,column which is present in the query..
In DB2 you can append your query with something such as the following, but 1 is the minimum you can specify; it will throw an error if you try to specify 0:
FETCH FIRST 1 ROW ONLY