Most efficient way to query by nearest float value?

Most efficient way to query by nearest float value? - sql

I have the following model:
class PlatformVersion(models.Model):
platform = models.ForeignKey(Platform)
version = models.FloatField(db_index=True)
display_version = models.CharField(max_length=32, db_index=True)
is_enabled = models.BooleanField(default=False)
Initial DB seeding code:
def add_version(self, platform, version, display_version):
try:
return PlatformVersion.objects.get(platform=platform, version=version)
except PlatformVersion.DoesNotExist:
return PlatformVersion.objects.create(platform=platform, version=version, display_version=display_version,
is_enabled=True)
self.add_version(platform, 9.0, '9.0')
self.add_version(platform, 9.1, '9.1')
self.add_version(platform, 9.2, '9.2')
self.add_version(platform, 9.3, '9.3')
As you can see, I'm adding minor versions only, skipping versions like 9.0.1, 9.0.2, etc.
And I have the following use-case: client sends request with platform version, that could be, for example, 9.0, 9.0.0 or 9.0.1 or 10.0 (missing in DB at all), etc. I need then to query the following PlatformVersion instance from DB:
for 9.0 -> 9.0
for 9.0.0 -> 9.0
for 9.0.1 -> 9.0
for 9.0.999 -> 9.0
for 9.1.1 -> 9.1
for 10.0 (10.* missing in DB) -> closest version = 9.3 for my initial seed data.
I'm sure I can write such code by myself, but there will be a really huge amount of such queries per second, so I care about efficiency.
Can you please suggest best option for performance?

Ok, I've realized that it's better to store major and minor versions as separate integer fields in my model, and filter by their values.
BTW, I think it's also a very bad practice to store versions like "x.x" in float field. If single field storage is required, it should be better to store version in DecimalField because of different float and decimal representaion.

Related

nutch 1.16 crawl example from NutchTutorial returns NoSuchMethodError on org.apache.commons.cli.OptionBuilder (Windows 10)

I have been trying to run a Nutch 1.16 crawler using code example and instructions from https://cwiki.apache.org/confluence/display/NUTCH/NutchTutorial but no matter what, I seem to get stuck when initiating the actual crawl.
I'm running it through Cygwin64 on a Windows 10 machine, using a binary installation (though I have tried compiling one with the same results). Initially, Nutch would throw an UnsatisfiedLinkError (NativeIO$Windows.access0) which I fixed by adding libraries from several other answers for the same issue. Upon doing so, I could at least start a server, but trying to crawl through nutch itself would return NoSuchMethodError no matter what I did. nutch-site.xml only contains http.agent.name and plugin.includes options, both taken from the same example.
The following is the error message (I also tried to omit seed.txt):
$ bin/nutch inject crawl/crawldb urls/seed.txt
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.cli.OptionBuilder.withArgPattern(Ljava/lang/String;I)Lorg/apache/commons/cli/OptionBuilder;
at org.apache.hadoop.util.GenericOptionsParser.buildGeneralOptions(GenericOptionsParser.java:207)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:370)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:138)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:59)
at org.apache.nutch.crawl.Injector.main(Injector.java:534)
The following is the list of libraries currently present in the lib directory:
activation-1.1.jar
amqp-client-5.2.0.jar
animal-sniffer-annotations-1.14.jar
antlr-runtime-3.5.2.jar
antlr4-4.5.1.jar
aopalliance-1.0.jar
apache-nutch-1.16.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
args4j-2.0.16.jar
ascii-utf-themes-0.0.1.jar
asciitable-0.3.2.jar
asm-3.3.1.jar
asm-7.1.jar
avro-1.7.7.jar
bootstrap-3.0.3.jar
cglib-2.2.1-v20090111.jar
cglib-2.2.2.jar
char-translation-0.0.2.jar
checker-compat-qual-2.0.0.jar
closure-compiler-v20130603.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2-sources.jar
commons-cli-1.2.jar
commons-codec-1.11.jar
commons-collections-3.2.2.jar
commons-collections4-4.2.jar
commons-compress-1.18.jar
commons-configuration-1.6.jar
commons-daemon-1.0.13.jar
commons-digester-1.8.jar
commons-el-1.0.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-jexl-2.1.1.jar
commons-lang-2.6.jar
commons-lang3-3.8.1.jar
commons-logging-1.1.3.jar
commons-math3-3.1.1.jar
commons-net-3.1.jar
crawler-commons-1.0.jar
curator-client-2.7.1.jar
curator-framework-2.7.1.jar
curator-recipes-2.7.1.jar
cxf-core-3.3.3.jar
cxf-rt-bindings-soap-3.3.3.jar
cxf-rt-bindings-xml-3.3.3.jar
cxf-rt-databinding-jaxb-3.3.3.jar
cxf-rt-frontend-jaxrs-3.3.3.jar
cxf-rt-frontend-jaxws-3.3.3.jar
cxf-rt-frontend-simple-3.3.3.jar
cxf-rt-security-3.3.3.jar
cxf-rt-transports-http-3.3.3.jar
cxf-rt-transports-http-jetty-3.3.3.jar
cxf-rt-ws-addr-3.3.3.jar
cxf-rt-ws-policy-3.3.3.jar
cxf-rt-wsdl-3.3.3.jar
dom4j-1.6.1.jar
ehcache-3.3.1.jar
elasticsearch-0.90.1.jar
error_prone_annotations-2.1.3.jar
FastInfoset-1.2.16.jar
geronimo-jcache_1.0_spec-1.0-alpha-1.jar
gora-hbase-0.3.jar
gson-2.2.4.jar
guava-25.0-jre.jar
guice-3.0.jar
guice-servlet-3.0.jar
h2-1.4.197.jar
hadoop-0.20.0-ant.jar
hadoop-0.20.0-core.jar
hadoop-0.20.0-examples.jar
hadoop-0.20.0-test.jar
hadoop-0.20.0-tools.jar
hadoop-annotations-2.9.2.jar
hadoop-auth-2.9.2.jar
hadoop-common-2.9.2.jar
hadoop-core-1.2.1.jar
hadoop-core_0.20.0.xml
hadoop-core_0.21.0.xml
hadoop-core_0.22.0.xml
hadoop-hdfs-2.9.2.jar
hadoop-hdfs-client-2.9.2.jar
hadoop-mapreduce-client-common-2.2.0.jar
hadoop-mapreduce-client-common-2.9.2.jar
hadoop-mapreduce-client-core-2.2.0.jar
hadoop-mapreduce-client-core-2.9.2.jar
hadoop-mapreduce-client-jobclient-2.2.0.jar
hadoop-mapreduce-client-jobclient-2.9.2.jar
hadoop-mapreduce-client-shuffle-2.2.0.jar
hadoop-mapreduce-client-shuffle-2.9.2.jar
hadoop-yarn-api-2.9.2.jar
hadoop-yarn-client-2.9.2.jar
hadoop-yarn-common-2.9.2.jar
hadoop-yarn-registry-2.9.2.jar
hadoop-yarn-server-common-2.9.2.jar
hadoop-yarn-server-nodemanager-2.9.2.jar
hbase-0.90.0-tests.jar
hbase-0.90.0.jar
hbase-0.92.1.jar
hbase-client-0.98.0-hadoop2.jar
hbase-common-0.98.0-hadoop2.jar
hbase-protocol-0.98.0-hadoop2.jar
HikariCP-java7-2.4.12.jar
htmlparser-1.6.jar
htrace-core-2.04.jar
htrace-core4-4.1.0-incubating.jar
httpclient-4.5.6.jar
httpcore-4.4.9.jar
httpcore-nio-4.4.9.jar
icu4j-61.1.jar
istack-commons-runtime-3.0.8.jar
j2objc-annotations-1.1.jar
jackson-annotations-2.9.9.jar
jackson-core-2.9.9.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.9.9.jar
jackson-dataformat-cbor-2.9.9.jar
jackson-jaxrs-1.9.13.jar
jackson-jaxrs-base-2.9.9.jar
jackson-jaxrs-json-provider-2.9.9.jar
jackson-mapper-asl-1.9.13.jar
jackson-module-jaxb-annotations-2.9.9.jar
jackson-xc-1.9.13.jar
jakarta.activation-api-1.2.1.jar
jakarta.ws.rs-api-2.1.5.jar
jakarta.xml.bind-api-2.3.2.jar
jasper-compiler-5.5.12.jar
jasper-runtime-5.5.12.jar
java-xmlbuilder-0.4.jar
javassist-3.12.1.GA.jar
javax.annotation-api-1.3.2.jar
javax.inject-1.jar
javax.persistence-2.2.0.jar
javax.servlet-api-3.1.0.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jaxb-runtime-2.3.2.jar
jcip-annotations-1.0-1.jar
jersey-client-1.19.4.jar
jersey-core-1.9.jar
jersey-guice-1.9.jar
jersey-json-1.9.jar
jersey-server-1.9.jar
jets3t-0.9.0.jar
jettison-1.1.jar
jetty-6.1.26.jar
jetty-client-6.1.22.jar
jetty-continuation-9.4.19.v20190610.jar
jetty-http-9.4.19.v20190610.jar
jetty-io-9.4.19.v20190610.jar
jetty-security-9.4.19.v20190610.jar
jetty-server-9.4.19.v20190610.jar
jetty-sslengine-6.1.26.jar
jetty-util-6.1.26.jar
jetty-util-9.4.19.v20190610.jar
joda-time-2.3.jar
jquery-2.0.3-1.jar
jquery-selectors-0.0.3.jar
jquery-ui-1.10.2-1.jar
jquerypp-1.0.1.jar
jsch-0.1.54.jar
json-smart-1.3.1.jar
jsp-2.1-6.1.14.jar
jsp-api-2.1-6.1.14.jar
jsp-api-2.1.jar
jsr305-3.0.0.jar
junit-3.8.1.jar
juniversalchardet-1.0.3.jar
leveldbjni-all-1.8.jar
log4j-1.2.17.jar
lucene-analyzers-common-4.3.0.jar
lucene-codecs-4.3.0.jar
lucene-core-4.3.0.jar
lucene-grouping-4.3.0.jar
lucene-highlighter-4.3.0.jar
lucene-join-4.3.0.jar
lucene-memory-4.3.0.jar
lucene-queries-4.3.0.jar
lucene-queryparser-4.3.0.jar
lucene-sandbox-4.3.0.jar
lucene-spatial-4.3.0.jar
lucene-suggest-4.3.0.jar
maven-parent-config-0.3.4.jar
metrics-core-3.0.1.jar
modernizr-2.6.2-1.jar
mssql-jdbc-6.2.1.jre7.jar
neethi-3.1.1.jar
netty-3.6.2.Final.jar
netty-all-4.0.23.Final.jar
nimbus-jose-jwt-4.41.1.jar
okhttp-2.7.5.jar
okio-1.6.0.jar
org.apache.commons.cli-1.2.0.jar
ormlite-core-5.1.jar
ormlite-jdbc-5.1.jar
oro-2.0.8.jar
paranamer-2.3.jar
protobuf-java-2.5.0.jar
reflections-0.9.8.jar
servlet-api-2.5-20081211.jar
servlet-api-2.5.jar
skb-interfaces-0.0.1.jar
slf4j-api-1.7.26.jar
slf4j-log4j12-1.7.25.jar
snappy-java-1.0.5.jar
spatial4j-0.3.jar
spring-aop-4.0.9.RELEASE.jar
spring-beans-4.0.9.RELEASE.jar
spring-context-4.0.9.RELEASE.jar
spring-core-4.0.9.RELEASE.jar
spring-expression-4.0.9.RELEASE.jar
spring-web-4.0.9.RELEASE.jar
ST4-4.0.8.jar
stax-api-1.0-2.jar
stax-ex-1.8.1.jar
stax2-api-3.1.4.jar
t-digest-3.2.jar
tika-core-1.22.jar
txw2-2.3.2.jar
typeaheadjs-0.9.3.jar
warc-hadoop-0.1.0.jar
webarchive-commons-1.1.5.jar
wicket-bootstrap-core-0.9.2.jar
wicket-bootstrap-extensions-0.9.2.jar
wicket-core-6.17.0.jar
wicket-extensions-6.13.0.jar
wicket-ioc-6.17.0.jar
wicket-request-6.17.0.jar
wicket-spring-6.17.0.jar
wicket-util-6.17.0.jar
wicket-webjars-0.4.0.jar
woodstox-core-5.0.3.jar
wsdl4j-1.6.3.jar
xercesImpl-2.12.0.jar
xml-apis-1.4.01.jar
xml-resolver-1.2.jar
xmlenc-0.52.jar
xmlParserAPIs-2.6.2.jar
xmlschema-core-2.2.4.jar
zookeeper-3.4.6.jar
This is my java version:
java version "1.8.0_241"
Java(TM) SE Runtime Environment (build 1.8.0_241-b07)
Java HotSpot(TM) 64-Bit Server VM (build 25.241-b07, mixed mode)
I'd also like to point out that, despite what another answer may have said, nutch 1.4 (or any other version of nutch for that matter) did NOT resolve the issue, at least on Windows.

EDIT: The following answer worked for me, but I left the original one because it may still be useful to someone working with other versions of nutch.
Again, thanks to Sebastian Nagel, in order to get around the NoSuchMethodError, just edit ivy\ivy.xml to reference a different version of hadoop libraries, in my case I installed hadoop 3.1.3 and I also added the corresponding 3.1.3 versions of winutils.exe and hadoop.dll to the hadoop\bin directory referenced by HADOOP_HOME. Running bin/crawl and it seems to be working correctly.
Outdated answer: Okay, after working on the source code itself (courtesy of https://github.com/apache/commons-cli) under the suggestion of Sebastian Nagel, I was able to find the (very simple) implementation for the method (https://github.com/marcelmaatkamp/EntityExtractorUtils/blob/master/src/main/java/org/apache/commons/cli/OptionBuilder.java):
/**
* The next Option created will have an argument patterns and
* the number of pattern occurances
*
* #param argPattern string representing a pattern regex
* #param limit the number of pattern occurance in the argument
* return the OptionBuilder instance
*/
public static OptionBuilder withArgPattern( String argPattern,
int limit )
{
OptionBuilder.argPattern = argPattern;
OptionBuilder.limit = limit;
Using maven I was then able to compile the code into their own jar files, which I then added in the lib folder for apache nutch.
This still did not completely resolve my problem, as there seem to be deprecated functions being used by the entire nutch framework, which will probably mean even more work under similar circumstances (for instance, right after using the new jar I've been returned a NoSuchMethodError over org.apache.hadoop.mapreduce.Job.getInstance).
I leave this answer here as a temporary solution to anyone who may have also gotten stuck on the same issue, but I surely wish there was an easier way of finding out which methods appear in which jar file before exploring their entire file structure, although it may just be me ignoring it.

Kafka 1.0.0 - Serialized.with() uses default serde instead of the ones provided

We recently updated our kafka version from 0.10 to 1.0 and I am updating the deprecated code
KTable<Long, myClass> myKTable = this.streamBuilder
.stream(Serdes.Long(), mySerde, sub_topic)
.groupByKey(Serdes.Long(), mySerde)
.reduce(myReducer, my_store);
to this
KTable<Long, myClass> myKTable = this.streamBuilder
.stream(sub_topic, Consumed.with(Serdes.Long(), mySerde))
.groupByKey(Serialized.with(Serdes.Long(), mySerde))
.reduce(myReducer, Materialized.as(my_store));
My stream throws an error while serializing in groupByKey. The Serialized.with() does not use the keySerde provided and defaults back to byteArray. And this byteArray serde then encounters my key which is a Long and throws a cast error.
Has anyone else encountered this error in the 1.0.0 version of kafka. The first code with the outdated version of kafka works fine. But updating the code to use Serialized.with() does not seem to work. Any help is greatly appreciated.

Can you share the stack trace? I actually think the issue is with reduce() -- you need to specify the Serdes via Materialized again.
This is kinda regression on the new API and got fixed recently in trunk: https://github.com/apache/kafka/pull/4919 Thus, upcoming 2.0 release will contain the fix.

How does Boost Serialization Versioning work?

Can someone explain to me how the versioning in Boost Serialization works. The archive version is always 10 and the class version 0. I thought that the version is automatically incremented when the archive is different then the last version. Do I have to define the version number by myself if I changed something?
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>
<boost_serialization signature="serialization::archive" version="10">
<EventSet class_id="0" tracking_level="0" version="0">
<Size>1</Size>
<Event>
...
</Event>
</EventSet>
</boost_serialization>

It is not described in Boost documentation but the line boost_serialization signature="serialization::archive" version="10" corresponds to the version of Boost.Archive library, it sometimes changes when new version of Boost becomes available.
As documentation explains, the line EventSet class_id="0" tracking_level="0" version="0" corresponds to the class version. You can change it in your code with a macro 'BOOST_CLASS_VERSION(EventSet, 1)'.

Expanding on the answer by Michael, the boost archive version depends on your boost version and is bumped whenever some meaningful changes to the output format are made. There is a comment explaining these versions in src/basic_archive.cpp of the boost serialization git at https://github.com/boostorg/serialization/blob/develop/src/basic_archive.cpp At the time of writing this reads:
// this should change if the capabilities are added to the library
// such that archives can be created which can't be read by previous
// versions of this library
// 1 - initial version
// 2 - made address tracking optional
// 3 - numerous changes - can't guarantee compatibility with previous versions
// 4 - Boost 1.34
// added item_version to properly support versioning for collections
// 5 - Boost 1.36
// changed serialization of collections: adding version even for primitive
// types caused backwards compatibility breaking change in 1.35
// 6 - Boost 1.41 17 Nov 2009
// serializing collection sizes as std::size_t
// 7 Boost 1.42 2 Feb 2010
// error - changed binary version to 16 bits w/o changing library version #
// That is - binary archives are recorded with #6 even though they are
// different from the previous versions. This means that binary archives
// created with versions 1.42 and 1.43 will have to be fixed with a special
// program which fixes the library version # in the header
// Boost 1.43 6 May 2010
// no change
// 8 - Boost 1.44
// separated version_type into library_version_type and class_version_type
// changed version_type to be stored as 8 bits.
// 10- fixed base64 output/input.
// 11- not changes
// 12- improved serialization of collections
// 13- simplified visibility, removed Borland, removed pfto
// 14- improved visibility, refactor map/set
// 15- corrections to optional and collection loading
// 16- eliminated dependency on <codecvt> which is buggy in some libraries
// and now officially deprecated in the standard
// 17- Boost 1.68 August 2018
// 18- addressed undefined behavior in archive constructors.
// init() called from base wrote archive header before archive
// was fully constructed.

Raw SQL result has it, result of ObjectContext.Translate<T> doesn't

I am able to replicate something I would have bet that is not true :-)
Entity-Framework-5 based application is populating an entity named "dbSetName" from stored procedure result, like this:
DbContext.Database.Initialize(false);
IDbCommand cmd = DbContext.Database.Connection.CreateCommand();
cmd.CommandType = CommandType.StoredProcedure;
cmd.CommandText = sprocName;
DbContext.Database.Connection.Open();
using (DbDataReader reader = (DbDataReader)cmd.ExecuteReader())
{
var tmpresult = ((IObjectContextAdapter)this.DbContext).ObjectContext.Translate<T>(reader, dbSetName, MergeOption.AppendOnly);
The reader is SqlDataReader. DbContext is derived from System.Data.Entity.DbContext - nothing fancy.
The stored procedure output, when I execute in SSMS, spits a single set - the result of the first and only stand-alone SELECT in its body. It consists of 3383 records.
The contents of tmpresult, when enumerated consists of 3383 records, but CONSISTENTLY and without apparent reason, a few records I see in the raw output are replaced with duplicates of records in the raw output.
For example, RAW output looks like this
Row Values
1.1 1.2 1.3
2.1 2.2 2.3
3.1 3.2 3.3
4.1 4.2 4.3
5.1 5.2 5.3
...
But the result in tmpresult is consistently like that:
Row Values
1.1 1.2 1.3
2.1 2.2 2.3
3.1 3.2 3.3
2.1 2.2 2.3
5.1 5.2 5.3
...
I can stop in the debugger, examine and ensure that I am executing the raw against same SQL server/database/sproc that EF uses... Bizarre...
Not sure what additional info to show, but I hope someone could explain this.
ADDITIONAL INFO: Replacing Objectcontext.translate<T> with SqlQuery<T> removes the issue. So do I now debug whatever assembly implements Objectcontext.translate<T>?

Unable to update the schema with cassandra 0.8.2

I have single node cassandra installion with 0.8.2 . I have created some column families with cassandra-cli like
create column family demo;
Now i have to use secondary indexes over this column family.For that i need to upgrade the schema.When i try to upgrade this with cassandra-cli like
update column family demo with comparator=BytesType and column_metadata=[{column_name: col1, validation_class: UTF8Type, index_type: KEYS}];
i get the following error message
org.apache.cassandra.db.marshal.MarshalException: cannot parse 'col1'
I have tried using the bytes('col1') , assume keyword ,ascii and utf8 none of them works.
as hex bytes
While the same thing works perfectly fine with cassandra 0.8.4

You answered your own question. It's a bug in 0.8.2 and you should upgrade. (To the latest 0.8 release, which is 0.8.7 at this time.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Most efficient way to query by nearest float value? - sql

Related

nutch 1.16 crawl example from NutchTutorial returns NoSuchMethodError on org.apache.commons.cli.OptionBuilder (Windows 10)

Kafka 1.0.0 - Serialized.with() uses default serde instead of the ones provided

How does Boost Serialization Versioning work?

Raw SQL result has it, result of ObjectContext.Translate<T> doesn't

Unable to update the schema with cassandra 0.8.2

Categories

Resources