Fuzzy facts in clips - fuzzy-logic

I made a fuzzy template that will represent a student's knowledge in a certain domain. Problem is that upon declaring a student John as low, he will also be declared as med because low students are also med between 30 and 40.
How can I declare a student as being low, without being med?
Note that I know I can do something like (student (name John) (knowledge (20 0) (21 1) (22 0))) , but what if I want to declare him using the fuzzy value?
(deftemplate fz-knowledge
0 100
( (low (20 1) (40 0))
(med (30 0) (50 1) (70 0))
(high (60 0) (80 1))
))
(deftemplate student
(slot name)
(slot knowledge (type FUZZY-VALUE fz-knowledge))
)
(deffacts students
(student (name John) (knowledge low) )
)

By having overlapping ranges it kind of makes sense. But maybe overlapping isn't what you want to do. What about:
(deftemplate fz-knowledge
0 100
( (low (20 1) (40 0))
(high (60 0) (80 1))
(med NOT [ low OR high ] )
))
That way, you can clearly tell when a score is low, or high, and the loosy-goosy med just fills in the cracks.

Related

Count(*) -1 in the denominator - sql

Can someone please help me with the count(*)-1 in the denominator here - why is -1 needed in the query below
Q: the query helps in finding average days between orders for each customer
A: select CustomerID
, cast(DATEDIFF(dd, min(OrderDate), max(OrderDate)) as decimal) / (count() - 1) as [Avg_day]
from Orders
group by CustomerID
having count() > 1
Consider a sequence of times such as:
A........B........C........D
You want to find the average time between two events. Well, this is defined as:
( (B - A) + (C - B) + (D - C) ) / 3
You can expand this out:
B/3 - A/3 + C/3 - B/3 + D/3 - C/3
Notice that Bs and Cs cancel out, so you are left with:
-A/3 + D/3
which is
(D - A) / 3
That is your original expression. The 3 is one less than the number of points you started with.
This generalizes to any number of events. The divisor is one less than the total number of events (really, the number of adjacent pairs).

How to calculate defect severity index?

Lets say me defect severity levels are 4 (Critical), 3 (Serious), 2 (Medium), 1 (Low). The total number of defects is 4.
We can use the following steps
1)We assign a number to each severity as : Blocker=9, Critical=8, Major=3, Minor=2 , Trivial=1
2)Then we multiply the number of issues in each category by the assigned number as:(Num of Blocker * 9) + (Number of Critical issue * 8)
3)Then we divide by the Total issue count
ex: ( (Num of Blocker * 9) + (Number of Critical issue * 8) + (Number of Major issue *3) + (Minor issue count * 2) + (Trivial issue count * 1) ) /Total issue count
#hirosht Defect Severity Index provides a measurement of the quality of a product under test. So in multiple test iterations if we can identify that the DSI drop, that may indicate that the quality of the product/feature is increasing. However, having said that, the numbers may mislead us and we should not take this as an indication of increasing quality as we need to also take into consideration the number of defects logged per iteration and the severity of the defects identified in each cycle and make our decision.

Is a better solution like a single set operation possible

I can't think of a single T-SQL operation through the following problem can be solved. I can think only of record by record operation to solve the problem.
The problem is as follows:
For each village a number of shops are assigned ( from 1 to n).
Same shop can serve more than one village.
Each shop has different maximum capacity (that is given in a table)
Need to assign all members of a family (based on family id) to same shop in such a way that `'nearly' equal families are assigned to each FPS. As the number of families may not be equally divisible FPS number a few shops may get one additional Family. While assigning last family if the FPS max capacity exceeds by a few member that is acceptable. This however would not happen if last family has just one member.
Some families may remain unassigned if FPS max capacity exceeds for all FPS assigned to that village.
Available tables
Population: Uniqid, Familyid, name, shopcode, villagecode
Village: VillageId
Shop: ShopId, Name, MaxCapacity
VillageShopMap: VillageId, ShopId
My solution is as follows
Take each village
Get one Family for that village
Get a shop with minimum number of person allotted for that village , whose current capacity < max Capacity
Continue until that population from that village is exhausted, or Shop MaxCapacity is reached (in that case some people remain unassigned to shops, that is acceptable)
Loop
My solution is extremely slow. Looking for a better solution.
Thanks
Not much but could use this to fill a shop in one pass
In this case 20 is the shop capacity
The top 20 is just to not evaluate more than needed - a family will have at least one
This could leave some shops empty
You could scale capacity to a fraction of the actual capacity
with famA as
( select top 20 sParID as ID, count(*) as famSize
from docSVsys
group by sParID
)
, fam as
( select famA.*, ROW_NUMBER() over (order by ID) as rn
from famA
)
, famCum as
( select fam.ID, famSize, fam.rn,
(select sum(f.famSize) from fam f where f.rn <= fam.rn) as cum
from fam
)
select famCum.*
from famCum
where famCum.rn <= (select max(f.rn) from famCum f where f.cum <= 20) + 1
order by famCum.rn
Repeating shopcode and village code in Population is not 3NF
Should have a Family table and I would denormalize and put a famsize in the table so you are not calculating size over and over.
Or assume you have the above Family table and a ShopView with CurCapacity
Can assign a one family to all open shops in one pass
with ShopOne as
( select ShopId, min(VillageID) as VillageID
from ShopView
where CurCapacity < Max Capacity
)
, FamilyRn as
( select Family.*, row_number (over VillageID order by ID) as rn
from Family where ShopID is null
)
select Family.*, ShopOne.*
from ShopOne
join FamilyRn
on ShopOne.VillageID = Famility.VillageID
and FamilyRn = 1

How to group results from a query based on one-to-many association by some criterion in the "many"?

Please forgive the awkward title. I had a hard time distilling my question into one phrase. If anyone can come up with a better one, feel free.
I have the following simplified schema:
vendors
INT id
locations
INT id
INT vendor_id
FLOAT latitude
FLOAT longitude
I am perfectly capable of return a list of the nearest vendors, sorted by proximity, limited by an approximation of radius:
SELECT * FROM locations
WHERE latitude IS NOT NULL AND longitude IS NOT NULL
AND ABS(latitude - 30) + ABS(longitude - 30) < 50
ORDER BY ABS(latitude - 30) + ABS(longitude - 30) ASC
I can't at this moment find my way around the repetition of the order/limit term. I initially attempted aliasing it as "distance" among the SELECT fields, but psql told me that this alias wasn't available in the WHERE clause. Fine. If there's some fancy pants way around this, I'm all ears, but on to my main question:
What I'd like to do is to return a list of vendors, each joined with the closest of its locations, and have this list ordered by proximity and limited by radius.
So supposing I have 2 vendors, each with two locations. I want a query that limits the radius such that only one of the four locations is within it to return that location's associated vendor alongside the vendor itself. If the radius encompassed all the locations, I'd want vendor 1 presented with the closest between its locations and vendor 2 with the closest between its locations, ultimately ordering vendors 1 and 2 based on the proximity of their closest location.
In MySQL, I managed to get the closest location in each vendor's row by using GROUP BY and then MIN(distance). But PostgreSQL seems to be stricter on the usage of GROUP BY.
I'd like to, if possible, avoid meddling with the SELECT clause. I'd also like to, if possible reuse the WHERE and ORDER parts of the above query. But these are by no means absolute requirements.
I have made hackneyed attempts at DISTINCT ON and GROUP BY, but these gave me a fair bit of trouble, mostly in terms of me missing mirrored statements elsewhere, which I won't elaborate in great detail on now.
Solution
I ended up adopting a solution based off OMG Ponies' excellent answer.
SELECT vendors.* FROM (
SELECT locations.*,
ABS(locations.latitude - 2.1) + ABS(locations.longitude - 2.1) AS distance,
ROW_NUMBER() OVER(PARTITION BY locations.locatable_id, locations.locatable_type
ORDER BY ABS(locations.latitude - 2.1) + ABS(locations.longitude - 2.1) ASC) AS rank
FROM locations
WHERE locations.latitude IS NOT NULL
AND locations.longitude IS NOT NULL
AND locations.locatable_type = 'Vendor'
) ranked_locations
INNER JOIN vendors ON vendors.id = ranked_locations.locatable_id
WHERE (ranked_locations.rank = 1)
AND (ranked_locations.distance <= 0.5)
ORDER BY ranked_locations.distance;
Some deviations from OMG Ponies' solution:
Locations are now polymorphically associated via _type. A bit of a premise change.
I moved the join outside the subquery. I don't know if there are performance implications, but it made sense in my mind to see the subquery as a getting of locations and partitioned rankings and then the larger query as an act of bringing it all together.
minor Took away table name aliasing. Although I'm plenty used to aliasing, it just made it harder for me to follow along. I'll wait until I'm more experienced with PostgreSQL before working in that flair.
For PostgreSQL 8.4+, you can use analytics like ROW_NUMBER:
SELECT x.*
FROM (SELECT v.*,
t.*,
ABS(t.latitude - 30) + ABS(t.longitude - 30) AS distance,
ROW_NUMBER() OVER(PARTITION BY v.id
ORDER BY ABS(t.latitude - 30) + ABS(t.longitude - 30)) AS rank
FROM VENDORS v
JOIN LOCATIONS t ON t.vendor_id = v.id
WHERE t.latitude IS NOT NULL
AND t.longitude IS NOT NULL) x
WHERE x.rank = 1
AND x.distance < 50
ORDER BY x.distance
I left the filtering on distance, in case the top ranked value was over 50 so the vendor would not appear. Remove the distance check being less than 50 portion if you don't want this to happen.
ROW_NUMBER will return a distinct sequential value that resets for every vendor in this example. If you want duplicates, you'd need to look at using DENSE_RANK.
See this article for emulating ROW_NUMBER on PostgreSQL pre-8.4.
MySQL extends GROUP BY and not all columns are required to be aggregates. http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html
I have seen many questions here with the same issue. The trick is to get the nececssary columns in a subquery and then self join it in the outer query:
create temp table locations (id int, vender_id int, latitude int, longitude int);
CREATE TABLE
insert into locations values
(1, 1, 50, 50),
(2, 1, 35, 30),
(3, 2, 5, 30)
;
SELECT
locations.*, distance
FROM
(
SELECT
vender_id,
MIN(ABS(latitude - 30) + ABS(longitude - 30)) as distance
FROM locations
WHERE latitude IS NOT NULL AND longitude IS NOT NULL
GROUP BY vender_id
) AS min_locations
JOIN locations ON
ABS(latitude - 30) + ABS(longitude - 30) = distance
AND min_locations.vender_id = locations.vender_id
WHERE distance < 50
ORDER BY distance
;
id | vender_id | latitude | longitude | distance
----+-----------+----------+-----------+----------
2 | 1 | 35 | 30 | 5
3 | 2 | 5 | 30 | 25

Does this mySQL "spatial" query work in SQL Server 2008 as well?

Before I embark on a a pretty decent overhaul of my web app to use a spatial query, I'd like to know if this MySQL query works in SQL Server 2008:
SELECT id, ( 3959 * acos( cos( radians(37) ) * cos( radians( lat ) ) *
cos( radians( lng ) - radians(-122) ) + sin( radians(37) ) *
sin( radians( lat ) ) ) ) AS distance
FROM markers HAVING distance < 25
ORDER BY distance LIMIT 0 , 20;
Or is there a better way to do this in SQL Server 2008?
My database currently stores that lat/long of businesses near military bases in Japan. However, I'm querying the table to find businesses that contain the specified bases' id.
Biz table
----------------------
PK BizId bigint (auto increment)
Name
Address
Lat
Long
**FK BaseId int (from the MilBase table)**
A spatial query, based on having a center lat/long and given radius (in km) would be a better fit for the app and would open up some new possibilities.
Any help is greatly appreciated!
It looks like you're selecting the distance between two points. In SQL Server 2008, you can use the STDistance method of the geography data type. This will look something like this:
SELECT TOP 20
geography::STGeomFromText('POINT(-122.0 37.0)', 4326).STDistance(p)
FROM markers
WHERE geography::STGeomFromText('POINT(-122.0 37.0)', 4326).STDistance(p) < 25
ORDER BY geography::STGeomFromText('POINT(-122.0 37.0)', 4326).STDistance(p);
Where p would be a field of type geography instead of two separate decimal fields. You may probably also want to create a spatial index on your p field for better performance.
To use the geography data type, simply specify your field as geography in your CREATE TABLE:
CREATE TABLE markers (
id int IDENTITY (1,1),
p geography,
title varchar(100)
);
Inserting values into your markers table will now look like this:
INSERT INTO markers (id, p, title)
VALUES (
1,
geography::STGeomFromText('POINT(-122.0 37.0)', 4326),
'My Marker'
);
Where -122.0 is the longitude, and 37.0 is the latitude.
Creating a spatial index would look something like this:
CREATE SPATIAL INDEX ix_sp_markers
ON markers(p)
USING GEOGRAPHY_GRID
WITH ( GRIDS = (HIGH, HIGH, HIGH, HIGH),
CELLS_PER_OBJECT = 2,
PAD_INDEX = ON);
If you are only interested in retrieving points within 25 miles, then there is absolutely no need to use spherical or great circle math in the distance calculations... More than sufficient would be to just use the standard cartesian distance formula...
Where Square(Delta-X) + Square(Delta-Y) < 225
All you need to do is convert the difference in Latitudes and the difference in longitudes to mileages in whatever units you are using (statue miles naultical miles, whatever)
If u r using nautical miles each degree of latitude = 60 nm...
And each degree of Longitude is equal to 60 * cos(Latitude) nm
Here if both points are within 25 miles of one another, you don;t even need to worry about the difference between this factor from one point to the other...