Predicting a result from an imbalanced distribution - data-science

Hi Below is the sample dataset from a quality control test result of electronic parts:
UnderTesting=array([47098, 46729, 45612, 43297, 40085, 36365, 32562, 28947, 25992,
23615, 21475, 19964, 18952, 18138, 17393, 16659, 16117, 15656,
15186, 14715, 14300, 13678, 12344, 11664, 11159, 10669, 10155,
9688, 9066, 8443, 7838, 7121, 6542, 6045, 5535, 5078,
4569, 4205, 3884, 3549, 3276, 3010, 2783, 2576, 2379,
2165, 1940, 1796, 1518, 1377, 1237, 1123, 1044, 982,
933, 886, 836, 777, 718, 678, 635, 603, 571,
546, 509, 473, 448, 416, 398, 379, 362, 338,
319, 310, 296, 286, 273, 260, 219, 199, 188,
181, 172, 168, 164, 156, 146, 142, 139, 137,
134, 129, 125, 122, 120, 108, 100, 97, 94,
91, 88, 85, 84, 82, 77, 75, 71, 67,
66, 65, 63, 63, 63, 62, 58, 57, 54,
53, 52, 50])
DailyFailure = array([11855, 11704, 11257, 10484, 9493, 8428, 7374, 6351, 5423,
4727, 4094, 3619, 3238, 2915, 2627, 2349, 2145, 2009,
1864, 1737, 1621, 1492, 1363, 1279, 1209, 1138, 1065,
997, 922, 864, 821, 778, 734, 703, 654, 606,
561, 529, 501, 465, 436, 394, 361, 340, 323,
302, 290, 275, 267, 249, 233, 220, 212, 203,
199, 186, 181, 173, 167, 164, 162, 158, 152,
148, 137, 130, 127, 121, 116, 111, 109, 105,
99, 98, 95, 89, 86, 82, 81, 77, 72,
70, 67, 67, 66, 64, 60, 60, 60, 59,
59, 56, 55, 54, 54, 46, 43, 42, 41,
40, 40, 40, 40, 39, 36, 36, 35, 32,
32, 32, 32, 32, 32, 32, 30, 30, 29,
29, 29, 28])
Both array are sorted based on Day 0 - Day 119 after the test started and for every day starting after the continius test started parts as the under testing array failing under the stress test as the DailyFailure array.
Now we are trying to create a prediction based on the Percentage of Failure using sample dataset (DailyFailure/UnderTesting)*100
Question 1 : How to find if a day break out required as the data changed behavior but the percentage of the sample are not significant enough to rely on that breakout?
Question 2 : How predict percentage failure based on this imbalanced distribution to avoid biases?

Related

how to plot two cluster using dictionary file

I have a dictionary file saved in .npy file that contain two cluster ids that i want to plot in a scatter plot(for the id values saved under key '0' one cluster and the id values saved under key '1' is another cluster)
My script:
import numpy as np
import matplotlib.pyplot as plt
data=np.load("dict.npy",allow_pickle=True)
print(data)
array({0: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 91,
92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 125,
126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138,
139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151,
152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177,
178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190,
191, 192, 193, 194, 195, 196, 197, 198, 199]), 1: array([ 89, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115,
116, 117, 118, 119, 120, 121, 122, 123, 124])}, dtype=object)
An example as you have request:
#you will need these libraries:
import numpy as np
from sklearn.cluster import KMeans
from matplotlib import pyplot as plt
Then generate some random 2D data, just for this example:
#the data you want to cluster
X = np.random.multivariate_normal(mean=[1,2], cov=[[.5, .25], [.25,.75]], size=1800)
plt.scatter(*X.T, alpha=.25, color="k")
Finally run the clustering and see the result:
X_cluster = KMeans(n_clusters=2).fit_predict(X)
for c in set(X_cluster):
plt.scatter(*X[X_cluster==c].T, alpha=.25)
plt.figure(figsize=(7,7))
for cluster in data:
plt.scatter(X[data[cluster],0], X[data[cluster],1])
plt.show()
where X is the dateset that you have used for the clustering and has shape (N,2) (N is the number of samples)

How can I fix ODOO 9 mismatching shipping address between [/shop/checkout] and [/payment] in Website sale Module?

My problem is that if I select a shipping address in the checkout page of website (between various addresses available for that user) and click CONFIRM, in the next page the shipping address is not what I selected previously.
Seems like odoo is loosing track of the ID of the address I want (with parent_id linked to the user) and set always the same one.
Anybody else has encountered this issue? Why shipping id won't remain 85262?
I tried debugging the code but I can't understand what is causing this problem.
Many thanks
[CHECKOUT] https://github.com/odoo/odoo/blob/9.0/addons/website_sale/controllers/main.py
PARTNER SHIPPING ID [inizio] CONFIRM ORDER >>> :: :: 85261
2019-05-30 09:00:25,262 11905 DEBUG retail openerp.addons.website_sale.controllers.main:
ShippingID TRY:: :: 85262
2019-05-30 09:00:25,266 11905 DEBUG retail openerp.addons.website_sale.controllers.main:
ShippingID (CHECKOUT VALUES):: :: 85262
2019-05-30 09:00:25,266 11905 DEBUG retail openerp.addons.website_sale.controllers.main:
Values AL TERMINE DI CHECKOUT:: :: {'states': res.country.state(1, 10, 9, 12, 11, 13, 14, 15, 17, 16, 18, 19, 20, 24, 21, 22, 23, 25, 26, 27, 42, 41, 28, 43, 44, 46, 45, 29, 36, 37, 30, 32, 33, 34, 2, 3, 31, 35, 38, 39, 40, 47, 4, 48, 5, 49, 50, 6, 51, 52, 53, 55, 7, 54, 8, 56, 58, 57, 59), 'has_check_vat': True, 'only_services': False, 'shipping_id': 85262, 'countries': res.country(3, 16, 6, 63, 12, 1, 9, 5, 10, 4, 11, 7, 15, 14, 13, 17, 33, 24, 20, 19, 37, 21, 38, 26, 28, 34, 30, 31, 18, 36, 35, 32, 106, 29, 23, 22, 25, 117, 48, 39, 53, 124, 41, 216, 47, 49, 55, 40, 50, 119, 43, 42, 46, 51, 98, 52, 54, 56, 57, 60, 59, 61, 62, 225, 64, 66, 211, 88, 68, 65, 70, 73, 75, 72, 71, 76, 80, 217, 77, 85, 79, 58, 81, 82, 89, 84, 78, 87, 92, 91, 83, 86, 93, 94, 99, 96, 238, 97, 95, 100, 109, 105, 101, 108, 107, 102, 104, 103, 110, 45, 112, 114, 111, 113, 125, 115, 118, 123, 116, 126, 135, 127, 132, 131, 136, 129, 133, 134, 148, 144, 142, 156, 158, 155, 145, 153, 143, 150, 151, 154, 248, 157, 74, 139, 138, 147, 140, 152, 137, 159, 146, 160, 169, 168, 166, 8, 170, 161, 172, 165, 162, 164, 171, 163, 149, 121, 167, 173, 179, 186, 184, 174, 177, 187, 175, 178, 182, 180, 176, 185, 183, 188, 189, 190, 192, 193, 27, 200, 120, 128, 141, 181, 210, 239, 246, 205, 194, 206, 191, 196, 204, 199, 212, 203, 201, 195, 207, 250, 90, 122, 209, 69, 130, 197, 208, 202, 214, 198, 44, 213, 229, 220, 230, 219, 218, 221, 224, 227, 223, 226, 222, 215, 228, 232, 231, 2, 233, 235, 236, 234, 237, 244, 240, 243, 241, 242, 245, 67, 247, 249, 252, 251, 253), 'shippings': res.partner(85262,), 'error': {}, 'checkout': {'city': u'citt\xe0 invisibile', 'name': u'aaa-pluto', 'zip': u'17325', 'shipping_name': u'via Pisa', 'shipping_state_id': 15, 'street2': u'via questa', 'shipping_street': u'via pisa', 'country_id': 235, 'shipping_id': 85262, 'phone': u'2', 'shipping_zip': u'17325', 'street': u'strada per fatturazione', 'shipping_country_id': 235, 'state_id': 15, 'email': u'pluto#doglover.com', 'vat': u'', 'shipping_city': u'livorno'}}
2019-05-30 09:00:25,266 11905 DEBUG retail openerp.addons.website_sale.controllers.main:
Values ORDER :: :: {'states': res.country.state(1, 10, 9, 12, 11, 13, 14, 15, 17, 16, 18, 19, 20, 24, 21, 22, 23, 25, 26, 27, 42, 41, 28, 43, 44, 46, 45, 29, 36, 37, 30, 32, 33, 34, 2, 3, 31, 35, 38, 39, 40, 47, 4, 48, 5, 49, 50, 6, 51, 52, 53, 55, 7, 54, 8, 56, 58, 57, 59), 'has_check_vat': True, 'only_services': False, 'shipping_id': 85262, 'countries': res.country(3, 16, 6, 63, 12, 1, 9, 5, 10, 4, 11, 7, 15, 14, 13, 17, 33, 24, 20, 19, 37, 21, 38, 26, 28, 34, 30, 31, 18, 36, 35, 32, 106, 29, 23, 22, 25, 117, 48, 39, 53, 124, 41, 216, 47, 49, 55, 40, 50, 119, 43, 42, 46, 51, 98, 52, 54, 56, 57, 60, 59, 61, 62, 225, 64, 66, 211, 88, 68, 65, 70, 73, 75, 72, 71, 76, 80, 217, 77, 85, 79, 58, 81, 82, 89, 84, 78, 87, 92, 91, 83, 86, 93, 94, 99, 96, 238, 97, 95, 100, 109, 105, 101, 108, 107, 102, 104, 103, 110, 45, 112, 114, 111, 113, 125, 115, 118, 123, 116, 126, 135, 127, 132, 131, 136, 129, 133, 134, 148, 144, 142, 156, 158, 155, 145, 153, 143, 150, 151, 154, 248, 157, 74, 139, 138, 147, 140, 152, 137, 159, 146, 160, 169, 168, 166, 8, 170, 161, 172, 165, 162, 164, 171, 163, 149, 121, 167, 173, 179, 186, 184, 174, 177, 187, 175, 178, 182, 180, 176, 185, 183, 188, 189, 190, 192, 193, 27, 200, 120, 128, 141, 181, 210, 239, 246, 205, 194, 206, 191, 196, 204, 199, 212, 203, 201, 195, 207, 250, 90, 122, 209, 69, 130, 197, 208, 202, 214, 198, 44, 213, 229, 220, 230, 219, 218, 221, 224, 227, 223, 226, 222, 215, 228, 232, 231, 2, 233, 235, 236, 234, 237, 244, 240, 243, 241, 242, 245, 67, 247, 249, 252, 251, 253), 'shippings': res.partner(85262,), 'error': {}, 'checkout': {'city': u'citt\xe0 invisibile', 'name': u'aaa-pluto', 'zip': u'17325', 'shipping_name': u'via Pisa', 'shipping_state_id': 15, 'street2': u'via questa', 'shipping_street': u'via pisa', 'country_id': 235, 'shipping_id': 85262, 'phone': u'2', 'shipping_zip': u'17325', 'street': u'strada per fatturazione', 'shipping_country_id': 235, 'state_id': 15, 'email': u'pluto#doglover.com', 'vat': u'', 'shipping_city': u'livorno'}}
2019-05-30 09:00:26,836 11905 DEBUG retail openerp.addons.website_sale.controllers.main:
PARTNER SHIPPING ID [fine] CONFIRM ORDER >>> :: :: 85261
Add loop "for record_or_somthing in self: " before your condition and replace self in condition by "record_or_somthing".
for exemple :
for record in self:
if record.invoice_count >= 1:
record.client_actif = True
else:
record.client_actif = False
Thanks Saloua.
I solved this way.
As I suspected it never saved the partner shipping id in the DB.
order_obj.write(cr, SUPERUSER_ID, [order.id], order_info, context=context)
# DAVIDE B U G FIX #
# THIS LINE NEED TO BE ADDED AT THE END OF CHECKOUT_FORM_SAVE #
order.write({'partner_shipping_id': checkout.get('shipping_id')})

INSERT INTO with WHERE IN clause

INSERT INTO dbo.ASTMListCustom ([ASTMID], [EDMID], [Distance], [Selected])
SELECT
ASTMID, 'HWG - VT', 1, 1
FROM
dbo.ASTMListCustom
WHERE
ASTMID IN ( 15, 21, 22, 23, 25, 38, 63, 72, 73, 74, 75, 82, 83, 125, 130, 163, 165, 182, 206, 207, 208, 214, 217, 250, 255, 256, 257, 264, 266, 299, 317, 342, 348, 349, 350, 357, 381, 382, 391, 392, 397, 398, 422, 448, 450, 451, 466, 481, 9, 12, 17, 18, 26, 61, 62, 67, 68, 69, 70, 77, 85, 92, 93, 94, 95, 126, 128, 129, 136, 137, 145, 146, 153, 154, 179, 203, 211, 213, 219, 221, 237, 253, 254, 261, 262, 301, 326, 327, 328, 329, 343, 346, 353, 368, 369, 386, 394, 436)
I'm trying to insert one row for each ASTMID but it ends up duplicating the number of rows per ASTMID and the duplicates are the same rows you've inserted.
It appears that your ASTMListCustom table has duplicate rows for the ASTMIDs on your list. You can fix this by adding a GROUP BY to your SELECT:
INSERT INTO dbo.ASTMListCustom
([ASTMID]
,[EDMID]
,[Distance]
,[Selected])
SELECT ASTMID
,'HWG - VT'
,1
,1
FROM dbo.ASTMListCustom
WHERE ASTMID IN ( 15, ... )
GROUP BY ASTMID

LISTAGG function gives me duplicates

Each district # in this query corresponds with more than one state. The LISTAGG is great for concatenating the list of states per each district # , however it is giving me duplicate states. How can I fix this? I should add that the reason why I get duplicates is because the number of retail stores per each district , so I have multiple stores within a number of states within each district. But I'd like to see if there is a way to just get unique states...
SELECT distinct DISTRICT_NBR,
LISTAGG( str_state_abbr, ',')
WITHIN GROUP (order by str_state_abbr) AS States
from DIM_LOCATION
where DISTRICT_NBR in (1, 3, 4, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18, 20, 21, 22, 23, 25, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 50, 52, 53, 54, 55, 56, 57, 58, 59, 100, 101, 102, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 122, 123, 124, 125, 126, 127, 128, 130, 131, 134, 135, 136, 140, 143, 152, 153, 154, 155, 156, 157, 158, 159, 160, 163, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 184, 185, 186, 188, 189, 190, 191, 193, 194, 195, 196, 198, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 221, 222, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 249, 250, 251, 252, 253, 254, 255, 256, 258, 259, 260, 261, 263, 266, 267, 268, 270, 271, 274, 275, 276, 277, 282, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 297, 300, 302, 304, 305, 306, 307, 308, 310, 311, 313, 315, 316, 317, 318, 319, 324, 325, 326, 327, 328, 330, 351, 352, 354, 355, 358, 359, 362, 364, 365, 366, 367, 369, 370, 371, 372, 373
)
and STR_STATE_ABBR is not null
group by DISTRICT_NBR
order by DISTRICT_NBR
select district_nbr,
listagg( str_state_abbr, ',')
within group (order by str_state_abbr) as states
from (select distinct district_nbr, str_state_abbr from dim_location
where district_nbr in (1, 3, 4, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18, 20, 21, 22, 23, 25, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 50, 52, 53, 54, 55, 56, 57, 58, 59, 100, 101, 102, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 122, 123, 124, 125, 126, 127, 128, 130, 131, 134, 135, 136, 140, 143, 152, 153, 154, 155, 156, 157, 158, 159, 160, 163, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 184, 185, 186, 188, 189, 190, 191, 193, 194, 195, 196, 198, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 221, 222, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 249, 250, 251, 252, 253, 254, 255, 256, 258, 259, 260, 261, 263, 266, 267, 268, 270, 271, 274, 275, 276, 277, 282, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 297, 300, 302, 304, 305, 306, 307, 308, 310, 311, 313, 315, 316, 317, 318, 319, 324, 325, 326, 327, 328, 330, 351, 352, 354, 355, 358, 359, 362, 364, 365, 366, 367, 369, 370, 371, 372, 373)
and str_state_abbr is not null
)
group by district_nbr
order by district_nbr;

The query processor ran out of internal resources exception

I have the following query which runs in Sql Server 2008, it works well if data is small but when it is huge i am getting the exception. Is there any way i can optimize the query
select
distinct olu.ID
from olu_1 (nolock) olu
join mystat (nolock) s
on s.stat_int = olu.stat_int
cross apply
dbo.GetFeeds
(
s.stat_id,
olu.cha_int,
olu.odr_int,
olu.odr_line_id,
olu.ID
) channels
join event_details (nolock) fed
on fed.air_date = olu.intended_air_date
and fed.cha_int = channels.cha_int
and fed.break_code_int = olu.break_code_int
join formats (nolock) fmt
on fed.format_int = fmt.format_int
where
fed.cha_int in (125, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 35, 36, 37, 38, 39, 40, 41, 43, 117, 45, 42, 44, 47, 49, 50, 51, 46, 52, 53, 54, 55, 56, 48, 59, 60, 57, 63, 58, 62, 64, 66, 69, 68, 67, 65, 70, 73, 71, 74, 72, 75, 76, 77, 78, 79, 82, 80, 159, 160, 161, 81, 83, 84, 85, 88, 87, 86, 89, 90, 61, 91, 92, 93, 95, 96, 97, 98, 99, 100, 94, 155, 156, 157, 158, 103, 104, 102, 101, 105, 106, 107, 108, 109, 110, 119, 111, 167, 168, 169, 112, 113, 114, 115, 116, 170, 118, 120, 121, 122, 123, 127, 162, 163, 164, 165, 166, 128, 129, 130, 124, 133, 131, 132, 126, 134, 136, 135, 137, 171, 138, 172, 173, 174) and
fed.air_date between '5/27/2013 12:00:00 AM' and '6/2/2013 12:00:00 AM' and
fmt.cha_int in (125, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 35, 36, 37, 38, 39, 40, 41, 43, 117, 45, 42, 44, 47, 49, 50, 51, 46, 52, 53, 54, 55, 56, 48, 59, 60, 57, 63, 58, 62, 64, 66, 69, 68, 67, 65, 70, 73, 71, 74, 72, 75, 76, 77, 78, 79, 82, 80, 159, 160, 161, 81, 83, 84, 85, 88, 87, 86, 89, 90, 61, 91, 92, 93, 95, 96, 97, 98, 99, 100, 94, 155, 156, 157, 158, 103, 104, 102, 101, 105, 106, 107, 108, 109, 110, 119, 111, 167, 168, 169, 112, 113, 114, 115, 116, 170, 118, 120, 121, 122, 123, 127, 162, 163, 164, 165, 166, 128, 129, 130, 124, 133, 131, 132, 126, 134, 136, 135, 137, 171, 138, 172, 173, 174) and
fmt.air_date between '5/27/2013 12:00:00 AM' and '6/2/2013 12:00:00 AM'
From IN (Transact-SQL)
Including an extremely large number of values (many thousands) in an
IN clause can consume resources and return errors 8623 or 8632. To
work around this problem, store the items in the IN list in a table.
So I would recomend inserting the values into a temp table and then either joining onto that table or selecting the IN from the table
So something like
DECLARE #TABLE TABLE(
val INT
)
INSERT INTO #TABLE VALUES(1),(2),(3),(4),(5)
SELECT *
FROM MyTable
WHERE ID IN (SELECT val FROM #TABLE)
SQL Fiddle DEMO
Do a backup from your production database (with many rows) and play with it locally on you development machine. The optimization may take some time it may actually be quite hard if you are new to sql. Break down the Query into several temporary tables and join them toghether in the end. Try and remove the dbo.GetFeeds(...) function from the Query to see if that function is the problem.