Get similar employees based on their attribute values - sql

Consider the following sample table("Customer") with these records
=========
Customer
=========
-----------------------------------------------------------------------------------------------
| customer-id | att-a | att-b | att-c | att-d | att-e | att-f | att-g | att-h | att-i | att-j |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-1 | att-a-7 | att-b-3 | att-c-10 | att-d-10 | att-e-15 | att-f-11 | att-g-2 | att-h-7 | att-i-5 | att-j-14 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-2 | att-a-9 | att-b-7 | att-c-12 | att-d-4 | att-e-10 | att-f-4 | att-g-13 | att-h-4 | att-i-1 | att-j-13 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-3 | att-a-10 | att-b-6 | att-c-1 | att-d-1 | att-e-13 | att-f-12 | att-g-9 | att-h-6 | att-i-7 | tt-j-4 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-19 | att-a-7 | att-b-9 | att-c-13 | att-d-5 | att-e-8 | att-f-5 | att-g-12 | att-h-14 | att-i-13 | att-j-15 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
I have these records and many more records dumped into SQL database and wanted to find top 10 similar customer based on the attribute value. For example customer-1 and customer-19 have atleast one column value matching .i.e "att-a-7" so the output should give me 2 customer-id's or top similar customer that are customer-1 and customer-19.
P.S - there can be one or more columns similar across rows.
I'm using windowing technique to find top 10 similar customer and im not sure if I'm correct.
following is my approach I used in my query :
row_number() over (partition by att-a, att-b,..,att-j order by customer-id) as customers
is this correct. ?

Related

Loop to find multiple minimum and maximum values

I have a table (tblProduct) with a field (SerialNum).
I am trying to find multiple minimum and maximum values from the field SerialNum, or better put: ranges of sequential serial numbers.
The serial numbers are 5 digits and a letter. Most of the values are sequential, but NOT all!
I need the output for a report to look something like:
00001A - 00014A
00175A - 00180A
00540A - 00549A
12345A - 12349A
04500B - 04503B
04522B - 04529B
04595B
04627B - 04631B
If the values in-between are present.
I tried a loop, but I realized I was using record sets. I need one serial num to be compared to ALL the ranges. Record sets were looking at one range.
I have been able to determine the max and min of the entire series, but not of each sequential group.
| SerialNum |
| -------- |
| 00001A|
| 00002A|
| 00003A|
| 00004A|
| 00005A|
| 00006A|
| 00007A|
| 00008A|
| 00009A|
| 00010A|
| 00011A|
| 00012A|
| 00013A|
| 00014A|
| 00175A|
| 00176A|
| 00177A|
| 00178A|
| 00179A|
| 00180A|
| 00540A|
| 00541A|
| 00542A|
| 00543A|
| 00544A|
| 00545A|
| 00546A|
| 00547A|
| 00548A|
| 00549A|
| 12345A|
| 12346A|
| 12347A|
| 12348A|
| 12349A|
| 04500B|
| 04501B|
| 04502B|
| 04503B|
| 04522B|
| 04523B|
| 04524B|
| 04525B|
| 04526B|
| 04527B|
| 04528B|
| 04529B|
| 04595B|
| 04627B|
| 04628B|
| 04629B|
| 04630B|
| 04631B|
Try to group by the number found with Val:
Select
Min(SerialNum) As MinimumSerialNum,
Max(SerialNum) As MaximumSerialNum
From
tblProduct
Group By
Val(SerialNum)

Postgresql query substract from one table

I have a one tables in Postgresql and cannot find how to build a query.
The table contains columns nr_serii and deleteing_time. I trying to count nr_serii and substract from this positions with deleting_time.
My query:
select nr_serii , count(nr_serii ) as ilosc,count(deleting_time) as ilosc_delete
from MyTable
group by nr_serii, deleting_time
output is:
+--------------------+
| "666666";1;1 |
| "456456";1;0 |
| "333333";3;0 |
| "333333";1;1 |
| "111111";1;1 |
| "111111";3;0 |
+--------------------+
The part of table with raw data:
+--------------------------------+
| "666666";"2020-11-20 14:08:13" |
| "456456";"" |
| "333333";"" |
| "333333";"" |
| "333333";"" |
| "333333";"2020-11-20 14:02:23" |
| "111111";"" |
| "111111";"" |
| "111111";"2020-11-20 14:08:04" |
| "111111";"" |
+--------------------------------+
And i need substract column ilosc and column ilosc_delete
example:
nr_serii:333333 ilosc:3-1=2
Expected output:
+-------------+
| "666666";-1 |
| "456456";1 |
| "333333";2 |
| "111111";2 |
| ... |
+-------------+
I think this is very simple solution for this but i have empty in my head.
I see what you want now. You want to subtract the number where deleting_time is not null from the ones where it is null:
select nr_serii,
count(*) filter (where deleting_time is null) - count(deleting_time) as ilosc_delete
from MyTable
group by nr_serii;
Here is a db<>fiddle.

Query from secondary index on aerospike

I'm considering aerospike for one of our projects. So I currently created a 3 node cluster and loaded some data on it.
Sample data
ns: imei
set: imei_data
+-------------------+-----------------------+-----------------------+----------------------------+--------------+--------------+
| imsi | fcheck | lcheck | msc | fcheck_epoch | lcheck_epoch |
+-------------------+-----------------------+-----------------------+----------------------------+--------------+--------------+
| "413010324064956" | "2017-03-01 14:30:26" | "2017-03-01 14:35:30" | "13d20b080011044917004100" | 1488358826 | 1488359130 |
| "413012628090023" | "2016-09-21 10:06:49" | "2017-09-16 13:54:40" | "13dc0b080011044917006100" | 1474432609 | 1505550280 |
| "413010130130320" | "2016-12-29 22:05:07" | "2017-10-09 16:17:10" | "13d20b080011044917003100" | 1483029307 | 1507546030 |
| "413011330114274" | "2016-09-06 01:48:06" | "2017-10-09 11:53:41" | "13d20b080011044917003100" | 1473106686 | 1507530221 |
| "413012629781993" | "2017-08-16 16:03:01" | "2017-09-13 18:10:48" | "13dc0b080011044917004100" | 1502879581 | 1505306448 |
Then I created a secondary index on lcheck_epoch using AQL since I want to query based on date.
create index idx_lcheck on imei.imei_data (lcheck_epoch) NUMERIC
+--------+----------------+-----------+-------------+-------+--------------+----------------+-----------+
| ns | bin | indextype | set | state | indexname | path | type |
+--------+----------------+-----------+-------------+-------+--------------+----------------+-----------+
| "imei" | "lcheck_epoch" | "NONE" | "imei_data" | "RW" | "idx_lcheck" | "lcheck_epoch" | "NUMERIC" |
+--------+----------------+-----------+-------------+-------+--------------+----------------+-----------+
When I execute
select imsi from imei.imei_data where idx_lcheck=1476165806
I'm getting
Error: (204) AEROSPIKE_ERR_INDEX
Please explain.
You're using the index name, not the bin name, in your query. Try this:
SELECT imsi FROM imei.imei_data WHERE lcheck_epoch=1476165806
Or
SELECT imsi FROM imei.imei_data WHERE lcheck_epoch BETWEEN 1490000000 AND 1510000000
Just a note, you can do much more complex queries using predicate filtering through several of the language clients (Java, C, C#, Go). For example the PredExp class of the Java client (see examples.)

PostgreSQL: Get elapsed amount between integers

I'm trying to get the difference between the first start_time and the last stop_time in the table. But I can't seem to get this done in one query. Can somebody help me? This is some sample data:
start_time | stop_time
-------------------+-------------------
1398871312.769668 | 1398871312.769676
1398871312.771368 | 1398871312.771429
1398871312.771471 | 1398871312.771476
1398871312.771494 | 1398871312.771543
1398871312.781109 | 1398871312.781115
1398871312.781150 | 1398871312.781154
1398871312.781233 | 1398871312.781282
1398871312.992759 | 1398871312.992765
1398871312.992795 | 1398871312.992798
1398871312.992832 | 1398871312.992881
1398871313.3387 | 1398871313.3399
1398871313.3435 | 1398871313.3440
1398871313.3703 | 1398871313.3745
1398871313.203462 | 1398871313.203469
1398871313.203497 | 1398871313.203501
1398871313.203560 | 1398871313.203600
1398871313.214120 | 1398871313.214127
1398871313.214153 | 1398871313.214158
1398871313.214177 | 1398871313.214192
1398871313.214208 | 1398871313.214248
1398871313.415027 | 1398871313.415035
1398871313.415136 | 1398871313.415140
1398871313.415218 | 1398871313.415226
1398871313.415252 | 1398871313.415265
1398871313.415290 | 1398871313.415298
1398871313.415332 | 1398871313.415339
1398871313.415350 | 1398871313.415362
1398871314.144867 | 1398871314.144886
1398871314.144896 | 1398871314.144901
1398871314.144906 | 1398871314.144912
1398871314.144918 | 1398871314.144923
1398871314.144927 | 1398871314.144931
1398871314.144935 | 1398871314.144939
1398871314.144965 | 1398871314.144974
1398871314.145055 | 1398871314.145060
1398871314.145138 | 1398871314.145146
1398871314.145152 | 1398871314.145158
1398871314.145166 | 1398871314.145173
1398871314.145211 | 1398871314.145215
1398871314.145235 | 1398871314.145243
1398871314.145247 | 1398871314.145252
1398871314.145262 | 1398871314.145267
1398871314.145307 | 1398871314.145314
1398871314.145547 | 1398871314.145551
1398871314.145563 | 1398871314.145571
1398871314.145576 | 1398871314.145581
1398871314.145586 | 1398871314.145590
1398871314.145600 | 1398871314.145606
1398871314.145611 | 1398871314.145618
1398871314.145623 | 1398871314.145627
1398871314.145634 | 1398871314.145641
1398871314.145999 | 1398871314.146003
1398871314.146014 | 1398871314.146022
1398871314.146026 | 1398871314.146033
1398871314.146043 | 1398871314.146050
1398871314.146140 | 1398871314.146145
1398871314.146160 | 1398871314.146168
1398871314.146178 | 1398871314.146185
So I want the difference between 1398871312 and 1398871314, in one query. Is this possible? Can anyone help me? Thanks!
Something like this?
select max(stop_time) - min(start_time)
from Table1

casting a REAL as INT and comparing

I am casting a real to an int and a float to an int and comparing the two like this:
where
cast(a.[SUM(PAID_AMT)] as int)!=cast(b.PAID_AMT as int)
but i am still getting results where the two are equal. for example:
+-----------+-----------+------------+------------+----------+
| accn | load_dt | pmtdt | sumpaidamt | Bpaidamt |
+-----------+-----------+------------+------------+----------+
| A133312 | 6/7/2011 | 11/28/2011 | 98.39 | 98.39 |
| A445070 | 6/2/2011 | 9/22/2011 | 204.93 | 204.93 |
| A465606 | 5/19/2011 | 10/19/2011 | 560.79 | 560.79 |
| A508742 | 7/12/2011 | 10/19/2011 | 279.65 | 279.65 |
| A567730 | 5/27/2011 | 10/24/2011 | 212.76 | 212.76 |
| A617277 | 7/12/2011 | 10/12/2011 | 322.02 | 322.02 |
| A626384 | 6/16/2011 | 10/21/2011 | 415.84 | 415.84 |
| AA0000044 | 5/12/2011 | 5/23/2011 | 197.38 | 197.38 |
+-----------+-----------+------------+------------+----------+
here is the full query:
select
a.accn,
a.load_dt,
a.pmtdt,
a.[SUM(PAID_AMT)] sumpaidamt,
sum(b.paid_amt) Bpaidamt
from
[MILLENNIUM_DW_DEV].[dbo].[Millennium_Payment_Data_May2011_July2012] a
join
F_PAYOR_PAYMENTS_DAILY b
on
a.accn=b.ACCESSION_ID
and
a.final_rpt_dt=b.FINAL_REPORT_DATE
and
a.load_dt=b.LOAD_DATE
and
a.pmtdt=b.PAYMENT_DATE
where
cast(a.[SUM(PAID_AMT)] as int)!=cast(b.PAID_AMT as int)
group by
a.accn,
a.load_dt,
a.pmtdt,
a.[SUM(PAID_AMT)]
what am i doing wrong? how do i return only records that are NOT equal?
I don't see why there is an issue.
The query is returning the sum of the payments in b (sum(b.paid_amt) Bpaidamt). The where clause is comparing individual payments. This just means that there is more than one payment.
Perhaps your intention is to have a HAVING clause instead:
having cast(a.[SUM(PAID_AMT)] as int)!=cast(sum(b.PAID_AMT) as int)
You can do a round and a cast statement.
cast(round(sumpaidamt,2) as money) <> cast(round(Bpaidamt,2) as money)
Sql Fiddle showing how it would work http://sqlfiddle.com/#!3/4eb79/1