How many bitcoins were transferred from one wallet to another? - sql

The problem is simple: I want to query how many BTC were transferred from Wallet A to Wallet B with as many hops as blocks in the blockchain.
Ex.
A transferred 1 BTC to C and 1 BTC to D.
C transferred 0.1 to B
D transferred 0.5 to E and 0.5 to F
E transferred 0.1 to B
Total 0.2 BTC transferred from A to B
I figure I could do this by using bigquery on the blockchain. The problem is that I do not know how to create a recursive query like that. My SQL skills tend to zero.
The cause is noble. I have few addresses that were used in what proved to be a ponzi scheme 1. I have other set of addresses that are being used in ANOTHER scheme, which I believe is another scam (2) laundering money from scheme 1.
I know who is the person behind scam 2.
If I prove that a great amount of BTCs from the first scam went to the wallets related to the second scam, it could be strong indication that they are the same.
Note that I've said a great amount of BTCs. I know that some of BTCs may wind up at the wallets of scheme 2 by chance, but for the majority to end up there is not at all a coincidence.
Disclosure: I am NOT obtaining any financial benefits from this, I only intend to reveal this scammer.

Since you did not post a data structure your mileage may vary. Here is a hypothetical (I know zero about bitcoin data structures) bitcoin chain structure. Use a recursive CTE to create an anchor and self call. I am using Source and target below, however, they could be exchanged with bitcoin semantics.
Sql Fiddle
DECLARE #T TABLE(ChainID INT, SourceID INT, TargetID INT, Amount INT)
INSERT #T VALUES
(1,100,300,1),
(2,900,800,1),
(1,100,400,1),
(2,800,700,1),
(1,300,200,1),
(1,400,500,1),
(2,700,600,1),
(1,500,600,1),
(1,500,200,1),
(2,600,500,1),
(2,500,400,1)
DECLARE #ChainID INT = 2
--Get the first source of a chain !If natural order, if there is a more suitable order field then use it!
DECLARE #StartID INT = (SELECT SourceID FROM (SELECT SourceID,RN=ROW_NUMBER() OVER (ORDER BY ChainID) FROM #T WHERE ChainID = #ChainID ) AS X WHERE RN=1)
;WITH RecursiveWalk AS
(
--Anchor
SELECT
SourceID,
TargetID = T.TargetID,
LevelID = 1
FROM
#T T
WHERE
T.SourceID = #StartID AND ChainID = #ChainID
UNION ALL
--Recursive bit
SELECT
T.SourceID,
TargetID = T.TargetID,
LevelID = LevelID + 1
FROM
#T T
INNER JOIN RecursiveWalk RW ON T.SourceID = RW.TargetID
WHERE
ChainID=#ChainID
)
SELECT
SourceID,
TargetID,
LevelID
FROM
RecursiveWalk

Related

SQL Query that uses GEOGRAPHY to select record where distances match

On this thread, I found this example:
DECLARE #source geography = 'POINT(-94.25 45.46)'
DECLARE #target geography = 'POINT(-94.19 45.57)'
SELECT (#source.STDistance(#target)/1000) * 0.62137
This accurately tells me that there are
~8+ miles between the two points. That's VERY helpful. But, now what I am trying to do is a bit more complex.
I have a table, Criteria that looks like this:
ID State Zip Lat Long Radius
------------------------------------------------
1 MN 56301 45.46 -94.25 25
There are more records that this, but that's enough for our purposes. Now, I need to query for record where either there is a direct State match, or a direct Zip match, or the range matches. So...
DECLARE #CompareState VARCHAR(2) = NULL
DECLARE #CompareZip VARCHAR(5) = NULL
DECLARE #CompareLon DECIMAL = -94.19
DECLARE #CompareLat DECIMAL = 45.57
SELECT
*
FROM
Criteria c
WHERE
c.State = #CompareState
OR c.Zip = #CompareZip
OR (Distance between two sets of Lat and Long is <= c.Radius)
In the query above, the row with ID of 1 should be returned. I'm struggling with the syntax.
Got it.
DECLARE #CompareState VARCHAR(2) = NULL
DECLARE #CompareZip VARCHAR(5) = NULL
DECLARE #CompareLon DECIMAL = -94.19
DECLARE #CompareLat DECIMAL = 45.57
SELECT
*
FROM
LeadSalesCampaignCriterias c
JOIN LeadSalesCampaignCriterias c2
ON c.LeadSalesCampaignCriteriaID = c2.LeadSalesCampaignCriteriaID
AND c2.Latitude IS NOT NULL
AND c2.Longitude IS NOT NULL
WHERE
c.State = #CompareState
OR c.Zip = #CompareZip
OR
(
((geography::Point(c2.Latitude, c2.Longitude, 4326).STDistance(geography::Point(#CompareLat, #CompareLon, 4326))/1000) * 0.62137) < c.Radius
)
I honestly don't know what 4326 is.
See: https://msdn.microsoft.com/en-us/library/bb933811.aspx
I cribbed your answer and changed it a bit for efficiency.
DECLARE #CompareState VARCHAR(2) = NULL
DECLARE #CompareZip VARCHAR(5) = NULL
DECLARE #CompareLon DECIMAL = -94.19
DECLARE #CompareLat DECIMAL = 45.57
-- you appear to be wanting to find things within 1 mile
-- the magic number 1609.34 is the number of meters in a mile
DECLARE #RangeDisk geography = geography::Point(#CompareLat, #CompareLon, 4326).STBuffer(1609.34);
SELECT
*
FROM
LeadSalesCampaignCriterias c
JOIN LeadSalesCampaignCriterias c2
ON c.LeadSalesCampaignCriteriaID = c2.LeadSalesCampaignCriteriaID
AND c2.Latitude IS NOT NULL
AND c2.Longitude IS NOT NULL
WHERE
c.State = #CompareState
OR c.Zip = #CompareZip
OR geography::Point(c2.Latitude, c2.Longitude, 4326).STIntersects(#RangeDisk) = 1
A couple of other notes. If you can alter your table to have the geography column pre-computed, that will make this even better as you won't have to convert it on the fly in the where clause (that predicate would take the form of new_column.STIntersects(#RangeDisk) = 1). A spatial index on that new column will do wonders for the efficiency of the query!
I'm also a little confused by the self join. Is LeadSalesCampaignCriteriaID the primary key in the table? If so, I don't think that join is necessary (and is very likely hurting your performance).
Lastly, in your self-answer, you mentioned not knowing what the magic number 4326 was. It's called a spatial reference id (aka SRID). Essentially, there have been multiple attempts historically to model the earth. When you get representations of geographic features from external sources, they will have been created with one of those systems in mind. Even if you're creating them whole cloth though, you need to know what the unit of measure is (when you compute something like distance, for example). You can see properties of the SRIDs that SQL knows about in sys.spatial_reference_systems.

SQL Combining results from multiple tables, and rows, in to one row in one table

so here's my situation.
I have two tables (keysetdata115) containing vendor information and keysetdata117 that contains either a Remit or Payment address.
Here are the structures with one sample entry:
keysetdata115:
keysetnum ks183 ks178 ks184 ks185 ks187 usagecount
2160826 1 6934 AUDIO DIGEST FOUNDATION 26-1180877 A 0
keysetdata117 (I truncated values for ks192 and ks191 to fit formatting)
keysetnum ks183 ks178 ks188 ks189 ks190 ks192 ks191 usagecount
2160827 1 6934 P001 P EBSCO... TOP OF... A 0
2160828 1 6934 R002 R EBSCO... 123 SE... A 0
There is no 1:1 relationship and the only thing that makes a unique record is the combination or Remit Code,Payment Code, vendor number and vendor group.The codes can only be obtained by referencing the address and / or name.
Ideally what I'd like to do is set this up so that I can pass in the addresses and return all the related values.
I'm dumping this in a table called 'dbo.test' right now (for testing obviously), that has the following entries and what the correspond to in the above tables: vengroup (ks183), vendnum (ks178), remit (ks188), payment (ks188)... ks188 will be a remit or payment based off the value in ks189.
This is what I'm doing so far, using 3 select queries and it works, but there's a lot of redundancy and it's very inefficient.
Any suggestions on how I can streamline it would be MUCH appreciated.
insert into dbo.test (vengroup,vendnum)
select ks183, ks178
from hsi.keysetdata115
where ks184 like 'AUDIO DIGEST%'
update dbo.test
set dbo.test.remit = y.remit
from
dbo.test tst
INNER JOIN
(Select ksd.ks188 as remit, ksd.ks183 as vengroup, ksd.ks178 as vendnum
from hsi.keysetdata117 ksd
inner join dbo.test tst
on tst.vengroup = ksd.ks183 and tst.vendnum = ksd.ks178
where ksd.ks190 like 'EBSCO%' and ks189 = 'R') y
on tst.vengroup = y.vengroup and tst.vendnum = y.vendnum
update dbo.test
set dbo.test.payment = y.payment
from
dbo.test tst
INNER JOIN
(Select ksd.ks188 as payment, ksd.ks183 as vengroup, ksd.ks178 as vendnum
from hsi.keysetdata117 ksd
inner join dbo.test tst
on tst.vengroup = ksd.ks183 and tst.vendnum = ksd.ks178
where ksd.ks190 like 'EBSCO%' and ks189 = 'P') y
on tst.vengroup = y.vengroup and tst.vendnum = y.vendnum
Thanks so much for any suggestions!
You can do what you want in one statement. You just have to do the selection on the run. The way the statement below is written, if Remit gets the value, Payment gets a null and vice versa. If you want the other value to be non-null, just add an else clause to the cases. Like then b.ks188 else 0 end.
INSERT INTO dbo.TEST( vengroup, vendnum, remit, payment )
SELECT a.ks183, a.ks178,
CASE b.ks189 WHEN 'R' THEN b.ks188 END,
CASE b.ks189 WHEN 'P' THEN b.ks188 END
FROM keysetdata115 a
JOIN keysetdata117 b
ON b.ks183 = a.ks183
AND b.ks178 = a.ks178
AND b.ks190 LIKE 'EBSCO%'
WHERE a.ks184 LIKE 'AUDIO DIGEST%';

SQL sub query logic

I am trying to calculate values in a column called Peak, but I need to apply different calculations dependant on the 'ChargeCode'.
Below is kind of what I am trying to do, but it results in 3 columns called Peak - Which I know is what I asked for :)
Can anyone help with the correct syntax, so that I end up with one column called Peak?
Use Test
Select Chargecode,
(SELECT 1 Where Chargecode='1') AS [Peak],
(SELECT 1 Where Chargecode='1242') AS [Peak],
Peak*2 AS [Peak],
CallType
from Daisy_March2014
Thanks
You want a case statement. I think this is what you are looking for:
Select Chargecode,
(case when chargecode = '1'
when chargecode = '1242' then 2
else 2 * Peak
end) as Peak,
CallType
from Daisy_March2014;
Thanks Gordon, I have marked you response as Answered. Here is the final working code:
(case when chargecode in ('1') then 1 when chargecode in ('1264') then 2 else Peak*2 end) as Peak,
Since it depends on your charge code, I'm going to make a wild assumption that this might be an ongoing thing where new charge codes / rules could be added. Why not store this as metadata either in the charge code table or in a new table? You could generate the initial data with this:
SELECT ChargeCode,
Multiplier
INTO ChargeMeta
FROM (
Select 1 AS ChargeCode,
1 AS Multiplier
UNION ALL
SELECT 1242 AS ChargeCode,
1 AS Multiplier
UNION ALL
SELECT ChargeCode,
2 AS Multiplier
FROM Daisy_March2014
WHERE ChargeCode NOT IN (1,1242)
) SQ
Then just join to your original data.
SELECT a.ChargeCode,
a.Peak*b.Multiplier AS Peak
FROM Daisy_March2014 a
JOIN ChargeMeta b
ON a.ChargeCode = b.ChargeCode
If you do not want to maintain all charge code multipliers, you could maintain your non-standard ones, and store the standard one in the SQL. This would be about the same as a case statement, but it may still add benefit to store the overrides in a table. At the very least, it makes it easier to re-use elsewhere. No need to check all the queries that deal with Peak values and make them consistent, if ChargeCode 42 needs to have a new multiplier set.
If you want to store the default in the table, you could use two joins instead of one, storing the default charge code under a value that will never be used. (-1?)
SELECT a.ChargeCode,
a.Peak*COALESCE(b.Multiplier,c.Multiplier) AS Peak
FROM Daisy_March2014 a
LEFT JOIN ChargeMeta b ON a.ChargeCode = b.ChargeCode
LEFT JOIN ChargeMeta c ON c.ChargeCode = -1

Joining large tables in SQL

I have a table called "calls", there are columns:
a_imei_number
b_imei_number
a_phone_number
b_phone_number
call_start_time
call_end_time
if a specific phone called x calls y then imei number of x is in a_imei_number column if y calls x then imei of x in b_imei_number. Shortly difference between a_imei_number and b_imei_number is incoming and outgoing calls for an imei. same for phone_number columns.
I am searching calls for a specific imei that happen in same time (cloned imei numbers) so i thought if i find a call whose call_start_time between other's call_start_time and call_end_time then i would find the cloned phones. so imei numbers must be same and phone numbers must be different logically.
so i wrote
select * from calls c1 , calls c2
where (c1.a_imei = 1234 or c1.b_imei = 1234)
and
c1.call_start_time between c2.call_start_time and c2.call_end_time
table has maybe 500M data. so this query is not returning and result maybe in 1 week it returns. Is there anyother way to find the result without joining same table like this ?
If I understand correctly, you are looking for calls that occur at the same time as calls to or from a specific number. The following query expresses this idea:
select c2.*
from (select c.*
from calls c
where c.a_imei = 1234 or c.b_imei = 1234
) cbase join
calls c2
on cbase.call_start_time between c2.call_start_time and c2.call_end_time;
The performance is going to depend greatly on the number of matches of the first query.
Sometimes, the database engine has a hard time optimizing or in a condition. I would suggest having indexes on calls(a_imei, call_start_time) and calls(b_imei, call_start_time) and rewriting the query as:
select c2.*
from ((select c.call_start_time
from calls c
where c.a_imei = 1234
) union all
(select c.call_start_time
from calls c
where c.b_imei = 1234
)
) cbase join
calls c2
on cbase.call_start_time between c2.call_start_time and c2.call_end_time;
For the final join, a third index would be useful: calls(call_start_time, call_end_time).
This probably won't help completely, but will hopefully give someone with more knowledge something to start with:
Improving the join
SELECT *
FROM calls c1
INNER JOIN calls c2 ON c1.call_start_time BETWEEN c2.call_start_time AND c2.call_end_time
WHERE (c1.a_imei = 1234 or c1.b_imei = 1234)
Other Comments:
SELECT * will be inefficient as it is, especially as it will be returning non unique column names, you should only select the columns relevant to the query in question.
There are several things you can do to improve your query.
Indexes
It seems that you should have indexes defined on a_imei and b_imei. Perhaps you also would want to include call start and end times in those indexes as well, this depends.
Specify columns
Don't use select *, instead specify the list of columns you want to return.
select
a_imei_number,
b_imei_number,
call_start_time,
call_end_time
Proper Join
This depends on exactly what you are looking for in results. If you want to report on all possible duplicates, you would structure it one way.
select c2.a_imei, c2.b_imei, c2.call_start_time, c2.call_end_time
from (select c.a_imei, c.b_imei, c.call_start_time, c.call_end_time
from calls c
where c.a_imei = c.b_imei
) cbase join
calls c2
on cbase.call_start_time between c2.call_start_time and c2.call_end_time;
If you have a known imei_number and want to search for it, the query would be structured differently.
select c2.a_imei, c2.b_imei, c2.call_start_time, c2.call_end_time
from (select c.a_imei, c.b_imei, c.call_start_time, c.call_end_time
from calls c
where c.a_imei = 1234 or c.b_imei = 1234
) cbase join
calls c2
on cbase.call_start_time between c2.call_start_time and c2.call_end_time;

Selecting elements that don't exist

I am working on an application that has to assign numeric codes to elements. This codes are not consecutives and my idea is not to insert them in the data base until have the related element, but i would like to find, in a sql matter, the not assigned codes and i dont know how to do it.
Any ideas?
Thanks!!!
Edit 1
The table can be so simple:
code | element
-----------------
3 | three
7 | seven
2 | two
And I would like something like this: 1, 4, 5, 6. Without any other table.
Edit 2
Thanks for the feedback, your answers have been very helpful.
This will return NULL if a code is not assigned:
SELECT assigned_codes.code
FROM codes
LEFT JOIN
assigned_codes
ON assigned_codes.code = codes.code
WHERE codes.code = #code
This will return all non-assigned codes:
SELECT codes.code
FROM codes
LEFT JOIN
assigned_codes
ON assigned_codes.code = codes.code
WHERE assigned_codes.code IS NULL
There is no pure SQL way to do exactly the thing you want.
In Oracle, you can do the following:
SELECT lvl
FROM (
SELECT level AS lvl
FROM dual
CONNECT BY
level <=
(
SELECT MAX(code)
FROM elements
)
)
LEFT OUTER JOIN
elements
ON code = lvl
WHERE code IS NULL
In PostgreSQL, you can do the following:
SELECT lvl
FROM generate_series(
1,
(
SELECT MAX(code)
FROM elements
)) lvl
LEFT OUTER JOIN
elements
ON code = lvl
WHERE code IS NULL
Contrary to the assertion that this cannot be done using pure SQL, here is a counter example showing how it can be done. (Note that I didn't say it was easy - it is, however, possible.) Assume the table's name is value_list with columns code and value as shown in the edits (why does everyone forget to include the table name in the question?):
SELECT b.bottom, t.top
FROM (SELECT l1.code - 1 AS top
FROM value_list l1
WHERE NOT EXISTS (SELECT * FROM value_list l2
WHERE l2.code = l1.code - 1)) AS t,
(SELECT l1.code + 1 AS bottom
FROM value_list l1
WHERE NOT EXISTS (SELECT * FROM value_list l2
WHERE l2.code = l1.code + 1)) AS b
WHERE b.bottom <= t.top
AND NOT EXISTS (SELECT * FROM value_list l2
WHERE l2.code >= b.bottom AND l2.code <= t.top);
The two parallel queries in the from clause generate values that are respectively at the top and bottom of a gap in the range of values in the table. The cross-product of these two lists is then restricted so that the bottom is not greater than the top, and such that there is no value in the original list in between the bottom and top.
On the sample data, this produces the range 4-6. When I added an extra row (9, 'nine'), it also generated the range 8-8. Clearly, you also have two other possible ranges for a suitable definition of 'infinity':
-infinity .. MIN(code)-1
MAX(code)+1 .. +infinity
Note that:
If you are using this routinely, there will generally not be many gaps in your lists.
Gaps can only appear when you delete rows from the table (or you ignore the ranges returned by this query or its relatives when inserting data).
It is usually a bad idea to reuse identifiers, so in fact this effort is probably misguided.
However, if you want to do it, here is one way to do so.
This the same idea which Quassnoi has published.
I just linked all ideas together in T-SQL like code.
DECLARE
series #table(n int)
DECLARE
max_n int,
i int
SET i = 1
-- max value in elements table
SELECT
max_n = (SELECT MAX(code) FROM elements)
-- fill #series table with numbers from 1 to n
WHILE i < max_n BEGIN
INSERT INTO #series (n) VALUES (i)
SET i = i + 1
END
-- unassigned codes -- these without pair in elements table
SELECT
n
FROM
#series AS series
LEFT JOIN
elements
ON
elements.code = series.n
WHERE
elements.code IS NULL
EDIT:
This is, of course, not ideal solution. If you have a lot of elements or check for non-existing code often this could cause performance issues.