R group by distinct pairs within a column and then count - combn

I have a data frame containing two columns - ID and SHOP, and I want to find the count of unique Customer IDs that correspond to unique combination of shops in the Shop column. My original data frame is as follows
CustomerID
SHOP
1
A
2
A
3
B
1
C
2
D
4
E
The intended output should be as follows:
SHOP PAIR
CUSTOMERS
A-C
1
A-D
1
Is there a smart way to achieve this in R? Thanks for the help!

Related

SQL - Update in a cross apply query

UPDATE Table1
SET SomeColumn = X.SomeOtherColumn
FROM Table1 T1
CROSS APPLY
(SELECT TOP 1 SomeOtherColumn
FROM Table2 T2
WHERE T2.SomeJoinColumn = T1.SomeJoinColumn
ORDER BY CounterColumn) AS X
I want to increase CounterColumn by 1 each time the cross apply query runs. Is there any way I could achieve this?
Some context and sample data
I have a table containing information about companies. I want to anonymize the company numbers in this table. To do this, I want to use data from another table, containing synthetized data. This table has a much smaller sample size. So I have to reuse the same synthetic companies multiple times. For each row in the table I anonymize, I want to pick a synthetic company of the same type. I want to use all the synthetic companies. That's where the counter comes in, counting how many times I've used that specific synthetic company. By sorting by this counter, I was hoping to be able to always pick the synthetic company that's been used the least.
Company table (Table1)
CompanyNumber
Type
67923
2
82034
2
90238
7
29378
2
92809
5
72890
2
Synthetic company table (Table2)
SyntheticCompanyNumber
Type
Counter
08366
5
0
12588
2
0
33823
2
0
27483
7
0
Expected output of Company table:
CompanyNumber
Type
12588
2
33823
2
27483
7
12588
2
08366
5
33823
2
Expected output of synthetic company table
SynteticCompanyNumber
Type
Counter
08366
5
1
12588
2
2
33823
2
2
27483
7
1

Grouping and Summing Totals in a Joined Table

I have two tables Medication and Inventory. I'm trying to SELECT all the below details from both tables but there are multiple listings of medication ids with different BRANCH_NO also in the INVENTORY table (the primary key in INVENTORY is actually BRANCH_NO, MEDICATION_ID composite key)
I need to total up the various medication_IDs and also join the tables in one SELECT command and display all the infomation for each med (there are 5) with a total sum of each med at the end of each row. But im getting all muddled trying Group by and Sum and at one point partition. Help please I'm new to this.
Below is the latest non working version - but it doesn't display
Medication Name
Medication Desc
Manufacturer
Pack Size
like i chanced it might.
SELECT I.MEDICATION_ID,
SUM(I.STOCK_LEVEL)
FROM INVENTORY I
INNER JOIN (SELECT MEDICATION_NAME, SUBSTR(MEDICATION_DESC,1,20) "Medication Description",
MANUFACTURER, PACK_SIZE FROM MEDICATION) M ON MEDICATION_ID=I.MEDICATION_ID
GROUP BY I.MEDICATION_ID;
For the data imagine I want this sort of output:
MEDICATION_ID MEDICATION_NAME STOCK_LEVEL OtherColumns.....
1 Alpha 10
2 Bravo 20
3 Charlie 20
1 Alpha 30
4 Delta 10
5 Echo 20
5 Echo 40
2 Bravo 10
grouping and totalling into this:
MEDICATION_ID MEDICATION_NAME STOCK_LEVEL OtherColumns.....
1 Alpha 40
2 Bravo 30
3 Charlie 20
4 Delta 10
5 Echo 60
I can get this when its just one table but when Im trying to join tables and also SELECT things its just not working.
Thanks in advance guys. I appreciate it may be a simple solution, but it will be a big help.
You need to write explicitly all non-aggregated columns into both SELECT and GROUP BY lists ( Btw, no need to use a nested query, and if it's the case MEDICATION_ID column is missing in it ) :
SELECT I.MEDICATION_ID, M.MEDICATION_NAME, SUM(I.STOCK_LEVEL) AS STOCK_LEVEL,
SUBSTR(M.MEDICATION_DESC,1,20) "Medication Description", M.MANUFACTURER, M.PACK_SIZE
FROM INVENTORY I
JOIN MEDICATION M ON M.MEDICATION_ID = I.MEDICATION_ID
GROUP BY I.MEDICATION_ID, M.MEDICATION_NAME, SUBSTR(M.MEDICATION_DESC,1,20),
M.MANUFACTURER, M.PACK_SIZE;
This way, you'll be able to return all the listed columns.

Count duplicates in an internal table?

I just want to ask on how to count duplicates in an internal table. I wanted to this in order for me to count per customer and put it into the Customer count column.
Sales Employee Customer Customer Count
a 1 2
a 2 2
b 3 3
b 2 3
b 4 3
c 1 1
as suncatcher mentions in his comment, using sql aggregates is more efficient than looping through internal tables. But if that is not possible in your case, one way would be to use the collect statement. collect adds entries to an internal table and adds up numerical fields when a row with the same key fields already exists. Create an internal table with a field for your sales employee, another field for the count and loop through your sales table, using collect to update your count table for each sale.
types: begin of t_count,
employee type text10,
count type i,
end of t_count.
data: it_count type standard table of t_count,
wa_count type t_count.
loop at it_sales into wa_sales.
move: wa_sales-employee to wa_count-employee,
1 to wa_count-count.
collect wa_count into it_count.
endloop.
The example assumes you have a table it_sales, a work area wa_sales, both with a field employee. Table it_count then contains a list of your employees (in the order they appear in your sales table) and the number of times they appeared in the sales table.
FIELD-SYMBOLS : <lfs_sales> TYPE ty_sales.
Assuming li_sales is an internal table with columns Sales_employee, Customer and customer_count. Initially the table entries are present as follows.
Sales_employee Customer customer_count
a 1 0
a 2 0
b 3 0
b 2 0
b 4 0
c 1 0
We need to calculate the duplicate sales_employee count and update the customer_count field. We can make use of collect statement as suggested by Dirik or make use of control break statements as shown below.
Prerequisite to make use of SUM keyword is to initialize the customer_count as 1 in each row so that it can sum up the customer count based on similar sales_employee.
LOOP AT li_sales ASSIGNING <lfs_sales>.
<lfs_sales>-customer_count = 1.
ENDLOOP.
Now the entries look as shown below.
Sales_employee Customer customer_count
a 1 1
a 2 1
b 3 1
b 2 1
b 4 1
c 1 1
Following code does update the customer_count field value.
LOOP AT li_sales INTO rec_sales.
AT END OF employee.
SUM.
MOVE-CORRESPONDING rec_sales TO rec_count.
APPEND rec_count TO li_count.
CLEAR rec_count.
ENDAT.
ENDLOOP.
SORT li_count BY employee.
LOOP AT li_sales ASSIGNING <lfs_sales>.
CLEAR rec_count.
READ TABLE li_count INTO rec_count
WITH KEY employee = <lfs_sales>-employee
BINARY SEARCH.
IF sy-subrc IS INITIAL.
<lfs_sales>-count = rec_count-count.
ENDIF.
ENDLOOP.
Now the internal table gets assigned with customer_count as below.
Sales_employee Customer customer_count
a 1 2
a 2 2
b 3 3
b 2 3
b 4 3
c 1 1

Find the pairs that appear the most times in StoreID and each STOREID has only this pair SQL

I want to find the pairs of MOVIEID that appear more times in the STOREID.
Additionally, each STOREID should have only this pair as MOVIEIDs. My table has 2 columns: STOREID and MOVIEID.
For example:
STOREID | MOVIEID
--------|---------
1 | a
1 | b
1 | c
2 | a
2 | b
3 | a
3 | b
5 | a
5 | b
In this case the answer would be pair: (a,b) 3 times.
Thanks in advance!
As far as i understand you want to consider only stores that sell movie pairs. This makes it a lot simpler. First you group by stores and take only those results with two movies. Now to generate pairs of those would be tricky if there were more than two movies. You would need windowing functions. However, for two you get both movies with aggregation functions. One with min and the other with max. Further those functions ensure, the same pair always has the same order. For example the pair (a,b) will always be (a,b) and never (b,a).
SELECT COUNT(*), MOVIE_1, MOVIE_2
FROM (
SELECT MIN(MOVIEID) MOVIE_1
,MAX(MOVIEID) MOVIE_2
,STOREID
FROM STORE_MOVIES -- your table
GROUP BY STOREID
HAVING COUNT(*) = 2
) MOVIE_PAIRS
GROUP BY MOVIE_1, MOVIE_2
ORDER BY COUNT(*) DESC
FETCH FIRST ROW ONLY;
For HAVING COUNT(*) = 2 I assume MOVIEID together with STOREID is unique.
Although the request does not really make sense, that's not our design/implementation to concern. I have done with a 3-part self-join to your movies table. The first (m1) joins to second (m2) on the same store, but for second movie being greater than (m1) movie. This will prevent conditions of comparing (a,b) vs (b,a). Then, I am joining (m2) to (m3) by same store, but movie 3 greater than 2. This is an intentional LEFT-JOIN as not all stores will have more than 2. In this case, the value at (m3) will be NULL (non-existent). So, I am looking for where m3.storeID IS NULL. The JOIN between (m1) and (m2) requires the first and second to exist. Finally, tacking on the HAVING will show only those pairs that appear at multiple stores.
select
m1.movieID as Movie1,
m2.movieID as Movie2,
count(*) TimesPaired
from
Movies m1
JOIN movies m2
on m1.storeId = m2.storeId
AND m1.movieId < m2.movieId
LEFT JOIN movies m3
on m2.storeId = m3.storeId
AND m2.movieId < m3.movieId
where
m3.storeId IS NULL
group by
m1.movieID,
m2.movieID
having
count(*) > 1

MDX: joining two sets on same member of several dimensions?

Is is possible in MDX to combine results of two queries based on the same member of several dimensions?
In my case:
There are two types of reports BuyersReports and SellersReports, e.g
BuyersReports
Buyer Seller Amount
A B 10
B C 20
SellersReports
Seller Buyer Amount
B A 10
C B 15
Each company (A,B,C) coulb be both buyer and seller.
I need to achieve some kind of this:
Buy Sell-To (Diff1) Sell Buy-From (Diff2)
B 20 15 5 10 10 0
Currently I have two measures: [Buyings] and [Sellings], and two instances of the same dimension of companies: [Buyers] and [Sellers].
I can get both parts of the desired query for company "B":
SELECT
[Measure].[Buyings],[Meausure].[Sellings] ON COLUMNS,
[Buyers].[Name], [Sellers].[Name] ON ROWS
FROM
(
SELECT [Buyers].[Name].&[B] ON COLUMNS
FROM MyCube
)
gives me
B C 20 15
And
SELECT
[Measure].[Buyings],[Meausure].[Sellings] ON COLUMNS,
[Buyers].[Name], [Sellers].[Name] ON ROWS
FROM
(
SELECT [Sellers].[Name].&[B] ON COLUMNS
FROM MyCube
)
with result
A B 10 10
Is it possible to combine results of these two queries to achieve combined buyer-seller report for each company?
SELECT
[Measure].[Buyings],[Meausure].[Sellings] ON COLUMNS,
[Buyers].[Name], [Sellers].[Name] ON ROWS
FROM
(
SELECT {[Sellers].[Name].&[B],SELECT [Buyers].[Name].&[B]} ON COLUMNS
FROM MyCube
)