I have two CSV files, the first like so:
Book1:
ID,TITLE,SUBJECT
0001,BLAH,OIL
0002,BLAH,HAMSTER
0003,BLAH,HAMSTER
0004,BLAH,PLANETS
0005,BLAH,JELLO
0006,BLAH,OIL
0007,BLAH,HAMSTER
0008,BLAH,JELLO
0009,BLAH,JELLO
0010,BLAH,HAMSTER
0011,BLAH,OIL
0012,BLAH,OIL
0013,BLAH,OIL
0014,BLAH,JELLO
0015,BLAH,JELLO
0016,BLAH,HAMSTER
0017,BLAH,PLANETS
0018,BLAH,PLANETS
0019,BLAH,HAMSTER
0020,BLAH,HAMSTER
And then a second CSV with items associated with the first list, with ID being the common attribute between the two.
Book2:
ID,ITEM
0001,PURSE
0001,STEAM
0001,SEASHELL
0002,TRUMPET
0002,TRAMPOLINE
0003,PURSE
0003,DOLPHIN
0003,ENVELOPE
0004,SEASHELL
0004,SERPENT
0004,TRUMPET
0005,CAR
0005,NOODLE
0006,CANNONBALL
0006,NOODLE
0006,ORANGE
0006,SEASHELL
0007,CREAM
0007,CANNONBALL
0007,GUM
0008,SERPENT
0008,NOODLE
0008,CAR
0009,CANNONBALL
0009,SERPENT
0009,GRAPE
0010,SERPENT
0010,CAR
0010,TAPE
0011,CANNONBALL
0011,GRAPE
0012,ORANGE
0012,GUM
0012,SEASHELL
0013,NOODLE
0013,CAR
0014,STICK
0014,ORANGE
0015,GUN
0015,GRAPE
0015,STICK
0016,BASEBALL
0016,SEASHELL
0017,CANNONBALL
0017,ORANGE
0017,TRUMPET
0018,GUM
0018,STICK
0018,GRAPE
0018,CAR
0019,CANNONBALL
0019,TRUMPET
0019,ORANGE
0020,TRUMPET
0020,CHERRY
0020,ORANGE
0020,GUM
The real datasets are millions of records, so I'm sorry in advance for my simple example.
The problem I need to solve is getting the data merged and collated in a way where I can see which item groupings most commonly appear together on the same ID. (e.g. GRAPE,GUM,SEASHELL appear together 340 times, ORANGE and STICK 89 times, etc...)
Then I need to see if there is any change/deviation to the general results in common appearance when grouped by SUBJECT.
Tools I'm familiar with are Excel and SQL, but I also have PowerBI and Alteryx at my disposal.
Full disclosure: Not homework, or work, but a volunteer project, thus my unfamiliarity with this kind of data manipulation.
Thanks in advance.
An Alteryx solution:
Drag the two .csv files onto your canvas (seen as book1.csv and book2.csv in my picture; Alteryx will create "Input" tools for you.
Drag a "Join" tool on and connect the two .csv files to its inputs; select "ID" as the join field; unselect the "Right_ID" as output since it's merely a duplicate of "ID"
Drag a "Summary" tool on and connect the Join tool's output to the Summary tool's input; select all three of the outputs and add as a "group by"... then add the ID column with a "count"
Drag a browse tool on and connect the summary's output to the browse tool's input.
run the workflow
After all that, click on the browse tool and you should see what is seen in my screenshot: (which is showing just the first ten rows of output):
+1 for taking on a volunteer project - I think anyone who knows data can have a big impact in support of their favourite group or cause.
I would just pull the 2 files into Power BI as 2 separate tables (Get Data / From File). Create a relationship between the 2 tables based on ID (it might get auto-generated). It should be one to many.
Then I would add a Calculated Column to the Book1 table to Concatenate the related ITEM values, eg.
Items =
CALCULATE (
CONCATENATEX (
DISTINCT ( 'Book2'[ITEM] ),
'Book2'[ITEM],
", ",
'Book2'[ITEM], ASC
)
)
Now you can use that Items field in visuals (e.g. a Table), along with Count of ID to get the frequency.
Adding Subject to a copy of the table (e.g. to the Columns well of a Matrix) will produce your grouped scenario, or you could add a Subject Slicer.
As you will be comparing subsets of varying size, I would change Count of ID to Show value as - % of grand total.
Little different solution using Alteryx.
With this dataset, there are very few repeating 3 or 4 item groups. You can do the two item affinity analysis and get a probability of 3 or 4 item groups, or you can count the 3 and 4 item groups individually. I believe what you want is the latter as your probability of getting grapes with oranges may be altered by whether you have bananas in the cart or not.
Anyway, I did not join in the subject until after finding all of my combinations. I found all the combinations by taking the Cartesian join of two, then three, then four of the original set. I then removed all duplicates by ensuring items were always in alphabetical order in each row. I then counted occurrences of each combination. More joins can be added in the same pattern to count groups of 5,6,7...
Once you have the counts of occurrences, then I would join back with the subjects and perform this analysis on each group and compare to the overall results.
I'm supposed to disclose that I work for Alteryx.
first of all if you are using windows
just navigate to the directory which contains the CSV and write the following command:
copy pattern newfileName.csv
#example
copy *.csv merged.csv
now you created one csv file, the file is too large now you can't process it once, depending on your programming language you can use appropriate way, for python you can use generators to process line by line, or pandas you can read chunk by chunk it will be easy.
I hope this help you.
We have an investment account, with mainly equities. We can download transaction history in xlsx format.
The table format looks like this
Date Activity Symbols/description Price
10/29 Buy FLY/leasing LTD 13.92
10/22 Sell BRFRF/BURFORD cap 1.99
10/20 Buy BRFRF/BURFORD cap 1.92
10/17 NameChg JGWPT/JGWPT Holding
10/13 Buy JGW/JG Wentworth 5.31
Now, I am going to write a VBA module to
generate a list in a separate tab (worksheet) to keep the UNIQUE Symbols -- the names of equities in my portfolio.
Only work for the 'buy' or 'sell' transactions, all other activities should be ignored
Any time the transaction history is updated, the code can check for any new symbols (I don't care whether it is fully automatic, or click button), and add to the list.
I am sure I can do it with a lot conditional statements and loops, but I believe there must be some better ways.
Any idea would be helpful. Thanks so much.
Split Symbols/Description on the "/" Text to Columns. Make sure you preserve the price.
Use Microsoft Query on the Excel file in question to pull the relevant information into a table in the second sheet.
Im building a system for my company to keep track of internal orders, inbetween our warehouses, we have material that goes out warehouse 1 to warehouse 2 and we kind of lose track of how much of "x" is in warehouse 1 and how much in warehouse 2, so i want to implement this access db where a user fills a form and says: order 1: 500 of "x" order 2: 300 of "y". then another user fills an exit form where he says 1 of "x" going out, so i would need the program to keep track of total order and how much as gone out to fill order 1 and so on...
My idea here is to have both an order number and an id number for each of "x" everytime someoneone assembles 1 "x" they fill the form and print a label directly from the access (i have this part working already) while keeping a record of when it was assembled, who verified and what was verified (it will work as a quality control also).
What i dont know is how to program the db so when it reaches 500 of "x", the id number for "x" starts again from 1
This is the one major issue with my program right now, i'm not experienced in access db's or vba, but im getting there with a tip and a trick from here and there, so, no need to be careful with the technical language, i will google it if i have to :p
EDIT:
The table structure goes as follows:
1 table as the main table where I record the check that is made for every product, where I include the model of the product, the said ID that I want to reset after a number of products checked, and a concatenated field that includes most of this information to generate a qr code.
Then there is a table for the Order Number, which is connected to a form to record each new order with a date/time field, the order number itself and the number of products. This number of products must then be called from the code that will count how many products have been checked to date and keep the order number field updated so we can keep track of the order.
Then there is another minor table just to get values for the form, the product models
Thank you for your answers ;)
See this MSDN Documentation
Unfortunately in Access, you cannot 'reset' an ID field, unless you move the records to a newly created table and use that table for every 500 records.
As for the user control and login form, I'm afraid those are separate questions that must be asked in a different thread.
To get you started:
You can set the RecordSource of a form to a table, and when users make entries, the data will be saved to the table. You can also use a form with controls (text boxes, comboboxes, etc.) and create a button that runs a query to insert these records into a table.
The login piece - you can encrypt the database with a password. That may/may not be sufficient.
I would suggest you change your schema, if possible. Something like the following:
Orders
OrderID (Autonumber)
ProductID (link to your Products table)
QuantityRequested
Deliverables
DeliverableID (Autonumber)
OrderID (link to your Orders table)
SequenceNumber: in the BeforeInsert event set this value equal to:
DCount("*", "Deliverables", "OrderID=" & Me.OrderID) + 1
I'm assuming that your form has a control named OrderID that is bound to the OrderID field of the Deliverables table.
The code uses the DCount() function to get the count of all the other deliverables that have already been created for this order. If this is the first deliverable, DCount() will return 0. It then adds 1 to this count to get the sequence number of the next deliverable.
If the new SequenceNumber is greater than the quantity requested, you could display a message saying that the order has been filled and cancel the creation of the Deliverable record.
This is just one approach and it is not a complete solution. I'm assuming that once assigned a sequence number a deliverable cannot be deleted. You might need to make allowances for deliverables that get lost or damaged. You could incorporate a status field to the Deliverable table to deal with this, but you would still need to make a decision about what to do with the SequenceNumber.
I have 2 excel files at work where I maintain the rates of assets and the dates when the rates were issued. Another excel file has the list of assets and the dates when they were sold.
So one excel file has the following columns:
Asset------Rate------Rate_Issued_On
1. X-------1500------21-Apr-2014
2. X-------2000------28-Aug-2013
3. Z-------2200------11-Jan-2014
4. X-------3000------1-Jan-2014
The other excel file has (let's suppose):
Asset-----Sold_Date
1. X------1-Dec-2013
2. Z------12-Mar-2014
Now since the sold date of Asset X lies between 1-Jan-2014 and 28-Aug-2013 it should take the rate of 2000. If for example the sold date was 22-Apr-2014 it should take the rate as 1500. If the sold date is 27-Aug-2013 it should display a blank record. So basically the sold date should be greater than the latest Issued date and rate will correspond to that particular date.
I can easily get this working in excel but the problem is that the excel file has now become so large that it runs very slow. So I just want this incorporated in ms access. Is this possible? (I am a novice in ms access so kindly requesting you to go a little easy on me)
Thanks
Yes - a few simple queries can match up the data they way you want. If your two tables are called Rates and Sales, you could use two queries to get the results you need. The 1st query would use the Sales and Rates table to find the largest Rate_date that is less than the Sale_date, and the second query would match this back to the Rate table to get the rate on that date.
A very similar problem is described in How to use another table fields as a criteria for MS Access
I have an access form with few controls on it, like start-date, end-date, move_type, mover_name etc, when user fills this field n clicks on query command button, a select query is run and it fetches records from various tables depending upon the criteria mentioned by the user.
Later the same records are exported to excel file and a report is generated for user.
Now I need to do more with this, my select query has a field "quoted-price", depending upon the price quoted the records need to be sorted out.
EX : if quoted_price < 500 then it is a "domestic apartment1", if quoted-price is >500 And <1500 it is "domestic Apartment1" etc.
Now I need to do more with this, my select query has a field "quoted-price", depending upon the price quoted the records need to be sorted out.
EX : if quoted_price < 500 then it is a "domestic apartment1", if quoted-price is >500 And <1500 it is "domestic Apartment1" etc. and when all the records generated through select query are searched on this criteria, the records need to be grouped accordingly. Later the count for individual category is multiplied by a unit value and total charge is calculated.
I tried building many queries to do this and tables to refer to value but of no use. Any help will be much appreciated.
are you looking to create a new field depending on the condition? That may be step one of what you are trying to do, If so that is pretty easy. See here: http://allenbrowne.com/casu-14.html