Group by in Orange, Data mining

Group by in Orange, Data mining - pandas

I have a dataset for a supermarket, each transaction containing an item is represented in a row. So, if transaction 1 contained milk, bread and coffee, the items are on a separate row and the attribute transaction occurs three times. What I want to do is to group transactions by item so that all the items are concatenated in one column. Then lastly apply association rules and separate each item in a column as an itemset. Is this even possible in Orange?
Worth mentioning, I managed to do this in RapidMiner easily with the same dataset. I used the Aggregate operator, concatenated the item attributes and then grouped by transactions.

If I understand this correctly, you wish to aggregate columns, not rows. If so, there's Aggregate Columns widget available for that. To perform Group by on rows, there's currently Pivot, which has a Group by output. We are working on a separate Group by widget, which should be available in the next release.

Related

SQL - Selecting columns based on attributes of the column

I am currently designing a SQL database to house a large amount of biological data. The main table has over 100 columns, where each row is a particular sampling event and each column is a species name. Values are the number of individuals found of that species for that sampling event.
Often, I would like to aggregate species together based on their taxonomy. For example: suppose Sp1, Sp2, and Sp3 belong to Family1; Sp4, Sp5, and Sp6 belong to Family2; and Family1 and Family2 belong to Class1. How do I structure the database so I can simply query a particular Family or Class, instead of listing 100+ columns each time?
My first thought was to create a second table that lists the attributes of each column from the first table. Such that the primary key in the second table corresponded to the column headers in table 1, and the columns in table 2 are the categories I would want to select by (such as Family, Feeding type, life stage, etc.). However, I'm not sure how to write a query that can join tables in such a way.
I'm a newbie to SQL, and am not sure if I'm going about this in completely the wrong way. How can I structure my data/write queries to accomplish my goal?
Thanks in advance.

No, no, no. Don't make species columns in the table.
Instead, where you have one row now, you want multiple rows. It would have columns such as:
id: auto generated sequential number
sampleId: whatever each row in the current table belongs to
speciesId: reference to the species table
columns of data for that species on that sampling
The species table could then have a hierarchy, the entire hierarchy with genus, family, order, and so on.

What's an elegant way to find the minimum value in each row of a table?

I've got a table which has a row per product, and the price that product has on ten different merchants. What I'd like to see is the minimum price each product has among those different merchants.
In Excel this would be easy, because the MIN() function there works on any set of cells, whether they're arranged horizontally or vertically. However, MIN() in SQL only acts on columns, so I'd be able to find the cheapest price merchant 1 had across all products, etc.
Is there an elegant way to obtain the minimum price for each row? (Are there OLAP functions that would do this, or does the problem have to be approached using a loop?)

In PostgreSQL, you can do:
select least(price1, price2, price3, ..)
from products
LEAST gives you the minimum value of a list of values. It's the non-aggregate version of MIN.

Form/Query: A row of a table, plus two editable fields from two different rows of a 1:n related table

I have three tables, one holding material-data (materials), one holding suppliers (suppliers) and one holding prices per supplier and material (supplierPrices). One material can have multiple prices, one price per supplier.
I have a form that displays various material-data per row. This form also displays an editable price of a specific supplier (supplierID 100). The table relationship in the query is "include all rows of materials where the joined fields are equal" and in the criteria supplierID = 100. So there's exactly one row per material, including the editable price of that supplier.
But now i would like to display a second editable price per row, the price of supplierID 200. If i extend the criteria to "supplierID = 100 OR supplierID = 200" i get two rows per material, which is not what i want. What i want is to display both prices in one row, together with the big bunch of material-data. First i did it with a VBA function, calling it in the query, but then the controlsource is an expression and data can't be edited respectively stored.
Is there a way to do this with some special select in the query? Or would i rather have to use VBA (again) to store it in the proper table?
Thanks for your hints.

TRANSFORM Max(supplierPrices.[price]) AS price
SELECT supplierPrices.[materialID]
FROM supplierPrices
GROUP BY supplierPrices.[materialID]
PIVOT supplierPrices.[supplierID];
but this is readonly.

combining multiple SSAS queries as one dataset

I have a fact-table with one date dimension linked three times to attributes for Ordered, Prepared and Shipped dates. So, I am able to get counts on ordering, manufacturing and dispatch and currently I am running these as three separate excel pivot table requests and combining them into one graphs giving three bars per month. I am wondering if there is a more sleek way I should be doing this by which I write a query which returns the three separate counts as individual measures rather than running the query three times doing it once against each dimension.
Currently, the fact table looks something like this: -
DEPID, PRODCODE, ORDERDATE, PROCESSDATE,SHIPDATE
001,001,20120101,20120102,20120104
002,001,20120103,20120105,20120106
003,002,20120104,20120106,20120107
004,002,20120105,20120107,20120108

You could create 3 calculated measures in the cube (Ordering Count, Manufacturing Count, Dispatch Count)...then just drag them into your pivot table in the values section and use a date dimension in the rows group.

What is the best way to do this SQL task?

What would be the best approach with this SQL-based logic:
I need to get some groups from a table. However there are thousands of items, which can belong to only one of those groups (say one of the five/ten/fifteen groups returned). I can get the groups and then loop all of the item objects and insert them into the group.
Or would it be better to get all the objects which belong to a group, loop them, and insert them into the belonging group? What would be the difference in performance?

If you're just looking for the groups, then a simple SELECT DISTINCT group FROM Table will return those. If you want each of the rows and their associated groups, well a SELECT * (not for production use...) would get that as well. If you want them in order, then a SELECT with an ORDER BY group would do.
What are you going to do with this information?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Group by in Orange, Data mining - pandas

Related

SQL - Selecting columns based on attributes of the column

What's an elegant way to find the minimum value in each row of a table?

Form/Query: A row of a table, plus two editable fields from two different rows of a 1:n related table

combining multiple SSAS queries as one dataset

What is the best way to do this SQL task?

Categories

Resources