Excel - How To count(*) and groupby similar to SQL - sql

I'm looking for a way to perform a SQL type command in Excel. I need to get a count of each string in a column without knowing the string's text before hand.
Here's some sample data, I want to get a count of each Name.
Name
----
A
B
C
A
D
B
IN SQL I'd
SELECT Name, count(*)
FROM #table
group by Name
And I'd expect to get
Name | Count
-----|------
A | 2
B | 2
C | 1
D | 1
How can I perform this operation in Excel?

You could go with the pivot tables that give you some options to analyze your data. There is a good example with explanation on this website: http://www.contextures.com/pivottablecountunique.html

Related

SQL pivot with text based fields

Forgive me, but I can't get this working.
I can find lots of complex pivots using numeric values, but nothing basic based on strings to build upon.
Lets suppose this is my source query from a temp table. I can't change this:
select * from #tmpTable
This provides 12 rows:
Row | Name | Code
---------------------------------
1 | July 2019 | 19/20-01
2 | August 2019 | 19/20-02
3 | September 2019 | 19/20-03
.. .. ..
12 | June 2020 | 19/20-12
I want to pivot this and return the data like this:
Data Type | [0] | [1] | [3] | [12]
---------------------------------------------------------------------------
Name | July 2019 | August 2019 | September 2019 | June 2020
Code | 19/20-01 | 19/20-02 | 19/20-03 | 19/20-12
Thanks in advance..
Strings and numbers aren't much different in pivot terms, it's just that you can't use numeric aggregators like SUM or AVG on them. MAX will be fine and in this case you'll only have one Value so nothing will be lost
You need to pull your data out to a taller key/value representation before pivoting it back to look the other way round as it does now
unpivot the data:
WITH upiv AS(
SELECT 'Name' as t, row as r, name as v FROM #tempTable
UNION ALL
SELECT 'Code' as t, row, code FROM #tempTable
)
Now the data can be re grouped and conditionally aggregated on the r columns:
SELECT
t,
MAX(CASE WHEN r = 1 THEN v END) as r1,
MAX(CASE WHEN r = 2 THEN v END) as r2,
...
MAX(CASE WHEN r = 12 THEN v END) as r12
FROM
upiv
GROUP BY
t
You'll need to put the two sql blocks I present here together so they form a single sql statement. If you want to know more about how this works, I suggest you run the sql statement inside the with block, take a look at it, and also remove the group by/max words from the full statement and look at the result. You'll see the WITH block query makes the data taller, essentially a key/value pair that is tracking what type the data is (name or code). When you run the full sql without the group by/max you'll see the tall data spreads out sideways to give a lot of nulls and a diagonal set of cell data (if ordered by r). The group by collapses all these nulls because a MAX will pick any value over null (of which there is only one)
You could also do this as an UNPIVOT followed by a PIVOT. I've always preferred to use this form because not every database supports the UN/PIVOT keywords. Arguably, UNPIVOT/PIVOT could perform better because there may be specific optimizations the developers can make (eg UNPIVOT could single scan a table; this multiple Union approach may require multiple scans and ways round it could be more memory intensive) but in this case it's only 12 rows. I suspect you're using SQLServer but if you're using a database that doesn't understand WITH you can place the bracketed statement of the WITH (including the brackets) between the FROM and the upiv to make it a subquery if the pattern SELECT ... FROM (SELECT ... UNION ALL SELECT ...) upiv GROUP BY ...; there is no difference
I'll leave renaming the output columns as an exercise for you but I would urge you to consider not putting spaces or square brackets in the column names as you show in your question

Same entity from different tables/procedures

I have 2 procedures (say A and B). They both return data with similar columns set (Id, Name, Count). To be more concrete, procedures results examples are listed below:
A:
Id Name Count
1 A 10
2 B 11
B:
Id Name Count
1 E 14
2 F 15
3 G 16
4 H 17
The IDs are generated as ROW_NUMBER() as I don't have own identifiers for these records because they are aggregated values.
In code I query over the both results using the same class NameAndCountView.
And finally my problem. When I look into results after executing both procedures sequentially I get the following:
A:
Id Name Count
1 A 10 ->|
2 B 11 ->|
|
B: |
Id Name Count |
1 A 10 <-|
2 B 11 <-|
3 G 16
4 H 17
As you can see results in the second set are replaced with results with the same IDs from the first. Of course the problem take place because I use the same class for retrieving data, right?
The question is how to make this work without creating additional NameAndCountView2-like class?
If possible, and if you don't really mind about the original Id values, maybe you can try having the first query return even Ids :
ROW_NUMBER() over (order by .... )*2
while the second returns odd Ids :
ROW_NUMBER() over (order by .... )*2+1
This would also allow you to know where the Ids come from.
I guess this would be repeatable with N queries by having the query number i selecting
ROW_NUMBER() over (order by .... )*n+i
Hope this will help

SQL - SSRS Search for list of values using LIKE

First I will show you example data, expected input and output:
VALUE1 | QTY
-------------
111-01 | 5
111-02 | 3
111-03 | 2
112-01 | 4
Expected input from user is VALUE1 or list of VALUE1 ( in SSRS multiple value, variable TEXT).
Expected output is for example SUM of QTY for each VALUE1 selected by user, but with this condition
like SUBSTRING(VALUE1,1,3)+'%'
In this case for user selection 111-01 output is
VALUE1 | QTY
-------------
111 | 10
So far it seems like LIKE operator in IN statement. I have found only solution which is to split the parameter from SSRS and do some loop as (pseudocode)
foreach #parameter in #parameter.Split
where VALUE1 like '#parameter[0]'+'%' or ...
I think there is some more elegant solution. Anyway, this solution is really slow. I am not much experienced with SSRS so maybe some grouping after dataset is created can be solution.
Probably you might want to try:
pseudocode:
WITH condition
AS ( SELECT SUBSTRING(c.SplitValue, 1, 3) Criteria
FROM dbo.fncSplit('111-1,112-2,113-3,114-4,115-1,116-1', ',') c
)
SELECT SUM(t.QTY)
FROM dbo.tblTest t
INNER JOIN condition con ON con.Criteria = SUBSTRING(t.Value1, 1, 3)
try to do a full text index on the table and the performance might be improved

sql logical compression of records

I have a table in SQL with more than 1 million records which I want to compress using following algorithm ,and now I'm looking for the best way to do that ,preferably without using a cursor .
if the table contains all 10 possible last digits(from 0 to 9) for a number (like 252637 in following example) we will find the most used Source (in our example 'A') and then remove all digits where Source = 'A' and insert the collapsed digit instead of that (here 252637) .
the example below would help for better understanding.
Original table :
Digit(bigint)| Source
|
2526370 | A
2526371 | A
2526372 | A
2526373 | B
2526374 | C
2526375 | A
2526376 | B
2526377 | A
2526378 | B
2526379 | B
Compressed result :
252637 |A
2526373 |B
2526374 |C
2526376 |B
2526378 |B
2526379 |B
This is just another version of Tom Morgan's accepted answer. It uses division instead of substring to trim the least significant digit off the BIGINT digit column:
SELECT
t.Digit/10
(
-- Foreach t, get the Source character that is most abundant (statistical mode).
SELECT TOP 1
Source
FROM
table i
WHERE
(i.Digit/10) = (t.Digit/10)
GROUP BY
i.Source
ORDER BY
COUNT(*) DESC
)
FROM
table t
GROUP BY
t.Digit/10
HAVING
COUNT(*) = 10
I think it'll be faster, but you should test it and see.
You could identify the rows which are candidates for compression without a cursor (I think) by GROUPing by a substring of the Digit (the length -1) HAVING count = 10. That would identify digits with 10 child rows. You could use this list to insert to a new table, then use it again to delete from the original table. What would be left would be rows that don't have all 10, which you'd also want to insert to the new table (or copy the new data back to the original).
Does that makes sense? I can write it out a bit better if it doesn't.
Possible SQL Solution:
SELECT
SUBSTRING(t.Digit,0,len(t.Digit)-1)
(SELECT TOP 1 Source
FROM innerTable i
WHERE SUBSTRING(i.Digit,0,len(i.Digit)-1)
= SUBSTRING(t.Digit,0,len(t.Digit)-1)
GROUP BY i.Source
ORDER BY COUNT(*) DESC
)
FROM table t
GROUP BY SUBSTRING(t.Digit,0,len(t.Digit)-1)
HAVING COUNT(*) = 10

Add a summary row to MS Access query

I have a query stored in MS Access which is doing a standard select from an Access table. I would like to add a summary row at the end showing sums for some of the data above.
I have looked at DSum() but it isn't suitable as I would have to include the running total on each row as opposed to just the end.
Also, note that I don't want to sum data in column a - I would like to get an empty field for the summary of column a.
Example:
a | b | c
-------------
0 | 1 | 2
1 | 1 | 9
| 2 | 11 <-- Sums data above
Does anyone know how this problem can be solved in Access? An alternative might be to define a second query which does the aggregation and then merge it with the recordset of the first one, but this doesn't seem particularly elegant to me.
In SQL server it is apparently possible to use "COMPUTE" or "ROLLUP" but these are not supported under MS Access.
You can use a union query:
SELECT "" As Sort, a,b,c FROM Table
UNION ALL
SELECT "Total" As Sort, Sum(a) As A, Sum(b) As b, Sum(c) As C FROM Table
ORDER BY Sort
EDIT:
SELECT "" As Sort, a,b,c FROM Table
UNION ALL
SELECT "Total" As Sort, "" As A, Sum(b) As b, Sum(c) As C FROM Table
ORDER BY Sort