Efficient SQL to merge results or leave to client browser javascript? - sql

I was wondering, what is the most efficient way of combining results into a single result.
I want to turn
Num Ani Country
---- ----- -------
22 cows Canada
20 pigs Canada
40 cows USA
34 pigs USA
into:
cows pigs Country
----- ----- -------
22 20 Canada
40 34 USA
I want to know if it would be better to use SQL only or if I should feed the whole query result set to the user. Once given to the user, I could use JavaScript to parse it into the desired format.
Also, I do not know exactly how I would change this into the right format for a SQL query. The only way I can think of approaching this SQL statement is very roundabout with dynamically creating a temporary table.

The operation you're after is called "pivoting" - the PIVOT info page has a little more detail:
SELECT MAX(CASE WHEN t.ani = 'cows' THEN t.num ELSE NULL END) AS cows,
MAX(CASE WHEN t.ani = 'pigs' THEN t.num ELSE NULL END) AS pigs,
t.country
FROM YOUR_TABLE t
GROUP BY t.country
ORDER BY t.country

There should be an efficient way using a 2-D array on the client-side (php) to achieve the pivoting. To address Ken Downs' concerns about byte pushing, a ragged raw pivot data consumes less bytes than a fully materialized 2-D pivot table, the simple case is
cows | pigs | sheep | goats | country
1 null null null Canada
null 2 null null USA
null null 3 null Egypt
null null null 4 England
which is only 4 rows of raw data (each being 3 columns).
Doing it in the front end also solves the issue of dynamic-pivoting. If your number of pivot columns is unknown, then you would require a MySQL procedure to build up a dynamic sql statement of the pattern "MAX(CASE....)" for each column.
There are advantages to doing this on the client side
can be done (at least considered as an alternative)
can be rendered earlier, if the savings in network traffic is significant despite requiring either (1) php pivottable construction or (2) client side javascript
does not require a MySQL procedure for dynamic pivoting

Related

How to pivot the table containing each value in the output row in SQL

I can't resolve this issue. I tried to use PIVOT() function, I've read the documentation and tried to use that. Additionally, I tried to find the answer but didn't find.
The main problem is using PIVOT() function, that it has to include aggregation function, but I needn't it, I need only pivot the table without any aggregation.
The source table:
COUNTRY
LEVEL
NUMBER
Germany
High
22
Germany
Medium
5
Germany
Low
3
Italy
High
43
Italy
Medium
21
Italy
Low
8
Canada
High
9
Canada
Medium
3
Canada
Low
13
I'd like to get the output table looks like:
COUNTRY
High
Medium
Low
Germany
22
5
3
Italy
43
21
8
Canada
9
3
13
Can anybody help me?
How to do that without using aggregation function or using but the get all values. Cause, for example, if I use min() or max() I get the max and min value and the others cells would be empty.
why do you think that using min/max would leave empty cells? As there is only one value for each country/level combination then using min or max is effectively just picking that one value.
Obviously, if your source data had more than one record for each combination of country/level then you'd need to decide how to deal with it.
This SQL seems to work fine:
select *
from COUNTRY_INFO
pivot(max(NUMBER) for LEVEL in ('High', 'Medium', 'Low'))
as p
order by country;

SQL Combine null rows with non null

Due to the way a particular table is written I need to do something a little strange in SQL and I can't find a 'simple' way to do this
Table
Name Place Amount
Chris Scotland
Chris £1
Amy England
Amy £5
Output
Chris Scotland £1
Amy England £5
What I am trying to do is above, so the null rows are essentially ignored and 'grouped' up based on the Name
I have this working using For XML however it is incredibly slow, is there a smarter way to do this?
This is where MAX would work
select
Name
,Place = Max(Place)
,Amount = Max(Amount)
from
YourTable
group by
Name
Naturally, if you have more than one occurance of a place for a given name, you may get unexpected results.

How to load grouped data with SSIS

I have a tricky flat file data source. The data is grouped, like this:
Country City
U.S. New York
Washington
Baltimore
Canada Toronto
Vancouver
But I want it to be this format when it's loaded in to the database:
Country City
U.S. New York
U.S. Washington
U.S. Baltimore
Canada Toronto
Canada Vancouver
Anyone has met such a problem before? Got a idea to deal with it?
The only idea I got now is to use the cursor, but the it is just too slow.
Thank you!
The answer by cha will work, but here is another in case you need to do it in SSIS without temporary/staging tables:
You can run your dataflow through a Script Transformation that uses a DataFlow-level variable. As each row comes in the script checks the value of the Country column.
If it has a non-blank value, then populate the variable with that value, and pass it along in the dataflow.
If Country has a blank value, then overwrite it with the value of the variable, which will be last non-blank Country value you got.
EDIT: I looked up your error message and learned something new about Script Components (the Data Flow tool, as opposed to Script Tasks, the Control Flow tool):
The collection of ReadWriteVariables is only available in the
PostExecute method to maximize performance and minimize the risk of
locking conflicts. Therefore you cannot directly increment the value
of a package variable as you process each row of data. Increment the
value of a local variable instead, and set the value of the package
variable to the value of the local variable in the PostExecute method
after all data has been processed. You can also use the
VariableDispenser property to work around this limitation, as
described later in this topic. However, writing directly to a package
variable as each row is processed will negatively impact performance
and increase the risk of locking conflicts.
That comes from this MSDN article, which also has more information about the Variable Dispenser work-around, if you want to go that route, but apparently I mislead you above when I said you can set the value of the package variable in the script. You have to use a variable that is local to the script, and then change it in the Post-Execute event handler. I can't tell from the article whether that means that you will not be able to read the variable in the script, and if that's the case, then the Variable Dispenser would be the only option. Or I suppose you could create another variable that the script will have read-only access to, and set its value to an expression so that it always has the value of the read-write variable. That might work.
Yes, it is possible. First you need to load the data to a table with an IDENTITY column:
-- drop table #t
CREATE TABLE #t (id INTEGER IDENTITY PRIMARY KEY,
Country VARCHAR(20),
City VARCHAR(20))
INSERT INTO #t(Country, City)
SELECT a.Country, a.City
FROM OPENROWSET( BULK 'c:\import.txt',
FORMATFILE = 'c:\format.fmt',
FIRSTROW = 2) AS a;
select * from #t
The result will be:
id Country City
----------- -------------------- --------------------
1 U.S. New York
2 Washington
3 Baltimore
4 Canada Toronto
5 Vancouver
And now with a bit of recursive CTE magic you can populate the missing details:
;WITH a as(
SELECT Country
,City
,ID
FROM #t WHERE ID = 1
UNION ALL
SELECT COALESCE(NULLIF(LTrim(#t.Country), ''),a.Country)
,#t.City
,#t.ID
FROM a INNER JOIN #t ON a.ID+1 = #t.ID
)
SELECT * FROM a
OPTION (MAXRECURSION 0)
Result:
Country City ID
-------------------- -------------------- -----------
U.S. New York 1
U.S. Washington 2
U.S. Baltimore 3
Canada Toronto 4
Canada Vancouver 5
Update:
As Tab Alleman suggested below the same result can be achieved without the recursive query:
SELECT ID
, COALESCE(NULLIF(LTrim(a.Country), ''), (SELECT TOP 1 Country FROM #t t WHERE t.ID < a.ID AND LTrim(t.Country) <> '' ORDER BY t.ID DESC))
, City
FROM #t a
BTW, the format file for your input data is this (if you want to try the scripts save the input data as c:\import.txt and the format file below as c:\format.fmt):
9.0
2
1 SQLCHAR 0 11 "" 1 Country SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 100 "\r\n" 2 City SQL_Latin1_General_CP1_CI_AS

Eliminate duplicate records/rows?

I'm trying to list result from a multi-table query with on row, 2 columns. I have the correct data that I need, I merely need to trim it down to 1 line of results. In other words, eliminate duplicate entries in the result. I'm using a value not shown here, school_id. Should I go with that as a distinct value? Can I do that without displaying the school_id?
SQL> select DISTINCT(school_name),Team_Name
2 from school, team
3 where team.team_name like '%B%'
4 AND school.school_id = team.school_id;
SCHOOL_NAME TEAM_NAME
-------------------------------------------------- ----------
Lawrence Central High School Bears
Lawrence Central High School BEars
Lawrence Central High School BEARS
The problem, as I'm sure you know, is the fact that "Bears" is in 3 different cases here. The simple fix is to do the upper or lower of "Team_Name" so it will only have 1 return record.
UPPER(Team_Name)

Storing parameterized definitions of sets of elements and single pass queries to fetch them in SQL

Suppose a database table containing properties of some elements:
Table Element (let's say 1 000 000 rows):
ElementId Property_1 Property_2 Property_3
------- ---------- ---------- ----------
1 abc 1 1
2 bcd 1 2
3 def 2 4
...
The table is being frequently updated. I'd like to store definitions of sets of these elements so that using a single SQL statement I would get eg.
SetId Element
--- -------
A 2
B 1
B 3
C 2
C 3
...
I'd also like to change the definitions when needed. So far I have stored the definitions of the sets as unions of intersections like this:
Table Subset (~1 000 rows):
SubsetId Property Value Operator
-------- -------- ----- --------
1 1 bcd =
1 3 1 >
2 2 3 <=
...
and
Table Set (~300 rows):
SetId SubsetId
--- ------
...
E 3
E 4
F 7
F 9
...
In SQL I suppose I could generate lots of case expressions from the tables, but so far I've just loaded the tables and used an external tool to do essentially the same thing.
When I came up with this I was pleased (and also implemented it). Lately I've been wondering whether it is as wonderful as I thought. Is there a better way to store the definitions of the sets?
I would think using duck-typing may be intuitive here, as an alternative.
For example all modern-languages (C#, Java, Python) have the concept of sets. If you are going to "intersect" or "union" (set operators) via SQL, then you have to store them in a relational way. Else, why not store them in a language native way ?. (as opposed to relational). By native way, I would mean that if it was done in Python and we used a Python set, then that is what I would persist. Same with Java or C#.
So if a set-id 10 had the members 1,4,5,6 it would be persisted in the DB as follows:
SetId Set
______________________________________
10 1,4,5,6
11 2,3
12 null
Sure, this has the disadvantage that it could be proprietary, or maybe even non-performant - which you can perhaps tell as you have the complete problem definition. If you need SQL to analyze it, maybe my suggestion has further downsides.
In a sense, the set representation feature of each of these languages are like a DSL (Domain specific Language) - if you will need to 'talk' a lot of set-stuff between your application classes / objects, then why not use the natural fit?