Which clause could you use to restrict the values returned by a group function?

The LIMIT clause in a SELECT query sets a maximum number of rows for the result set. Pre-selecting the maximum size of the result set helps Impala to optimize memory usage while processing a distributed query.

Syntax:

LIMIT constant_integer_expression

The argument to the LIMIT clause must evaluate to a constant value. It can be a numeric literal, or another kind of numeric expression involving operators, casts, and function return values. You cannot refer to a column or use a subquery.

Usage notes:

This clause is useful in contexts such as:

  • To return exactly N items from a top-N query, such as the 10 highest-rated items in a shopping category or the 50 hostnames that refer the most traffic to a web site.
  • To demonstrate some sample values from a table or a particular query. (To display some arbitrary items, use a query with no ORDER BY clause. An ORDER BY clause causes additional memory and/or disk usage during the query.)
  • To keep queries from returning huge result sets by accident if a table is larger than expected, or a WHERE clause matches more rows than expected.

Originally, the value for the LIMIT clause had to be a numeric literal. In Impala 1.2.1 and higher, it can be a numeric expression.

Prior to Impala 1.4.0, Impala required any query including an ORDER BY clause to also use a LIMIT clause. In Impala 1.4.0 and higher, the LIMIT clause is optional for ORDER BY queries. In cases where sorting a huge result set requires enough memory to exceed the Impala memory limit for a particular executor Impala daemon, Impala automatically uses a temporary disk work area to perform the sort operation.

See ORDER BY Clause for details.

In Impala 1.2.1 and higher, you can combine a LIMIT clause with an OFFSET clause to produce a small result set that is different from a top-N query, for example, to return items 11 through 20. This technique can be used to simulate "paged" results. Because Impala queries typically involve substantial amounts of I/O, use this technique only for compatibility in cases where you cannot rewrite the application logic. For best performance and scalability, wherever practical, query as many items as you expect to need, cache them on the application side, and display small groups of results to users using application logic.

Restrictions:

Correlated subqueries used in EXISTS and IN operators cannot include a LIMIT clause.

Examples:

The following example shows how the LIMIT clause caps the size of the result set, with the limit being applied after any other clauses such as WHERE.

[localhost:21000] > create database limits;
[localhost:21000] > use limits;
[localhost:21000] > create table numbers (x int);
[localhost:21000] > insert into numbers values (1), (3), (4), (5), (2);
Inserted 5 rows in 1.34s
[localhost:21000] > select x from numbers limit 100;
+---+
| x |
+---+
| 1 |
| 3 |
| 4 |
| 5 |
| 2 |
+---+
Returned 5 row(s) in 0.26s
[localhost:21000] > select x from numbers limit 3;
+---+
| x |
+---+
| 1 |
| 3 |
| 4 |
+---+
Returned 3 row(s) in 0.27s
[localhost:21000] > select x from numbers where x > 2 limit 2;
+---+
| x |
+---+
| 3 |
| 4 |
+---+
Returned 2 row(s) in 0.27s

For top-N and bottom-N queries, you use the ORDER BY and LIMIT clauses together:

[localhost:21000] > select x as "Top 3" from numbers order by x desc limit 3;
+-------+
| top 3 |
+-------+
| 5     |
| 4     |
| 3     |
+-------+
[localhost:21000] > select x as "Bottom 3" from numbers order by x limit 3;
+----------+
| bottom 3 |
+----------+
| 1        |
| 2        |
| 3        |
+----------+

You can use constant values besides integer literals as the LIMIT argument:

-- Other expressions that yield constant integer values work too.
SELECT x FROM t1 LIMIT 1e6;                        -- Limit is one million.
SELECT x FROM t1 LIMIT length('hello world');      -- Limit is 11.
SELECT x FROM t1 LIMIT 2+2;                        -- Limit is 4.
SELECT x FROM t1 LIMIT cast(truncate(9.9) AS INT); -- Limit is 9.

This SQL tutorial focuses on the SQL Group Functions in SQL Server, and provides explanations, examples and exercises. For this lesson’s exercises use this link.

Definition

SQL Server Group Functions operate on sets of rows to give one result per group, for example :

Which clause could you use to restrict the values returned by a group function?

 

SQL Server Common Group Functions

Function Description Syntax
SUM Returns the total sum
SUM(salary)
-- Result: 20000
MIN Returns the lowest value
MIN (salary)
-- Result: 1000
MAX Returns the highest value
MAX(salary)
-- Result: 7000
AVG Returns the average value
AVG(salary)
-- Result: 4000
COUNT (*) Returns the number of records in a table
COUNT(*)
-- Result: 5
COUNT (column) Returns the number of values (NULL values will not be counted) of the specified column
COUNT(name)
-- Result: 4
COUNT (DISTINCT column) Returns the number of distinct values
COUNT(DISTINCT name)
-- Result: 3

* Results based on the illustration mentioned above 

SQL Server GROUP Functions and NULL

  • In SQL Server, All Group functions ignore NULL values. For example: the average salary is calculated based on the rows in the table where a valid value is stored (the total salary divided by the number of employees receiving a salary).
  • In SQL Server You can use the ISNULL function to force group functions to include NULL values, in the following example the average is calculated based on all rows in the table, regardless of whether null values are stored in the salary column (the total salary divided by the total number of rows in the table):
SELECT       AVG(ISNULL(salary,0))
FROM         Employees

The SQL Server GROUP BY Clause

SELECT       column_name , group_function(column_name)
FROM         table_name
WHERE        condition
GROUP BY     column_name

So far, each group function described here has treated the table as one large group of data. In most cases you need to divide the table into smaller groups, instead of getting the average salary of all employees in Employees table, you would rather see, for example, the average salary grouped by each department (what is the average salary of the HR department, IT department and so on).

You can use the SQL Server GROUP BY clause to divide the rows in a table into groups. Then you can use the group functions to retrieve summary information for each group.

SELECT      department_id , AVG(salary)
FROM        employees
GROUP BY    department_id

A Few Guidelines

All columns in the SQL Server SELECT clause that are not group functions must be in the GROUP BY clause

When specifying group functions (AVG)  in a SELECT clause, alongside with other individual items (department_id), you must include a GROUP BY clause. In the GROUP BY clause you must specify these individual items (department_id in this case)  otherwise an error will be generated.

Which clause could you use to restrict the values returned by a group function?

The SQL Server GROUP BY columns don’t have to be in the SQL Server SELECT clause

It is absolutely possible to group by different columns, but not to specify these columns in the SQL Server SELECT clause (however the result will not be meaningful). In this example we’re displaying the average salary for each departments without displaying the respective department numbers.

SELECT      AVG(salary)
FROM        employees
GROUP BY    department_id

You can list more than one column after the SQL Server GROUP BY clause

Sometimes you need to see the result for groups within groups, for example: in each department there are various positions (administrative positions, clerk, maintenance employees and so on). While the operations carried out so far were meant to display the average of each department, the query specified below allows seeing the average of each department, divided by position type:

SELECT      department_id , job_id , AVG(salary)
FROM        employees
WHERE       department_id IN (50, 80, 90)
GROUP BY    department_id , job_id

Using the SQL Server WHERE clause, you can exclude rows before dividing them into groups.

For example, if you need to display the average salary only for departments 50, 80, and 90:

SELECT      department_id , AVG(salary)
FROM        employees
WHERE       department_id IN (50, 80, 90)
GROUP BY    department_id

You cannot use the SQL Server WHERE clause to restrict groups

As seen in the last example, using the SQL Server WHERE clause you can restrict rows before dividing them into groups. However, it is not possible to specify group functions in a SQL Server WHERE clause, as that would result in an error.

The departments where the average salary is higher than 5000:

 SELECT      department_id , AVG(salary)
 FROM        employees
 WHERE       AVG(salary) > 5000
 GROUP BY    department_id    

(error)

The HAVING Clause

The SQL Server HAVING clause allows filtering of aggregated results produced by the SQL Server GROUP BY clause. In the same way you used SQL Server WHERE clause to restrict rows, you use the SQL Server HAVING clause to restrict groups.

SELECT       column_name , group_function(column_name)
FROM         table_name
WHERE        condition
GROUP BY     column_name
HAVING       condition

The departments where the average salary is higher than 5000:

SELECT      department_id , AVG(salary)
 FROM        employees
 GROUP BY    department_id
 HAVING      AVG(salary) > 5000

In a single query you can use both SQL Server HAVING and WHERE clauses. Out of departments 80, 50, and 90, the departments where the average salary is higher than 5000:

SELECT      department_id , AVG(salary)
FROM        employees
WHERE       department_id IN (50, 80, 90)
GROUP BY    department_id
HAVING      AVG(salary) > 5000

You can filter based on another group function than the one that appears in a SQL Server SELECT statement:

 SELECT      department_id , AVG(salary)
 FROM        employees
 WHERE       department_id IN (50, 80, 90)
 GROUP BY    department_id
 HAVING      MAX(salary) > 5000

Which clause can restrict groups?

HAVING Clause is used for restricting group results.

Which clause is used with GROUP BY clause to restrict the groups of returned rows WHERE condition is true?

The SQL HAVING clause is used in combination with the GROUP BY clause to restrict the groups of returned rows to only those whose the condition is TRUE.

Which clause is used to restrict the groups that will be included in a query result group of answer choices like HAVING WHERE have?

The HAVING clause is used to restrict groups, based on aggregated results.

Which clause is used to filter data based on group functions?

The FILTER clause is used to filter the input data to an aggregation function and is more flexible than the WHERE clause.