Many-objective QoS-aware web service composition – Knowledge Discovery and Intelligent Systems – KDIS

Publication

A. Ramírez, J.A. Parejo, J.R. Romero*, S. Segura and A. Ruiz-Cortés. “Evolutionary composition of QoS-aware web services: a many-objective perspective”. Expert Systems with Applications, vol. 72, pp. 357-370. 2017.

Abstract

Web service based applications usually invoke services provided by third-parties in their workflow. The Quality of Service (QoS) provided by the invoked supplier can be expressed in terms of the Service Level Agreement specifying the values contracted for particular aspects like cost or throughput, among others. Hence, developers are required to scrutinize the service market in order to select those candidates that best fit with the expected composition focusing on different QoS aspects. This search problem, a.k.a. QoS-aware web service composition, is characterized by the presence of many diverse QoS properties to be simultaneously optimized from a multi-objective perspective. This paper explores the suitability of many-objective evolutionary algorithms for tackling the binding problem of web services on the basis of a real-world benchmark with 9 QoS properties. Then, a complete comparative study provides empirical evidence on the adequacy of the most recent and sophisticated techniques to achieve a better trade-off between all the QoS properties. Furthermore, an in-depth study shows that some algorithms are able to promote specific QoS properties while keeping high values for the rest of attributes, enabling appealing advantages for the application of many-objective evolutionary algorithms within the field of service oriented computation.

Highlights

QoS-aware web service composition requires multiple simultaneous QoS attributes.
Having conflicting QoS properties requires computationally efficient approaches.
A comparative experimental study of multi and- many-objective algorithms is presented.
Many-objective proposals can promote certain QoS properties while keeping trade-off.

Additional material

Experimental study

Experiment #1. It considers web service compositions having a maximum of 10, 20, 30, 40, or 50 tasks, where each task contains a different set of candidate services. Combining these elements, a total of 15 problem instances have been generated, i.e. 3 instances per maximum number of tasks, each one associated to a different set of candidate services but sharing the workflow.
Experiment #2. In order to validate the conclusions drawn from Experiment #1, Experiment #2 should serve to prove that the parameter fixed, i.e the workflow, does not have a marked influence on the outcomes. Therefore, 15 different structures of composition were generated for 3 representative instances, i.e. 10, 30 and 50 tasks, leading to a total of 45 problem instances.

Problem instances

Problem instances for Experiment #1 and #2 are available for download as ZIP file (855 KB).

All the problem instances used in the experimentation were generated by the instance generator proposed in: J.A. Parejo, S. Segura, P. Fernández, A. Ruiz-Cortés. “QoS-aware web services composition using GRASP with path relinking”. Expert Systems with Applications, vol. 41(9), pp. 4211-4223.

The QoS values of the candidate services have been extracted from the QWS dataset.

Experimental results

Results are available for download in Excel format:

Experiment #1 (79 KB)
Experiment #2 (172 KB)

These files contain the mean and standard deviation of the QoS values of the solutions belonging to the Pareto sets returned by each algorithm, as well as the quality indicators used for the statistical validation.

Statistical tests

Experiment #1

Friedman and Holm tests

Comparison of the algorithms in terms of hypervolume

i	Algorithm	Ranking	z	p	alpha/i	Hypothesis
7	NSGA-III	8.000	7.3790	1.5945E-13	0.0071	Rejected
6	SPEA2	6.1333	5.2920	1.2097E-07	0.0083	Rejected
5	GrEA	6.0000	5.1430	2.7045E-07	0.0100	Rejected
4	MOEA/D	4.8000	3.8013	1.4393E-04	0.0125	Rejected
3	IBEA	4.2667	3.2050	1.3505E-03	0.0167	Rejected
2	NSGA-II	3.4000	2.2361	2.5347E-02	0.0250	Accepted
1	HypE	2.0000	0.6708	5.0233E-01	0.0500	Accepted
0	e-MOEA	1.4000	–	–	–	–

Friedman test:

Iman and Davenport statistic considering reduction performance (distributed according to F-distribution with 7 and 98 degrees of freedom): 63.1879

Critical value at the significance level (alpha=0.01): 2.8272

Holm test:

Holm test rejects those hypothesis that have a p-value < 0.025.

Comparison of the algorithms in terms of spacing

i	Algorithm	Ranking	z	p	alpha/i	Hypothesis
7	IBEA	8.0000	7.8262	5.0269E-15	0.0071	Rejected
6	HypE	6.5333	6.1865	6.1532E-10	0.0083	Rejected
5	GrEA	6.4000	6.0374	1.5663E-09	0.0100	Rejected
4	NSGA-III	4.7333	4.1740	2.9931E-05	0.0125	Rejected
3	e-MOEA	4.0000	3.3541	7.9623E-04	0.0167	Rejected
2	SPEA2	2.7333	1.9379	5.2632E-02	0.0250	Accepted
1	MOEA/D	2.6000	1.7889	7.3638E-02	0.0500	Accepted
0	NSGA-II	1.0000	–	–	–	–

Friedman test:

Iman and Davenport statistic considering reduction performance (distributed according to F-distribution with 7 and 98 degrees of freedom): 202.1765

Critical value at the significance level (alpha=0.01): 2.8272

Holm test:

Holm test rejects those hypothesis that have a p-value < 0.025.

Cliff’s Delta test (effect size)

Cliff’s Delta test results in raw format (.txt):

Hypervolume (9 KB)
Spacing (9 KB)

Experiment #2

Friedman and Holm tests

Comparison of the algorithms in terms of hypervolume

i	Algorithm	Ranking	z	p	alpha/i	Hypothesis
7	NSGA-III	8.0000	12.4366	1.6544E-35	0.071	Rejected
6	SPEA2	6.4222	9.3812	6.5210E-21	0.0083	Rejected
5	GrEA	5.7778	8.1333	4.1788E-16	0.0100	Rejected
4	IBEA	4.6667	5.9816	2.2095E-09	0.0125	Rejected
3	MOEA/D	4.6444	5.9386	2.8751E-09	0.0167	Rejected
2	NSGA-II	2.9556	2.6681	7.6292E-03	0.0250	Rejected
1	HypE	1.9556	0.7316	4.6444E-01	0.0500	Accepted
0	e-MOEA	1.5778	–	–	–	–

Friedman test:

Iman and Davenport statistic considering reduction performance (distributed according to F-distribution with 7 and 308 degrees of freedom): 220.9533

Critical value at the significance level (alpha=0.01): 2.6977

Holm test:

Holm test rejects those hypothesis that have a p-value < 0.05.

Comparisons of the algorithms in terms of spacing

i	Algorithm	Ranking	z	p	alpha/i	Hypothesis
7	IBEA	8.0000	13.5554	7.3568E-42	0.0071	Rejected
6	HypE	6.6222	10.8874	5.2671E-25	0.0083	Rejected
5	GrEA	6.3333	10.3280	5.2671E-25	0.0100	Rejected
4	NSGA-III	4.3333	6.4550	1.0824E-10	0.0125	Rejected
3	e-MOEA	3.9778	5.7664	8.0963E-09	0.0167	Rejected
2	SPEA2	3.2444	4.3463	1.3842E-05	0.0250	Rejected
1	MOEA/D	2.4889	2.8832	3.9363E-03	0.0500	Rejected
0	NSGA-II	1.0000	–	–	–	–

Friedman test:

Iman and Davenport statistic considering reduction performance (distributed according to F-distribution with 7 and 98 degrees of freedom): 453.6330

Critical value at the significance level (alpha=0.01): 2.6977

Holm test:

Holm test rejects all the hypotheses.

Cliff’s Delta test (effect size)

Cliff’s Delta test results in raw format:

Hypervolume (10 KB)
Spacing (9 KB)