January 31, 2020
Genstat 23 Masterclass: data mining and advanced modelling
In this masterclass we will explore some of the data mining and more advanced statistical modelling facilities available in Genstat. As not all these facilities are available in the menus, the session will begin with a brief recap of Genstat’s command language.
Topics will include:
- Genstat command language
- Data mining
- association rules
- self-organising maps
- discriminant analysis
- support vector machines
- classification trees and random forests
- k-nearest-neighbours classification
- Advanced modelling
- specialised regression (e.g. lasso, quantile regression)
- comparisons and contrasts in regression and REML
- GLMMs and HGLMs - how do they differ and what are their advantages/disadvantages?
- new facilities for assessing fixed models in REML
- generating reports and exporting results (i.e. interoperability)
The sessions will involve a mixture of examples and practicals, so please bring your laptops (ideally with Genstat 23 already installed).
Biography of Workshop Presenters
Roger Payne leads the development of Genstat at VSN, now working part-time after 15 years in the full-time role of VSN’s Chief Science and Technology Officer. He has a degree in Mathematics and a PhD in Mathematical Statistics from University of Cambridge and is a Chartered Statistician of the Royal Statistical Society. Prior to joining VSN, Roger was a statistical consultant and researcher at Rothamsted, becoming their expert on design and analysis of experiments, as well as leader of their statistical computing activities. He originally took over the leadership of Genstat there in 1985 when John Nelder retired. His other statistical interests include generalized and hierarchical generalized linear models, linear mixed models, the study of efficient identification methods (with applications in particular to the identification of yeasts). Roger’s statistical research has resulted in 9 books with commercial publishers, as well as over 100 scientific papers. He has a visiting professorship at Liverpool John Moores University, and also retains an honorary position at Rothamsted, to help him keep in touch with practical statistics.
David Baird is a consultant statistician with 35-years’ experience and has been a Genstat developer for 25 years. He was a biometrician at AgResearch for 25 years before starting his own company VSN NZ. He has worked in a wide range of disciplines including biosecurity, entomology, agriculture, ecology, soil science, plant breeding and microarrays. His statistical interests include experimental design, spatial analysis, data mining and statistical modelling. For the last 9 years he has been the NZ Earthquakes Commission’s statistical consultant. In 2019 he was awarded an ALF Cornish award for contributions to biometrics in Australasia. David has a MSc in Applied Statistics from the University of Reading and a PhD in Statistics from the University of Otago.
Vanessa Cave is an applied statistician interested in the application of statistics to the biosciences, in particular agriculture and ecology, and is a developer of the Genstat statistical software package. She has many years’ experience collaborating with scientists in the agricultural and environmental sciences, using statistics to solve real-world problems. As a biometrician, Vanessa provides expertise on experiment and survey design, data collection and management, statistical analysis, and the interpretation of statistical findings. Vanessa is also an active member of the Australasian statistical community, serving on the New Zealand Statistical Association committee and president-elect of the Australasian Region of the International Biometric Society. She is also an editorial board member for New Zealand Veterinary Journal, an associate editor for Agronomy Journal and an honorary academic at the University of Auckland. Vanessa has an honours degree in Statistics from the University of Otago and a PhD in Statistics from the University of St Andrews.