Jaime Buitrago is a mechanical engineer and industrial systems engineer with 30 years of experience. He obtained his PhD in industrial and systems engineering from the University of Miami in August 2017, and his MSc in mechanical engineering from the Massachusetts Institute of Technology in 1988. His PhD dissertation was on the topic of short-term electrical load forecasting using artificial neural networks. Dr. Buitrago has been an entrepreneur for many years, starting several technology businesses in diverse fields, including software development and the automotive sector. His consulting clients have included multinational companies in the US, Europe and Latin America. His interests include using software tools for machine learning and clustering.
Project: Authorship Identification by Using a Machine Learning Approach
Given sufficient amounts of text with the author’s information tagged, can we accurately produce a feature vector that describes the style of the author and represents a fingerprint of their style?
Main Results in Detail
Reuters 50/50 is a database of 5,000 news articles written by 50 different authors. It serves as a baseline for authorship identification. Wolfram’s standard Classify and FeatureExtraction functions were used on full articles, sentences and normalized sentences to generate a baseline dimension-reduced vector space. These results were compared with an ad-hoc neural network classifier using two different pre-trained neural networks, GloVe 100 and ELMo Contextual as feature extractors, and further processing their output with various combinations of LSTM and GRU layers, and a vector space plot was created with much better results.
It will be necessary to work on the parameters of the neural networks in the model. Extensive training with the ELMo Contextual neural network will be necessary to assess its impact.