Self-Intro

Hi, I’m George Zhu. I graduated from Kyushu University in Japan with a master’s degree in engineering.

I have more than 8 years of experience in the software development industry and have provided data-driven solutions as a data scientist at a consulting company in Japan.

I am currently a freelancer, mainly in the fields of data science and software development.

Through my study abroad and work experience in Japan, I have experienced a lot, and also tempered a lot. Have certain views on cooperation between international teams to help foreign companies enter the Chinese market

Services

Data-Driven Software Development

Solutions using Machine Learning or Deep Learning

Visualization using Graph Database(Neo4j),Construction of Knowledge Graphs

Provide assistance for companies to enter the Chinese market, including the establishment and operation of WeChat public accounts, WeChat mini program, etc.

Projects

Data Science Projects

Anomaly detection of electric fan

This is a project that used deep learning technology to realize anomaly detection in factory environments. The purpose is to provide a factory enterprise with a solution for anomaly detection. In this project, considering the current situation that it is not easy to collect abnormal data in a factory environment, firstly, we decided to use the deep learning VAE generation model of the unsupervised learning algorithms. In addition, the isolated forest clustering is also used for anomaly detection. Because the detection effect of unsupervised learning algorithms worked not well (about 60% accuracy), we also used the finetuned a pre-trained vgg16 model to build a classification model. Finally, the classification which is a supervised learning algorithm achieves 98% accuracy. I was responsible for investigating the most appropriate machine learning techniques, collecting test data, and developing demonstration programs. Report to customers, etc.

Tools: Python3, Keras, GCP, Matplotlib, Pandas, Flask, Kafka, OpenCV

Predictive analysis of sewer sewage purification

Due to the sewer drainage, the sewer reservoir needs to be cleaned from time to time. The amount of scavenger required to purify the water needs to be based on the experience of the operators and is related to external uncontrollable factors such as weather. If the amount is too small, the purification effect will not be achieved, and if the amount is too much, the odor will occur. This project uses machine learning methods to predict the amount of purifier input amount. Regular meetings with clients to discuss project progress. Analyze client’s data using statistical methods, and report to the client. Use linear regression, decision tree and other machine learning methods to analyze variables, select variables, etc. I used clustering to extract unknown features. Write a web crawler to capture the public data of a third-party website for machine learning modeling. Write a simple predictive application to simplify the client’s business operations.

Tools: Python3, Matplotlib, Pandas, Scikit-learn, Jupyter Notebook, Git, Docker

Proposal for a knowledge acquisition system of knowledge database

In order to expand the company's business in the field of data analysis, we designed and developed a knowledge database system of a human-machine interface using a graph database (neo4j). In this Proof of Concept, we tried to import blogs on the company's website into the graph database (neo4j) through natural language processing. The topic model is used to cluster the topics in the blog and build the knowledge graph. For the natural language query entered by users, the Cypher query statement of neo4j has generated automatically through word segmentation, case analysis, and so on to get potential query results from neo4j. Different from full-text retrieval, the purpose of this project is not to query accurate results, but to query the undiscovered potential knowledge existing in the knowledge database. I involved in the overall design and development of the system.

Tools: Python3, Neo4j, WordNet, Gensim, Cabocha

Infrastructure construction for customer behavior analysis

Use the graph database Neo4j to build a system for customer behavior analysis. Discuss system design with the client. Report work progresses weekly to the client. Build an instance of Neo4j on AWS EC2, import customer data into Neo4j, also associate third-party data from other companies so that we could analyze more customer features.

Tools: Python3, Pandas, Apache Spark, Docker, AWS

Software Delevopment Projects

Development for smart meter management systems

Develop a smart meter management system. The system simplifies the operation of meters for the electricity companies. Electricity meters are distributed in every corner of the city, the system provides a series of remote operations, including collecting data from the meters, switching the switch, switching the switch at specified time intervals and etc. Work onsite with Mitsubishi’s engineers to design and develop a smart meter management system. Joined projects for two customers in Japan, one customer in Taiwan. Guide new team members. Basic design, detailed design, coding, and testing according to client’s needs.

Tools: Java, Struts, Spring, Hibernate, JPA, dHtmlx, JQuery, Oracle, VoltDB, Javascript

Offshore software development

Work onsite with client’s offshore development team. Lead a team of 3-4 members to work with client’s engineers. Serve as BSE, communicate with the Japanese development team, carry out detailed design, and timely feedback on the changes of Japanese customers' needs to the project. Coding, testing, and review.

Tools: Java, Struts, Spring, Hibernate, iBatis, Oracle, JQuery, PowerCenter

Some Japanese Blogs about Data Science written for my previous employer

Publications

  • “XMLデータベースにおける構造要約索引を用いたTree Pattern問い合わせ処理方式に関する検討”, データ工学と情報マネジメントに関するフォーラム(DEIM), 2011.
  • “XSim : A New Method for Generating Simulation Quotient of XML Documents in a Relational Database”, In Technical Committee on Data Engineering (DE), 2011.
  • “XSim: The First Method for Generating the Simulation Quotient of XML Documents in a Relational Database”, 2012 International Conference on Future Information Technology and Management Science & Engineering, Paper ID: 40.
  • “リレーショナルデータベースを用いた模倣索引の生成”, 第64回電気関係学会九州支部連合大会, 2011.
  • Education

    M.S. of Engineering(Advanced Information Technology), Kyushu University, Japan

    Languages

    Mandarin: Native Level

    Japanese: Business Level

    English: Business Level