Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Participant %7: Team CSV, Universidad Central "Marta Abreu" de Las Villas #20

Open
chenzimin opened this issue May 18, 2018 · 10 comments
Open
Labels
participant Participant of the CodRep-competition

Comments

@chenzimin
Copy link
Collaborator

Created for Team CSV(@cesarsotovalero) from the Universidad Central "Marta Abreu" de Las Villas for discussions. Welcome!

@chenzimin chenzimin added the participant Participant of the CodRep-competition label May 18, 2018
@monperrus
Copy link
Collaborator

monperrus commented May 18, 2018 via email

@cesarsotovalero
Copy link

cesarsotovalero commented May 18, 2018

My current scores using just a very naive string comparison based approach:

Score on dataset1: 0.1236735
Score on dataset2: 0.1096176

No machine learning yet.

@monperrus
Copy link
Collaborator

Yes. The first 0.8 are easy to get (purely due to the data).

The remaining points are super hard.

Best score seen so far:

  • Dataset1: 0.114
  • Dataset2: 0.085

@cesarsotovalero
Copy link

cesarsotovalero commented May 29, 2018

My last scores:

Dataset Perfect Match Score
Dataset 1 3867 0.11842962430821
Dataset 2 9833 0.108660931336428
Dataset 3 17197 0.0753167732657934

My current approach: string matching + parse checking

A related paper: A comparison of code similarity analysers

@chenzimin
Copy link
Collaborator Author

Thanks, I have updated the rankings

@monperrus
Copy link
Collaborator

monperrus commented May 31, 2018 via email

@cesarsotovalero
Copy link

cesarsotovalero commented Aug 21, 2018

Hi everyone, I want to give an update of my scores for the preliminary ranking:

Dataset Perfect Match Score
Dataset1 3900 0.1111243868013270
Dataset2 9948 0.0995737723246198
Dataset3 17438 0.0631975953292782
Dataset4 15773 0.0769219481612277

My current approach is: string matching + parse checking + decision rules + heuristics

@monperrus
Copy link
Collaborator

monperrus commented Aug 22, 2018 via email

@cesarsotovalero
Copy link

cesarsotovalero commented Aug 22, 2018

Thanks @monperrus!!
However, my approach has some performance issues. For instance, it takes almost 2h for Dataset1, which is far from the performance results of @tdurieux. Also, I think the accuracy (in terms of the loss function) should be improved much more to really win the competition. I'll continue working on that.

@tdurieux
Copy link
Contributor

tdurieux commented Aug 22, 2018

Strangely my technique is still better for the dataset 2 but worse for the others.

I still have some room for improvement but I am very happy of the performance of my technique. It takes less than 10min to have the results on all datasets. That is helping a lot to try new improvements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
participant Participant of the CodRep-competition
Development

No branches or pull requests

4 participants