Became part of big old project? How to not fuck up all codebase? Tips & tricks and some practices.

C#, Architectural pattern, Best Practices,

Hi Everyone.
In the beginning, being a part of any big project we know nothing about the codebase, of course, we have some guesswork and we know architecture approaches which must be in there but it is always not as we expected. We face a huge amount of code and the help of other devs are not always help because they already work on the project and explanation is based on their experience and we have to investigate the codebase by ourselves. Nevertheless, it is possible to follow some simple rules to handle your tasks on a new project without messing up. This post most for juniors devs but hope experiences devs also can find something interesting.

Look at my code

First, do not think 'my code is bad' if it works it is not bad code, it can be not clear, not optimize, not stylish, etc... there is always be a way to improve and that all is subjective things. You should show it someones and if they understand it that code is good enough. There are always someone's around you who judge your code because they have more skills or they just think to have more skills but actually they not. Let's consider some simple rules to keep code readable and supportable. Somehow, all of these present in design principles like OOP, SOLID, KISS, etc.


Do not overcomplicate the code
A simple decision is always better than a complex if it solves the task. Do not implement complex architecture for the future reusing if there is no requirement to do it. Just keep it simple - K.I.S.S.

The client always choose the easy way
If your code possible to use improper it will be used that way if that way is easier - 99.9% chance. Try to design code so that is impossible to use it wrong. It is not simple but you should think like a client of your code. Think about what is unclear about usage. Can the client use it without knowing about internal logic? Does the usage of public API intuitive? It is about applying Encapsulation in the right way.

Don't like creating a unit test, just thinking about it!
Very good practice, to think about how you will testing code even if you don't add any unit tests just think how it should be. Improving this skill will allow you to keep low coupling and make better composition/reusing of your code. You will see how all SOLID principles just become visible or noticeable in your code.

Many dependencies
If you see that your class must have many dependencies, look at it again maybe some of the functions should be out of this class. Keep the single-responsibility principle.

Performance is not your goal
Do not improve performance before having your code worked.

Do not overload constructor
Constructors must not have heavy logic or async/await or invoking virtual\abstract methods. Use patterns as Builder, Creator

Be careful with static
Careful means - not use if possible or encapsulate it inside a module/class to avoid public static. Here is the good article about it

Codebase base navigation skill

We have many tools inside IDE to search, refactor, and highlights pitfalls in the code. I will talk about MS Visual Studio without any extensions as ReSharper it is most general. So, it is very important to know how to efficiently use IDE. Debugging with Watch and Immediate Window, investigate async problems with Threads and breakpoints with conditions.
Find All References is the most often used option to code navigate and most helpful to understand how and where is used certain class/method/property.

Looking at codebase

Before creating new help functions or classes research codebase because in 99% of cases what you need has been already created. Use them, don't create duplicates.

Encoding, formating, date

Just use Encoding.UTF8 for all case explicit, if you don't have requirements to use another. Explicit means like this:

SerializerFactory.Deserialize(path, Encoding.UTF8);
SerializerFactory.Serialize(path, obj, Encoding.UTF8);

Always use the correct formatting of numeric values to string. If you show it in UI use local CultureInfo.CurrentCulture. If you send/serialize/store somewhere as text use only CultureInfo.InvariantCulture. It'll solve a lot of problems in the future. Having numerics in InvariantCulture you able to convert to any local representation in any culture.

float num = 0.1234f;
var invNumText = num.ToString(CultureInfo.InvariantCulture);
//
float invNum = float.Parse(invNumText, CultureInfo.InvariantCulture);
var german = invNum.ToString(new CultureInfo("de")); //0,1234f;
var english = invNum.ToString(new CultureInfo("en"));  //0.1234f;
//
var deNumText = num.ToString(new CultureInfo("de"));
//doesn't work correctly, result will be - 1234
//the only certain culture that was used in text converting can be used.
invNum = float.Parse(invNumText, CultureInfo.InvariantCulture);
invNum = float.Parse(invNumText, new CultureInfo("en")); 

// this is bad but this works var fixedText = deNumText.Replace(",", "."); //invNum looks right invNum = float.Parse(fixedText, CultureInfo.InvariantCulture);

As for numerics Invariant, for the date is UTC. Only for UI representation convert date to local in other cases use only UTC.

Do not extend the method with boolean parameters, add a new one.

void OpenOperation(Operation op); // existed method 
void OpenOperation(Operation op, bool isNew = false); //extended

This change is easy to add and we don't need to change any usage of this method. But, this is bad.

  • Mixing different logics in one method
  • OpenOperation contains two handlings for default case and new case. That means modification for one case potentially may regression of an another.
  • Not intuitive public API.
  • Using Find All References shows the usage list but checking what case is called we need to look at each reference.

Better way.

void OpenNewOperation(Operation op);
void OpenOperation(NewOperation op);

Naming is important

Specific naming is better than common

public Data GetData(); //not clear
// Better
public RegisteredUserData GetRegisteredUserData();
class RegisteredUserAccount {
  public RegisteredUserData GetUserData();
}
public Task CallToMyService(); // --> BAD
public Task CallToMyServiceAsync(); // --> GOOD

Do not use more than three input values

public Geometry BuildCylinder(float radius, float height, bool close, float minTriangulationEdge); // --> BAD
struct CylinderBuildData{
  public readonly float Radius;
  public readonly float Height;
  public readonly bool Close;
  public readonly float MinTriangulationEdge;
}
public Geometry BuildCylinder(CylinderBuildData data); // --> GOOD

It is not mandatory but it will make your code easy to understand.

Reduce the variation of code usage

It is because - "The client always choose the easy way" but here it is also about immutable data objects. We always have some data in input and output and data that transfers through layers. Has these data in an immutable state make code less fragile and knowledge, where is exactly the place of changing, save a lot of time to research. As a result, we achieve reducing incorrect usages and thread-safe of data-objects.

  • Use uint instead of int if your code does not support negative values
  • Use IEnumerable/ReadOnlyCollection as output/input values instead of List if the code doesn't modify a collection
  • Use struct instead of class if all data is simple types or all properties as readonly.

Try to avoid simple types

There are we talking about public API and special values like ID, Key maybe Type like this.

UserData GetUserDataById(int id);
UserData GetUserDataById(string id);

If I want to analyze how and where ID is used in case int ID I can't. Using Find All References by int is useless. I can only check usages of GetUserDataById or Find All by 'int id'. If ID is a custom type we can more flexible to investigate its usages in the codebase.

readonly struct UserId : IEquatable<UserId>{
  public UserId(int id){}
  public override bool Equals(object obj);
  public override int GetHashCode();
  public override string ToString();
  public bool Equals(UserId other);
}

how to override methods see here.

Use appropriate Lambda expressions for your case

Using correct expressions allow having condition and execution in one line.

var single = list.Single(x => x.Name == name);

If your logic depends on conditions like 'collection is empty' or 'more than one element satisfies' or 'the source sequence is empty' than .Single(..) checks these conditions and throws Exception otherwise. For example:

list.SingleOrDefault(x => x.Type == MyType).DoSomething(); //--> BAD
list.Single(x => x.Type == MyType).DoSomething(); // --> GOOD

In the first case, we got NullReferenceException this is not correct because the problem is the missing value by the condition. In the second case, we got InvalidOperationException with 'Sequence contains more than one element' and this is the correct exception.
Or in case of value is allowed to be absent.

var single = list.SingleOrDefault(x => x.Type == MyType);
if(single != null){
  single.DoSomething(); // --> GOOD
}
list.SingleOrDefault(x => x.Type == MyType)?.DoSomething(); // GOOD

Follow correct Set/Get semantic of methods

public int SomeValue(); // --> BAD

The ability to get value is not described by the semantic of the method.

public int SomeValue { get; }// --> GOOD
public int GetSomeValue();// --> GOOD
public int TakeSomeValue();// --> GOOD

Input value in getter

public int GetSomeValue(int val) ;// --> BAD

What do we expect when we see input value in the getter method?

  • Is val a default value?
  • public ValueStruct GetSomeValueOrDefault(ValueStruct _default);
  • Is val Max/Min?
  • public IEnumerable<int> GetSomeValuesLessThan(int maxVal);
  • Is val used in some calculation and then we get the result of the method?
  • public int GetCalulatedValueBy(float factor);

What I see is not clear semantic of passing val in this case.

Setter

public void SomeValue(int val);// --> BAD

Not clear, Does that method just set value or executes more complex logic and then set value?

public int SetSomeValue(int val);// --> BAD

Why does the setter return value? It will be the new value or previous or value after some specific logic?

public void SetSomeValue(int val);// --> GOOD
public void ApplySomeValue(int val) {  }// --> GOOD

Use Try get/set semantic

public bool TrySetValue(int val);
public bool TryGetValue();

If your setter/getter has executing restrictions show it to the client by using bool Try semantic.
And also do not use complex logic inside get/set properties use methods instead.

If the method does not expect some values, check them and throw an exception

float Calculate(float a, int b){
  if(b == 0){
    throw new InvalidArgumentException(nameof(b));
  }
  return a / b;
}

Try do not to eliminate the whole function by the condition like this.

void Handle(Dto dto){
  if(dto.IsValid){
    //do something
  }
}

This method does nothing in case data is invalid, the condition hides the function and problem, validation of data should be outside of the method it is the responsibility of the client code who invokes this method. Otherwise, a client doesn't understand why Handle is not working.

void Handle(Dto dto){
  if(!dto.IsValid){
    throw new InvalidArgumentException(nameof(dto));
  }
}

Or use bool Try and the client will know this function executed or not, do not hide this information.

bool TryHandle(Dto dto){
  if(!dto.IsValid){
    return false;
  }
  ...
  return true;
}

And this the worst case, never do like this.

Dto UpdateSomethingInDto(Dto dto){
  if(dto.IsValid){
    //do something
  }
  return null;// hiding and creating stupid NullReferenceException in another place
  return dto; // perfect hiding and moving a problem to another place
}

In most cases conditions like if(obj != null) just passes the problem NullRefferenceException to next module/code. The exception will be moved from the place where it exactly exists to the place where it was not expected this increasing time to investigate and fix.

Do not reuse existed classes for not applicable cases, create a new one

class MessageToRequest {
  public byte[] Data;
} 
public void SendMessage(MessageToRequest data);

Have to add GetMessage with the same data as SendMessage has.
Do not make like this.

public MessageToRequest GetMessage();

It is wrong semantic and it is increasing the complexity of code for no reason.
For example:

//here the client does not aware that response is MessageRequest	
//and it seems correct
var response = service.GetMessage();
//an increasing level of obfuscation, MessageToRequest is turned into response message
processor.HandleMessage(response);
//looking at MessageToRequest dev expects Request data not response
void HandleMessage(MessageToRequest response){
//have no idea what should be done with response-request message!!!
}

Here is totally not clear what handling must be.

class ResponseMessage {
	public byte[] Data;
} 
public ResponseMessage GetMessage();

It is not a problem to have several classes with the same properties as public byte[] Data. You always can create a base class for both but still keep the semantics of the types (see 'Specific naming is better than common')

Never return NULL as a collection

IEnumerable<MyItem> GetItems(){
  ...
  if(/*something goes wrong*/){
    return null;  // --> BAD
    return Enumerable.Empty<MyItem>();  // --> GOOD
    throw;  // --> GOOD in case impossible to execute the rest of code
  }
  ...
}

Why is this bad? Because of collections usage. Collections are created for iterating on them think as it is a container there is can be elements or not but the collection must be existed.

foreach(var i in GetItems()) // --> NullReferenceException
if(GetItems().Any()) // --> NullReferenceException -> if(GetItems() != null && GetItems().Any())
return GetItems().ToArray();// --> NullReferenceException
var found = GetItems().Where(x => x.IsValid);// --> NullReferenceException

To fix all these we must to add != null but it is not all problems.

  • When clients use collections from your module they expect an enumerable and the condition on null is always lost.
  • All operations of the collection must be executable and not throwing NullRefferenceException .
  • All operations of the collection must be executable and not fail on NullReferenceException otherwise we can't use them as pipeline Collection.Shuffle().ShuffleAgain().etc;

to be continue ...